Itemscript Schema

This is a work in progress; a validator is not yet
available, and certain aspects may be inconsistent at present.

Introduction

Itemscript Schema is a language for validating and specifying
JSON values. It is deliberately simple compared to languages like XML
Schema.

Itemscript Schema is intended to be read by humans. Even
though computers can read and process it, the main use for schema
languages is in communicating precise expectations about what kind of
JSON you expect to see to other people. It is also designed to be
simple enough to be used without prior training by people who are not
experts in computer languages, although they may be experts in other
things.

Itemscript Schema is intended to be a descriptive
language; that is, it does not say that a JSON value is of type
X, but that it meets the requirements for type X. That value
may also meet the requirements for other, unrelated types. In this way
it differs a little from other schema languages and strong-typing
schemes in programming languages.

You should be familiar with JSON
before reading this specification. JSON provides a rich set of basic
types for strings, numbers, booleans, arrays, and objects (objects are
also known as hashes, maps, or dictionaries).

If you intend to exchange data, you need to agree on the data
format. Itemscript Schema describes objects and their keys, the types
of values that are allowed, and allows you to describe and restrict
subclasses of the other built-in types in JSON. as well as a few
common subclasses of those JSON types added in for convenience - for
instance, an integer type.

As you can see, the schema description and the actual instances
of the item look a lot like each other. This is unlike the situation
with most schema languages, where the schema descriptions look very
different to the data they're describing. We think it's easier to
understand a schema definition when it's very similar in structure to
the data - it's more like a template.

A slightly longer worked example:

We want to describe cats, and their owners. The best way to
start with a schema definition is to imagine what an object of that
type might look like in JSON. So:

Inside the schema object is a string under the key "prefix"
that defines a common prefix for all types defined in the schema.
There is also an object under the key "types" which contains the
actual type definitions. Inside that object is a key with the name of
a type - "Cat". Object type names are usually spelled with an initial
capital letter. Once the scheme is processed, type names will have the
prefix specified above added to their name.

It has a type definition attached to it, in an object - type
definitions are always objects. For, simplicity, for most of the
examples from here on we're just going to use the type definition
itself, not the surrounding schema object.

The default kind of type is an object, so we don't need to
explicitly say that. Inside the type definition, we have a set of keys
- "name", "color", "age" - that indicate keys that we expect to see
inside a Cat object. You can see that this looks a lot like the
example cat object we gave at the beginning. Now, the values of those
keys determine what type we expect the value to be in the actual
object. In this case, we can see that the values that go with "name"
and "color" are the empty string "". In Itemscript Schema, this means
that the value can be any string. This is the most common type for a
simple value like "name".

The type for "age" is "integer". JSON doesn't have a built-in
integer type, just a floating-point number type, but by specifying the
type as "integer", we can restrict a number value to only being an
integer.

This is pretty similar, except that the value for the "address"
field is an object. When we have a type value that is an object, we
know that we're expecting to see an object, and when the object
contains fields, like this one, we know we're expecting to see those
fields inside it.

OK, now we want to show which owner owns which cats. At this
point, it's worth introducing a couple of new simple types so we can
mark values as being an OwnerId or a CatId and use that to look up the
right kind of object. We also want to be able to specify that the
number isn't negative, since integer types can by default contain
negative values. So:

These are two new type definitions. Just like the Cat and Owner
types, they are defined using an object. But the first field in each
of them is the "EXTENDS" field. This is what tells us that we're not
to assume that they are objects. In this case, we're told that they
are extending "integer". After that comes a field that is special to
integer types, called "MIN". This gives the minimum value that we'll
accept in that field.

Now let's use those in the Owner and Cat definitions, and add
extra fields to those objects that give the relationships between
objects of those types:

So, on Cat, we introduced a single field called "ownerId", of
type "OwnerId". This is useful even if we hardly make any restrictions
on OwnerId, because we can recognize values of this type and see that
they represent relationships between objects. (We'll assume cats only
have one owner for now, just for simplicity.)

On Owner, we introduced a field called "cats", with a value of
an array, the array containing the single string "CatId". This (quite
naturally) means we expect to see "cats" in an owner object having a
value of an array, itself containing integers of type CatId. One thing
to note here is that in Itemscript Schema, when a value can be an
array of some type, it can also be a single value of that type not
wrapped in an array. This is a convenient shortcut, since you often
want people to be able to specify either a single value of a given
type or a list of values of that type.

So let's look at what actual values of these types might look
like now that we added in the ID fields:

Pretty easy, right? We think so. Of course, we don't talk here
about how those values get put in place, how they are generated, or
how they are kept consistent. That's not the responsibility of the
schema language - all the schema language does is tell you what you
expect the data to look like.

Attaching types to values

Only objects and items can directly declare their own type. For
all other types, the type is determined by matching to a type declared
in the containing object, item, or array. The type can also be
determined this way for objects and items if it is unambiguous. (The
type may also be inferred from the name of an item, or from another
key declared inside an object or item, rather than from an explicit
TYPE specification - but that's an advanced and
implementation-specific topic.)

In this example, the type of the "Loki" and "Victoria" items
are inferred from the fact that the containing item is of type "Cats",
which declares an ANY key of type "Cat", so we know that anything
inside that does not match a fixed key name (like "location") should
be inferred to be of type "Cat" - so the "Loki" and "Victoria" object
are of type "Cat".

Locations

One concept defined on top of JSON is that of a location. Every
object in an Itemscript system has a location, which is a dotted
string like "com.example.petstore.Dog", or "itemscript.SCHEMA". Any
time you encounter something that can be a location, if it has at
least one dot in it it is treated as a qualified location - the full
path to that object. If it has no dots, it's treated as a relative
location, and the surrounding location is added to it.

In both of these examples, the location of the Schema object is
"itemscript.SCHEMA.Schema". As you can see, there is more than one way
to specify a location in Itemscript, depending on context. For now,
you can just assume that it "does what you expect".

You don't need to use locations for your own system; you can
just use this as a JSON schema language without adopting all of the
concepts. You do need to understand the basics of locations to define
an Itemscript Schema, though, since the Schema system uses locations
to refer to other definitions.

Items

Another concept on top of JSON is also worth noting in advance
- the "item" type. As you can imagine from the name "Itemscript",
items are a core concept in the system. An item is just an object that
can exist separately to other objects, and an item type is just a
marker that a particular object should be treated as an item. For
instance, the "Customer" item type might describe an item instance
named "Jacob". All you need to know here is that even though "Jacob"
might be logically contained in another object, item, or array, it may
physically exist in a separate record to the object it's contained in.
All that that means is that when you ask an object for a value that is
of type "item", it knows to go looking for it in a database (or to
make a call to a server) to find it.

If you are using this simply as a JSON schema language, you
don't need to worry about items. They are strictly optional.

Itemscript Schema Language Definition

An Itemscript schema is described by a single Itemscript
object.

Within a schema, each key declares the name of a type, and the
corresponding value is an item that defines the type. Each kind of
type definition item can contain certain keys that specify
restrictions on the type.

Type specifications

The basic Itemscript types

boolean

number

integer

string

binary

decimal

array

object

item

All of these are built-in JSON types except integer,
binary, decimal>, and item. Those can be
represented in JSON - they map directly to basic JSON types - but give
some extra convenience and restrictions over the JSON types they are
based on.

You can specify a key called "EXTENDS" with a value of the name
of the basic type to be extended.

"ExampleType": {
"EXTENDS" : "array"
}

In all type definitions, the only thing that is required is a
name. The default is generally to accept any value of any type. If you
want to restrict things, you have to do so explicitly.

boolean types

(none) : Itemscript Schema does not define any keys for
boolean types.

"IsRegistered" : {
"EXTENDS" : "boolean"
}

number types

MIN : A number representing the maximum value
that this number may take.

MAX : A number representing the minimum value
that this number may take.

FRACTIONDIGITS : An integer number representing
the number of digits allowed after the decimal place.

PATTERN values

Strings may specify one or more patterns that their values must
match. These patterns are very simple:

An at-sign "@" means any letter character may occupy that
position.

A hash mark "#" means any digit may occupy that position.

An ampersand "&" means any letter or any digit may
occupy that position.

A question mark "?" means any non-space character at all may
occupy that position.

A space means a space must occupy that position.

A plus sign "+" means that any character, including space,
may occupy that position.

Any other character means that character itself must occupy
that position.

An asterisk "*" at either the beginning or end of a pattern
(but not both!) means zero, one or many instances of any character may
prepend (or append) the pattern as long as the rest of the pattern
matches the declaration.

There are no escape characters, character groups,
variable-length groups, or other regex-type features. Full string
validation can be done at the application level. The Itemscript
pattern declarations handle simple checks and enable preliminary
classification of data, leaving more specific tests to the
applications that handle the data.

binary types

MAX : An integer number representing the
largest size, in bytes, that may be stored. This refers to the actual
(decoded) binary data, not the Base64 representation. The Base64 size
in JSON will be about 33% larger than the original binary data.

decimal types

decimal types are strings containing a decimal
number. We only describe what those numbers will look like;
applications must correctly process them according to their own
rounding rules. For instance, money is often represented in a decimal
field with 2 or 3 digits after the decimal point, but the rules for
rounding it may be much more complex. For those purposes, a value of
type number is not an appropriate choice because of
rounding issues.

MIN : A decimal number representing the
smallest value this decimal number may hold.

MAX : A decimal number representing the largest
value this decimal number may hold.

FRACTIONDIGITS : An integer number giving the
number of digits after the decimal point that can be stored.

CHOOSE : An array of decimals representing the
only allowed values for this type.

Type references in arrays and objects

Since arrays and objects can in turn contain other objects, the
type of these contained objects may be specified.

To reference a built-in type and allow any conforming value,
you can declare the type by declaring an empty value of that type,
number or boolean value:

You can only put one value inside an array like this, and it
has to be a valid type reference itself. (It can be an empty object,
an inline object definition, or an empty or non-empty array itself,
though.)

If you want to specify that a value can be one of several
different types, you can declare an ANY type definition
and then specify that as the type of the value:

Inline object declarations

You can declare an object type inline by giving a non-empty
object. An inline object type has no name and is handled as an object.
The intent of the specification is to describe the schema in a way
that looks like the data it represents. Inline object declarations
concisely describe schema types using standard object notation.

array types

MIN : An integer number giving the minimum
number of elements allowed in this array.

MAX : An integer number giving the maximum
number of elements allowed in this array.

One of

CONTAINS : A type reference indicating the
type that can be stored in this array - this can be a string
containing the name or location of a type, an inline object
definition, an array containing a type reference, or an empty
object, array, string, true/false, or 0. If you want to allow
multiple types, define an ANY type grouping them and then assign
that type to this key.

"AgeList" : {
"MIN" : 0,
"MAX" : 10,
"CONTAINS" : [""]
}

object types

When you declare an object, you can require certain fields be
present and contain particular values. You can require that all
elements with any name be of a particular type (or set of types). You
can't restrict an object so that no other keys than those listed may
appear. This is intentional.

NoRestrictionsObject : {}

is is the same as saying:

"NoRestrictionsObject" : {
"EXTENDS" : "object",
"ANY" : "any"
}

Keys that are not in all-caps declare the keys that may be
present in the object. These values are declared as standard type
references.

You can attach multiple types to an object by using an ANY
type, and each type may declare its own set of keys. Each type can
then be processed separately. The behavior in the case of key
collisions is undefined. So avoid them! One strategy for
avoiding them is to use fully-qualified keys. These are keys
containing dots that identify the type that "owns" that key.

"QualifiedKeyObject" : {
"com.example.SCHEMA.Cat.name" : "Victoria"
}

Applications should pass along unknown keys. Someone may be
expecting them to be present later in the processing.

Objects can define a key named ANY or can require
that any unknown keys be of a particular type or set of types. This is
useful if you want to allow any key name while declaring the type of
the corresponding value. If an object declares an ANY
key, the validator will try to find a specific named key and its
corresponding type before assigning the default ANY key.
You can have named keys that are treated specially even if you declare
an ANY key.

PATTERN keys

Instead of declaring a specific object key, you can use a PATTERN
key to match a variety of different keys:

As with ANY keys, a PATTERN key name
will be used only if no fixed-string key is declared with the name of
a matching key. The order of search is:

Keys with fixed string names.

Keys with "PATTERN " key names.

The ANY key.

Object type definition keys

Keys in all-caps in object definitions are Itemscript Schema
restrictions on the object. Itemscript reserves all all-caps keys (and
all keys starting with a single all-caps word) in all objects for its
own use.

ID keyname : This key is the ID field for the
item. If this field is blank in an instance item, it will be set by
the system to the name of the item as determined externally or
through a LOCATION key.

ANY : The value is a type reference; the
presence of an ANY key means any key may be used with
values of the specified type. (But see the footnote on dotted names.)

ORDER : A single string or an array of strings
that declare the natural order of elements in an object by key name.
If ORDER is not declared, the ordering is undefined. Any elements not
named in the ORDER clause will be placed after the named
elements in undefined order. If the value of a named key is missing,
the application decides whether to treat it is present with a null
value or absent. You cannot d eclare a "TypeRef" key or
an ANY key. These keys will always be placed after the
ordered keys in an undefined order.

META : The value is an object whose contents
may contain additional keys required by the application. Any object
keys unknown to Itemscript Schema will be treated as keys that should
be present in the object to be validated, rather than being ignored.
Itemscript is not concerned with the internals of objects, but it
preserves and protects their right to express and share internal
metadata in Itemscript.

EITHER : With a value of an array of objects;
each object contains a set of one or more keys with type reference
values. One and only one of the objects must be chosen, and the keys
contained in it are treated as if they were REQUIRED
keys. The values for those keys are type references. You cannot list
a PATTERN key or an ANY key in an EITHER
object. The attempt to match is made in the order that the objects
are listed in the array, and the first one to match is the one whose
keys and types will be used. So you should start with the most
specific set of keys and work down to the least specific set.