GraphQL Data Shapes Directives

This document defines an easy-to-use set of GraphQL directives that can significantly improve the
value of GraphQL schemas for JSON-based data processing.
The @uri directive can be used to state how globally unique identifiers can be
produced for JSON objects, based on field values.
The @class directive defines subclass relationships between GraphQL types, going
beyond the limited inheritance mechanism of the current GraphQL spec.
The @shape directive defines semantic constraints
that all JSON objects of the GraphQL type are expected to conform to.
The @display directive specifies user interface metadata for form-building
including default values.

Unique Identifiers for GraphQL

JSON objects delivered by GraphQL services typically represent entities from an underlying
database or some object repository.
However, each time someone requests information about these entities, a new JSON object is
produced, and there is no reliable mechanism to ensure that these objects indeed refer to the
same entity.

Let's look at an example: below are the results of various GraphQL requests:

A client that is trying to make sense of these objects has individual, disconnected JSON tree structures
to begin with:

However, both the conceptual model and the underlying database is probably rather as follows:

With a combined object structure as shown above, a client consuming the JSON objects delivered by
a GraphQL service could collect data from any subsequent request and put them into the right slots.
For example, if a future request delivers height of one of the humans above, a corresponding
field can be added to the corresponding object.
The result is a true graph structure that can hold the collective results from many individual
JSON tree structures.

We have defined GraphQL directives that we are using to derive unique identifiers.
In the image above, these unique identifiers are, for example, ex:human-HAN.
The technology used for these identifiers is well-established from Web technologies including RDF,
namely Unique Resource Identifiers.

Above, we use abbreviated URIs:

human:HAN is the abbreviation for http://example.org/human/HAN using the
namespace prefixhuman:

The use of URIs makes it less likely that identifiers used for data from a given data source
that is represented by a given GraphQL schema will clash with data from another data source.
Furthermore, URIs can be cross-referenced from outside of your local data graph, leading to a
potentially huge knowledge graph that can drive user applications and allow queries
that go across local boundaries.

URI Templates

Many GraphQL objects include one (or more) ID fields that are used to identify an entity
delivered by a server.
The @uri directive uses such ID fields to derive globally unique URIs.
All we need to do it to annotate the GraphQL schema as follows:

Using the URI templates, the three JSON result objects from the introduction can be turned into a
data graph that collects and merges information from multiple requests:

URI from JSON Object

Field

Value

http://example.org/human/HAN

id

"HAN"

http://example.org/human/HAN

name

"Han Solo"

http://example.org/human/HAN

friends

http://example.org/human/LEIA

http://example.org/human/HAN

friends

http://example.org/human/LUKE

http://example.org/human/LEIA

id

"LEIA"

http://example.org/human/LEIA

name

"Leia Organa"

http://example.org/human/LUKE

id

"LUKE"

http://example.org/human/LUKE

name

"Luke Skywalker"

http://example.org/human/LUKE

friends

http://example.org/human/HAN

(Readers familiar with RDF will recognize that the above table is identical to how
graph databases store information in so-called triples, using a
subject, a predicate and an object.)

URI templates can use values of any single-valued field using the {$fieldName} syntax,
and insert the URI-encoded string representation of the corresponding value of these fields.
Values of these fields are assumed to be present in the JSON object, and are therefore
typically used with non-nullable fields marked with the ! operator.
If such values are absent then the URI cannot be generated, and the associated JSON object
can not be added into the data graph.

URI templates may reference multiple fields, such as http://example.org/person/{$firstName}-{$lastName}
although this is probably rare.

Declaring Namespace Prefixes

Namespace prefixes are used to abbreviate URIs so that they do not need to be repeated over
and over again, and so that changes to URIs just need to be made in a single place.
We introduce the @prefixes directive to define such namespace prefixes for
a GraphQL schema:

The notion of URIs and namespaces does not need to be limited to instances:
even GraphQL types themselves may have a URI so that a type can become an entity in the data graph.
For this purpose, the example above marks one of the namespace prefixes as the "default".
This means that all types defined in the schema will (by default) use this namespace,
and Human gets the URI http://starwars.com/data/Human,
abbreviated as starwars:Human.

Importing and Reusing GraphQL Schemas

We took the ideas of URIs to the next level and assigned a URI to each GraphQL schema.
By doing so, we make it possible for other schemas to reference our schema, solving
a well-known GraphQL problem coined Schema Duplication.
Basically, if one schema defines a type Movie and you have a field that
links your local Actor type with such movies, then wouldn't it be nice to just reference
the existing Movie type instead of repeating it over and over?
The directive @graph can help.

In the following example, the schema gets the URI http://starwars.com/data/
and it imports (or: includes) another, more general schema about movies.

The concept of referencing URIs from external files is well-known from the RDF and Linked Data worlds,
where engines may even dynamically look up other schemas from the Web, using the provided URIs.
You don't need to take it that far - the URIs may just as well be aliases to local files.
In any case, given the namespace prefixes shown above, a field such as Actor.appearedIn
can now reference the Movie type from the namespace that has the prefix movies:.
The example above shows that in order to use such prefixes in GraphQL identifiers, you need to
use the underscore character (instead of the : if the reference is a string).

Organizing GraphQL schemas into multiple files is key to realizing modularity and reuse potential.
However, since this mechanism is not part of the standard GraphQL spec, we had to develop a
pre-processor that combines the various files into one.

Now that types can have unique identifiers, the ability to reference GraphQL types from other
schemas opens up some interesting possibilities as shown in the next section.

Classes and Inheritance for GraphQL

Many data models that are accessible via GraphQL APIs are in fact object-oriented,
using classes that can be arranged into a subclass hierarchy.
GraphQL only supports a very shallow, one-level notion of inheritance:
GraphQL types can implement interfaces, but interfaces cannot inherit from each other.
Furthermore, even if your GraphQL types implement an interface, you still need to
repeat all field declarations.
While there may be good reasons for this (simple) design (if simplicity is the primary goal),
we argue that there is a wasted opportunity here to use GraphQL schemas as a much more general
data modeling language.

The @class directive presented here aligns GraphQL with a number of object-oriented
technologies and modeling languages such as RDF and SHACL.
@class can be used to annotate GraphQL types to instruct a processor that the type represents
a class of objects, and that the class inherits fields from its superclasses.
To make best use of this directive, you need to use a GraphQL pre-processor, or compiler,
that takes the GraphQL type definitions and "flattens" them so that all fields from superclasses
are repeated in the subclasses.

The following example states that Human and Droid are subclasses of
Character.
After pre-processing, all fields from the Character base class also apply to
Human and Droid.

The advantages of this syntax become apparent if you want to add more subclasses, such
as different types of droids with additional properties. With pure GraphQL you would
need to repeat all field definitions on each level of the class "hierarchy", quickly
producing an unmaintainable code base.

Note that you can define multiple superclasses by using arrays as value of subClassOf.

Now let's combine this concept with the unique identifiers from the first section.
We stated that types (here: classes) can have URIs and thus be treated as data in the data graph.
In the case of the data derived from the JSON delivered by the GraphQL service, we can now
add the following "triples" to our data graph:

URI from JSON Object

Field

Value

human:HAN

type

starwars:Human

human:LEIA

type

starwars:Human

human:LUKE

type

starwars:Human

The field type is simply assumed to be present for all classes,
and is used to link an instance with its type(s).
For those familiar with RDF, this is exactly what the property rdf:type does.

Combined with the information about the class hierarchy, a client can now make some
"inferences".
For example, if it knows that human:HAN is a starwars:Human then
it knows that HAN is also an instance of starwars:Character, and whatever knowledge
or functionality that we have about characters also applies to humans.
This includes the rich semantic constraints introduced in the following section.

Constraints on GraphQL Fields

The GraphQL schema language intentionally defines only a very focused and simple syntax to define
fields. Basically each field has a type, may be null, and may be an array. But that's all.
In many cases it would be beneficial to declare additional constraints on fields.
For example:

latitude values must be between -90 and 90

startDate of an event must be before the endDate

the minimum length of a userName is 8 characters

the values of countryCode must be from a given list of string constants

Such constraints can be used to validate whether instance data (and JSON objects) conform to
certain quality checks.
Constraints can also be used to consciously constrain the user input widgets, for example
to offer a drop down list for the list of available country codes.
Constraints are well-known to other schema languages such as XML Schema but also the
RDF-based Shapes Constraint Language SHACL.

The table below is a summary of the built-in constraint parameters for @shape
directives. They intuitively map to corresponding SHACL constraints from the sh: namespace.
The type Name represents GraphQL names (without quotes) or String values
that can be translated into URI using the given prefix mapping.

Thankfully, GraphQL includes an extension mechanism, directives, that can be used to define
additional constraints.
Some tool will wants to use them, other tools can just ignore them without ill side effects.
We use the directive @shape to annotate field definitions with constraints.
The term "shape" was chose because it nicely illustrates that these are about the shape of the allowed
data, but also to relate to the Shapes Constraint Language that defines exactly equivalent constraint types.
Let's look at an example:

Other typical examples of such scalar types include zip codes, social security numbers and country codes.
Defining them for a scalar type means that they can be reused and do not need to be repeated at
each occurrence.
Furthermore, scalar types that define a datatype shape have well-defined meaning across
implementations, while the GraphQL standard is very unspecific and can lead to questions about whether
a scalar value should become a string or number, for example.

Display Metadata for GraphQL

This section introduces the @display directive that naturally extends
GraphQL type definitions with information that is useful for user interfaces.
Imagine you have a Customer record and would like to display it roughly as
follows:

The basic elements of such a (form) layout are that fields can be organized into
groups (such as "Address" above), and these groups as well as their fields are in
a given order and have human-readable display labels.
GraphQL already covers the ordering (fields are naturally sorted from top to bottom),
so it is just a small step to make the schema significantly more useful:

Field Groups

To define field groups, we are using the @groups directive at the schema.
All declared groups are global (and engines can even derive URIs for them based on
the name of the arguments of the @groups directive).

Each entry in the @groups directive is an object that defines one or
more labels.
Use label to specify the "default" label.
You can use properties like label_en to define labels for specific languages,
including 5-character language codes such as label_en_AU for Australian visitors
of your application.

Field groups are by default ordered from top to bottom, but you can also use order
to assign a numeric value for groups. That is really only ever needed if you are merging
groups from multiple GraphQL schema files.

Once a group has been defined, individual fields can reference them using
@display(group: ...) as shown in the example above.

Field Display Labels and Ordering

Fields can define display labels using @display(label: ...) as shown in the example
above. This includes the option to define multiple, language-specific labels using
label_de etc.

The natural order of fields for display purposes is the same as (from top to bottom) in the GraphQL schema.
If you are relying on class inheritance and your object types are stitched
together from multiple object types, then this approach does not work, and you may need to rely
on the order argument to assign individual numeric values to each field.
Basically, assign 0 to the upper-most field, then increase numbers.
You can use floating point numbers if you want to squeeze your subclass field in between existing
fields from a superclass.

Default Values

Fields can have default values, meaning that such values can be inserted into displays
and forms even if a given object does not actually declare any value for the field.
Use @display(defaultValue: ...) as shown in the example above.

Next Steps

The directives described here and implemented in TopQuadrant's products can be used by any GraphQL system.
Some systems may for example only use the @display annotations for form building,
others may also look for @shape constraints to validate input on said input forms.

The design of the directives can also be seen as part of a more consistent design in which
GraphQL is playing a more central role than just sending JSON back and forth.
In this design, GraphQL lays the foundation of modeling, representing and storing data for an application.

In order to make use of data delivered by an existing (possibly 3rd party) GraphQL web service,
the URI templating and namespace mechanisms can be used to turn JSON objects into RDF graphs.
This means that the results of multiple JSON requests can be added into a single unified graph database
instead of remaining in disconnected tree structures.

To support this vision, we have developed a translation between GraphQL Schemas and the RDF world
using the Shapes Constraint Language SHACL.
This basically enables the use of GraphQL schemas as RDF domain models (aka ontologies).
Benefits include the ability to define complex class hierarchies with a more flexible form of inheritance,
the ability to extend GraphQL schemas with semantically richer constraints and inferencing rules,
and cross-linkage between multiple schemas to reuse definitions, leading to a powerful enterprise knowledge graph.
Details of the translation are described on the GraphQL Schemas to RDF/SHACL page.

Closing the circle, we also made it possible to automatically generate new GraphQL schemas that expose
different view points on underlying RDF data, including the ability to ask almost arbitrary queries.
The mechanism that turns an RDF/SHACL data model into GraphQL schemas is described in
Querying RDF Graphs with GraphQL.
Since the SHACL can be auto-generated from GraphQL schemas, this approach means that users can benefit
from RDF graph technology even if they are not familiar with SHACL.