In the situation where a query service is presented with a graph identifier that is not present in local storage, the query service may wish to resolve the graph URI as a URL and make a request to that URL (possibly with conneg) for a document that serializes the content of that graph.

In the situation where a query service is presented with a graph identifier that is not present in local storage, the query service may wish to resolve the graph URI as a URL and make a request to that URL (possibly with conneg) for a document that serializes the content of that graph.

−

NB: It is important to consider what the linked data "Follow your nose" approach means for identified graphs�

+

NB: It is important to consider what the linked data "Follow your nose" approach means for identified graphs.

Graph Use Cases

Storage Use Cases

Organizing Information

When storing RDF information in a graph store, we would like to organize related information into separate graphs. Each graph must be identified with a URI to facilitate retrieval.

Slicing datasets according to multiple dimensions

Within the BBC, we want to slice large RDF datasets according to multiple dimensions: statements about individual programmes, access control, 'ownership' of the data (what product owns/maintains what set of triples), versioning, etc. All those graphs are potentially overlapping or contained within each other. Those issues are very common in large organisations using a single, centralised, triple store.

Permissions

Another purpose in storing RDF content in different graphs is to enforce a permissions model so that sensitive information is not accessed by unauthorized users.

Graph Changes Over Time

When storing graph information retrieved from a URL external to an application, it becomes important to store snapshots of the location over time. When these graph snapshots are taken, it is useful to annotate each snapshot with information such as retrieval time, HTTP Headers used, HTTP Response returned, and other such items that may have affected the contents of the graph snapshot.

Here is a quick JSON-LD (assuming g-snap support) example showing two graph snapshots. The home page changes between the two snapshots:

A more complex example involves supporting decentralized product listings via PaySwarm. That is, in PaySwarm products for sale (access to particular post in a blog, or a particular Web App) are expressed in a decentralized manner on a website. The expression of what is for sale is encapsulated in a graph of information about the asset for sale, pricing information and licensing information that is associated with the sale. The combination of this information is effectively an offer of sale:

Note the "ps:validFrom" and "ps:vaildUntil" dates - that information changes once a day. Since that information in the graph changes, the signatures on the graph change as well. Because of the daily changes, it is important that one is able to track snapshots of this graph as it changes from day to day. Storing this data in a graph store is particularly challenging w/o the fundamental concept of a graph snapshot (Graph Literal).

Dependencies e.g. trace inferences and their results

Using identifying graphs that where consumed and produced by an inference one can can trace the inferences that enriched a triple store to undo some reasoning for instance when the store is updated.

Query Use Cases

While query services are not explicitly addressed in the RDF spec, SPARQL does make use of graph IRIs and we should ensure that the semantics of graph identifiers are compatible with the way in which RDF datasets are defined by SPARQL.

Find Information In a Graph

When a query service processes a query containing a graph identifier, it must resolve the graph identifier to some collection of materialized RDF content that will be returned in the result set.

Computed Graphs

Often, graphs exposed by a query service are not present in any sort of physical storage, but rather their contents are computed at query time. Examples include:

A federated query service may define a graph URI to be the union of graphs accessible through other query services.

A service that does RDB to RDF mapping via R2RML may dynamically compute RDF results based on SQL results at query time.

Graph URIs as Locations

In the situation where a query service is presented with a graph identifier that is not present in local storage, the query service may wish to resolve the graph URI as a URL and make a request to that URL (possibly with conneg) for a document that serializes the content of that graph.

NB: It is important to consider what the linked data "Follow your nose" approach means for identified graphs.

Contextual constraints in queries

In e-Science projects we can use identified graphs to represent and query contextual metadata. For instance, evidence-based reasoning requires being able to differentiate assertions considered as universally true and assertions which are concurrent hypothesis or interpretations. One can use identified graphs when annotating experiments (e.g. in biology) or analysis (e.g. in geology). Identified graphs are used to represent different contexts within which alternative metadata can be described.

Identifying the graphs also allows us to hierarchically organize the RDF datasets, based on RDFS entailment. When considering RDF datasets as contexts, the root of the hierarchy contains the triples that are true in any context below it i.e. any other node of the hierarchy entails it. The other nodes of the hierarchy represent specific contexts; each one recursively inherits and adds to the triples of its ancestors. Each node then provides a different context for querying and reasoning. When a hypothesis is tested (as a SPARQL query), the context of the test is specified by the identifier of the graph to be used.

A special case is the introduction of temporal or geographical aspects in querying and reasoning over the triple store: a query may be solved considering only the assertions that are true in a specific range of time or geographical area.

A way to address this family of scenarios is to allow a basic algebra of sets over the identified graphs. For instance to allowing to assert inclusion.

Provenance Use Cases

Digital Signatures on Graphs

There are a number of ways to create digital signatures on RDF graphs. Often, you do not want to co-mingle the signature information and the graph. Co-mingling signature information in a graph requires the software to use an algorithm to clean the graph in order to generate the signature hash for verification purposes. It also means that it becomes very difficult to sign a graph containing a digital signature at the top-most level. In order to express a digital signature on a graph of information, the idea of a Graph Literal becomes useful. Take the following as an example of a JSON-LD graph that we would like to digitally sign:

However, nobody else could sign that graph without introducing ambiguity as to who signed the graph first. That is, the second signer couldn't sign the initial signer's signature. Therefore, having the concept of a graph snapshot which can be annotated in the same way that triples are annotated becomes very useful. The first signature could be performed like so:

Note that a "dc:date" has been associated with the initial signed graph. Using this technique, one could verify that:

The initial graph was signed by a primary author.

The initial graph w/ signature was annotated and signed by a secondary author.

This is useful when dealing with web-of-trust issues such as trusting graphs which have been cached by third parties. This happens when product listings are cached by companies like Google and then proxied by 3rd parties. You want to ensure that the initial product listing is valid per the asset owner, and that the state of the cache has been verified by Google. This prevents a nefarious proxy of meddling with the information that will be used to perform a financial transaction.

Capture elements of the production context

A graph may be produced through a variety of means and in very different contexts. For instance it could be the result of some natural language processing or other extractions techniques.
An identified graph may be linked to the context in which it was produced (source, properties, etc.)

Separate Ontology Use Case

This use case is derived from a proposal to have OWL annotations that
can be collected together into a separate ontology (and that might even
be able to affect the main ontology). The proposal itself can be seen
at http://www.w3.org/2007/OWL/wiki/Annotation_System however this "use
case" is somewhat of a modification of the suggestions in the proposal.

The basic need is to be able to generate multiple ontologies from a
single OWL document. One ontology is the ontology that corresponds to
the main information in the document. The other ontology (or
ontologies) would sit alongside the main ontology. These secondary
ontologies might be used to store and reason about things like
provenance or certainty.

Aside from the ability to have multiple ontologies be generated from a
single document, there is the need to be able to have syntactic entities
in the main document show up as semantic entities in the secondary
ontologies. Note that this does *not* directly require reflection, as
the syntactic entities don't have their semantic import in the secondary
ontologies. Any semantic relationship between the main ontology and
secondary ontologies is mediated by relationships outside the formalism
semantics, again so that there is no need for reflection or reification
or ....

So far this is about (OWL) ontologies, not graphs, but it can be turned
into a use case for referenceable graphs either by replacing OWL
ontologies by RDF graphs or by considering RDF graph naming as the
syntactic mechanism for separate ontologies in the RDF encoding of OWL.

General Annotation Framework

As mentionned above, there are use cases for temporal annotation (a graph is valid during a certain time frame), provenance, etc. We want to support reasoning based on these annotation, using a generic approach as defined in [1][2][3]. For instance, in the temporal setting (i.e., where graphs or triples are annotated with time-frames), if:

ex:chadhurley rdf:type ex:YoutubeEmployee . [2005,2010]

(that, the triple holds at least between 2005 and 2010) and:

ex:YoutubeEmployee rdf:type ex:GoogleEmployee . [2006,2011]

then:

ex:chadhurley rdf:type ex:GoogleEmployee . [2006,2010]

If annotations exist in a dataset, one can query for them, asking for instance when a triple holds (e.g., "who was a GoogleEmployee and when?").
This generalises to other types of annotations, as described in [1][2][3]. E.g., with provenance annotation:

A typical use case for including provenance in the data is when exchanging datasets coming from multiple sources. This is for instance done for the Billion Triple Challenge, which in fact does not provide a billion triples but a billion quadruple, so that sources are identified.

Following the same generic framework, it is also possible to deal with fuzzy, probabilistic and uncertain information, e.g.,:

Fuzzy-annotated RDF are likely to be produced automatically by tools relying on statistical data or heuristic-based algorithm. Terminological statements with uncertainty are very common outputs of ontology matching algorithms.

In all these situations, identifying the triples or graphs to which attach the annotations is necessary.