Associated Projects

Closed Groups

W3C Data Activity Building the Web of Data

More and more Web applications provide a means of accessing data. From simple
visualizations to sophisticated interactive tools, there is a growing reliance
on the availability of data which can be “big” or “small”, of diverse origin, and
in different formats; it is usually published without prior coordination with
other publishers — let alone with precise modeling or common vocabularies. The Data
Activity recognizes and works to overcome this diversity to facilitate potentially
Web-scale data integration and processing. It does this by providing standard data exchange formats, models, tools, and guidance.

The overall vision of the Data Activity is that people and organizations should
be able to share data as far as possible using their existing tools and working
practices but in a way that enables others to derive and add value, and to utilize it
in ways that suit them. Achieving that requires a focus not just on the interoperability of data but of communities.

Context & Vision

The Data Activity merges and builds upon the
eGovernment and
Semantic Web Activities.
The eGovernment Activity comprised an interest group that offered members a series of interesting talks from
well placed speakers in governments around the world, including from countries that are often under-represented
at W3C such as Jordan and Uganda. Primary topics have been the use of social media for citizen engagement and open data.
The Semantic Web Activity was launched in 2001 to lead the use of the Web as an exchange medium for data as well as
documents. That overall aim, along with a series of associated activities by W3C and others, has
been highly successful — although not necessarily in the way originally envisioned. For example,
the vision was that organizations and individuals would publish data in much the same way that
they were already publishing Web pages. Enormous volumes of data are available on the Web today but
it is typically published through portals that act on behalf of multiple agencies, not on
the Web sites operated by those agencies themselves. Data publication is seen as a specialist activity,
not as something anyone can do, and therefore it is more centralized than expected.

The Activity will make data publication less of a specialist activity and ensure that
the excellent work done by portals does not lead to de facto data silos.

There is a benign current of centralization in vocabularies. The success of
Linked Open Vocabularies as a central information
point about vocabularies is symptomatic of a need, or at least a desire, for an
authoritative reference point to aid the encoding and publication of data.
This need/desire is expressed even more forcefully in the rapid success and adoption of
schema.org. The large and growing set of terms in the
schema.org namespace includes (and references) many established terms defined elsewhere,
such as in vCard, FOAF, Good Relations and rNews. Designed and promoted as a means of
helping search engines make sense of unstructured data (i.e. text), schema.org terms are
being adopted in other contexts, for example in the ADMS
vocabulary originally developed by the European Commission.

The Data Activity will continue to support this work as well as promoting W3C's existing
open approach to the coordination, recognition and persistent hosting of vocabularies, which the
user community sees as critical companions to Web standards such as XML, RDF and HTML.

The use of the Web as a platform for delivering data has been driven by policy as much
as by technology. The G8 Open Data Charter
being a prime example. Other examples include President Obama’s Executive Order
and the European Union’s revised PSI Directive.
These policies apply equally to the areas of government information, scientific research, and cultural
heritage and that creates a further source of diversity of workflows, people and the technologies they use.

The W3C Data Activity will support technologists tasked with responding to this political pressure.
It will do so in a way that works for those individuals and at the same time delivers maximum return on
the political and financial investments made, minimizing the risk that data produced in one community
remains only usable by other members of that same community.

Although the needs and views of application developers are, of course, of critical importance, the Data
Activity is designed to support the needs of the public and private sector organizations working to
publish and integrate data across the Web. W3C has traditionally worked on Semantic Web technologies
and has promoted the publication of data through the (Enterprise) Data and (5 star) Linked Open
Data approaches. The primary value of Linked Data, of RDF and related technologies, is that these
technologies have the Web at their “core,” providing a unique means of integrating data at Web
scale. Such integration may happen online but often happens within industry (offline or in the cloud).
YarcData gave some examples of this in their interview with Ian Jacobs
in which Shoaib Mufti explained how Semantic Web technologies can help to process Big Data and derive insights
from it that might otherwise remain hidden.

However, not all applications need the power of Semantic Web technologies to achieve
data integration; in many cases applications work with one or two specific datasets that can
be accessed and managed individually. Datasets of significant size are published on the Web
in different formats and the conversion or access to this data specifically as RDF is not
always necessary. The Data Activity will contribute to the larger data ecosystem to ensure interoperability and
ease of application development.