News

2014-09-10: An updated version of the LOD Cloud diagram has been published. The new version contains 570 linked datasets which are connected by 2909 linksets. New statistics about the adoption of the Linked Data best practices are found in an updated version of the State of the LOD Cloud document.

2013-04-25: The Ordnance Survey, Great Britain's national mapping agency, has launched its new Linked Data service.

2012-03-25: The accepted papers of the 5th Linked Data on the Web Workshop (LDOW2012) are online now. LDOW2012 will take place at WWW2012 in April 2012, Lyon, France. Beside of the paper presentations, there will be a panel discussion at the workshop about the deployment of Linked Data in different application domains and the motivation, value proposition and business models behind these deployments, especially in relation to complementary and alternative techniques for data provision (e.g. Web APIs, Microdata, Microformats) and proprietary data sharing platforms (e.g. Facebook, Twitter, Flickr, LastFM).

2012-02-03: LODStats released. LODStats constantly monitors the Linked Data cloud and calculates statistics about the content of the data sets, their accessability as well as the usage of different vocabularies. LODStats complements the meta-information about LOD data sets provided by CKAN and LOV.

2011-11-10: The W3C has launched a community directory of Linked Data projects and suppliers in the domain of eGovernment.

2011-10-12: Facebook has started to support RDF and Linked Data URIs and now provides access to parts of its user data via a Linked Data API. For details, see these posts (1, 2) by Jesse Weaver.

2011-09-19: Updated version of the LOD Cloud Diagram and State of the LOD Cloud statistics published. Thanks a lot to everybody who contributed to the creation of the diagram by providing meta-information about the data sets on the Data Hub. Altogether, the data sets in the LOD cloud currently consist of over 31 billion RDF triples and are interlinked by around 504 million RDF links.

2011-06-02: Google, Yahoo and Microsoft have agreed on vocabularies for publishing strucutred data on the Web. Their shared 'ontology' is maintained on schema.org. The Linked Data community congratulates to this important step forward towards making Web content more strucuted and thus allowing applications to do smarter things with it!

2011-03-29: The 4th Linked Data on the Web Workshop (LDOW2011) took place at WWW2011 in Hyderabad, India. The workshop was attended by around 70 people and we had lots of interesting discussions. The papers and presentation slides are available from the workshop website. Photos from the workshop are found on flickr.

2010-10-20: Linked Enterprise Data book released. The book records some of the earliest production applications of linking en- terprise data and is freely available as HTML in addition to being published by Springer.

2010-09-24: New version of the LOD Cloud diagram released Over the last weeks, vrious members of the LOD community have collected detailed meta-data about linked datasets on CKAN. This data was used to draw a new September 2010 version of the LOD diagram. The new diagram contains 203 linked datasets which together serve 25 billion RDF triples to the Web and are interconnected by 395 million RDF links. State of the LOD Cloud provides further statistics about the datasets in the cloud.

2010-05-28: Newsweek is now using RDFa, Dublin Core, FOAF and SIOC to annotate the articles on their website.

2010-01-12: A Japanese translation of this page is available here. Thanks a lot to Noboru Shimizu and Shuji Takashima for translating the page and for promoting Linked Data in Japan. An ongoing (traditional) chinese translation is also now(2/10) avaiable. More transltions are welcome!

The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.

RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications.

The figures below show the data sets that have been published and interlinked by the project so far. Collectively, the 570 data sets are connected by 2909 link sets.

The figure below, offered by the UMBEL project, shows (some of) the class-level interlinking of the data dictionaries (shared vocabularies, schemas, ontologies) associated with the data sets shown above. Click the image for a node-clickable version. More information about class-level/vocabulary-level interlinking is provided by the Linked Open Vocabularies (LOV) project.

Project Pages

The project collects relevant material on several wiki pages. Please feel free to add additional material, so that we get an overview about what is already there and what is currently happening.

The goal of the Linking Open Data project is to build a data commons by making various open data sources available on the Web as RDF and by setting RDF links between data items from different data sources.

RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications.

For demonstrating the value of the Semantic Web it is essential to have more real-world data online. RDF is also the obvious technology to inter-link data from various sources.

3. Why do you think this project will have a wide impact?

A huge inter-linked data set would be beneficial for various Semantic Web development areas, including Semantic Web browsers and other user interfaces, Semantic Web crawlers, RDF repositories and reasoning engines.

Having a variety of useful data online would encourage people to link to it and could help bootstrapping the Semantic Web as a whole.

4. Can your project be easily integrated with other wide-spread systems? If so, which and how?

5. Why is it that this project should be done right now, i.e. why should people prioritize this ahead of other projects?

It is getting boring to play around with toy examples as most Semantic Web projects do.

6. What can you contribute to the project?

We will keep on working on DBpedia and start serving RDF for all 1.6 million concepts in Wikipedia in a couple of weeks. As Wikipedia contains information about various domains, we think DBpedia URIs could function as a valuable linking-hub for interconnecting various data sources. We could link to related data from DBpedia as we already did with the links to Geonames.

7. What contribution would you need from others?

Propose additional open data sources that could be mapped to RDF.

Convert a data source to RDF and serve it as linked data or SPARQL endpoint on the Web.

Invent heuristics to auto-generate links between data items from different sources.

8. What standardization should the Semantic Web community at large undertake to support the project?

9. How does your project encourage others not currently involved with Semantic Web technologies to get involved (by providing data or make a coding commitment)?

Having useful data online might initialize network effects. The project could raise awareness within the Open Data community about the benefits that RDF as a shared data model offers them. Having richly inter-linked data online might inspire people to create interesting mashups and other RDF-aware applications.

10. What would be the main benefit of using Semantic Web technologies to achieve the goals of the project, compared to other technologies?

RDF provides a flexible data model for integrating information from different sources. Especially its linking capabilities are not provided by any other data model.

Commitments

If you like this project, please write your name below and indicate what contribution you can make to the project. Possible forms of commitment are:

I think this project is a good idea and it's realization would be useful.

I would like to propose further data sources for being published as RDF

I would like to convert a data source to RDF

I could serve some data from my server (if somebody would give it to me)

I would like to work on heuristics to auto-generate links between data items from different sources

I could talk with other people that might want to contribute to the project

Chris Bizer and Richard Cyganiak proposed the project to the W3C SWEO. We maintain several Linked Data sources, including DBpedia, DBLP Berlin, CIA Factbook, Book Mashup and Eurostat, and do outreach and coordination work for the project.

Sören Auer - I try to contribute with regard to converting, serving data-sources and talking to people ;-)

Bernard Vatant - Already involved in Geonames ontology, and linking Geonames data and concepts to other sources such as INSEE data. Projects to do more, linking to GEMET concepts, Wikipedia categories etc.

Josh Tauberer - 700 million triples of U.S. Census data coming very soon now.... (Just having some free disk space issues loading it into MySQL.) Tying this to GeoNames will be an interesting/useful project for someone looking for a project.

Tom Heath - Great idea. I can contribute the involvement of Revyu.com, (AFAIK) the only RDF-based reviewing and rating site in the wild. The sites exposes data using FOAF, the Review Vocab and Richard Newman's Tag Ontology, and everything gets dereferenceable URIs. The data set is modest, but growing. I'm really interested in developing heuristics to auto-generate sameAs links between URIs from Revyu and elsewhere, and ways to infer locations of things from reviews and tags and hook this into Geonames.

Felix Van de Maele - A very interesting project. I developed an ontology-focused crawler and am currently working on the community-driven ontology matcher and mediator which might be handy to interlink RDF data sources.

Stefano Mazzocchi - As part of the MIT Simile Project, I've been RDFizing large data sets for years (unfortunately, most of these are not data I can make publicly available). We provide a way to export RDF data from all of our RDF browsing tools, but we haven't focused on providing URI dereferencing for such data and I agree that it might be important to start doing so. The juiciest dataset we have to offer is a 50Mt dump of the MIT Libraries catalog covering about a million books. I'm also currently working on an owl:sameAs-based RDF smoosher (which is already functional from the command line) and I'm planning on working on equivalence mining next. Also worth noting how the SIMILE Project has a large collection of RDFizing programs that can be used to generate large quantities of RDF from existing data.

Ed Summers - I'm a software developer at the Library of Congress interested in making bibliographic and authority data sets available to the semantic web.

Yves Raimond - I am a PhD student in the Centre for Digital Music, Queen Mary, University of London, and I am interested in linking music-related open data (Musicbrainz,
Magnatune, Jamendo, Dogmazic,
Mutopia, among others...). I am also part of two projects, in which I am trying to promote such an approach: EASAIER (Enabling Access to Sound
Archives through Enrichment and Retrieval) and OMRAS2 (Online Music Recognition and
Searching).

Vangelis Vassiliadis - I could work on heuristics to auto-generate links between data items from different sources and on adding domain knowledge to different
data sets by means of OWL.

Georgi Kobilarov - I maintain the DBpedia extraction framework. I'm interested in developing tools to help data publishers interlink their databases and in building UIs for end-users .

Huajun Chen - developer of DartGrid which is a relational data integration toolkit using semantic web technologies. Two major components of DartGrid are a visulized semantic mapping tool and a view-based(or more generally rule-based) SPARQL-SQL query rewriting component. ISWC2006 Paper introduces the details.

Giovanni Tummarello - I created Sindice, together with Eyal Oren. Sindice is a linked data search engine which returns ranked lists of "SeeAlso" URLs which contain information about a given URI. In a sense it overcomes the problem of the need of the mandatory "SeeAlso" statements by looking anywhere on the web (via people providing direct Ping either to ourself or to PTSW and via our array of swse bots). "SeeAlso" statemens remain useful however for ranking purposes. Service has a simple http API, see for example all the links Sindice knows which talk about Tim Berners-Lee here.

Sherman D. Monroe - I'm the author of Cypher, which is a transcoder with generates the RDF and SeRQL (working on SPARQL port) representation of natural language phrases and sentences. The project page can be found here. The Cypher project aims to collect and unify the various sources of data used for NLP tasks, such as WordNet, FrameNet, PropBank, as well as annotated corpora, and also to provide standard ontologies for things like part-of-speech tagging. We wish to provide a single resource which NLP applications can use to leverage this data. I'm also working on a Semantic Web web service called overdogg (currently in alpha) which is a new type of marketplace based on reverse auction for services, and another soon to be announced service which builds FOAF databases of users.

Adam Sobieski - Interested in event ontology and also wikitology. I'm making a website that allows users to select or create predicates and drag and drop nouns (noun phrases) from sentences into predicate slots. The interface will capture pronoun resolution and semantics from visitors reading. The sentences will be viewed in order from articles to obtain context information. The downloadable resource will be both a corpus (hopefully as useful as Penn treebank and Redwoods) and a collaborative ontology relating nouns from real-world encyclopedic articles.

Troy Self - I maintain SemWebCentral, which is a development site for Open Source Semantic Web tools. I also maintain the RDF browser, ObjectViewer, the ontology summarizer, Ocelot, and was one of the primary developers of the Semantic Web Development Environment, SWeDE.

David Peterson - I am working on getting large Australian science data sets converted to RDF and accessible via SPARQL. We work with over 13 large and diverse science organisations so I believe this will be a valuable contribution.

Joerg Diederich - I am working on Semantic Web topics and Digital Libraries and I am the maintainer of FacetedDBLP. I am planning to contribute my local DBLP data (updated weekly) by means of the D2R technology from FUBerlin very soon.

Danny Gagne -- I think this is a great idea. I'm going to work on building some tools, trying to create a small dataset, who knows what else :)

MichaelHausenblas -- I joined the LOD community project in June 2007 after Chris Bizer has told me about this great idea. In the meantime quite some things emerged and I think I have left some traces ;) In the beginning I was interested in building a linked dataset, which actually yielded riese, the RDFised and interlinked version of the Eurostat's statistical data. As a by-product we developed a pradigm allowing for manual interlinking, called UCI (User-Contributed Interlinking). Then my focus shifted and I wanted to build applications on top of linked data which triggered the creation of voiD, the 'Vocabulary of Interlinked Datasets'. Now, as I'm a multimedia guy at heart, I also started to apply linked data principles to multimedia content. Finally, I gave a LOD tutorial at ISWC08 with Tom, Chris, Richard and Olaf (see also my quick intro into linked data). I've participated in (and co-organised) several LOD Gatherings so far and plan to do so in the future.

Bernhard Schandl -- I am working on integration of semantic data into user desktops and file systems and am interested in the possibilities of publishing such data on the web.

David Huynh -- I'm interested in building UIs for browsing and viewing the collected data. I don't think that a SPARQL interface appeals to the general public, and a pure search text box a la Google takes sufficient advantage of the graph nature of the data.

Daniel Lewis -- I am a Technology Evangelist for OpenLink Software, my interests are in making the Social Web more Semantic, making the Semantic Web more User Friendly and making the web more intelligent. The only way to advance web applications is to expose and link data between domains - Semantic Web technology can do that. My blog is available here and I tend to talk about various subjects (not just about the Semantic Web).

Andreas Langegger -- I'm currently developing a middleware for virtual data integration based on SW technologies. It will be used inside the Austrian Grid for sharing structured scientific data but because of the relevance for the SW community, it will be released as SemWIQ (Semantic Web Integrator and Query Engine) in mid-2008. Among other goals, I try to keep setup / configuration as simple as possible. Stay tuned (currently optimization is on the top of the agenda)...

Rob Cakebread -- I run Doapspace.org and am mainly interested in DOAP and linking it with FOAF, BEATLe, SIOC. I'm a Gentoo Linux developer and I'm working on tools for users and package maintainers to benefit from the metadata provided by DOAP and related ontologies.

Francois Scharffe -- I'm a researcher at STI Innsbruck in the area ontology alignment. I'm involved in the EASAIER project where we publish music related data sets using the music ontology. I've worked on SPARQL++, a SPARQL extension allowing to transfer RDF data from one ontology to another. I'm interested in data fusion techniques. I'm also interested creating a classical music reference knowledge base that could be used as an anchor to publish classical music data sets.

Andraz Tori -- I'm CTO at Zemanta. I work on architecting Zemanta's engine to disambiguate to Linking Open Data entities. Would like to see which parts of LOD are candidates for inclusion into Zemanta and get the feedback of what the API needs to return to be most useful for enabling LOD mashups.

Ted Thibodeau Jr - I've been with OpenLink Software since December 2000, working with Data Access and all that entails, including the Linking Open Data project, many aspects of the Virtuoso Universal Server, exposing more-and-less structured data (from RDBMS content to plaintext) as RDF, dreaming up new wish lists for Linked Data features and applications, connecting people and projects, and more. My dream includes realization of the Knowledge Navigatorconcept, complete with unrolled and/or unfolded screen, voice commands, intelligent agents, etc. Before joining OpenLink, I spent time serving various roles in various industries, all of which would have benefited from the LOD project, and those experiences guide my efforts today.

Andreas Harth -- I'm a researcher at DERI Galway, working on integrating web data without making assumptions on the schema used. I currently crawl parts of the LOD (at least weekly) and provide search, browsing and navigation functionality over that LOD data in VisiNav. The site is useful for the LOD crowd as it allows people to inspect LOD data (including provenance tracking) in case there are hickups in the data published in the LOD cloud.

Hydrasi -- I'm JohnMetta, a scientist, 20+ year Open Source Software developer, and founder at Hydrasi. Hydrasi is a company focused on the water and climate sciences and we are planning on developing a non-profit foundation, the Climate Cloud Initiative, as an open data network of water and climate data. I'm happy to see such a diverse and active crowd in open data-- from so many fields.

TobyInkster - I'm Toby Inkster. Not sure why I've added myself to this list so late. Without exaggeration, I think Linked Data could play a part in solving some of the biggest problems of the 21st century. We can't really in advance which information will be turn out to be needed to save the world - the early scientists who caught lightning with kites could never have known that electricity and the things electronics have made possible would be so important in the modern age - so it's key to link together as much data as possible.

Mariano Rico -- I developed VPOET, a web application oriented to create presentation templates for handling semantic data easily. As an example, click here to browse TBL's FOAF profile under a given template.
These templates can be used also by web developers through HTTP messages, and end-users can use these templates by using the Google Gadget GG-VPOET. You are welcome to create your own templates, reuse other's templates, or create your own templates repository (code available). I am enthusiastic with LOD and I think that semantic templates can contribute to reduce the adoption barrier of semantic data for common web developers and end-users.

How To Join the Project

Send a little self introduction to the mailing list (include an intro to your project and associated RDF Data Sets where such exist or are planned).

Register at [2], which automatically gives you a WebID (an ID for you, the Person Entity, e.g., WebID for Kingsley Idehen, an Entity of Type: Person) and an OpenID URL. Use the Profile page to Link to your other URIs (if such exist) via the "Synonyms" input field.