DBpedia Version 2014 released

The most important improvements of the new release compared to DBpedia 3.9 are:

1. the new release is based on updated Wikipedia dumps dating from April / May 2014 (the 3.9 release was based on dumps from March / April 2013), leading to an overall increase of the number of things described in the English edition from 4.26 to 4.58 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner data.

The English version of the DBpedia knowledge base currently describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases.

We provide localized versions of DBpedia in 125 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia. The full DBpedia data set features 38 million labels and abstracts in 125 different languages, 25.2 million links to images and 29.8 million links to external web pages; 80.9 million links to Wikipedia categories, and 41.2 million links to YAGO categories. DBpedia is connected with other Linked Datasets by around 50 million RDF links.

Altogether the DBpedia 2014 release consists of 3 billion pieces of information (RDF triples) out of which 580 million were extracted from the English edition of Wikipedia, 2.46 billion were extracted from other language editions.

The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2014 extraction, we used 4,339 mappings (DBpedia 3.9: 3,177 mappings), which are distributed as follows over the languages covered in the release.

English: 586 mappings

Dutch: 469 mappings

Serbian: 450 mappings

Polish: 383 mappings

German: 295 mappings

Greek: 281 mappings

French: 221 mappings

Portuguese: 211 mappings

Slovenian: 170 mappings

Korean: 148 mappings

Spanish: 137 mappings

Italian: 125 mappings

Belarusian: 125 mappings

Hungarian: 111 mappings

Turkish: 91 mappings

Japanese: 81 mappings

Czech: 66 mappings

Bulgarian: 61 mappings

Indonesian: 59 mappings

Catalan: 52 mappings

Arabic: 52 mappings

Russian: 48 mappings

Basque: 37 mappings

Croatian: 36 mappings

Irish: 17 mappings

Wiki-Commons: 12 mappings

Welsh: 7 mappings

Bengali: 6 mappings

Slovak: 2 Mappings

3. Extended Type System to cover Articles without Infobox

Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. Starting from the 3.9 release, we provide type statements for articles without infobox that are inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2014 (http://www.heikopaulheim.com/documents/ijswis_2014.pdf). For the new release, an improved version of the algorithm was run to produce type information for 400,000 things that were formerly not typed. A similar algorithm (presented in the same paper) was used to identify and remove potentially wrong statements from the knowledge base.

4. New and updated RDF Links into External Data Sources

We updated the following RDF link sets pointing at other Linked Data sources: Freebase, Wikidata, Geonames and GADM. For an overview about all data sets that are interlinked from DBpedia please refer to http://wiki.dbpedia.org/Interlinking.

Daniel Fleischhacker (University of Mannheim) and Volha Bryl (University of Mannheim) for improving the DBpedia extraction framework, for extracting the DBpedia 2014 data sets for all 125 languages, for generating the updated RDF links to external data sets, and for generating the statistics about the new release.

All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.

Petar Ristoski (University of Mannheim) for generating the updated links pointing at the GADM database of Global Administrative Areas. Petar will also generate an updated release of DBpedia as Tables soon.