Wikipedia articles consist mostly of free text, but also include structured information embedded in the articles, such as "infobox" tables (the pull-out panels that appear in the top right of the default view of many Wikipedia articles, or at the start of the mobile versions), categorisation information, images, geo-coordinates and links to external Web pages. This structured information is extracted and put in a uniform dataset which can be queried.

In September 2014, version 2014 was released.[5] Compared to previous versions, one of the main changes was the way abstract texts got extracted. By running a local mirror of Wikipedia and retrieving the rendered abstracts from it, the extracted texts got considerably cleaner. Furthermore, a new data set containing contents extracted from Wikimedia Commons was introduced. The whole DBpedia data set describes 4.58 million entities, out of which 4.22 million are classified in a consistent ontology, including 1,445,000 persons, 735,000 places, 123,000 music albums, 87,000 films, 19,000 video games, 241,000 organizations, 251,000 species and 6,000 diseases.[6] The data set features labels and abstracts for these entities in up to 125 different languages; 25.2 million links to images and 29.8 million links to external web pages. In addition, it contains around 50 million links into other RDF datasets, 80.9 million links to Wikipedia categories, and 41.2 million YAGO2 categories.[6] The DBpedia project uses the Resource Description Framework (RDF) to represent the extracted information and consists of 3 billion RDF triples, 580 million extracted from the English edition of Wikipedia and 2.46 billion from other language editions.[6]

From this data set, information spread across multiple pages can be extracted, for example book authorship can be put together from pages about the work, or the author.[further explanation needed]

One of the challenges in extracting information from Wikipedia is that the same concepts can be expressed using different parameters in infobox and other templates, such as |birthplace= and |placeofbirth=. Because of this, queries about where people were born would have to search for both of these properties in order to get more complete results. As a result, the DBpedia Mapping Language has been developed to help in mapping these properties to an ontology while reducing the number of synonyms. Due to the large diversity of infoboxes and properties in use on Wikipedia, the process of developing and improving these mappings has been opened to public contributions.[7]

DBpedia extracts factual information from Wikipedia pages, allowing users to find answers to questions where the information is spread across many different Wikipedia articles. Data is accessed using an SQL-like query language for RDF called SPARQL. For example, imagine you were interested in the Japanese shōjo manga series Tokyo Mew Mew, and wanted to find the genres of other works written by its illustrator. DBpedia combines information from Wikipedia's entries on Tokyo Mew Mew, Mia Ikumi and on works such as Super Doll Licca-chan and Koi Cupid. Since DBpedia normalises information into a single database, the following query can be asked without needing to know exactly which entry carries each fragment of information, and will list related genres:

DBpedia Spotlight is publicly available as a web service for testing purposes or a Java/Scala API licensed via Apache License. The DBpedia Spotlight distribution also includes a jQuery plugin that allows developers to annotate pages anywhere on the Web by adding one line to their page.[18] Clients are also available in Java or PHP.[19] The tool handles various English languages through its demo page[20] and web services. Internationalization is supported for any language that has a Wikipedia.[21]

^"Life in the Linked Data Cloud". www.opencalais.com. Retrieved 2009-11-10. Wikipedia has a Linked Data twin called DBpedia. DBpedia has the same structured information as Wikipedia – but translated into a machine-readable format.

^"BBC Learning - Open Lab - Reference". bbc.co.uk. Retrieved 2009-11-10. Dbpedia is a database version of Wikipedia. It is used in a lot of projects for a wide range of different reasons. At the BBC we are using it for tagging content.