Site Navigation

Site Mobile Navigation

First 5,000 Tags Released to the Linked Data Cloud

For more than 150 years, The New York Times has meticulously indexed its archives. Through this process, we have developed an enormous collection of subject headings, ranging from “Absinthe”[1] to “Zoos”[2].

Unfortunately, our list of subject headings is an island. For example, even though we can show you every article written about “Colbert, Stephen [3],” our databases can’t tell you that he was born on May 13, 1964, or that he lost the 2008 Grammy for best spoken word album to Al Gore. To do this we would need to map our subject headings onto other Web databases such as Freebase and DBPedia.

So that’s exactly what we did.

Over the last several months we have manually mapped more than 5,000 person name subject headings onto Freebase and DBPedia. And today we are pleased to announce the launch of http://data.nytimes.com and the release of these 5,000 person name subject headings as Linked Open Data.

Over the next several months, we plan to expand http://data.nytimes.com to include each of the nearly 30,000 subject headings we use to power Times Topics pages, a collection that includes locations, organizations and descriptors in addition to person names.

So have you ever wanted to query The New York Times for all the articles mentioning people born in your hometown? Or maybe all the articles written about United States senators born in states other than the state they represent? Head over to http://data.nytimes.com to browse our subject headings; download a SKOS file containing all 5,000 subject headings; and start hacking.

It appears to me that the combination of owl:sameAs links with the dcterms:rightsHolder properties on each entity entail claims by the New York Times that it has license and attribution rights to thousands of dbPedia and freebase entities. The rightsHolder assertions are flat-out wrong and should be removed.

Your concerns about our use of owl:sameAs relations certainly merit further discussion and I would welcome your input as to how we should best address this issue. I encourage you to post approaches to this issue at http://data.nytimes.com/community and look forward to working with all of you as we move this product forward.

There isn’t a good reason, I’ll change the uri’s as you’ve suggested in our next update of the data set.

The difference between topics and tags is that we use the tags in queries from which we generate the topic pages. For example, the query we use to create the Sasha Obama topic page looks like this.

(per=”OBAMA, SASHA” or (per=”Obama, Barack” and des=”Families and Family Life” and body=”Sasha”)) and tom!=”Caption” and tom!=”Correction” and tom!=”List” and tom!=”Paid Death Notice” and dsk!=”Society”

Since these queries may sometimes involves multiple tags, there is not a one-to-one correspondence between topic pages and tags.

As to whether or not we aim to replace our topic page URLs with our RDF identifiers: The data we published yesterday is intended to compliment not supplant what’s already out there.