He pointed out that Chon­i­cling Amer­ica has a pub­lished API (appli­ca­tion pro­gram­ming inter­face) that explains how one can access the con­tent of Chron­i­cling Amer­ica. The API is at: http://chroniclingamerica.loc.gov/about/api/

The API facil­i­tates the fol­low­ing functions:

Search — with results returned in HTML, JSON, or Atom — allow­ing for sim­ple human read­ing of a web page, web page manip­u­la­tion of the returned data arriv­ing in JavaScript Object Nota­tion, or as an Atom feed, that can be read in a feed reader, such as Google Reader or Bloglines.

Link — to “titles, issues, edi­tions, and pages” using “LCCNs, dates, issue num­bers, edi­tion num­bers, and page sequence num­bers.” Using some of the exam­ples on the site, you can quickly pre­dict and test poten­tial URLs, then use and share them. You can also gen­er­ate URLs out of a data­base, once you under­stand the rules.

Linked Data — using pub­lished, stan­dard ontolo­gies, you can use the Chron­i­cling Amer­ica data­base to get at related con­tent on the “seman­tic web”, where that con­tent is sim­i­larly tagged. Using RDF/OWL (Resource Descrip­tion Frame­work / Web Ontol­ogy Lan­guage) tech­nolo­gies, this con­tent can be deliv­ered to users in new and cre­ative ways.

Aggre­ga­tions — Chron­i­cling Amer­ica has assem­bled col­lec­tions of related items (such as JPEG 2000, PDF, and OCR text of the same news­pa­per page) using a tech­nol­ogy called OAI/ORE (Open Archives Ini­tia­tive, Object Reuse and Exchange).

I am amazed by the scope of this project, as well as how openly the con­tent is being made avail­able. Here’s a brief snip­pet from their API page about the scope of Chron­i­cling America:

There are more than a mil­lion dig­i­tized news­pa­per pages in Chron­i­cling Amer­ica. These pages span sev­eral decades and many U.S. states and ter­ri­to­ries. New batches of data come in from part­ner insti­tu­tions through­out the year and are added to the site regularly.

The open­ness of the con­tent, which such a rich, pub­lished API, means that this con­tent is ripe for re-purposing, and the site itself can teach you how to get to its own con­tent. Just as I noticed the pre­dic­tive URLs, the folks at Chron­i­cling Amer­ica write:

Details about these inter­faces are below. In case you want to dive right in, though, we use HTML link con­ven­tions to adver­tise the avail­abil­ity of these views. If you are a soft­ware devel­oper or researcher or any­one else who might be inter­ested in pro­gram­matic access to the data in Chron­i­cling Amer­ica, we encour­age you to look around the site, “view source” often, and fol­low where the dif­fer­ent links take you to get started.