Post navigation

Translating a blog post into structured data

Recently my Bodleian colleague Alasdair Watson posted an announcement about an illuminated manuscript that is newly available online. To get the most long-term value out of the announcement, I decided to express it as Linked Open Data by representing its content in Wikidata. This blog post goes through that process.

The manuscript, the Shahnamah of Ibrahim Sultan, was not represented on Wikidata, although the epic poem itself, the Shahnamah (or Shahnameh) was already present and so were six of its exemplars.

Three of the grandsons of Tīmūr (Tamerlane) are known to have had lavish copies of Firdawsī’s Shāhnāmah or Persian Book of Kings made for them. The Shāhnāmah of […] Ibrāhīm Sulṭān [is] preserved in the Bodleian Libraries, Oxford,

Ibrahim Sultan is already represented in Wikidata as Q3147516 including his immediate family tree, which connects him to his father Shahrukh (Q553204), who in turn is linked to his father Timur (Q8462).

To create an item for the manuscript, I click on “Create a new item” on the left of any Wikidata page (or, from a script, request a new item through the API). Identifiers are auto-numbered, and more than 53 million have already been allocated, so the Shahnamah of Ibrahim Sultan gets Q53676578. After adding a name and one-line description, the first priority is to say what kind of thing I’m describing:

and to link it to the item representing the poem:

The Wikidata interface makes great use of auto-suggestion and auto-completion, so adding these properties doesn’t require me to type the whole name of the property; just to make a couple of clicks and type the first few letters. We can repeat the process to extract more statements from the text of the blog post.

Thought to have been made in Shiraz…

…sometime between 1430 and Ibrāhīm Sulṭān’s death in 1435,

The manuscript was acquired by Sir Gore Ouseley, a Diplomat and Linguist, during travels in the East in the early 19th century, and came into the Bodleian in the 1850s along with many other of Sir Gore’s collections.

Now the Wikidata representation of the manuscript has eleven properties, and anyone creating an appropriate query in the Wikidata Query Service will get this manuscript in the results. Let’s ask for the English names of works once owned by grandchildren of Timur, along with the link to view them. That query translates into this SPARQL code:

Running that query gives us “Shahnamah of Ibrahim Sultan” and the Digital Bodleian link. At the moment it’s the only result, but as more digitized manuscripts come online, and more of their metadata is shared on Wikidata, the query will return more results over time. More realistic queries are manuscripts whose language is Persian or Middle-Eastern manuscripts held by institutions in England. The relevant query code can be incorporated into a manuscript-browsing app for use by a general audience.

This kind of addition can be made manually through Wikidata’s online interface, or more rapidly in bulk by a script. The sooner digitising institutions put in place workflows to share these metadata, the sooner we all benefit from pathways through biographical, bibliographic, geographical knowledge to the resources we create.