The British National Bibliography (BNB) server is generally more responsive than the Cambridge University Library one; title seems to work better than author. The following are hopefully useful examples useful:

author=”fisher” seems to work (at least in BNB) so long as the number of hits to return is small (e.g. 5).

I would really like to try and think of ways of improving free text regular expression search times for things like author and title in Sparql* although I doubt there is one that doesn’t rely on the configuration, processing power, or indexing of the server being searched.

* thinking aloud, some ideas might include: downloading a larger imprecise set for further local searching (e.g. for an author/title search downloading the title matches and searching the authors locally: although this would also be slow, it would get round the timeout at least); forcing a look-up in a controlled vocab first in order to get an exact string match (esp for authors, although even if this is possible, this forces a user to do more work, which isn’t the point); local indexing of the triple store (this is probably the best way but I’m not sure how to go about it, whether I really have the server capabilities to do it, and can be committed to the updating required).

One of the difficulties in searching RDF data is knowing what the data looks like. For instance, finding a book by its title means knowing something about what how a dataset has recorded the relationship between a book and its title. There is no real standard for publishing MARC/AACR2-style bibliographic data as RDF: it seems libraries publishing RDF are approaching this largely individually, although they are using many of the same vocabularies, dc, bibo, etc. This was one reason why I wanted to create Lodopac: to present some kind of interface so that searchers didn’t need to know these different models but could start to explore them. Below are the Sparql recipes for the different search criteria I used for the BNB and the Cambridge University Library datasets, so they can be compared, re-used, or corrected. All examples use prefixes, which are defined anew in each example. The examples are of course fragments and don’t have all the necessary SELECT and WHERE clauses.

By the way, for an excellent Sparql tutorial with ample opportunity to play as you go along, do have a look at the Cambridge University Library’s SPARQL tutorial. It also gives clues to the way their data is structured. Of use for the BNB is their data model (PDF), which is not nearly as scary as it looks at first, and incredibly helpful.

Author keyword search

This would be relatively straightforward-the unavoidable regular expression being the main complication- but for the fact that the traditional author/editor/etc of bibliographic records can be found in dc:creator as well as dc:contributor which necessitates a UNION. The BNB used foaf:name:

This is more straightforward and is in fact the same for both the BNB and Cambridge University Library:

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?book

WHERE {

?book dc:title ?title .
FILTER regex(?title, “title”, “i”) .

}

Date of publication (year)

I imagined this one being simple and for Cambridge University Library it is. However the BNB took some unravelling as they have modelled publication as an event related to a book. The various elements of publication are then related to the event. So, for the BNB we have this:

As an identifier, ISBN is relatively straightforward in both models, although care must be taken with the BNB as 10 and 13 digit ISBNs are treated as separate properties and the following assumes that the search will cover both:

PREFIX bibo: <http://purl.org/ontology/bibo/>

SELECT ?book

WHERE {

{?book bibo:isbn10 “isbn”} UNION {?book bibo:isbn13 “isbn”} .

}

For Cambridge University Library, also using the bibo ontology, this is:

PREFIX bibo: <http://purl.org/ontology/bibo/>

SELECT ?book

WHERE {

?book bibo:isbn “isbn” .

}

Conclusion

I didn’t set up to provide ground-breaking conclusions. However, it is remarkable how different data models can be formulated for modelling the same type of data by similar organisations. The real question is whether this is a good, a bad thing, or doesn’t really mattter. Will it need to be standardised? My understanding of how this works is probably not. I think the days of monolithic library standards are probably now gone. I wonder, for instance, if there ever will be a single MARC22 (or whatever you like to call it) and doubt RDA will ever completely replace AACR2 in the way we imagine. What will emerge I suspect will be a soup of various standards and data models, some of which will be more prevalent. One thing I picked up from various linked data talks is that information has frequently been published then re-used in ways that the issuers never imagined; if that is the case, the precise modelling and format is probably not as important as the fact that it is of good quality and intelligently put together. The BNB and Cambridge University Library models are clearly quite different but quite capable of being mapped and used despite this.

If there are any other bibliographic data Sparql endpoints, I would like to include them in a future version of the Lodopac search. Do let me know if you come across them.

More mundanely, do say if there are errors in my Sparql recipes or if there are ways they could be done more efficiently.

Lodopac is my entry to the UK Discovery Developer Competition. Aside from obvious mocking of the name, comments on Lodopac are very welcome. If anyone installs it locally, I’d also be very interested to know.

The purpose of Lodopac is to provide a simple standard OPAC-style interface to perform searches of various bibliographic RDF datasets without having to know how to formulate a Sparql query and without having to know the structure of the database. I hope it is especially useful for people wanting to get a grip on how bibliographic RDF is put together, what it looks like, and what a Sparql query looks like. For example, an author search is possible without knowing about dc:creator and dc:contributor, or how these need to be linked together in a Sparql search. Similarly, a searcher wouldn’t need to worry about how to construct date searches in different datasets. For the BNB and CUL, these are very different (three lines in the BNB, one for CUL), but in Lodopac, there is only search box to search both. Lodopac displays the Sparql query it constructed to perform the search, as well as the combined RDF for all results found in XML, JSON, N3, and TTL.

How to search Lodopac

Select one or more of the available datasets using the checkboxes.

Author and Title searches are free text phrase searches. In other words, a string you search for will match with any exact match, including spacing and punctuation, and in the middle of words. E.g. searching for “shake” will match “Shakespeare”, “milkshakes”, and “More hits that you can shake a stick at”. Searching is case insensitive. The following punctuation is removed from searches: \”‘<>$^%.

You are strongly advised to keep author and title searches simple: e.g. one word of a title or a surname only.

ISBN searches 10 or 13 digit ISBNs. Any dashes or other non-digit characters are stripped from the search.

Date search will accept a year.

N.B. Keep searches as simple as possible, especially with author and title searches, to avoid them timing out. ISBN and date searches are generally quicker.

Limitations

A bad workman blames his tools and I’m no exception. The greatest limitation is the time taken by Sparql endpoints to perform a Sparql query, especially one that involves a regular expression, such as the Author and Title searches. What is needed is some more robust indexing or some cheat like Virtuoso’s unorthodox bif:contains, which the old version of the linked data BNB used. I touched on this in a blog post about the In Our Time Booklist script I wrote (see section 6).

The load and current capacity on the Sparql endpoints at the time the query is made is another important factor. A search which times out one minute can work fine the next.

The search options are obviously limited but do I hope represent the most common methods of searching normal library catalogues aside from, of course, a general keywords search. The manipulation of results is also rather sparse but allows click through to full data associated with a book, the structure and contents of which can be more fully explored. The aggregation of RDF data in various formats is I hope useful illustratively as well as having potential for further manipulation.

Installation, source code, and configuration

The source code for Lodopac is available as a zip file, which contains all the necessary PHP, Javascript, and CSS files. In addition, you will need to install ARC2, which makes the Sparql queries and manipulates the resultant RDF. Edit the first line of lodopac.php so that it points at your local installation of ARC2.

The programme is basically one long script- there is only one page- but is split for convenience of editing. The key file is lodopac.php which includes the other files as it goes along. The main core of the script which builds the queries and does the searching is all in lodopac.php.

I have attempted to make the script as easily configurable as possible so that additional Sparql endpoints can be added. There are probably more components hard-coded into the script that I have overlooked, but all the setup for the endpoints is in the file setup_endpoints.php. The first part of this file is a list of necessary prefixes that are needed for any possible queries from any of the endpoints and, although not ideal, all these prefixes are sent with any Sparql query. Following that and the declaration of an array of the endpoints, each endpoint has a dedicated block with the information added to a hash. To add another endpoint, duplicate a block and configure the search recipes as appropriate. The keys marked “brief_” are used to fetch information for the brief results display. I have conspicuously chickened out of providing an author and the attendant main entry and multiple author headaches involved.