Final features, Trends and Future Work

A lot of my previous articles reflected the state of play at the time they were written, so I wanted to write one final article summarising the API and reviewing what it does, as well as various directions in which interested parties might like to see it develop.

Final Features

Our API consists of a Perl core which emulates Solr query and response syntaxes and performs federated searching across any number of other APIs which are available on the internet.

We have developed an extensible XSLT mapping routine to which API types and instances can be easily added. At the moment plugin code has been written to enable our mappings routine to cope with five types of API data output:

Trends and Future Work

Timelines, Geodata and enriching the content

In working with other providers, It became clear that popular applications which designers are interested in building on top of this kind of data right now centre around timelines and geodata, neither of which was of a high or consistent quality across all the datasets we pulled into our project. Europeana have addressed issues like this to some extent with their enrichment scheme, whereby they have re-processed data submitted to their project and looked for meaning within that data, in an effort to bring out translations and geodata.

For a federated project, making enrichments to data on the fly represents much more of a challenge, and it was not something we managed to get around to within the scope that we had. It would certainly be interesting to explore this kind of enrichment further!

Facets and relevance ranking

Another limitation we found was attempting to facet and relevance rank from federated searches. From a faceting perspective, we are currently passing through facet requests to other providers, and not all of them support faceting which immediately limits the responses. The next issue is that the facets returned by providers rarely match up, and so faceted requests tend to be a concatenation of each provider’s individual facets.

From a relevance ranking perspective, again, not all providers return a relevance rank in their API (although some of them clearly have one internally). One possible solution to both of these problems which we considered was adding a lucene library to the API and running the results through it in order to get faceting and relevance ranking from a single, consistent source. In the end, time didn’t permit us to look into this much further.

The End!

That concludes my set of blog posts on this project! It’s been an interesting piece of work, and it would be great to work on some more of the features I have mentioned, but for now the project has come to a close. Take a look at Adrian’s posts for more of a summary about the project as a whole, and I’ll update this page and the API syntax page with any last-minute amendments!