Archive of Datasets

In the announcement last week that we’re shutting down Kasabi, we said that we would make an archive of the datasets available.

There is a list of download links now available. The spreadsheet lists each of the datasets, their license and a download link for the data. The spreadsheet contains nearly 200 datasets that were publicly available in Kasabi.

The list doesn’t contain any unpublished (private) datasets. It also doesn’t include a few datasets that Talis was hosting, but which are still available elsewhere, e.g. those from data.gov.uk or those that were straight mirrors of other sources. VoiD descriptions of each of the datasets — including their title, etc — is harvestable from data.kasabi.com.

Coul you please give me some directions on how to use your datasets? I was building my MSc project relying on your food dataset and APIs. Now i’d like to go on using it but I don’t have idea on how to do.

The majority of triple stores will cope with both ntriples and nquads. We’ve been using the TDB store that comes with Apache Jena. I’d recommend you start there. This would give you a SPARQL endpoint to use.

For help on loading data files, configuring the software, etc I suggest you take a look through the Apache Jena documentation and then post any unanswered questions to the Jena User mailing list.

The rest of the APIs (search, reconcilation, etc) were all custom built for Kasabi so won’t be supported by a triple store out of the box. I’m afraid you’ll need to look at implementing those yourself, should you need them.

The void descriptions — which includes summary of the schema — isn’t part of the dataset. The archive files contain all the data submitted by the owners. The void descriptions were generated automatically and not stored in the triple store managed by the user.

The data was separately harvestable from the Kasabi Linked Data views. But as of today this is no longer available.

Ed Summers has already archived both the datasets & their metadata to the Internet Archive. You can find what you need there: