Category Archives: Blog

The CKAN software allows portal providers to include additional metadata fields in the metadata schema. When retrieving the metadata description of a dataset via the API, these keys are included in the resulting JSON under the key “extras”. However, it is not guaranteed that the DCAT conversion of the CKAN metadata contains these extra keys. Depending on the version and configuration of the DCAT export-extension there are three different cases:

1) Portal-specific mapping

The portal provider defines a mapping for certain CKAN fields to a specific RDF property.

2) No mapping

Looking at the same dataset, we can see that there are other “extra” keys where no mapping to an RDF property exists, e.g., for the key collection-name. The metadata information will get lost if we only consider the exported DCAT.

3) Generic mapping by extension

Certain CKAN data portals map all available extra metadata keys by using the dct:relation (Dublin Core vocabulary) property. The key gets mapped to the rdfs:label property and the value to the rdf:value property, e.g., for the contact-email metadata key:

Please be aware that this implementation is only for demonstration purposes. The underlying background knowledge graph is based on 50 DBpedia properties, in detail described in the paper. This is a research project and we try to fix bugs and plan to extend the knowledge graph to other data sources.

Here I started to collect URLs and APIs of existing CKAN instances. I came across the dataportals.org portal which provides a comprehensive list of Open Data sites and portals. However, there are just about 50 CKAN portals in this collection where some of them are down and a lot of APIs are missing.

Using a short script I harvested the dataportals.org CKAN portals and merge them with my list. Before adding new portals to the list I check if the URL (and the API) is accessible (performing an HTTP GET request).

I used the information within the square brackets: ra2014 is the internal dataset name, updated tells you if the dataset is created or updated and the third value, resources, appears only if a resource of the dataset has been edited.

The CKAN software is the de facto standard for Open Data portals. So I started to manually collect a list of portal URLs together with their API.

At the moment there is a focus on portals in Europe and especially on portals in Austria in the list. If you have any portals not listed (no matter which country) please contact me or leave them in the comments.