The CKAN data portal software now has fully integrated output of metadata as RDF linked data, in XML or N3 format. To see it at work, simply add the appropriate prefix to the URL for a dataset. For example, here is a dataset on the DataHub (as it happens, part of the LODcloud). Would you like an XML RDF file of the metadata? here it is!

Instead of changing the URL, you can also change the “Accept:” header of your HTTP request. For full details, see the CKAN documentation. The new feature is already live on the DataHub, and will be released soon as part of CKAN 1.6.1.

I am in Vienna, along with my colleague Ira, for a plenary meeting of the assorted partners of the LOD2 project. LOD2 is an EU-funded research project on Linked Open Data, the vision of an interlinked web of data known to many from Tim Berners-Lee’s TED talk. The meeting runs for 3 days, in which there will be discussions about the various work packages, but I have been given the task of blogging about the opening introductory session on Wednesday afternoon. (Full disclosure: I have received a handsome LOD2 mug as advance payment for my efforts.) The Open Knowledge Foundation is one of the partners, because the pan-European CKAN data portal publicdata.eu is part of the project. But being personally a relative newcomer, I was looking forward to finding out in this introductory session what the project is really all about.

Delegates at the LOD2 plenary

Sören Auer, the project co-ordinator, kicked off, giving an overview of the overview. He described the lifecycle of Linked Data, from extraction (from other structured or unstructured data) through to linking in to existing data, enrichment (perhaps by adding more structure), to the point where it can be explored for interesting patterns. For each stage in the lifecycle, there are tools being developed by the project – many are already released. Collectively these tools, which are all Open Source, form the LOD2 ‘stack’. Sören also mentioned some recent milestones, including a Serbian CKAN portal holding a lot of data in RDF, the native format for Linked Data; and a planned new data-oriented conference, the European Data Forum.

The tools: Work Packages 2-6

WP2: Optimising the store

Peter Boncz of CWI spoke about Work Package 2. (What happened to WP1, you ask? It was a prototype which finished earlier in the project.) WP2 concerns Virtuoso, the database part of the LOD2 stack. The challenge with RDF is to make a database that runs efficiently with huge quantities of data, as the potential for rich interlinking means the data is not neatly segmented into tables as in a normal database. A lot of progress has already been made, and he hopes that Virtuoso 7 will be released soon. It will be structured to enable better compression (speeding up processing by reducing I/O), and use adaptive caching to try to minimise the number of queries that need to be done more than once.

WP3: Getting the data

Jens Lehman of AKSW at the University of Leipzig was next, talking about WP3 on ‘extraction, enrichment and repair': the creation of Linked Data from existing structured or unstructured sources, its enrichment with suitable taxonomies to describe it, and detecting inconsistencies or other problems with its structure. If that sounds like a wide-ranging package, it is: as Jens told me later over dinner (not entirely seriously), ‘anything that doesn’t fit in one of the other packages gets stuffed into WP3’! There are currently over 20 tools playing a role in this stage, including Natural Language Processing techniques for extracting data from free text.

WP4: Creating links

Next up was Robert Isele of the Freie Universität Berlin. WP4 aims to enrich RDF data by adding links to other data sources, as well as linking data together by identifying duplicate entities within or between datasets. Automatic tools suggest links that a user can confirm or reject. WP4 also includes work to create an RDF-enabled version of the open source data cleaning tool Google Refine.

WP5: User interfaces

Sean Policarpio of DERI reported on WP5 on browsing, visualisation and authoring interfaces. He demonstrated geospatial data on a map, filtered with a structured (faceted) search – combining the power of Linked Data with a mapping search like Google Maps. Associated with this, they have produced a ‘semantic authoring’ tool, allowing the user to add or edit Linked Data via the map. Their next tasks are to implement ‘social semantic networking’ – for example, notifications based on semantic content – and mobile interfaces for their semantic tools.

WP6: Integrating the tools

Finally, the engaging and very Belgian Bert van Nuffelen of TenForce spoke about WP6, which aims to make the various disparate tools in the LOD2 stack play nicely together. They have worked on making it easier for users to install the stack tools, a shared interface and shared authorisation using WebID. They have also recently released an intermediate version of the stack (version 1.1) with new and upgraded tools and better documentation.

By now it was 3 o’clock and, against all expectations, the meeting was ahead of schedule. So we had a relatively luxurious half-hour break for tea. Your correspondent and another relative newcomer, Jan from Tenforce, took the opportunity to get some fresh air and a feel for the Viennese genius loci. Or should that be Ortsgeist?

The use cases

WP7: Publishing

We had heard about the tools that had been, and are being, developed to manipulate Linked Data. But how will they be used? Refreshed by tea we returned to the meeting to hear about the three Work Packages concerned with use cases. Perhaps the most exciting talk of the afternoon came from Christian Dirschl of WP7 and Wolters Kluwer Germany (WKD). WKD is a legal and accountancy publisher who are already adapting and using the LOD2 stack tools to enhance their publishing business. Christian told us that ‘semantic technologies enable publishing media to create added value’, and WKD’s first release of news and media datasets created using Linked Data tools is on course for publication in April. By December they will release an interlinked version of the datasets, including links to DPpedia and further optimised tools.

WP8: Enterprise

Amar-Djalil Mezaour of Exalead presented the ‘enterprise’ use case WP8, an application to human resources with the aim of matching job vacancies to applicants. Some early work trying to model CVs had met criticism on the ground, among others, that the EU reviewers had doubts about volume of data freely available. WP8 has refocused its attention on job vacancies rather than CVs, for which there is plenty of data and better RDF support. They hope to release the results later this year, with vacancies ‘dashboards’ and analytics, faceted by sector, region, salary, etc, using Linked Data, and enriched with mashups with other sites such as social networks.

WP9: Government data

After a long wait in the wings, it was time for the OKF’s own Ira Bolychevsky to take centre stage at last. WP9 aims to explore the applications to making government data available and maximising its use. Its main visible output is publicdata.eu, which republishes open data from government portals throughout the European Union. publicdata.eu has recently been upgraded and repaired: it now runs the latest version of CKAN, introducing features such as data previews (like this) and – live on the DataHub and coming soon to publicdata.eu – a data API for structured data. Two subjects we hope to discuss more later in the plenary are closer integration with the LOD2 stack, and metadata standards.

Ira presenting WP9

Jind&rcaron;ich Mynarz briefly mentioned the new Czech CKAN portal. They have developed a detailed methodology as well as a ‘Quick Start guide’ for publishers, both of which they promise to make available in English soon (hurrah!)

Finally Vojtech Svatek of UEP gave a quick overview of WP9a, which aims to use Linked Data technology in the field of public procurement, with ontologies for public sector contracts – providing matchmaking and analytics not dissimilar from those in WP8.

A jug of wine, a loaf of bread

Perhaps the reader has read enough of Work Packages for now. Anticipating your satiety, the organisers had decided to defer the presentations from WP10-12 until Friday. In their place an outsider to the LOD2 project, Allan Hanbury, gave a lightning talk on a slightly related EU project, Khresmoi, which aims to provide useful searching tools for large medical databases.

Thus concluded the day’s business, and we all dispersed to our various hotels. The OKF contingent, along with TenForce, are staying in one just a couple of roads away. Crossing a road is hazardous in Vienna, because there are sometimes cars parked in what seems to be the middle of the road. You keep half-expecting some lights to change and the cars to zoom off. In fact they are parked between the road and the tramlines, along which long and elderly trams snake through the city.

In the evening, everyone from the day’s meetings reconvened and were whisked away on one such tram to an outlying districts of the city, for an evening at a (more or less) traditional Austrian Heurige, an untranslatable type of wine tavern. A true Heurige, Helmut from the Semantic Web Company explains to me as we hurtle along, is run by a vineyard, and gives people an opportunity to sample its new year’s crop of wine. (‘Heurige’ in Austrian German literally means ‘this year’.) It will have a licence to open for only 2 or 3 weeks a year, and when open will hang out a spray of branches and a lamp to signify the fact.

There is still some wine grown in Vienna, I am told, but most of the Viennese Heurigen are open all year round and are really just restaurants. But they recreate the atmosphere of the real thing. Patrons are served wine and a mixed plate of traditional local foods, which, for readers not familiar with Austrian cuisine, mainly consist of various kinds of sausage, potato and cabbage. They are delicious, and so is the Apfelstrudel that comes along later. The only thing I cannot recommend in Vienna is the tea. When will these foreigners learn that it must be made with boiling hot water?

Last week we started a spreadsheet to compile examples of EU companies using open data. There are currently 46 examples from 11 EU Member States. You can view the spreadsheet here.

In the first instance we want the list to be illustrative rather than comprehensive – highlighting interesting examples of reuse and reuse in different European countries, rather than striving to capture every example of how companies have used open data.

If you have an example which you think should be added, please feel free to edit the spreadsheet! If you want to discuss these examples further, you can join the euopendata and/or the open-government mailing lists.

Today we’re happy to release a first beta of publicdata.eu, the Open Knowledge Foundation’s European-level data registry. After releasing an experimental data catalogue federation and scraping frontend earlier this year, this is the first iteration based on CKAN, our data management system. While the basic functionality is still that of a read-only dataset search, a lot has changed behind the scenes.

The site now uses CKANs new harvesting capabilities, originally developed for the UK’s location programme. Using this framework, we were able to pull a large number of data catalogues into this joint index – including all instances of CKAN (such as data.gov.uk), France’s Data Publica, Swedens OpenGov.se, CSI Piemonte’s Dati Piemonte and several municipal catalogues, including those of London, Paris and Vienna. In the near future, we hope to also include some geodata directories, such as the EU’s national INSPIRE registries.

Another major story in the current development was RDF support. While CKAN has had batch export to RDF for a while and the semantic.ckan.net subdomain is offering those exports for download, publicdata.eu is stepping up support: We’re now offering a live RDF API for DCat export, a SPARQL endpoint based on a background triple store that is updated whenever data changes as well as some support for DCat RDF imports in our harvesters. This means CKAN now potentially has round-trip support for DCat and that we can go ahead in implementing the proposed standard for DCat data catalogue federation.

As we started to gather increasing numbers of data packages, we decided to try out a few normalization techniques to the data we had gathered. Starting in the messiest place, the first aspect to tackle was file formats. While there is no hope for datasets with “paper” as the mime type, “shapefiles” and “commasheets” can be easily translated into their proper types via a simple script.

Another piece of information that we were easily able to generate was the member state (and in some cases NUTS classification) of the affected region. This allowed us to create a map-based overview of data availability thoughout Europe. Besides being a nice way to facet the data, this also helps to show which countries are leading in their effort to open up government information.

We then did the same thing to categorizations: several of the catalogues we harvested contain their own small taxonomies. Looking at the similarities, it was easy to extract a set of 14 common categories – most of which roughly align with first-level Eurovoc items. Still, the larger number of source categorizations remains untranslated and highlights the need for a proper taxonomy management to be integrated with the catalogue in LOD2.

Finally, comes the most visible aspect: CKAN received both a face lift and an integrated apps catalogue. Realizing the need to give some of the fabulous contestants for the Open Data Challenge a permanent home, we decided to integrate a gallery of the shortlisted entries right into the core of publicdata.eu.

The Open Data Challenge, Europe’s biggest open data competition, is now over! From the website:

There were a total of 430 entries from 24 EU Member States. Our amazing panel of judges are currently scouring through the entries to select the winners, which will be announced at the European Digital Agenda Assembly in Brussels on the 16th June. All winners will be listed on the website as soon as they are announced.

If you’d like to keep in touch with other people interested in open data, you can join the open-government and euopendata mailing lists or follow the #opendata hashtag on Twitter.

Background

Anyone who follows the #opendata hashtag on Twitter, or who hangs out on the Open Knowledge Foundation’s open-government mailing list will know that nearly every week there is a new local, regional, or national data catalogue being announced somewhere in the world. People interested in using data from different sources may want to search across these different catalogues to find datasets of interest to them (e.g. all the openly licensed spending datasets, or all of the legislative corpora in formats X, Y or Z, from anywhere in the world). We are currently working on things like PublicData.eu and OpenDataSearch.org to do this. However in order to make services like this work, we need up to date lists of data catalogues.

A few weeks ago we discussed exactly this at an extremely useful meeting in Edinburgh on data catalogue interoperability. One of the outcomes of this meeting was an agreement between the Open Knowledge Foundation, DCAT, CTIC, and RPI to collaborate on creating a shared, collaboratively curated, comprehensive list of data catalogues on a new website called datacatalogs.org. This would include a source list of local, regional and national catalogues, catalogues created by public bodies and catalogues created by citizens and NGOs, and so on.

Where are we now?

Today we had a brief call to discuss how to take the forward. The call included:

James Gardner, CKAN Project Lead

John Glover, CKAN Developer

Kendra Levine, Librarian in Berkeley

Jonathan Gray, OKF Community Coordinator (me)

First we went over the plan we made in Edinburgh, which is:

to define a basic set of metadata about data catalogues that we want to collect, taking into account work that has been done on DCAT, by RPI, by CTIC, and so on

to amalgamate existing lists into one big list, collecting all relevant metadata

to start a new customised instance of CKAN on datacatalogs.org – with features like moderation to allow a group of administrators to curate the list of data catalogues, with a custom ‘catalogue metadata’ plugin to show the fields we’re interested in displaying, and so on

to import the big amalgamated list into the new CKAN instance

to brand the new CKAN instance with the logos of other organisations who are supporting/updating it

to invite key stakeholders (e.g. government representatives, policy makers, researchers, open data advocates, and others) to curate the list

Customising CKAN

In addition to having a single resource list which is updated by key organisations and stakeholders, we want to create an easy mechanism for enabling datacatalogs.org to be administered. At the Edinburgh meeting there was a strong feeling that this should be curated – and all new suggested catalogues should undergo some sort of review and approval process.

The CKAN team have been busy developing a simple but surprisingly sophisticated moderation mechanism for managing suggested updates and revisions to information about data catalogues.

Here are a few sneak previews of the functionality:

Next steps?

Here’s a rough schedule of how we’d like to proceed over the next few weeks:

From 7th June – start work

On 13th June – metadata standard ready (based on DCAT and existing lists) and start populating spreadsheet based on metadata standard

On 20th June (or before if possible) – first deployment on datacatalogs.org

A few weeks ago we had a small workshop on “Open Government Data in Europe” in Budapest. The meeting brought together representatives from the European Commission, the Hungarian government, and other EU member states to discuss the current state of open government data across Europe. Discussions included legal, technical and economic aspects of running an open government data initiative.

We started out with a brief introduction from myself and David Kitzinger, co-founder of Szabad Adat, a new open data organisation in Hungary. Then we went onto presentations to introduce the idea of open data, to give an overview of the state of play in Europe, and to look in more depth at open data in Poland:

Finally we had a closing discussion on what kinds of data could be opened up at city level, how to get started, and how to engage with developers and reusers of the data. We discussed how to set up a data catalogue, how to put data into PublicData.eu and encouraged public bodies to enter datasets into the Open Data Challenge.

The event is organised by the Open Knowledge Foundation and supported by EVPSI, HUNAGI, and LAPSI. EurActiv.hu was the official media partner for the event (see their post about the event here).