As mentioned in my initial blog post, Open Data Node (ODN) is not a silver bullet, one solution to rule them all. One corner case is a situation, when simple shell script is suitable and sufficient to publish certain Open Data. If this is your case, ODN is most probably not for you. Another extreme is a case when brand new (or major upgrade of existing) information system is being executed and Open Data publication is factored in from the beginning. Again, if this is your case, ODN is most probably not for you. But for cases, when simple shell script is “just not enough” and “complete new information system” is not feasible either, ODN can help. How?

It provides powerful ETL capabilities, both for Linked Data and tabular/relational data, to allow publishers to convert, clean, enrich and link data before publishing as Open Data

To help data users to actually understand and use the data, it provides also data publication and presentation functions

And to help data publishers more with the whole Open Data publication process (as described for example in COMSODE Methodology – see here), it provides also cataloguing functionality

Data publishers will also benefit from integration capabilities with internal systems, modular design and Open Source nature of ODN

I will explain more on each in subsequent sections.

Simple installation

On Debian systems, after you prepare COMSODE repository, you can simply run:

It has ability to create repeatable publication jobs, jobs which can convert formats, clean and enrich the data, even link the data to other data

Publishers can schedule such jobs to automate publication of updates to keep datasets up-to-date without repeated manual labour

Very important aspect is caching of the data: Open Data intended for publication is stored inside ODN. Thanks to that, internal systems are insulated from possible overload or attacks via Open Data publishing. While in rare cases ODN can go down, internal systems are still operational and organization publishing the data can still function.

And thanks to incorporating CKAN and other tools, ODN provides to data users also functions to preview, analyse and visualize data. As or now, that works mainly for tabular data. Later on, we will enhance that also for Linked Data, with visualisation tool Payola.

Inclusion of cataloguing functionality is motivated by the need to make it easier for data publishers to follow COMSODE Methodology: While in phase “Development of open data publication plan (P01)”, publishers are (among other things) mapping their internal data sources (steps like “Analysis of data sources” and “Identification of datasets for opening up”). So they already need a place where to put information about those internal data sources. Data catalogue – internal one – is thus a very good function ODN can provide.

This functionality is provided by including customized CKAN catalogue in ODN in two roles:

CKAN in role of “internal catalogue” is the main entry for data publishers into ODN. This catalogue if private, visible only to data publisher and its authorized personnel.
From this catalogue, publishers manage many aspects of their Open Data publication. Once some dataset is properly prepared for publication, it can be marked as “public” and ODN will automatically ensure the visibility of such public data to the general public in …

… CKAN serving role of “public catalogue”. This public catalogue is the main entry for general public (a.k.a. data users). In this catalogue, they will see only datasets explicitly marked as “public”, and will use this catalogue to search for the datasets, learning about them, looking at and obtaining the data from.

Compared to what we wrote originally, it is still true that ODN is supposed to complement data catalogues. Imagine Organization ABC having ODN instance and that instance providing also catalogue of all datasets published by this organization. It is nice and useful as of itself, but it is not suitable nor desirable to replace say state wide or EU wide data catalogue. So, such ODN instance will instead “fit” into the hierarchy of data catalogues and provide dataset metadata on behalf of Organization ABC to anyone – including the nation wide and EU wide data catalogues – in (also) automated fashion, saving Organization ABC valuable time.

Integration functions, modular design, Open Source implementation

For the basic use-cases, the main focus is on ability to integrate with various kinds of data sources: various formats (XLS, XML, CSV, etc.), technologies (SQL, JDBC, SPARLQ, etc.), via file system or remotely (HTTP, etc.) and so on are supported “out of the box”.

In more broader terms, thanks to Open Source implementation of ODN, taking into account also open standards, ODN can be enhanced (by almost anyone) with additional modules, or incorporated into bigger information systems, integrated with existing infrastructure as used by data publishers, etc. It can be even modified.

For example ODN’s Single-Sign-On (SSO): Thanks to midPoint, CAS and LDAP, it can be integrated to existing user management, authentication and authorization systems organizations may already be using.

Note: Some concrete formats and APIs are not there yet. Because for that, we need more feedback from those trying ODN in their real environment. For example, for SQL we currently support only PostgreSQL, MySQL, MS SQL and Oracle and support for other databases might be added, pending feedback from users. See section “Future” bellow.

Stable release

1.0 was a first stable release. This means that we are going to provide further upgrades in a way so as to not disrupt your operations, i.e. backward compatible or (if not feasible) with easy migration to new release.

Future

While ODN 1.0 does provide a lot of basic functionality (and it can also already help applications to be built – see Building an application on Open Data with Spinque), there is still some work to do to make ODN better. With this release we’re starting many pilots in various EU countries. Using feedback from those pilots, we will further refine ODN. Here, I also kindly ask also you to give ODN a try and provide additional feedback to us.

Among many smaller things (more file formats – like JSON, more publication protocols – like FTP or BitTorrent, etc.) we’re have one bigger feature still in development: wizards. While you can do quite powerful ETL stuff in ODN, this is not truly easy to use for everybody. Thus, for common cases we will implement “wizards” which will allow even novice users publish usable and high quality Open Data with minimum of skills. What are those “common cases”? Here’s a slide from one of our presentations:

Using data scheme from http://5stardata.info/ , the most common case is taking 2* from internal systems and transform it to at least 3*. In practice, it means ODN being able to harvest data from internal systems in formats like CSV (and it many variants), XLS(X), XML (possibly with XSD schema) and various kinds of SQL databases. And being able to publish that data in common Open Data formats (i.e. CSV, JSON or RDF, via API or file dumps).

Additionally, we will also add features for quality assessment of the data. Those features will help both data publishers and data users: publishers can use them to get hints about what to improve in published data, users will be able to better assess for example to what extent is the data actually usable for certain purposes.

But to get there, to get those additional features done, we need to first validate basic ODN functions “in the field”. We also need to verify that what we see as “common publication use case” is truly “common”, and narrow down a concrete (and not too long) list of specific transformations, making sure that we implement what truly needed and to not waste time implementing what is non needed.

Similarly, based on user needs, we would like to expand the list of supported platforms, adding support for other Linux distributions or other operating systems.

So again, I urge you to join our User Group and try ODN. Or at least get in touch with COMSODE and describe the scenario/problem you’re facing.

Peter Hanečák is a Senior Researcher and a team leader from the EEA Company. At the same time he is the Open Data enthusiast.

2 Responses to Open Data Node 1.0 released

Hello Peter
i would like to know if the full list of Use cases mentioned here (4.6. Use Cases )
in the DELIVERABLE D2.3 ” Architecture and design documentation for
COMSODE development tasks”
is available somewhere
because the URL mentioned requires auth :https://team.eea.sk/wiki/display/COMSODE/Use+Cases

thank you for the question. I’ve checked, and use-case list in COMSODE Deliverable 2.3 is still same as in the internal Wiki (barring some internal development notes present in Wiki and missing from Deliverable).

As of February 2017, we have Open Data Node 1.6.2 available. Compared to initial Open Data Node 1.0 release, we’ve fixed a lot of bugs and added some improvements, but major features and use-cases are still same. (Hence the continued usage of 1.x version.)