Monday, 1 December 2014

I was very honoured to be invited as a guest speaker at the 2nd Data Management Workshop, held at the University of Cologne on the 28-29 November 2014.

It was a very interesting workshop, with many excellent national and international speakers. What was particularly good was its focus on interactions between the attendees - the coffee and lunch breaks were particularly long, which gave everyone the chance to really look at the many posters that had been submitted to the workshop, and talk to the people who were presenting them. The workshop proceedings will also be published as a special issue on data management in ISPRS International Journal of Geo-Information - I'm expecting further details of that to be on the workshop website in due course.

I took about 8 pages of hand-scribbled notes from the talks, so I won't be inflicting them all on you. Instead I'll just pull out the highlights that jumped out at me. The talks themselves were videoed, and will be made available on-line too.

The workshop opened with a pair of presentations from Stefan Winkler-Nees and Brit Redohl, both from the German Research Foundation (DFG), discussing the funding mechanisms in Germany for funding data management activities.They seemed very keen to receive more applications for data management funding!

Kevin Ashley (Digital Curation Centre) was next, giving an overview of the landscape of data management - highlighting the DCC guidance documents and Jisc's Research Data Spring, as well as the need for good research data management to root out cases of fraud, and aid data reuse. A key quote I jotted down was "Often your data tells stories that your publications do not."

Arnulf Christl (Metaspatial) gave an amusing and informative talk about open source software and what we can learn from it when it comes to open data. He made the very valid point that scientific data should be clearly licensed, as this allows attribution and credit to be given to the creators. He also showed the following video, which everyone enjoyed!

Tomi Kauppinen (Aalto University School of Science) spoke about linked data and our need for online tools to visualise and assess data, as well as the fact that linked data makes data, and data about data, machine processible.

Jane Greenberg (Dryad) gave an overview of the data publishing system in operation at Dryad, their guidance on data citation, and the costs involved in creating the Dryad metadata records. (This discussion of data publication was a theme that kept coming back throughout the workshop.)

Cyril Pommier (French National Institute for Agricultural Research, INRA) gave a talk about the data management difficulties in coupling phenotype with plant genome studies, for studies into crop security, adaption to climate change, etc. (Being a physicist, a lot of the science went straight over my head, but what I found fascinating was the fact that the data management problems being described were the same ones that we get in atmospheric science, so we may have more in common from a data management point of view, than not. Which made me think - how many of the solutions are applicable cross domains? We need to find out!)

The second day of the workshop kicked off with a pair of archaeological talks. Firstly was Gerd-Christian Weniger (Neanderthal Museum) talking about making 3D scans of items from the Pleistocene period, including Neanderthal fossils. They use Confluence, which is a business wiki, as their repository software, as it allows easy up- and download of data. These scans, and the high resolution surface scans of rock art and stone tools, allow research to be done without having to travel to where the original tool or fossil is actually held - opening up the artifacts for study by schools and teacher training.

Katie Green (Archaeology Data Service) gave a talk about how the ADS does what it does, touching on their workflows for ingest and data publication (with the journal Internet Archaeology, who are also publishing data papers). She talked about the Jisc project, investigating the value of ADS to the community (a related project looked at the BADC last year) - a synthesis report can be found here.

Marjan Grootveld (Data Archiving and Networked Services) talked about how DANS operates, specifically about their front office - back office model for dealing with researchers, where the front office provide guidance and information, while the back office deal with the technical aspects of storage and preservation. DANS provide training for front office staff, who can be embedded in university libraries and other locations. Another quote that resonated with me was: "Data management planning is more important than the plan".

Wolfram Horstmann (State and University Library of Gottingen) discussed data services and policies from universities, funding bodies and journals. He also differentiated between a "post hoc data library" which is strong in service reputation, but weak in subject specific expertise, with an "ad hoc data library", which has good subject specific knowledge, but often no recurrent funding. Of course, hybrids of these two exist.

And Hans Pfeiffenberger (Alfred Wegener Institute and Earth Systems Science Data) finished off the workshop with a discussion about data publication, giving examples of lessons that were learned from data papers published in ESSD. He also showed us that all these data publication issues are not new - Kepler's laws were based on Tycho Brahe's data and observations, which Kepler only got access to after Brahe's death. ESSD requires authors to describe the provenance of the data, the methods used to create/collect it, the limitations of the data, and provide estimates of the error. Reviewers must look at the data, and assess the consistency of the data and the article.

I'd like to thank the organisers again for inviting me to the workshop - and I hope to visit Cologne again sometime!