Friday, 2 October 2015

Last week was the 6th Plenary of the Research Data Alliance, held in Paris, France. It officially started on the Wednesday, but I was there from the Monday to take advantage of the other co-located events.

This workshop consisted of a quick-fire selection of presentations (12 of them!) all in the space of one afternoon, covering such topics as busting DOI myths; persistent identifiers other than DOIs; persistent identifiers for people (including ORCIDs and ISNI - including showing Brian May's ISNI account - linking his research with his music); persistent identifiers for use in climate science,the International GeoSample Number (ISGN) - persistent identifiers for physical samples; the THOR project - all about establishing seamless integration between articles, data, and researchers across the research lifecycle; and Making Data Count - a project to develop data level metrics.

(I also learned that DOIs are also assigned to movies, as part of their supply chain management)

Questions were collected via Google doc during the course of the workshop, and have all since been answered, which is very helpful! I understand that the slides presented at the workshop will also be collected and made available soon.

This was a day long event featuring several parallel streams. Of course, I went to the stream on Research Data infrastructures for Environmental related Societal Challenges, though I had to miss the afternoon session because of needing to be at the RDA co-chairs meeting (providing an update on my Working Group and also discussing important processes, like, what exactly happens when a Working Group finishes?) Thankfully, all the slides presented in that stream are available on the programme page.

Unsurprisingly, a lot of the presentations at this workshop dealt with the importance of e-infrastructures to address the big changes we'll need to face as a result of things like climate change. There was also talk about the importance of de-fragmenting the infrastructure, across geographical, technological and domain boundaries (RDA being a key part of these efforts).

A common thing in this, and the other RDA meetings, were analogies between data infrastructures and other infrastructures, like for water, or electricity. Users aren't worried about how the water or power gets to them, or the pipes, agreements and standards are generated. They just want to be able to get water when they turn the tap, and electricity when they flick a switch. Another interesting point was that there's a false dichotomy between social and technical solutions, what we really have is a technical solution with a social choice attached to it.

Common themes across the presentations were the sheer complexity of the data we're managing now, whether it's from climate science, oceanography, agriculture, and the needs to standardise, and fill in those gaps in infrastructure that exist now.

As ever, the RDA plenaries are a glorious festival of data, with many, many parallel streams, and even more interesting people to talk to! It's impossible to capture the whole event, even with my pages of notes.

If I can pick out a few themes though, these are them:

Data is important to lots of people, and the RDA is a key part of keeping things going in the right direction.

Infrastructures that exist aren't always interoperable - this needs to be changed for the vast quantities of data we'll be getting in the future.

The RDA is all about building bridges, connecting people and creating solutions with people, not for them.

Axelle Lemaire, Minister of State for Digital Technology, French Ministry of Economy, Industry and Digital Technology, said that people say that data are the oil of the 21st century, but this isn't such a good comparison – better to compare it to light – the more light gets diffused, the better it is, and the more the curtains are open the more light gets in. She is launching a public consultation on a digital bill she's preparing and is looking for views from people outside of France - the RDA will distribute the information about this consultation at a later date.

It's interesting now that the RDA has matured to the point that several working groups are either finished, or will be finished by the next plenary (though there is still some uncertainty what "finished" actually means). Given the 18 month lifespan of the working groups - that's enough time to build/develop something, but the actual time to get the community to adopt those outputs will be a lot longer. So there was plenty of discussion about what outputs could/should be, and how the adoption phase could be handled. I suspect that, even with all our discussions, no definite solution was found, so we'll have another phase of seeing what the working groups decide to do over the next few months.

This is of particular relevance to me, as my working group on Bibliometrics for Data is due to finish before the next plenary in March. We had a packed meeting room (standing room only!) which was great, and we achieved my main aim for the session, which was to decide what the final group outputs would be, and how to achieve them. Now we have a plan - hopefully the plan will work for us!

A key part of that plan is collecting information about what metrics data repositories already collect - if you are part of a library/repository, please take a look at this spreadsheet and add things we might have missed!

I went to the following working group and Birds of a Feather meetings:

We have a plan for our group outputs, which will basically map the landscape for data bibliometrics as it stands - identifying what needs to be done, and the other groups that are addressing aspects of this problem (which is a big one!)

Interesting stuff this, anthropologists and social scientists looking at how we deal with data as humans. Not directly relevant to me, but I think I'll keep half an eye on it purely out of personal interest.

There were a few demonstrations made, which showed off how far the group has come in developing a potential new service.

Obviously, when ingesting links from several places, standards and interfaces are needed!

Supporting RDA women networking breakfast

An interesting meeting, despite it being held in a corner of the main marquee, so it was really difficult to get a proper conversation going. RDA is about 1/3 female, which is good, but given that more than 50% of Internet users are female, we need to be careful of the human aspect of our work. It was also very good to see several male RDA members in attendance too - this is not just a woman's issue!

Again, the fragmented landscape of repositories came up - we'll need to help people navigate it and find the best places for their data

There was some discussion about commercial data repositories, and the threat they pose to domain/institutional ones. My thoughts (as part of a domain repository) - I'd rather have the data with minimal metadata in a commercial repository than lost on a CD in a drawer somewhere. And the commercial companies are pressure on us to up our game. If we're losing researchers to them because it's easier to put data in the commercial repositories, then we either have to make it easier to put data into ours, or really explain why the pain is worth it!

We had a lot of discussion about the structure of the Publishing Data Interest Group, now that most of the Working Groups under its umbrella are coming to an end. Personally, I think there's still a lot that this group can do - we haven't touched on issues like peer review of data for example, plus implementation and adoption of the working group outputs is going to take a while. But having a refresh of the group is probably a good thing too.

So, that was RDA Plenary 6. Next plenary will be held in Tokyo, Japan from the 1st to the 3rd of March 2016. In the meantime, we've got work to be getting on with!