NavigationNavigation

ELIXIR is working hard trying to find a common Key Performance Indicator vocabulary between research infrastructures

Published Date 11/07/17 10:54

After the e-IRG Workshop in St. Paul's Bay, Malta, we had the opportunity to talk with Rafael Jimenez, Chief Technical Officer (CTO) of the ELIXIR infrastructure. His role in ELIXIR is to facilitate the technical activities within the infrastructure. This includes the implementation of the workplan programme through different platforms that ELIXIR has for operating its technical activities, and also through the nodes that are the real machinery that the team has for implementing services within ELIXIR.

ELIXIR is providing data services. The organisation or implementation of the programme is structured in five key areas, including data, tools, interoperability, compute and training. The team provides services around those five key areas. For instance, the services that the team provides related to data are data resources that belong to the nodes and are part of the nodes. The things that one is trying to do in ELIXIR is to do better integration of these core resources and try to get better adoption of best practices for these resources. ELIXIR is a life science data infrastructure. The target audience consists of researchers in the life sciences.

The presentation that Rafael Jimenez gave at the e-IRG Workshop was about monitoring the infrastructure, seeing how well the team does looking at the KPIs, the Key Performance Indicators, and also trying to register them and automatically look at them. The topic of sustainability is very important for infrastructures and for research infrastructures in particular. Research infrastructures like ELIXIR require working a lot on the topic of sustainability, having that plan ready for the sustainability of the services and the infrastructure. Something that is very important when talking about sustainability is how to access different resources and services to have a sustainable plan.

In ELIXIR the team has been working mainly on indicators for data resources. There are five key areas, as Rafael Jimenez already mentioned. So, the team has been working on indicators for training, interoperability, and compute. So far, the most advanced work is on data resources. The team even has a workpackage that is completely dedicated to indicators for data resources. All the information is available on the ELIXIR website. There is also a publication. These indicators are aiming to provide a way to evaluate the data resources in ELIXIR. The team tries to select those that are meaningful for the researchers and help the team to understand the different stages of development of a data resource. The team knows whether it is a mature state or a state that needs an improvement. The indicators are very helpful for the team to be able to assess its own resources.

There are different types of data resources. For instance, the team has data resources that store experimental data. This could be experimental genomics data, proteomics and metabolomics. The team calls this type of repositories archives. The team also has knowledge bases. These are specialized repositories that try to collect information from different experimental repositories and try to build knowledge about specific biological entries. For instance, there is UniProt, a database of protein sequence information that tries to collect information from difference places. It is a very well curated resource that provides very important knowledge for the researchers to understand proteins in general. It is very important for the team to understand the impact of these resources and their value. When one talks about key performance indicators, one does not just try to evaluate performance. The impact is probably correlated but not directly correlated, so the team tries to evaluate different things, not only performance.

With regard to indicators, Rafael Jimenez thinks that one may be looking at the users because it is important to provide them with resources of better quality. Quality is one aspect but the team is also looking at how it can increase the sustainability of resources. One also needs to look at other stakeholders. For instance, the funders want to know the resources and the difference between the resources. The team tries to provide that information as well in a harmonized way.

Whenever the team writes grant proposals, it normally needs to provide the information such as it is at the moment and as best as it can. It would be nice if the team could have a set of common indicators so it can provide the information in the same way. Then it is easy to evaluate and to compare the different resources. That is something the team tries to accomplish by defining core indicators for data resources and exposing them in its data resources.

We wanted to know how the ELIXIR team is collecting those indicators. Is it by hand, with pencil and paper?

Rafael Jimenez said that this is a real challenge. There are different types of indicators and some of them will be easier to collect while others will be more difficult. Some of them are very sensitive. For instance, the number of entries of a database is something which is easy to provide but maybe the number of the loads could be a little bit more sensitive. At the moment, the team is working on how to make that information available. The team needs a solution to expose this information in a programmatic way, so that it becomes easy to programmatically collect that information, integrate it and monitor it for the data resources. The team has already identified the indicators that can help it to assess the resources. Now, the team is exploring methodologies to be able to collect this information. The team is in the stage that it is evaluating different ways to do that.

Ideally, this should become an automatic system. The team is trying to understand how to collect that information. Maybe at the moment, the team might do some manual collection but the aim in the future will be to try to automate that as much as possible. The team has a project that is called Excelerate with several deliverables. Soon, one of the deliverables will be how to develop different ways to collect the data. Not until the end of next year, the team will have a proper strategy on how to collect that information.

Rafael Jimenez realized that infrastructures are already proposing indicators and metrics for their services and infrastructures. There is an overlap in the metrics that data centres are defining and the metrics of the ELIXIR team. It would be nice to have a core set of metrics and to have a good overlap. That would be one suggestion for the ELIXIR team and for the research infrastructures: to have a common ground and more dialogue on how to organize things.

He answered yes. He has looked at the indicators and has tried to look at the overlap between the indicators that have been defined in ELIXIR and the e-IRG indicators. There is an overlap, especially in the operational and technical indicators. One also has political and social indicators and Rafael Jimenez thought that one could do a match on that level. The ELIXIR team focuses on the indicators for services, indicators for data and data resources. One can see that there already is an overlap. It is important to have a similar definition of those indicators so when the team collects information, it knows what it is collecting and it can compare the different resources.

Rafael Jimenez realised that e-IRG is taking into account services but also their organisational information with also indicators for the organisational information. At the moment, ELIXIR is very focused on the services. Probably, if one can get indicators for the services, this is the best way as well to evaluate the infrastructure. ELIXIR is looking at indicators for the organisation as well. Within ELIXIR, the team has selected indicators to evaluate different working groups. For instance, the team performs work on indicators to evaluate the performance of the coordinators' group. This group aims to exchange knowledge and is trying to bring knowledge from the nodes to the platform and from the platform to the nodes. The coordinators have specific objectives and based on those objectives, the KPIs are defined. This kind of KPIs are very different but also contribute to understand whether that organisation is functional and is performing as it should.

There are two different levels: the organisational level and the service level. Both have to be taken into account. The work in e-IRG bridges both. What Rafael Jimenez would like to see is more alignment, at least at the service level.

Just to make sure, we wanted to make clear that when Rafael Jimenez talks about "nodes" he means national organisations and not computing nodes. Rafael Jimenez confirmed that the nodes he was talking about, are representing their Member States within the European Union.

Rafael Jimenez concluded by asking a question too. Something that he has been seeing in the e-IRG Workshop is that at the moment there are different approaches to try and define indicators and collect data for indicators. As far as Rafael Jimenez understood, e-IRGSP5 partner Ad Emmen is involved as well in some of those initiatives. Rafael Jimenez wanted to know how Ad Emmen saw the interaction with the different research infrastructures and the adoption of things that they are doing in defining and collecting metrics for data.

Ad Emmen answered that indeed he is working on the indicators in e-IRG's e-IRGSP5 support project and that one is working together with the eInfraCentral project. One is looking at the e-Infrastructure key performance indicators. Of course, there is a big overlap between the KPIs in research infrastructures. There should be some kind of exchange, not only with regard to IDs but also with regard to the way of how one would define the key performance indicators because it is not only about the numbers. If one talks about definitions of users, one can already have hundreds of different definitions. One needs to define a common vocabulary and try to find a set of common key performance indicators that most people, from funders to users, can understand.

Rafael Jimenez completely agreed. He expected that researchers could rely more on the work that e-IRG is doing and, vice versa, that e-IRG could rely on the part of the work that ELIXIR is doing.

Ad Emmen confirmed that it is all about serving communities and making transparent what is happening. All parties benefit from this collaboration.