NavigationNavigation

EUDAT Project Director Damien Lecarpentier to expand on added value of EUDAT for data storage, the success of the first DI4R Conference

Published Date 09/11/16 15:13

At the Digital Infrastructures for Research Conference in Krakow, Poland, we had the opportunity to talk with Damien Lecarpentier, Project Director of EUDAT, one of the organizers of the conference. EUDAT is a network of research organisations, computing and data centres, that are working together to create and establish a collaborative data infrastructure. This infrastructure consists of a few services in the area of data management but it also tries to promote a number of practices and harmonize the way the centres across Europe deal with data management. EUDAT started as a project but now it is moving to a more sustainable organisation. Damien Lecarpentier is also the head of the new secretariat that has been established. There are commitments from a number of partners for ten years to sustain the collaboration and support the infrastructure and its services.

The data services that EUDAT is providing are for research. What EUDAT is trying to do, compared to commercial providers such as Google Drive, Dropbox and Microsoft Azure, and where the added value of these services lies is in the integration with the environment of the data centre. When EUDAT designs services for data management, EUDAT partners like them to be coupled or integrated also to a computing environment because that is what makes them interesting compared to a single discrete commercial service.

The data services that are provided on demand by these data centres are scalable in terms of the capacity. Depending on the agreements that you might have as a specific service provider, there is a huge potential for customizing these services, according to the needs of the researchers. This is one of the main added values of the public services providers. They are there to serve the users, the researchers, and the non-profit basis. There is a greater possibility to customize the service and co-design the services with the research communities.

At the moment, there are 35 partners in the project. In the agreement that is pushing for the sustainable organisation, there is an initial set of 18 signatories. These are the major supercomputing and data centres from Europe. There are also a few research organisations. EUDAT is also partnering with over 30 research communities. They are coming from all kinds of disciplines, from life science over climate science, linguistics to earth science and environmental science. EUDAT is now doing a lot of piloting because there are still a lot of emerging infrastructures, integrating the services to the research infrastructures. The research infrastructures are also emerging and are in different states of maturity. EUDAT is organizing the planning and development phase. EUDAT is discussing how the different generic services that the partners are building can be integrated to the research infrastructure to support them.

EUDAT is focusing on the data management part but, obviously, data storage and data management are not always appealing in itself for research communities who want to have a full portfolio of services. Therefore, EUDAT is building collaborations with EGI, PRACE and OpenAire to enrich the portfolio that is being developed by EUDAT. There are a few working collaborations with EGI and PRACE for the computing part. They have organized the provisioning of the computing resources on a European scale and EUDAT is very interested to interoperate with them so that the researchers can store, manage and process their data and preserve them across the different infrastructures.

In one of the DI4R conference presentations, Damien Lecarpentier mentioned that if you apply within PRACE for funding of computing time, you can also indicate that you would like to have access to EUDAT storage during the computation of your project. This was initiated last year as part of the DECI Call. What EUDAT offers to applicants is the possibility to access the EUDAT storage resources and environment once the computation on the PRACE system has been done. PRACE itself is not so much interested in keeping the data once it has been computed. Sometimes, it is quite big data and the research community or the researcher doesn't have the environment to support it for a long period of time. That is where the added value of EUDAT comes into play. It offers the possibility to move the data to a EUDAT site where it can be preserved and where the researcher can still access it. This is possible because EUDAT has a nice, strong overlap between PRACE and EUDAT centres. A lot of PRACE centres are also involved in EUDAT. There are sometimes different teams working on HPC and on data management. It might very well be possible to have the data stored locally. What is sometimes happening is that there is a researcher from Finland applying for PRACE computing time in Germany. Once the data has been computed, it can be moved back to CSC in Finland for example, which is involved in EUDAT and which can use EUDAT services for that purpose.

The major items that were discussed in Krakow at DI4R were the collaboration between the different infrastructures, how to improve and foster it and the direction one has to follow. This is the first event of its kind that has been organized and it is quite an achievement and it will be a greater achievement yet if it continues like this. The principle feedback that the organizers had is that it should go forward and that they should carry on with an event like this. The users, the research communities and the developers are very much interested in having this organized as a single event because sometimes the same people are working in different infrastructures. The organizers still need to analyze the feedback more carefully but it seems that there is a big demand and interest to have this organized together. It makes very much sense because if you organize a single event, the question that usually arises is how to interoperate with different organisations such as PRACE, EUDAT, and so on. Now, there is a possibility to answer this question and to seal the high demand for a greater collaboration.

EUDAT has been financed by European Commission project money. There was a first phase of 3 years within the Framework Programme 7. Currently, EUDAT is in the middle phase of the second project, also financed by the European Commission but now EUDAT also has established a membership organisation where any organisation providing ICT services can join the EUDAT organisation. The membership fee is supporting the secretariat that has a coordination function to manage the collaboration and the participation of the science in different projects, as well as manage the infrastructure itself.

The European Commission wants to know whether the money is well spent. They have these Key Performance Indicators (KPI) which you should show to the Commission since these are the things that you think are important to tell the Commission, indicating what you do within your project. This is a very actual topic. All the research infrastructures and e-infrastructures are working on these KPIs. The KPIs need to be taken very seriously. Sometimes, one writes a list of KPIs in the project proposal and one forgets about it six months after that. This is not what one should do. The KPIs have to be used to monitor the strategy, impact and achievement of the objectives. One has to be flexible and be able to revise the strategy. Sometimes the KPIs are not relevant or have become obsolete. One has to take this tool seriously. The difficulty is to harmonize the different KPIs across the infrastructures. Every infrastructure has to do its own homework. At the moment Damien Lecarpentier is working on it. A list of KPIs has been established according to the objectives in terms of data access, data availability, preservation, as well as trustworthiness and multi-disciplinarity, interoperability with other infrastructures, knowledge transfer. These kinds of KPIs are very important and help to monitor whether the project is on track, meeting its objectives.

The next challenge is to harmonize these KPIs across the infrastructures to make sure that everybody is counting things in the same way, that there is a common approach, that things can be compared. It is not about comparing the performance of the infrastructures themselves but it is about trying to harmonize the interpretation of the performance of the infrastructures. During DI4R, very interesting activities have started. A draft document on a KPI approach is being proposed. One has the intention to have a broader consultation with the infrastructures, with the funders, with the research communities in order to agree on a common framework for KPIs. This approach was presented in Bratislava. Damien Lecarpentier thinks it looks very promising but it is not a simple exercise. It will take time. It takes time within an infrastructure to agree on the meaningful KPIs. When it is about comparing the KPIs across the infrastructures one needs to be sure that one is talking about the same thing, using the same definitions. This is going into the right direction. It is very much in line with all the work around the cross-infrastructure service catalogue which also needs to be coupled with the KPIs.

The DI4R conference organizers are going to collect the feedback from the participants to figure out how it went. At the moment, the feeling is that people are happy about this kind of set-up. Right after the conference the organizers will have a meeting to share their views, to try and analyze the feedback. From then on, they will see whether they organize another conference. In principle, most, if not all of the e-infrastructures are in favour of such an event but there can be question about whether it should be a yearly event or a bi-annual event. Maybe the organizers need to better scope the target audience, the kind of presentation sessions they want to have. This was a successful first try but still a lot can be improved for the benefit, not only of the research communities, but also for the e-infrastructures themselves.