On 15 – 16th May the BioTransR conference, organised by the IMI funded project eTRIKS, took place in Barcelona. eTRIKS provides collaborative projects with advice, service, and a data management platform. It has supported over 60 projects in the past 4 ½ years.

BioTransR was conceived as a conference that would help to bring together data scientists and translational researchers. To this end, the conference held a series of discussions on the landscape of life sciences IT tools and infrastructures as well as talks focused more on the translational researcher, that emphasized case scenarios and how the tools are used.

All presentations and discussions have been recorded and are available below.

Scott Wagers, CEO and founder of BioSci Consulting a consultancy in the life sciences that facilitates collaborations from formation, to implementation, assuring impact and valorisation, introduces the session. He believes precision medicine to be the next turning point in medicine and stresses the need for data sharing.

Keith Elliston from the tranSMART Foundation gives a presentation on what his foundation has been doing on big data and what it can do for precision medicine. tranSMART is a non-profit organization providing a global, open-source knowledge management platform for scientists to share pre-competitive translational research data. It enables scientists to develop and refine research hypotheses by investigating correlations between genetic and phenotypic data, and assessing their analytical results in the context of published literature and other work.

Andreas Kremer explains that we should preserve data beyond a projects lifetime, as an aggregate of this data provides the power for us to better understand the biology behind treatments. However, this will only be possible if data is put into the right standard. One of the biggest challenges in the field of translational medicine is the integration of heterogeneous data generated in multi-center clinical trials and via multi-layer lab measurements, especially in clinical environments. Curation, harmonization and mapping of retrospective and prospective clinical data, omics and imaging data are key solutions for this challenge.

Reinhard Schneider, Head of the Bioinformatics Core facility at the Luxembourg Centre for Systems Biomedicine (LCSB) discusses why it is important to preserve data beyond the lifetime of the project. He presents ELIXIR, the European infrastructure for life science information, which aims to provide access to this wealth of bioinformatics tools and biological data. In research projects all over Europe, valuable data are generated but this data usually remains local and lacks a broader use. ELIXIR offers the possibility to archive, integrate, analyse and exploit the large and heterogeneous datasets of modern life science research beyond the life-time of individual research projects.

Benjamin Guillon works at cc.in2p3. He looks at the infrastructure of data management and explains how quality data management translates into quality treatment. By looking at tranSMART he explains the advantages of open source technology. He says that the main challenges to provide for open source technology are basic setups and distributed architectures. He recommends more stability, confidentiality, data integrity and availability.

Paul Houston from C-DISC discusses how start-ups can help improve data management. He says that, in the past years, he has seen a lot of support for standards. In the next 5 – 10 years the changes we will see in data standardisation include a roadmap in semantic interoperability. He predicts that data sharing will increase enormously in the next ten years.

Chris Marshall from BioSci Consulting discusses the value and use of data tools to gain efficiency and reduce redundant efforts. He argues data has been around us all the time but now the technological advancements gives us the opportunity to process much more sophisticated data.

Yi-Ke Guo is a Professor of Computing Science in the Department of Computing at Imperial College London. He is the founding Director of the Data Science Institute at Imperial College, as well as leading the Discovery Science Group in the department. In this presentation he discusses how integrated platforms can be used towards precision medicine.

Ibrahim Emam is a Software developer and bioinformatician at Imperial College. The eTRIKS harmonisation services is a repository and associated interfaces that will facilitate the configuration of study designs, preparation of study data for system import and transformation of study data into semantically consistent representations. Any project that requires data curation and standardization will benefit from this platform.

Stelios Pavlidis looks at the different analysis data tools that exist to enhance disease stratification. He explains in detail how gene set enrichment analysis works and how it looks at sets of genes to show statistically significant, concordant differences between two biological states.

Adriano Barbosa-Silva from University of Luxembourg discusses data exploration using the Etriks/ TranSmart platform. TranSmart is an open platform that is capable of storing a multitude of data types such as clinical, pre-clinical, or OMICS data combined with strong visual analytical capabilities will significantly accelerate the scientific progress by making data more accessible and hypothesis generation easier. The open data warehouse tranSMART is capable of storing a variety of data types and has a growing community consisting of academia and pharma.

Alexander Mazein is the Senior Researcher at the European Institute for Systems Biology and Medicine, discusses disease maps as a community effort to solve certain issues of data interpretation. He says that during the last year, 11 of this disease maps have been develop and soon they will be able to interpret most of the data. The idea is to move to a dynamic model and prediction.

Bertrand De Meulder discussed ways of how to reduce the effort needed to integrate, preserve and explore data by using clustering methodologies. He introduces the Weighted Gene Co-expression Network Analysis (WGCNA) which focuses on exploring correlation between probe sets in gene expression data, compared with available clinical data.

Keynote Facilitated Discussion

Niklas Blomberg, Director at ELIXIR, speaks for the value of open data, showcasing the exponential increase in the use of datasets once they have been made open access, and the economic value this more intensive use of data can produce for society. Niklas also demonstrates how open data also opens the market place, allowing SME’s to innovate and move industry forward, and how we can overcome the barriers to sharing data.

Keynote Facilitated Discussion

Alvar Agusti, Director if the Respiratory Institute at Hospital Clinic, Barcelona, gives a clinicians critical view of the risks in using data without integral involvement of experts. We hear how clinical practice has evolved to the current state of the art, and how the limitations in clinical understanding that exist today are also inherent in the re-use of data. Alvar advocates for best research rather than fast research, and reveals how the devil is in the detail.

A fascinating panel discussion, facilitated by Scott Wagers, saw a range of topics and propositions debated, with challenges from the floor and a number of perspectives examined.

Jay Bergeron, eTRIKS coordinator, makes the business case for how developing data management infrastructure can help coordinate the sector and lead to more efficiency. He explores what types of elements drive source projects. He stresses the need to include the community for the product to work.

Mike Barnes from Queen Mary University, discusses how to better treat Immune Mediated Inflammatory Diseases using big data. He explains that platforms such as TranSmart are undergirding most of the work he does. He also explains the eMedLab, a virtual machine infrastructure for flexible scientific cloud computing across omics and clinical informatics applications.

Petr Holub discusses BBMRI-ERC a European entity that aggregates more tan 600 biobanks. He discusses how to build up biobanks to help improve diagnostics, especially for cancer. Biobanks can ensure find ability; accessibility; interoperability; reproducibility and reusability as well as privacy and trust. There are a reasonable set of incentives for why researchers would use biobanks. Issues however still arise on the quality of data.

Bryn Williams-Jones is Founder and Chief Operating Officer at Connected Discovery Lt. He presents OPEN-Phacts, an Innovative Medicine Initiative funded project. Its aim is to create an open knowledge infrastructure enabling facile integration of chemical and biological data to support drug discovery while laying down a long term legacy for service delivery. Bryn argues that the biomedical data still does not have standards so his team is now trying to create an ecosystem around this. He also recognises that most of the users are traditional chemists so there is an need to build an interface according to their needs. He stresses the need to engage a community and keep it running.

Anastassis Perriakis from the Netherlands Cancer Institute, introduces INSTRUCT, a pan-European research infrastructure in structural biology, making high-end technologies and methods available to users. Anastassis explains how INSTRUCT is promoting new emerging talent by funding internships and giving awards for the best pilot projects. He uses infrastructures across many centres in Europe to develop simple solutions. He explains that integrating multi-scale data is needed to describe life better. There is a need to isolate macromolecules to start to understand structure at the molecular level.

Niklas Blomberg is Director at ELIXIR. ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe. These resources include databases, software tools, training materials, cloud storage and supercomputers. ELIXIR’s core objective is to ensure that Europe can continue to handle a rapidly growing volume and variety of data from high-throughput experiments such as DNA sequencing. Proper management of this information promotes knowledge-based economic growth and facilitates the translation of research into innovations that meet global challenges in food security, energy and health. He talks about how challenging it is to compare data sets and how hard it is to guarantee compliance and standards between institutes and organisations which are sharing data sets. He says that most research activities in Europe are funded nationally and so they are the core of the model.

David Henderson presents the issues surrounding data privacy and protection. He foes through the new EU Data Protection Directive. He goes over the differences between health data and private data and stresses the need to have solid regulations protecting patients.

Fabien Richard is founder of HITS. He looks at federated data models. He describes the issues surrounding reuse of secondary data for research purposes and argues it is not covered by initial consent. For all these reasons he argues there is a need for efficient, seamless and quick data sharing processes. There is also a need for better infrastructure to stop data breaches as data concerning health is very sensitive.

Martin Romacker is Principal Scientist at Roche, Data and Information Architecture. He discusses the different terminologies in the context of translational research. He argues that ontologies can help to increase the process of data curation as they can be used to build machinery. The machinery by itself is not able to understand many of the correlation and therefore, we need to apply string based mechanism so that data curation becomes more efficient.

Ferran Sanz describes methods on integrative knowledge management and exploitation in biomedicine, including multi-level and multi-scale modelling and simulation. He explains some of the platforms which use these measures such as eTOX project funded by IMI. The platform combines historical toxicological data within the pharmaceutical industry to create a series of models to support toxicity prediction. Both data and models are integrated in the platform developed in the project, the etoxsys, which is a powerful system to access the eTOX data and the predictive models.