Welcome to the Association of Learned and Professional Society Publishers blog. It includes helpful posts to connect, inform, develop and represent the scholarly publishing community.

Friday, 8 September 2017

Spotlight on SourceData - shortlisted for the 2017 ALPSP Awards for Innovation in Publishing

Last but not least in our series of blogs on our 2017 Awards Finalists is EMBO – the creators of SourceData. We speak to Project Leader Thomas Lemberger to find out more:

Tell us a bit about your organisation

EMBO is an international organization that promotes scientific excellence in Life Sciences.It has over 1700 members elected from the leading researchers of Europe and beyond. The organization is funded by 29 member states to provide support to scientists through events, networking opportunities, funding and fellowships for young researchers and shaping science policy. EMBO also publishes four journals reporting important discoveries from the global bioscience community: EMBO Journal, EMBO Reports, Molecular Systems Biology and EMBO Molecular Medicine.

What is the project that you submitted for the Awards?

SourceData is a technology platform made up of several tools that extract information about published figures and make scientific data more discoverable.Through EMBO’s work at the intersection of research and publishing we realized there is a disconnect between the way research data is published in scientific papers and the way researchers typically want to interact with it. Most scientific papers report the results of carefully-designed experiments producing well-structured data. Unfortunately, during the publishing process this data is typically summarised in text and graphs and “flattened down” thus losing a lot of valuable information along the way. As a result, it can be very difficult for researchers to find answers to relatively simple questions because data is inaccessible.

For example, it is currently very cumbersome for a scientist to find specific experiments where a certain small molecule drug has been tested on a specific cancer cell line or to look at the results of a published experiment and find out whether similar data had been published elsewhere. These are the kinds of scenario where SourceData can help. SourceData goes to the heart of the scientific paper - the data - and extracts its description in a usable format that researchers can access and interrogate. It then goes on to link this data to results from other scientific papers that have been through the same process.

Tell us more about how it works and the team behind it

With SourceData, EMBO has developed a way to represent the structure of experiments. The principle of SourceData is rather simple: we identify the biological objects that are involved in the experiment and then we specify which objects were measured to produce the data and which, if any, were experimentally manipulated by the researchers. Despite its apparent simplicity, this method allows us to build a scientific knowledge graph that turns out to be a very powerful tool for searching and linking papers and their data.

The development of SourceData has been a collaborative process involving the Swiss Institute of Bioinformatics who provided their expertise in developing software platforms in the field of Life Sciences and the curation of data. After this we worked with Wiley to implement SourceData within a publishing environment. Nature also contributed content to the initiative.

Why do you think it demonstrates publishing innovation?

SourceData transforms the way that researchers can interact with scientific papers by getting to the heart of the paper - the data, and putting it into a highly searchable form. It then takes this a step further by linking this data with relevant results from other scientific papers so that researchers can explore these connections.SourceData can give readers a new level of confidence in finding more of the research that is relevant to their questions. It can give scientists more opportunities to have their publications found and cited and can allow publishers to expose more of their content to interested readers by making it even easier to search and explore.

What are your plans for the future?

Our work to date has involved a lot of manual work so we are now working to automate this process. We are developing artificial intelligence algorithms using deep learning to extract the structure of an experiment from their descriptions in natural language. Our vision is to provide access to our technology to as many publishers as possible and encourage the widespread adoption of SourceData. In doing so we hope to facilitate access to the data behind more and more journals over time and ultimately accelerate Science in the process.

Thomas Lemberger is leading the SourceData project and is passionate
about the importance of scientific data and structured knowledge in
publishing. Trained as a molecular biologist, Thomas is Deputy Head of
Scientific Publications at EMBO and Chief Editor of the open access
journal Molecular Systems Biology.