Category Cloud

Tag: FAIR

A couple of weeks ago we took part in the May ELIXIR Bioschemas meeting, along with representatives from Google, the European Bioinformatics Institute (EBI) and other participating organizations from the UK and beyond.

To give some background, Bioschemas is based on schema.org, an initiative to produce schemas that can be directly embedded in websites to give more structure to data. Search engines can understand this more easily than simple text, and it’s the stuff that powers a proportion of Google snippets (those box-outs you see on Google search results when you search for something popular). For example, let’s suppose I wanted to tell search engines more about my Jazz event. This is what I would embed in the webpage for the event.

Bioschemas wants to do the same but for biological information (like genes, proteins, samples, etc.). So in InterMine, for the CHEY_BACSU protein report page in SynBioMine we might have something like this:

A search engine (or a specialized life sciences search tool) can then crawl and aggregate the structures embedded in a wide range of life sciences websites (particular those with lots of small sites such as biological samples in biobanks). The goal is to make it considerably easier for scientists to find information relevant to their research without having to visit lots of sites individually.

The job of Bioschemas is to go through the existing schema.org schemas and decide what existing stuff we can use (such as Dataset) and what we need to propose as new schemas (such as BiologicalEntity). schema.org schemas are big bags of attributes with no cardinality constraints as they need to satisfy a lot of different use cases, so another job of Bioschemas is to recommend which attributes to use and at what cardinality, both for data in general (DataSet, for example) and for specific life sciences entities, such as proteins and biological samples.

It’s time to celebrate. After some hectic weeks preparing and organizing the InterMine conference we can now take a deep breath and and get ready for our new BBSRC funded project to make data more FAIR.

It will be a short rest, just the time to check that we have everything we need to start this long journey but also to savour the excitement before the departure towards FAIRness, a destination that will enhance InterMine compliance with FAIR principles; InterMine has been at the forefront of getting research data into the hands of scientists for over 10 years and we’re excited to support the formalisation of these principles.

As team, we recognized with no doubts, the need to implement the FAIR data principles, making biological data stored in InterMine instances more Findable, Accessible, Interoperable, and Reusable by both humans and machines, as well as the huge impact that this achievement might have on the quality of biological data served by InterMine. Implementing some mechanisms that make data stored in InterMine FAIR, we provide a unique opportunity to make ALL data collection, served by nearly 30 public biological InterMine instances worldwide, FAIR.

This is a great chance and we didn’t want to miss it!

Here some important milestones we want to achieve along the journey:

Generate globally unique and stable URIs to identify InterMine data objects and register them in community bioinformatics repositories (for instance bio.tools and Identifiers.org) in order to provide more findable and accessible data.

Apply suitable ontologies to the core InterMine data model to make the semantic of InterMine data explicit and facilitate data exchange and interoperability

Provide a RDF representation of data stored, lists and query results, and the bulk download of all InterMine in RDF form, in order to allow the users to import InterMine resources into their local triplestore

Provide an infrastructure for a SPARQL endpoint where the user can perform federated queries over multiple data sets