Category Archives: Ontologies

This is a very delayed blog post about the OntoMaton Add-on version we released earlier in the year. But better late than never: here we describe the new features we incorporated in the latest OntoMaton version.

OntoMaton is a widget bringing together ontology lookup and tagging within the collaborative environment provided by Google Spreadsheets. The original motivation for creating OntoMaton was to support users to create well-annotated experimental metadata in biosciences in a collaborative way, while keeping track of different versions. Google Spreadsheets provide such facilities for collaboration and versioning, so we combined them with ontology search and tagging functionality offered by the NCBO BioPortal web services. BioPortal is a web-based repository for biomedical ontologies/terminologies with functionality for searching and visualizing the ontologies and support ontology-based annotations.

BioPortal released a new API for searching ontology terms and annotator services (see BioPortal 4.0 release notes), deprecating the old API earlier in 2014

Consequently, we upgraded OntoMaton to the latest versions of these services.

We also took the opportunity to incorporate searches across the Linked Open Vocabularies repository. Linked Open Vocabularies (LOV) is a repository of (RDFS or OWL) vocabularies used in the Linked Data Cloud, and thus, not restricted to bio-ontologies. This addition allows OntoMaton to be used for other use cases, relying on vocabularies outside the bio-domain.

For this new version, the installation procedure with Google Add-ons is as follows:

Open a Google Spreadsheet and select the Add-ons menu

Select Get add-ons, search for OntoMaton

By clicking on OntoMaton, you can find more information about it, including some screenshots.

Then you can install OntoMaton (by clicking over the Free button) or if it is already installed, you can manage the installation by clicking on the Manage button.

You will need to authorise OntoMaton to view and manage your spreadsheets (as the Add-on will search over terms from your spreadsheets and incorporate links, etc) and connect to an external service (the REST services that OntoMaton relies upon)

After that, you will be able to use OntoMaton functionality, accessible from the Add-ons menu

And that’s it! You can start using OntoMaton for searching and tagging… The functionality is as before, except that when searching you need to select if you want to search BioPortal or LOV.

In the Bioinformatics publication, we shown some of the use cases for OntoMaton. More recently, OntoMaton has been:

used to create mappings from the ISA-Tab syntax to several ontologies in our linkedISA project

If you are interested in the OntoMaton source code, you can find it in its GitHub repository.

Finally, if you have questions or comments about OntoMaton, contact us (the ISA team) at isatools <AT> googlegroups.com (replacing <AT> for @!). We would love to hear about how you are using OntoMaton!

We are happy to announce the release of OntoMaton, a tool which allows users to search for ontology terms and tag free text right in Google Spreadsheets. This post will serve to introduce you to the tool, how it works and how it can make it easier for users to use ontologies in a pervasive, powerful and collaborative environment, complementing existing work from our team in the creation of ISAcreator.

How it looks

OntoMaton is available from the Google Script Gallery and when installed provides a menu as shown below.

From the menu you may access two resources part of OntoMaton: ontology search and ontology tagging. There is also an ‘about’ option.

Ontology Search

Ontology Tagging

Behind the scenes: restricting the ontology search space

If a sheet named “restrictions” is in your spreadsheet, OntoMaton will consult it to determine if the currently selected column/row name has a narrowed ontology search space. This makes it quicker to search BioPortal, allows for restriction of the user’s result space to make easier the process of selecting a term.

Behind the scenes: extra information about the terms you select

For every term you select, it’s full details are recorded in a “terms” sheet. This makes it possible to use OntoMaton in any spreadsheet and all provenance information (including URIs, ontology source and version) for selected ontology terms will be immediately available for use when exposing your records to the linked data world!

Installing

To install, create a new google spreadsheet, then go to the menu tools > script gallery. In the script gallery, search for ontology or ontomaton and you’ll get the following result pane.

Click on ‘install’ and this will install the scripts inside your spreadsheet. Then there is one more and final step to follow for installation. You have to click again on tools > script manager and you’ll be presented with something like that shown in the image below.

OntoMaton contains lots of functions, but the only one you need to worry about in order to run the program is the onOpen function. Click this then click on run and the OntoMaton menu will be installed in your menu bar. From here you’ll be able to access the ontology search and ontology tagging functions.

OntoMaton inherently supports ISA-Tab files too. So if you have an investigation file it will automatically add ontology sources to the ONTOLOGY SOURCE REFERENCE block. Also, if you have Term Source Ref and Term Source Accession after a column, OntoMaton will automatically populate these columns for you.

Also, the following table provides a quick review of available tools attempting to mix spreadsheets and access to vocabulary servers:

domain

automated

annotation

ontology search/lookup

versioning*

collaboration

RightField

general

✘

✓

✘

✘

ISA creator

multiomics

✓

✓

✘

✘

Proteome Harvest PRIDE

proteomics

✘

✓

✘

✘

Annotare

transcriptomics

✘

✘

✓

✘

OntoMaton

general

✓

✓

✓

✓

by versioning we refer to managing of user edits throughout the annotation process.

We hope you enjoy this new feature!

The ISA team

Addendum:

Safari 6 users, be aware you will have to activate the ‘developer menu’ from the Advanced Item in the Safari ‘Preferences’ menu item. Once activated, go to menu ‘Develop’ and navigate to ‘User Agent’ item and select ‘Safari 5.1.7’ for enabling the browser to work with Google Spreadsheet. (Thanks to rpyzh for reporting the issue, see here)

This is intended to be a constructive criticism of a resource which I believe to have the potential to be powerful and useful.

Any of you who have read Edward Tufte’s essay on Visual and Statistical Thinking: Displays of Evidence for Making Decisions will instantly recognise this question…compared to what? We see many examples in the biological world, and I’ll focus specifically on one resource here…the ArrayExpress Atlas. First, a disclaimer: I used to work in the group who developed this resource, and have aired my criticisms many years ago to no avail. And not only me, senior researchers have raised the same questions even before the resource was developed, but all suggestions have up to now been ignored.

Here, I will only give food for thought about what is presented in the Atlas since some people don’t seem to understand that what is presented doesn’t actually make much sense. This is mostly caused by a failure to answer the compared to what question…a particularly important question for a resource which is comparing gene expression levels would you not say?

Some examples:

The heatmap
A query on the resource, such as this will yield a result like so:

My first thought would be that this heat map is telling me that Fah was up regulated in liver 31 times and once in some obscure string seemingly encompassing every organism in the human body (I’ll get to my criticism about these factor representations later). Now, the second question that any self-respecting investigator would ask is compared to what? Is this saying that it is up regulated compared to normal tissue, diseased tissue or all tissue across all organisms? Actually, we don’t know. And there is nothing to say what is being shown here. Moreover, what does it mean to say up and down regulated. Surely it depends. You can’t just present discrete variables, one needs to show the statistical meaning of such suggestions…i.e. show the P value of up/down regulations since not all may be meaningful to a biologist/statistician even though they may well be to guys in the ArrayExpress Atlas team.

Another small point on this is that if this value is dependent on database contents rather than baseline expression levels (whatever they are supposed to be), then if my database contains more liver samples than anything else, and expression levels are calculated relative to this content, my results will be skewed. Either a disclaimer should be presented on the site, or they should make the comparison metrics used more obvious.

Look at this graph, and tell me what the Y-Axis represents. First of all, even if what they are trying to represent was meaningful, it would still be pretty useless. Let me explain. They have split up variables which are supposed to be related into 3 different tabs, with variables which make NO sense. What does it mean to show time as a variable. Time of what? Sampling time, the length of time an organism was exposed to a compound…what? Exactly, nothing. It means nothing to show time like this. What does it mean to show dose as a seemingly independent variable. Dosage is no good without a compound. What does make sense and can at least possibly allow one to ask the question “compared to what?” is to show growth factor beta 1 and 5 ng/ml after 1 hour as one factor, and show the expression levels then (even though we still don’t know what the Y axis means). You can look at any experiment in the Atlas and find the same problems.

The cluster effect

All people, even those not in the realm of statistics need to understand the importance of the cluster effect. I.e. do I only get over expression of one or more genes when another gene is expressed/under expressed. Transcription networks are indeed networks. There are feedback loops, both positive and negative, and a lot is known about these loops already. So, why are these not taken into account when calculating statistics in the Atlas? For such cases, presenting mutually exclusive P-values of individual genes is not really enough and the clustering effects should be taken into account more so as to adjust the P-value to more realistic sizes.

Summary

I have presented my thoughts on the ArrayExpress Atlas publicly and internally beforehand, but this is the first time I’m airing it to the public domain. I hope now that something is done to fix this resource since I still believe it to have the potential to be cool and really helpful.

For browsing and querying OLS and BioPortal seamlessly, ISAcreator and it’s associated configurator have done much to make life easier for it’s users.

ISAcreator configurator and it’s use for ontologies

ISAcreator Configurator is a tool used by community experts or curators to detail the Minimum Information (via MIBBI for instance) that users should enter to describe their experiment. As part of this, curators can define which fields (e.g. Sample Name) are required, whether or not they require ontology terms and if so, which ontologies (and parts of the ontologies) users should be pointed to.

ISAcreator Configurator – allows curators to define ‘checklists’: these are the fields required to describe an experiment from start to finish.

The ISAcreator configurator provides both the ability to browse ontologies and to search them, so as to allow curators to select which ontologies and which parts of these ontologies should be suggested to users whenever they annotate a field in the ISAcreator.

Browsing ontologies

Users can browse ontologies in OLS and BioPortal via an easy to use GUI. The browsing is done on the fly by accessing hierarchy web services from both the OLS and BioPortal. An extensible code base means that other ontology resource (should they become available) can be added at any point.

Browse ChEBI from OLS

Browsing OBI (Ontology for Biomedical Investigations) from BioPortal

Search within ontologies

Search within ontologies

Users can search within ontologies and then locate the ontology in the entire ontology hierarchy. This functionality is provided to aid curators in finding the appropriate branch of an ontology to restrict users to.

ISAcreator and how it makes ontology selection easier for users

ISAcreator is a tool used by the community who often don’t know much about ontologies or the minimum information required to describe an experiment. The ISAcreator configurator produces XML describing the minimum information and the ontologies to use so that users need not worry about what to enter.

ISAcreator’s main menu

The user interface needs to be easy to use, and ontology selection should be seamless. The user needs not know which ontology resource they are browsing, but ontologies should be presented to them in a way which makes selection straightforward. ISAcreator achieves this by automatically prompting users for ontology terms whenever they are set to be required by the ISAcreator configurator.

Users are automatically prompted for Ontology terms

Users are presented with the full search result from both OLS and Bioportal in a standard form.

In summary, the functionalities inside both ISAcreator and it’s associated configurator make browsing and searching both OLS and BioPortal much simpler for users! An API with many advanced functionalities, building upon BioPortal and OLS services will be released soon as well as a standalone ontology browser and searcher!