New & Noteworthy

The most interesting board games can’t be played right out of the box. You can admire the board and the game pieces, but before the fun can begin you need to spend some time reading the instructions and understanding the strategy.

A little effort put into learning the game allows you to not only play it, but master it. The same can be said for Gene Ontology! Image by Arbitrarily0 from Wikimedia Commons

Gene Ontology (GO) annotations are a little bit like that. You can get interesting information very quickly by just reading the GO terms on the Locus Summary page of your favorite yeast protein in SGD. But if you look deeper and learn just a little bit more about GO, you’ll find that you can get so much more out of it.

If you’re a molecular or cell biologist, a geneticist, or a computational biologist (or are studying one of those fields), you’re probably already aware of GO. But still, you may be wondering, “Where did these annotations come from? What do those three-letter acronyms mean? How can this help me in my research?” This short and sweet article is a great place to start getting answers to these questions.

We recommend that everyone devote a few minutes to reading this brief article, even if you think you already understand GO. Based on the most frequent questions that we get from researchers who use GO annotations at SGD, we can distill it even further into these top three points as seen from an SGD perspective.

There are people behind these annotations. GO terms are assigned either by real, live humans called biocurators, or computationally using automated methods (each annotation is marked, so you can easily see which is which). At SGD, biocurators are Ph.D. biologists who read the yeast literature and capture experimental results as GO annotations; SGD biocurators are also involved in developing the structure of the GO. We try our best, but like all human beings, we are not infallible. So if you see an annotation that looks wrong or confusing, or if you think an area of the GO could better represent the biology, please contact us (sgd-helpdesk@lists.stanford.edu) to talk about it. The more expert help we can get, the better the GO and our GO annotations will be.

The details matter. Those three-letter codes that accompany each annotation mean something. Imagine you are deciding how to allocate your lab’s resources and a critical experiment will be based on a particular protein having a particular function. You see a GO annotation for that function and that protein, so you’re good to go! But wait a minute…

Those codes tell you the experimental evidence behind the assignment of a GO term to a gene product. If that annotation has an IDA (Inferred from Direct Assay) evidence code, then the function was shown in an actual experiment, so you probably are good to go. On the other hand, if the annotation has an ISS (Inferred from Sequence Similarity) evidence code, then it was made solely based on resemblance to another protein. This is still valuable information, but you might not want to bet the farm (or the lab) on it.

Dates are very important too. Both the annotations and the GO itself are constantly updated to keep up with new biological knowledge. Because of this, everything related to GO – from a single annotation shown on an SGD GO Details page, to the downloadable files that contain all GO annotations or the ontology itself – is associated with the date it was created. So if you do any analysis using GO annotations it’s important to note the dates of both the annotation and ontology files that you used. This is especially important if you repeat a GO term enrichment for a gene set over time. The results will definitely change, as significant enrichments become more strongly supported while marginally significant enrichments may not be reproduced.

Go deeper. GO is not just a list of terms. GO terms have defined relationships to each other, with some being broader (parent terms) and some more specific (child terms). If you really understand the structure of GO, you’ll be able to make much better use of the annotations.

For example, if you look for gene products in SGD annotated to the GO term “mitochondrion,” you’ll currently find 1055 of them1. Does that mean that there are exactly 1055 proteins or noncoding RNAs known to be in yeast mitochondria? Noooo!

There are more than that, because the term “mitochondrion” has more specific child terms such as “mitochondrial matrix”; some proteins are annotated directly to those terms and not to the parent term. If you had used the original list of proteins annotated to “mitochondrion”, you’d be missing 92 gene products2 that are so well-studied that their precise locations in the organelle are known! The structure of the GO allows you to gather all the gene products annotated to a term and to all its child terms (YeastMine has a template tailored to this kind of query).

As you can tell, there is a lot more to GO annotations than a lot of people think. And as you dig deeper, you begin to be able to use them in ever more sophisticated ways. Sort of like the natural progression with a strategy board game like Settlers of Catan. At first, even after reading the instructions, you are just trying to work through the game. But as you play more and more, you quickly learn where to build your roads, which islands to colonize and so much more. So get out there and master GO. You’ll be glad you did.

1As of December 2013, using YeastMine template “GO Term -> All genes” (includes Manually curated and High-throughput annotation types).

2As of December 2013, using YeastMine template “GO Term Name [and children of this term] -> All genes” (filtered to exclude Computational annotation type so that only Manually curated and High-throughput annotation types are included).