I couldn't help but notice just how non-descriptive the gene names that modern genetics is using. Currently I'm reading "The new science of Evo Devo" by Sean B. Carroll and here are some examples of gene names used:

Fzrb

Krox 20

Hoxa2, Hoxb4

ZPA

FGF8

sonic hedgehog

While these names identify genes uniquely, they do very little to express what and where the gene does, or how it is related to other genes (While FGF8 may be related to FGF7, it's relationship to XYZ10 is not obvious).

I get the need to uniquely identify genes , and the book is an example of just how hard it is to presently write about a lot of genes at once. The author creates a picture of what's going on, but the gene names get in the way. Even in cases where a gene has a semi-descriptive name, like "eyeless", the reader has to remember that it's actually the gene responsible for eye formation.

Are there are any efforts underway to systematize or name genes for a given organism in an expressive manner?

As a programmer, I write code for a living, and having descriptive names makes it easier to look at someone else's code, read about code and even discuss it with novices. For example:

initializeDataModel

createViewHierarchy

userDidSelectLayerAtIndex

Modern programming tools make using descriptive names easier, because of autocomplete - typing in the first few letters of a programming structure completes the rest. Even google has a list of autocomplete suggestions.

We are all familiar with the Internet, where biology.stackexchange.com/questionname resolves into a specific page. Stackexchange is the site we are visiting, and Biology is a subset of that site. There are other biology websites, but biology.stackexchange.com uniquely identifies this site. The use of "biology" in the address gives readers a general idea of what the site is about and relates it to other biology sites. Our web browsers resolve the address into a proper string of bytes and get the right page. What if we name genes like like

com.drosophila.eyeformation

com.chicken.limb.structure.ZPA

com.human.development.geometry/XYZ10

,and whatever technology we use would actually resolve that descriptive name into a gene or a series of gene interactions?

You are absolutely right and gene names are often horrid. Just wanted to point out that one of the genes on your list is actually quite descriptive: FGF8 is fibroblast growth factor 8, doesn't get any better than that.
–
terdonMay 7 '13 at 16:54

4 Answers
4

Many gene names are descriptive, e.g. DRD1: dopamine receptor D1, TOP2A: DNA topoisomerase 2-alpha, or PTGS1: prostaglandin G/H synthase 1. These are examples of genes that have a clearly defined main function.

The genes you listed are involved in development, and there describing the function of a gene becomes much more difficult. E.g. sonic hedgehog is involved in patterning many organs, from brain structures to teeth. How would you assign a descriptive name?

The analogy to (programming language) functions does not work for many genes, because they were not designed to fulfill one function, but evolved to do many different functions. For genes that have many functions, we just need a unique name. The multitude of functions is catalogued elsewhere.

Maybe there's a higher level of abstraction then, like gene interactions or cycles that can be given descriptive name?
–
Alex StoneMay 8 '13 at 1:39

@AlexStone: there are pathways that can be given a nice name, e.g. genome.jp/kegg/pathway.html -- but even then there will be more functions to the pathway than what can be encapsulated in the name
–
Michael KuhnMay 8 '13 at 6:45

In theory, your idea seems reasonable, but I can see at least to problems with implementing it.

First, the situation is complicated by the fact that genes often carry out multiple functions, and a single label is often insufficient to annotate the complete functional repertoire of a gene. @MichaelKuhn made an excellent point with regards to the evolution of gene function. When one stops thinking about a gene (and its protein products) as something that was designed to fulfill a specific purpose, but rather as something that has been adapted (often serendipitously), for better and for worse, to fulfill a particular set of functions, the difficulty with annotating gene function is a lot clearer.

Second, this idea is confounded by the fact that reliable functional annotations are really, really rare. Sure, lots of genes in human and yeast and Drosophila have been extensively studied for decades, and many of their functions are well understood (at least we think they are). But for the vast majority of genes for the vast majority of organisms that have been studied, functional annotations are inferred from sequence similarity to previously annotated genes, without any additional experimental validation. Of course, every genome project would love to functionally classify all of the genes for their pet genome, but this is really expensive, time consuming, and low-throughput. For every extensively studied, well annotated gene, there are literally hundreds of analogous genes in other organisms that have been annotated solely based on sequence similarity to this gene. As disappointing as this is, it doesn't seem like it's going to change any time in the near future (unless there are some drastic improvements in the throughput of gene function annotation). So any proposed nomenclature convention that does not address this reality is quite inadequate in my opinion.

Are there are any efforts underway to systematize or name genes for a given organism in an expressive manner?

As you observed, the genes are named by their discoverers and some genes usually have anecdotes behind their names. Since gene names have to be small it is difficult to systematically describe them completely. But gene nomenclature can adopt a certain way of classification and it has been used extensively in yeast; genes are named by their chromosomal locations (This was done in an age when cytogenetics was more developed than functional genetics. So you can understand why it so happened). Some genes were named after the processes they are involved in. For example cdc2, cdc48 etc (cell cycle control)

This is of only peripheral relevance, but I think that you have got the history of yeast genetic nomenclature backwards. For decades yeast genes were defined functionally and were named accordingly. So, for example, the CDC genes (uppercase = dominant form, usually wild-type, lowercase = recessive form, usually mutant) were described by Lee Hartwell in 1970, about 25 years before the yeast genome was sequenced. At this point ALL of the ORFs/genes were given a systematic name (e.g. CDC1 is YDR182W). The aim is, however, still for all genes to be given a name relating to function as well.
–
Alan BoydMay 7 '13 at 20:26

thanks for correcting.. i was unaware of that..
–
WYSIWYGMay 7 '13 at 23:16

I think it seems like a good idea at first but...
You have to take into account that as has been said genes might have multiple roles, sometimes very different and for most of them functions are still unknown.
So It's very difficult to classify a constatly changing promiscuous set of things. Even the gene concept is changing (Hello, ENCODE :)).
To me It's a lot easier to leave names as short as possible with some hints of they role (just the way they are) and if you're interested in the specific function, a very quick visit to a database and using the list as input will provide you with the current description and/or function of the gene.