Notes from the life of a [data] scientist

Menu

What’s in a name?

Once upon a time, biologists were allowed to name things with whatever took their fancy at the time. Hence we have genes such as bride of sevenless and proteins like JAK (Just Another Kinase).

Well, that was all terribly amusing but now we are paying the price. Those witty biologists failed to anticipate that one day, we would have so much data that it would be useful if things were named systematically, so as we all know precisely to which gene or protein people are referring. You wouldn’t believe how difficult it is to figure out the answer to a question like “which protein kinase phosphorylates my substrate?” using the current databases. By “which”, I mean the precise sequence records that correspond to a name like JAK, or CSK, or Prkaa1.

It’s good to see some correspondence about these issues in Nature this week:

Related

Post navigation

6 thoughts on “What’s in a name?”

Well, if it’s in Nature, it must be a real problem then. Nature sometimes feels like Time magazine, making big noise about new stories that happened months ago. Or maybe this is just a reflection of the scientific community’s overspecialization problem. In other words why do I need to know other gene names, the only names I need are the ones that I study, and they are easier to remember if I call them names like “sonic the hedge hog”, duh!

Anyway, when everyone gets there act together and decides to act in concert regarding systematic naming of biological entities, we can then have a really fun discussion about which technology should be used to do the naming :)

This is one of those things that most people agree with but somehow it does not get implemented. I talked about this some two years ago with an editor to FEBS letters and I remember him saying that it was a good idea but that he thought that it would be hard to implement something that asks more of the authors. Something on the lines that if you ask too much of the authors that they might go for a journal where it is easier to publish (less of a hassle). I guess today I would counter this by suggesting that the new “feature” (structured machine readable abstract) be marketed as a way to increase the visibility of research. The journal can say that by doing this the author is making their research easier to access and therefore more broadly available.
In any case it is nice to see so many of the points that we have been talking about in Nodalpoint being discussed by a wider audience :).

I can see that people would be resistant to structured text when writing articles. However, it’s the only way to go if text mining is ever going to be any use. I feel that there are so many subtle nuances in the way writers express themselves, even in the rigid format of a journal article, that text mining as it stands today will never extract all the information with 100% accuracy. As you say, the best approach would be to market the idea as something that benefits both author and community.

It’s related to, but perhaps distinct from, the problem of sequence or other “biological object” identifiers. So much of our time is spent just trying to map identifiers between different databases. I would really like to see a concerted effort to solve this.