Notes from the life of a computational biologist

A new twist on the identifier mapping problem

Yesterday, Deepak wrote about BridgeDB, a software package to deal with the “identifier mapping problem”. Put simply, biologists can name a biological entity in any way that they like, leading to multiple names for the same object. Easily solved, you might think, by choosing one identifier and sticking to it, but that’s apparently way too much of a challenge.

However, there are times when this situation is forced upon us. Consider this code snippet, which uses the Bioconductor package GEOquery via the RSRuby library to retrieve a sample from the GEO database:

2 thoughts on “A new twist on the identifier mapping problem”

“Biological databases should avoid potentially “troublesome” keys”
Good point – but hard to enforce.
I was parsing some blast against swissprot output a while ago… some gene descriptions contained characters including “\t” “#” as well as quotes. Yipeee!