Information in biology related to organisms is often annotated with a name. We refer to this content as name-bearing or biocentric, content.

Information is being digitized and made available online at an increasing rate. Sources of biological information include primary specimen data from the worlds museums and natural history collections. The digitization efforts of GBIF aim to provide access to billions of specimens. Each of these is labeled with some form of a name.

Names are recorded in notebooks, articles, texts, labels, on photographs and specimen jars and in many other media and forms. When gene and protein sequences are submitted to GenBank a name is included to specify to what the sequence refers. Names are used to record observations, fishing survey data, and in ecological studies. They annotate medical literature, research publications, and news stories.

Their occurrence within these data objects is self-evident and names would appear to be a logical candidate for keyword searches within data repositories. The only problem is that organism names are neither fixed, stable, or unique and employing them as query terms can result in receiving information that is not related to the organism you were looking for as well as missing information you wanted.