Thanks for putting this out there for consideration, Eric. I
certainly agree the amount of effort they have invested on the issue
of using LSIDs as GUIDs for organism taxonomic information makes they
a very worthy example, and, as they're work continues to progress, a
possible existence proof of the value LSIDs have to offer.
Being able to deal with species in a more systematic and semantically
granular manner is very important - and will be critical to using
formal semantically-driven information federation techniques to
better support translational research - e.g., enabling the creation
of software capable of placing findings from animal models of disease
in their proper, fine-grained, semantic context to make them useful
to clinical treatment of human disease. It's also critical to
phylogenetic analyses. Both of these issues can be handled now with
sufficient manual effort in a relatively narrow domain, but this is
not scalable and not the recommended plan for the future.
In general, it is helpful to be as specific as possible when
specifying the organism taxon, since that brings with it some
constrained definition of genotype. So, for instance, for the
available digital mouse brain atlases, I believe the most specific
one can be regarding taxon would be "Mus musculus" (ID: 10090 -
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?
mode=Info&id=10090&lvl=3&lin=f&keep=1&srchmode=1&unlock), though it's
possible the more specific subspecies Mus musculus domesticus would
fit (http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?
mode=Info&id=10092&lvl=3&lin=f&keep=1&srchmode=1&unlock), as many
classical inbred strains are derived from this sub-species.
As many on this list are aware, NCBI Taxonomy is in ubiquitous use in
the biomolecular informatics community and is included in UMLS.
Having said that, NCBI Taxonomy is NOT the last - or even the best -
current effort to formally specify the extent of our current extant
knowledge of organism taxonomy and phylogeny. In fact, every page on
the NCBI Taxonomy site such as the ones given above includes the
following disclaimer at the bottom of the page:
"Disclaimer: The NCBI taxonomy database is not an authoritative
source for nomenclature or classification - please consult the
relevant scientific literature for the most reliable information."
The Zoological Record had previously been, since the mid-1880s, in
cooperation with The British Museum, THE authority on this topic.
That situation began to change in the 1990s. The email clips below
are from an email I'd sent a few weeks ago to a colleague in response
to a request for info on the status of defining an agreed upon,
global, comprehensive formal specification of organism taxonomy.
Cheers,
Bill
EMAIL 1:
To my knowledge, there are three basic projects working on the issue
of organism taxonomy with a view toward being globally and
phylogenetically comprehensive:
* Life science library & info scientists associated with university
science libraries, scientific field stations (especially for
agriculture & ecology), life science databases (such as ZR),
botanical gardens, and natural history museums around the world
==> Species 2000 (http://www.sp2000.org/)
* Researchers whose work involves some aspect of studying global
biodiversity
==> Global Diversity Info Facility (http://www.gbif.org/)
* Researchers who study the phylogenetics of organism comparative
anatomy (macro, micro, and biochemical) and behavioral ecology.
==> Tree of Life (http://tolweb.org/tree/)
They are all authorities in their own right - Species2000, GDIF, and
ToL - but each from their own vantage.
In some ways, the GDIF and its constituent participants has been
around the longest, though possibly GDIF the institution hasn't been
around as long the shared biodiversity information aggregation/
integration effort started by several of the participant organizations.
GDIF
Homepage
http://www.gbif.org/
Wiki
http://wiki.gbif.org/gbif/wikka.php?wakka=HomePage
Portal
http://www.asia.gbif.net/portal/index.jsp
Darwin Core data element definitions
http://darwincore.calacademy.org
When you go to the Portal and browse the taxonomy, you see
attribution to sources for taxonomic names. This appears to be in
holding with the following stated goal:
"Taxonomic names. GBIF developing an 'Electronic Catalogue of
Taxonomic Names'. This will provide access to authoritative
information about both scientific and common names for all organisms,
and will integrate data from a wide range of different organisations.
The portal already includes data for over 983,000 scientific names
and 253,000 common names from the Catalogue of Life Partnership
Annual Checklist. Some names are listed with the words 'Tentative
position in taxonomy'. This indicates that the name is only known to
the portal from specimen/observation records and should not be
treated as authoritative simply on the basis of being listed here."
Right now they have 176 data providers for taxonomic information
(http://www.gbif.org/DataProviders/providerslist?sortby=records),
many of which are linked to the Species2000 Project.
I also know GDIF has been looking to use semantic web technologies in
a big way and the LSID as a global identification system (resolvable
URIs for RDF triplet resources).
The Tree of Life has always appealed to me as a bottom up effort of
current investigators whose research aims include a phylogenetics
component. It was the "brain child" of the Maddison brothers (http://
tolweb.org/tree/home.pages/homepeople.html) back in the mid-90s.
I've known of ToL since it's relatively humble beginnings about a
decade ago as a collection of phylogeny web pages organized according
in a phylogenetic tree graph. Back then, there were mostly empty
nodes in the graph. Now they have an absolutely immense collection
of domain expert contributors and an ever decreasing collection of
blank nodes (http://tolweb.org/tree/home.pages/participants.html).
Given the participants involved and their stated objectives (http://
tolweb.org/tree/home.pages/goals.html) there efforts on this task
really need to be somehow incorporated into any comprehensive,
semantically formal expression of organism taxonomy.
EMAIL 2:
I do think this is a critical issue for the medium- to long-term. My
sense has been about 10 years ago NCBI bit off the tractable part of
this problem immediately addressing the needs of molecular biologists
in a manner that has proven exceedingly useful along the lines of the
the way GO has become a ubiquitous tool for many informatics tasks
stretching well beyond it's original design goals - though in the
area of microbes, and particularly viruses, there are significant
problems with NCBI. Whenever such a thing happens - a tool gets
pressed into service for tasks not part of it's original cornucopia
of Use Cases - there is a need to step back. Either you need to
start recommending the community not use the resource for that "new"
purpose - as is often the case for UMLS utilization - or considerable
re-tooling needs to be done.
The biodiversity group includes folks like ZR and the various nat.
history/bot. gardens organizations throughout the world, etc. who've
been working on this issue of organism taxonomy for a very long time
- some for over a century. Few have resources you'd want to use "as
is" if the goal were to construct a well founded ontology. I'm
particularly concerned with the high-level structure of the
"ontology" the TDWG is proposing (the DARWIN Core - http://
darwincore.calacademy.org/). However, it is really ill advised to go
it alone and ignore this body of work.
NCBI taxonomy - like GO - is in such ubiquitous use in the realm of
molecular & celluar biology, one can't throw it out either. Really
what should be done is those at NCBI who curate NCBI Tax., the GBIF
folks, AND the Tree of Life folks need to be brought together to work
on this problem. Otherwise, splintering of the efforts will cause
problems for us all in the future.
On Aug 28, 2006, at 1:02 PM, Eric Neumann wrote:
> I would like to point out the Taxonomic Databases Working Group
> (TDWG) and their work with trying to establish a system of Global
> Unique Identifiers (GUIDs).
>
> http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report
>
> At this point in time they are recommending (within their
> community) the use of LSIDs WITH metadata in the form of RDF.
>
> I would like to propose that we include this on the list of
> examples for the LSID/URI discussion in BioRDF (just added to
> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/
> URI_Best_Practices/LSID_Pros_%26_Cons). I think they have some
> great global examples of how to use such identifiers.
>
> Eric
>
>
> Eric Neumann, PhD
> co-chair, W3C Healthcare and Life Sciences,
> and Senior Director Product Strategy
> Teranode Corporation
> 83 South King Street, Suite 800
> Seattle, WA 98104
> +1 (781)856-9132
> www.teranode.com
>
Bill Bug
Senior Research Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - William.Bug@DrexelMed.edu
This email and any accompanying attachments are confidential.
This information is intended solely for the use of the individual
to whom it is addressed. Any review, disclosure, copying,
distribution, or use of this email communication by others is strictly
prohibited. If you are not the intended recipient please notify us
immediately by returning this message to the sender and delete
all copies. Thank you for your cooperation.