The digitization of libraries had a clear initial goal: to permit anyone to read the contents of collections anywhere and anytime. But universal access is only the beginning of what may happen to libraries and researchers in the digital age. Because machines as well as humans have access to the same online collections, a complex web of interactions is emerging. Digital libraries are now engaging in online relationships with other libraries, with scholars, and with software, often without the knowledge of those who maintain the libraries, and in unexpected ways. These digital relationships open new avenues for discovery, analysis, and collaboration.

Daniel J. Cohen is an Associate Professor at George Mason University and has been involved in the development of the Zotero extension for the Firefox browser that enables users to manage bibliographic data while doing online research. Zotero [1] is one of many new tools [2] that are attempting to add a social dimension to scholarly information on the Web, so this should be an interesting talk.

November’s entity of the month at ChEBI is the antimalarial drug Artemether. This accompanies release 62 of ChEBI, not just yet another incremental release but an increase of more than twentyfold in the number of entities in ChEBI, thanks to merging of data between an updated ChEBI [1] and ChEMBL [2]. ChEBI now (as of release 62) has over 455,000 total entities, compared to just under 19,000 in the previous version (release 61), see ChEBI news for details. The text below on Artemether is reproduced from the ChEBI website, where content is available under a Creative Commons license:

Artemether (CHEBI:195280) is a lipid-soluble antimalarial for the treatment of multi-drug resistant strains of Plasmodium falciparum malaria. First prepared in 1979 [3], it is a methyl ether of the naturally occurring sesquiterpene lactone (+)-artemisinin, which is isolated from the leaves of Artemisia annua L. (sweet wormwood), the traditional Chinese medicinal herb known as Qinghao. However, because of artemether’s extremely rapid mode of action (it has an elimination half-life of only 2 hours, being metabolized to dihydroartemisinin which then undergoes rapid clearance), it is used in combination with other, longer-acting, drugs. One such combination, licensed in April of this year by the WHO, is Coartem in which the artemether is mixed with lumefantrine – a racemic mixture of a synthetic fluorene derivative known formerly as benflumetol – which has a much longer and pharmacologically complementary terminal half-life of 3–6 days, allowing the two drugs to act synergistically against Plasmodium.

The molecule of artemether is interesting because of its extreme rigidity, with very few rotational bonds. Unlike quinine class antimalarial drugs, it has no nitrogen atom in its skeleton. However, an important chemical feature (and unique in drugs) is the presence of an O–O endoperoxide bridge which is essential for its antimalarial activity, as it is this bridge which is split in an interaction with heme, blocking the conversion into hemozoin and thus releasing into the parasite heme and a host of free radicals which attack the cell membrane.

Artemether is fully Rule-of-Five compliant and has recently also been under investigation as a possible candidate for cancer treatment [4,5].

In preparation, I’ve been revisiting the OBO Foundry documentation, part of which establishes a set of principles for ontology development. I’m wondering how they could be improved because these principles are fundamental to the whole effort. We’ve been using one of the OBO ontologies (called Chemical Entities of Biological Interest (ChEBI)) in the REFINE project to mine data from the PubMed database. OBO Ontologies like ChEBI and the Gene Ontology are really crucial to making sense of the massive data which are now common in biology and medicine – so this is stuff that matters.

The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.The OBO ontologies are for sharing and are resources for the entire community. For this reason, they must be available to all without any constraint or license on their use or redistribution. However, it is proper that their original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers.

The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL. The reason for this is that the same tools can then be usefully applied. This facilitates shared software implementations. This criterion is not met in all of the ontologies currently listed, but we are working with the ontology developers to have them available in a common OBO syntax.

The ontologies possesses a unique identifier space within the OBO Foundry. The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique.

The ontology provider has procedures for identifying distinct successive versions.

The ontology has a clearly specified and clearly delineated content. The ontology must be orthogonal to other ontologies already lodged within OBO. The major reason for this principle is to allow two different ontologies, for example anatomy and process, to be combined through additional relationships. These relationships could then be used to constrain when terms could be jointly applied to describe complementary (but distinguishable) perspectives on the same biological or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies.

The ontologies include textual definitions for all terms. Many biological and medical terms may be ambiguous, so terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.

The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.

The ontology is well documented.

The ontology has a plurality of independent users.

The ontology will be developed collaboratively with other OBO Foundry members.

CC-licensed picture above of the Old Smithy (pub) by Loop Oh. Inspired by Michael Ashburner‘s standing OBO joke (Ontolojoke) which goes something like this: Because Barry Smith is one of the leaders of OBO, should the project be called the OBO Smithy or the OBO Foundry? :-)

Abstract: The W3CSemantic Web for Health Care and Life Sciences Interest Group (HCLS) has the mission of developing, advocating for, and supporting the use of Semantic Web technologies for biological science, translational medicine and health care. HCLS covers hot topics including data integration and federation, bridging commonly used domain standards such as CDISC and HL7, and the applications of medical terminologies. This talk will introduce the HCLS, as well as provide an overview of the activities that are currently ongoing within the task forces, as well as new developments and the recent Face2Face meeting. The role of information extraction and the current interest in Shared Identifiers will also be discussed.

Time: 14.00, Monday 11th May 2009Venue: Atlas 1, Kilburn Building, University of Manchester, number 39 on the Google Campus Map

Abstract: Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of biocuration are substantially more accurate.