HCLSIG/PharmaOntology/Resources

Existing Resources

Ontologies

SO-Pharm is a knowledge base for tying genotype (especially SNPs), drug and phenotype. It directly imports Coulet's SNP ontology, ChEBI and DO in OWL. Other ontologies were translated to OWL. Notably, Coulet drew his own UML diagrams for clinical trials.

n-ary relations are dealt with by the addition of new classes. 'is composed of', a subproperty of ro:has_part, domain and range are each disjunctions of over 10 classes, and is distinct from the SNP property 'isComposedOf'.

In the paper, a 'demylenised_patient' is modelled as a subclass of person with a conjunction of further restriction classes, one of which is:

So, if the patient is enrolled in anything (a 'clinical trial' by the range axiom) then he is enrolled in a clinical trial that can only (be)DefinedBy its involving mercaptopurine_treatment. He cannot also be enrolled in a dissimilar trial. This rule, if intended, might more reasonably be stated as a restriction on 'clinical trial', but such would contradict the comments in the ontology.

Implemented in OWL-DL, this suite of upper ontologies consists of BioTop, ChemTop + 4 bridge ontologies to tie in BFO-RO. The suite tries to be independent of granularity, and as such has had to eschew use of the subclasses of bfo:IndependentContinuant. There are bridges to GO, Cell Ontology and ChEBI. There are no imports.

Role, also a direct descendant of Thing, is broken out into seven subclasses, none of them formed by property restrictions. ChemicalRole has Catalytic- and Reagent- subclasses; DrugRole is subclassed to Therapeutic- and HealthRelated-. All these are organized into a tree enforced by disjointness.

The descendants of ProcessualEntity do not have disjointness axioms. Causing, Complicating, Disrupting and ManagingCare are all subclasses, not formed by restriction.

BioTop has a selection of generic properties aligned with RO. Notable additions are hasAbstractPart and qualityLocated.

Created with an early version of Protege and free of any annotations, there are no URIs. 'property' is a class, among whose subclasses are dosage_form and intake_route. Individual 'capsule', and several others, are of type dosage_form. There are no comparable individuals which are of type intake_route. There are properties for Monograph IX class levels 1-3.

RO is a small hierarchy of properties for OBO ontologies. The world is divided into continuants and processes, where "The terms 'continuant' and 'process' are generalizations of GO's 'cellular component' and 'biological process' but applied to entities at all levels of granularity, from molecule to whole organism." http://genomebiology.com/2005/6/5/R46

The native format for RO is OBO instead of OWL, and this has a few consequences. In the OWL mapping, in addition to the changes for obo:is_a and obo:instance_of, some important property characteristics disappear. For example, obo:part_of is transitive, reflexive and anti-symmetric, as seen in the downloadable ro.obo. But in the downloadable ro.owl, transitivity is the only surviving property characteristic. OBO anti-symmetry (R(x,y) and R(y,x) implies x=y) is a weaker characteristic than OWL asymmetry (R(x,y) implies ~R(y,x)). The OWL 2 New Features and Rationale discusses the mapping.

The SDTM (Study data tabulation model) "is built around the concept of observations collected about subjects who participated in a clinical study." Observations are broken out into Findings, Events and Interventions.

Eric Neumann has argued that, in addition to describing Observations, SDTM should hold high-level concepts of Study and Subject. There should be provision for holding unique URIs provided by NCI Thesaurus, and the types of observations should be augmented with further URIs to express refined descriptions. While these suggestions have gone unheeded, SDTM presumably will become the standard for submitting clinical trial data to the FDA. SDTM data might well be a valuable addition to the LOD cloud, or at least to the RDF accessible from the translational medicine ontology.

Given the above, what can be done? Could the translational medicine ontology use SDTM data?

Study and subject do not have their own domains; they are relegated to 'identifier variables'. These identifier variables, notably STUDYID and USUBJID (www2_sas_com/proceedings/forum2008/207-2008.pdf), are "keys" for study and subject, and would be properties with rdfs:domain Observation. Under OWL 1 there is no provision for an inverse functional datatype property, so there would be no way to 'pivot' on the USUBJID in order to make visible his genotype and biomarkers.

With OWL 2 "easy keys" (pdf), however, we now will be able to isolate the subject, where he is uniquely determined by a combination of properties whose domain is sdtm:Observation. Instead of having Observation as the only large-scale class, we can now have both Study and Subject as well, as Neumann suggested. The only downside is that easy keys can be used on only named individuals in the graph; they cannot be used on bnodes. So, every member of both Study and Subject must be specified explicitly in the ABox component, which should not be too burdensome. In fact, these now can be URIs in the RDF, outside of SDTM itself.

So declarations of the domains, ranges and easy keys might look something like:

Real Entities are distinguished from the UtilityClasses. Disjointness is established for top subclasses under Entity. All 50 properties have domain and ranges. There are 18 universal restriction classes and 20 cardinality restrictions. The PhysicalEntity class is meant to accommodate instances in distinct states, such as post-translational changes to a protein, and which participate in interactions, whereas the EntityReference class unifies these physical states into a commonly used reference.

Pathway is an Entity, and PathwayStep is a UtilityClass. They can be related by the near-inverse properties pathwayOrder and stepProcess. nextStep establishes an ordering between PathwaySteps, and this is typically used to isolate steps in a biochemical pathway. Pathways need not be decomposed into a network of parts, but can instead be composed of a bag of interactions, and this is typically used for molecular reactions. cofactor, controlled and controller are notable subproperties of 'participant' to specify the nature of participants in interactions.

The Interaction class should probably be an owl:equivalentClass to its immediate subclasses. "Since it is a highly abstract class in the ontology, instances of the interaction class should never be created." Same is true of SequenceLocaton and Xref classes.

SBO classes give a vocabulary for describing the components of systems biology and their interactions. With the native source being OBO, there are no properties. Many of the classes look like they easily could be mapped or subclassed to other ontologies. But the most important of the major classes is 'mathematical expression'. For its subtree, the leaves and certain other classes bear annotation defs that hold MathML lambda expressions, especially for rate equations. The variables of these expressions are other SBO terms, and this ties the corpus of mathematically defined rate equations to their intended parameters.

There is some interest in bringing the Systems Biology Markup Language (SBML) into the HCLS.

Native representation is OBO. PATO is primarily a class hierarchy of 2000 classes, without disjointness but meant to express, I believe, a tree partition. Each class is a term for a quality of a phenotype, independent of the phenotype that bears it. There are only 17 properties, some of which participate in existential restrictions. The top class 'quality' has subclasses for qualities inhering in continuants and in processes. A third subclass will be obsolete, and houses concepts of intensity, magnitude and deviation from normalcy. From George Gkoutos ppt: "Qualities inhere to entities: every entity comes with certain qualities, which exist as long as the entity exists."

Notably, there are classes for lacking or having extra processual parts, for being disfunctional, for being unnecessary or insufficient, for having a variety of dispositions of variabilities, and for being absent from an organism.

Native representation is OBO, meant to describe clinical abnormality. This is a hierarchy of 9000 classes without disjointness that is a graph instead of a tree. The three main partitions are graphs for organ abnormality, inheritance, and onset and clinical course. There are no properties.

Under inheritance, 'Autosomal dominant vs. multifactorial' is a subclass of Multifactorial. The clinical course vocabulary has categories of phenotypic variability that distinguishes
'Highly variable phenotype and severity' from
'Highly variable phenotype, even within families'.