The Phenotype Research Coordination Network was funded by NSF to establish a network of scientists who are interested in comparing phenotypes across species and in developing the methods needed to make this possible

Supported by a Phenotype RCN collaboration grant, Grant Godden and Pier Luigi Buttigieg met during May 2015 at the Rancho Santa Ana Botanic Garden (RSABG) in Claremont, CA, with the aim of enhancing the ontological representation of plant environments. Grant and Pier processed label data from more than one million plant specimen records hosted by iDigBio, using a combination of natural language processing and text-mining techniques to identify well-represented terms and phrases in “habitat” descriptions. Their interactions with RSABG collections staff, whose active work with specimen digitization and insights into the creation of records that populate repositories like iDigBio, greatly enhanced the project and helped create a workable corpus. The preliminary results of the analyses were immediately informative, revealing gaps in the current coverage of the Environment Ontology (ENVO; Buttigieg et al., 2013).

Further work is planned to refine their computational pipeline and corpus, and to extend ENVO’s coverage of environments which the botanical community frequently sample. A brief publication reporting the process, findings, and results is in preparation.

Visualizing one or more trees/taxonomies with non-trivial number of characters and taxa is a challenge a number of projects is facing. The ETC project organized a workshop with information visualization experts, data providers (trees and characters), and end users to tackle the challenge together.
The meeting was organized by Hong Cui and hosted by Bertram Ludäscher in the National Center for Supercomputing Applications (NCSA), Urbana, IL, in May 11-13. Phenotype RCN participants Matt Yoder, Nico Franz and Martín Ramírez attended the meeting and posed vis challenges. Much of the workshop was devoted to brainstorm on the challenge of representing a large dataset together with some kind of mapping on a tree, and often on two trees simultaneously. This is a familiar challenge for anatomy ontologists, who are trying to represent the interaction of phylogenetic trees, matrices and ontologies:

Right, the level of anatomical data available for different parts of the fin and limb can be visualized for taxa along the fin to limb transition (figure from Dececchi et al. in press 2015, Systematic Biology, doi: 10.1093/sysbio/syv031). Left, a phylogeny of spiders colored according to anatomical complexity, derived from the ontology (figure from Ramírez & Michalik 2014, doi: 10.1111/cla.12075).

The beautiful and clever examples presented by the vis experts were inspiring. How these gorgeous examples can help us represent or complex data in intuitive visualizations? Filters, sort controls, heat maps, zoom panes, collapsing, expanding, and more tools – all made in us two effects: Make some of our challenges look feasible, and refine our vague ideas into more precise challenges.

PathwayMatrix: Visualizing Binary Relationships between Proteins in Biological Pathways (Dang, Murray, and Forbes, 2015). PathwayMatrix can be used not only for biological pathway visualization, but also for character and taxon data. See: http://biovis.net/year/2015/papers/pathwaymatrix-visualizing-binary-relationships-between-proteins-biological-pathways

The ETC project will implement a few promising techniques as part of ETC toolkit to invite comments and suggestions from broader communities. Stay tuned, and crunch your data into nice visualizations!
(post by Martin Ramírez)

Figure 1. SubClass hierarchy of upper-level classes in the Common Anatomy Reference Ontology (CARO, orange boxes), plus relations to critical external ontology terms from Basic Formal Ontology (BFO, grey boxes), Population and Community Ontology (PCO, yellow box), and Gene Ontology (GO, pink boxes). Terms in light purple boxes are found in multiple ontologies (CL=Cell Ontology). The term ‘Organism’ in the green box is not an ontology term but a class from the Darwin Core Vocabulary that is a subclass of the new CARO term ‘biological entity’. ‘Biological entity’ is a catch-all term for any material entity that is, is part of, or derived from an organism, virus, or viroid, or a collection of them.

Before the post-Thanksgiving haze had lifted, a small group of ontologists (Melissa Haendel, Chris Mungall, David Osumi-Sutherland, and Ramona Walls) converged on the lovely small town of Brownsville, Oregon to work on the Common Anatomy Reference Ontology (CARO), the Population and Community Ontology (PCO), and PATO, an ontology of biological qualities. This work was done within the context of the larger group of ontologies that make use of or are used by CARO (UBERON, GO, CL).

CARO is a relatively small upper ontology with ~165 classes and a few core relations that is used to link taxon-specific anatomy ontologies ranging from fruit flies to vertebrates to plants. The 1.0 release of CARO has been widely used, but usage has been quite inconsistent and sometimes incorrect. This is partly due to lack of clarity in some definitions, but also because it was written at a time when we lacked the tools to provide automated reports of incorrect usage.

PCO is recently developed ontology focussing on populations, communities and the relationships between organisms. The definitions of organism types in CARO are critically important for this ontology, as are the biological qualities applying to groups of organisms in PATO.

PATO, an ontology of biological qualities, has been very widely used by the community brought together by the Phenotype RCN as well as in defining classes in a wide range of other ontologies used by this community (covering phenotypes, anatomy, cell types and populations). So far, PATO has had limited axiomatisation, but there many obvious cases where axiomatisation could improve its integration with ontologies that use it – including the PCO and anatomy ontologies.

A major aim of our work on CARO at this meeting was to redraft textual definitions so that they could be understood by any competent biologist and to redraft logical definitions so that they could be used for automated classification and error checking. For both logical and textual definitions, we aimed to focus on distinctions that are important to biologists – either directly, or indirectly by making biologically useful queries possible. We also aimed to take into account new use cases that have arisen since CARO 1.0 was released, as a result of work on the PCO as well as on anatomy ontologies and the ontologies and tools that use them. In parallel with this work, we aimed to improve related axiomatisation of PATO.

Over two and half days of leftover turkey, home-fermented vegetables, and farm-fresh eggs, we took care of operational issues such as repository maintenance, as well as more hard-core ontologizing. A highlight of the meeting was an informal gathering on Monday night when we were joined by Laurel Cooper and John Campbell from Oregon State University and Joe Fontaine from Murdoch University to discuss the intersections of ontologies, ecology, plant traits, and biodiversity.

Further development of PCO, including updating import files, testing ODPs for defining collections of organisms and species/organism interactions.

A pending beta release of CARO2.0 and plans for how to announce it.

Better formalization of PATO through general class axioms (GCIs) necessary for CARO and PCO.

A Jenkins job that reports on and verifies ontologies that use CARO (FBBT, PO, XAO, and ZFA))

A draft paper on CARO2.0.

One of the key use cases for anatomy ontologies is annotation of gene expression, and we wanted a way to help curators avoid the pitfall of annotating expression to the (immaterial) space that is part of a structure rather than the (material) structure that surrounds it. We propose a design pattern in which any structure that has an interior space (such as stomach) would be modeled using four classes: one for the entire structure (which includes both the surrounding structure and the space that is part of it), one for the space, one for the wall (which is just the surrounding structure without the space) and one for “wall region”. A wall region is any portion of the wall that spans the full thickness of the wall for its entire lateral extent, whereas the wall is the mereotopological sum of all wall regions. Following this pattern, an ontology that wished to include a stomach would have classes for “stomach”, “stomach lumen”, “stomach wall”, and “region of stomach wall”. We opted against including very general classes such as “wall” or “wall region” in CARO, and instead plan to document the pattern and provide a template for its use in anatomy ontologies.

One way of specifying the structures such as a stomach that have a geometric component is through the use of GCIs in PATO. PATO includes a number of classes for qualities describing shape. Of these, lumenized, tubular, and saccular are the most relevant to CARO. We began adding GCIs to PATO of the form:

bearer_of some lumenized subClassOf ‘has part’ some lumen

bearer_of some unlumenized subClassOf not (‘has part’ some lumen)

An open question remains on how to document these patterns (in CARO or as separate patterns). One possibility is for CARO to include abstract geometrical classes such as “anatomical tube” or “anatomical tube wall” and “tube lumen”.

The Alfred P. Sloan Foundation’sDigital Information Technology program has awarded $499K to Phoenix Bioinformatics to catalyze development of creative new user-based funding strategies for research databases. Phoenix was founded in 2013 by the staff of the Arabidopsis Information Resource (TAIR) to provide new support mechanisms as TAIR transitioned away from grant-based funding. Following its success with TAIR, Phoenix will be assisting other databases with their funding challenges and helping them find new ways to sustain their projects for the long term with community help. For more information about TAIR’s transition to sustainable funding or the newly funded Phoenix project please contact Phoenix Bioinformatics.

In case you missed it, our latest Phenotype RCN publication came out this week in PLoS Biology. In this perspective we argue for more investment in the infrastructure needed to make phenotypes more accessible. Check it out!

Abstract.—Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today’s data barriers and facilitate analytical reproducibility.

In late July, the Phenotype RCN and Phenoscape co-sponsored several speakers in the symposium “What should Bioinformatics do for EvoDevo?” co-organized by Günter Plickert, Mark Blaxter, Paula Mabee and Ann Burke. The symposium was part of the European Society for Evolutionary Developmental Biology (EED) meeting, held in Vienna. The organizers brought together speakers whose research and perspectives provided examples of how EvoDevo data integration is necessary for discoveries. Several speakers presented new insights into EvoDevo that were directly derived from sequencing genomes or transcriptomes. Others showed how by using semantic methods to represent species phenotypes, they could be linked to genetic and developmental data, and the research questions that they addressed. This well-attended symposium met its goals, which were to:

promote awareness of new and developing resources and methods as well as EvoDevo uses of existing ones.

promote discussions in the EvoDevo community that value input of bioinformatics to EvoDevo questions.

invite the audience to share their ideas of how to move the integration forward

The excellent organization of this conference and the wonderful venue helped spark several new collaborations and grant proposals. Talks and speakers in this symposium included (full program found here):

Insights into the evolution and development of planarian regeneration from the genome of the flatworm, Girardia tigrina, presented by Sujai Kumar (University of Oxford, GBR)

From the wet lab to the computer and back: A stage specific RNAseq analysis elucidates the molecular underpinnings and evolution of Hydrozoan development, presented by Philipp Schiffer (University of Cologne, GER)

Insights into the evolution of early development of parthenogenetic nematodes by second generation sequencing, presented by Christopher Kraus (University of Cologne, GER)

Do sponges have true tissues? This fundamental question is just one of the controversial topics that Phenotype RCN team members encountered as they constructed a new ontology to describe the unique features of sponge anatomy. As you can see from the diagram below, the team opted to describe “functional layers” of sponge cells, re-using the CARO class ‘portion of tissue’ to contain these layers.

The recently published Porifera ontology (PORO) is an outcome of Phenotype RCN meetings that matched experts in creating ontologies with taxonomists seeking to improve phenotype descriptions and databases. Sponge biologists Bob Thacker, Cristina Díaz, Adeline Kerner, and Régine Vignes-Lebbe teamed up with information scientists Chris Mungall, Melissa Haendel, and Erik Segerdell to generate the ontology from an existing thesaurus of anatomical terms. The ontology is currently being used to allow natural language processing software to efficiently extract morphological characters from taxonomic monographs.

Several members of the Plant Working Group got together at Phoenix Bioinformatics in lovely Redwood City California at the end of September to write up results of the long-running Plant Phenotype Pilot Project (or PPPP as the cognoscenti call it). The first draft is affectionately known as the Plant Phenotype Pilot Project Preliminary Paper (or PPPPPP).

Despite their obsession with the letter P, working group members Carolyn Lawrence, David Meinke, Ramona Walls, Lisa Harper and Eva Huala — with the help of Anika Oellrich, Laurel Cooper, Pankaj Jaiswal and George Gkoutos who were able to join via Skype — made good progress over the course of two and a half days on writing up sections describing the assembly and analysis of the phenotype dataset produced by the group, which includes 6361 Entity-Quality statements describing mutant phenotypes associated to 2744 genes across six well-studied plant species (Arabidopsis, rice, maize, soybean, Medicago, and tomato).

Other recent activities from the plant working group include submission of a grant proposal to NSF-ABI over the summer to fund the continuation of this work. The submission was made to “Advances in Biological Informatics” with funding sought from both the NSF and BBSRC under the “UK BBSRC-US NSF/BIO Lead Agency Pilot Opportunity” program.

The plant working group would like to thank the RCN for covering travel expenses for this and previous working group meetings; this funding has enabled the group to work together effectively despite being scattered over a wide range of institutions in the USA and UK.

Differences in the position and orientation of anatomical structures among species, or between mutant and wildtype organisms, is a frequently described source of phenotypic variation in the literature. For example, pectoral fins can be located anteriorly or posteriorly on a fish’s body, and their bases can be inclined vertically or horizontally. Although widely used in phenotypic descriptions, the application of spatial descriptors can differ across fields of biology, causing confusion when one attempts to compare anatomy across species. We developed the BSPO to standardize the description of spatial terminology across taxa and to enable spatial reasoning.

Fig 1. Comparison of primary organismal axes designated in a diversity of species and their representation in BSPO

Fig 2. Organization of high-level spatial classes in BSPO and some of their children.

We describe the challenges in designating positional terminology across the wonderful diversity of body forms, including headless animals and those with the anus adjacent to its mouth:

Fig 3. An individual zooid of the colonial ectoproct Bugula.

The BSPO is currently used by projects that require positional representation in anatomy ontologies and phenotype annotation. Terms in the BSPO are most developed for animals and, to a lesser extent, plants. We welcome feedback from the community, particularly to improve the taxonomic coverage of spatial terms in the ontology.

A mini-symposium, entitled ANATOMY ONTOLOGIES: BIOINFORMATICS IN THE ANATOMICAL SCIENCES, was held at the Annual Meeting of the American Association of Anatomists (AAA) a few months ago. I was very pleased to assemble a wonderful group of speakers for the symposium, which was supported by grants from AAA and Phenotype RCN. My purpose for organizing the symposium was to introduce cutting edge tools of bioinformatics to my anatomy colleagues. Although the talks were as good as I could have hoped, the attendance at the meeting was a bit poor, likely because of the many concurrent sessions (see program). Here is the line-up for the six talks. Clicking on the title will take you to the abstract.