The Phenotype Research Coordination Network was funded by NSF to establish a network of scientists who are interested in comparing phenotypes across species and in developing the methods needed to make this possible

Andy Deans

As you may recall, at a Spring 2013 meeting of the Phenotype RCN in Durham, NC, the Behavior Breakout group discussed the existence of multiple behavioral ontologies, including the gaps in existing ontologies (such as the Neuro Behavior Ontology, or NBO) that preclude their widespread use in behavioral ecology and other sub-disciplines in animal behavior. The group felt it could be possible to merge two existing behavioral ontologies – the NBO, developed to serve studies of animal models of human behavioral dysfunction, and the Animal Behavior Ontology or ABO, developed to serve the field of comparative animal behavior, including behavioral ecology and other sub-disciplines. If successful, the merger would facilitate the broader integration of behavioral studies: applied with basic, model organism with comparative investigations, mechanistic with evolutionary, and human with non-human animal questions. At the same time, it would also need to continue to serve the specialized needs of subfields.

In late summer 2014, a small group of animal behaviorists who were present at the 2013 meeting in Durham (Anne Clark, Sue Margulis, Peter Midford, Cynthia Parr) received NSF funding to hold two workshops to accomplish these goals.

Our first workshop, held August 2014 at Princeton University, convened over a dozen animal behaviorists with a broad range of expertise in comparative behavior to develop specific recommendations on how to integrate the basic terms and concepts of the two ontologies. Key outcomes included a list of proposed changes in parent-child relations in the NBO to emphasize function, and ABO term definition improvements that together could serve as the basis of integrating the two ontologies.

Our second workshop, supported in part by additional funding from the Phenotype RCN, was held at the Smithsonian’s National Museum of Natural History, Washington, DC, on October 24-25, 2015. Its specific goal was to start the process of merging the ABO and the NBO based on the first workshop’s recommendations. Attendees in addition to the four organizers, were our local host Katja Schultz (Encyclopedia of Life), Elissa Chesler (The Jackson Laboratory), George Gkoutos (NBO developer, University of Birmingham), David Osumi-Sutherland (European Bioinformatics Institute, Virtual Fly Brain), Melissa Haendel (Oregon Health and Science University), and Reid Rumelt (Cornell University undergraduate working with Macaulay Library and Encyclopedia of Life).

The workshop began with presentations about the histories of NBO and ABO. NBO had its roots in a phenotype vocabulary supporting the EUMORPHIA project (see http://empress.har.mrc.ac.uk/ and http://www.europhenome.org/). Behavior terms were initially included in the Gene Ontology, but also maps to phenotype ontologies, such as the Mammalian Phenotype ontology (MP) and Human Phenotype Ontology so as enable the integration of data. The Neuro Behavior Ontology was created to concentrate effort specifically on behavior.

ABO was one of the first accomplishments of the EthoSource project1, begun with an NSF-sponsored workshop in 2000 with the goal of developing integrated online resources for the discipline of Animal Behavior. Two NSF- sponsored Ontology Workshops followed in 2004-2005, at which an international group of animal behaviorists developed a basic metadata standard for the discipline, the ABO. The primary use of the ABO subsequent to 2005 was indexing an online ethogram repository, EthoSearch.org.

In our second blog post, we will summarize the progress we made in the October workshop, and outline our next steps.

One of the fundamental goals of biology is understanding the interactions of environment and phenotype, but this is a surprisingly difficult topic to study – not because of the concepts, but because of the data. Observations about environment and phenotype occur in separate data sets and the terms used are far too idiosyncratic for automated integration. Several biological domains, including conservation and phylogenetics could be advanced if these two data types could be easily merged on a large scale.

I led a recent paper, published in PeerJ, which suggests that the use of ontologies to standardize and link data about phenotypes and environments can enable scientific breakthroughs by increasing the scale and flexibility of research. This paper was a product of a workshop facilitated by the Phenotype RCN and supported by the National Science Foundation. My co-authors and I give several domain-specific use cases describing how an ontology can help advance science in four biological sciences. We then discuss the challenges to be addressed, present some proof-of-concept analyses, and discuss existing ontologies. The summary contains three suggestions for increasing interoperability between phenotype and environment data.

We hope this paper provides you with an overview of the landscape of ontologies available for integrating environmental data, and inspires you to use them in relation to your own data. For more information about ontologies and semantics, a good first read is Semantic Web for the Working Ontologist by Dean Allemang and Jim Hendler.

Supported by a Phenotype RCN collaboration grant, Grant Godden and Pier Luigi Buttigieg met during May 2015 at the Rancho Santa Ana Botanic Garden (RSABG) in Claremont, CA, with the aim of enhancing the ontological representation of plant environments. Grant and Pier processed label data from more than one million plant specimen records hosted by iDigBio, using a combination of natural language processing and text-mining techniques to identify well-represented terms and phrases in “habitat” descriptions. Their interactions with RSABG collections staff, whose active work with specimen digitization and insights into the creation of records that populate repositories like iDigBio, greatly enhanced the project and helped create a workable corpus. The preliminary results of the analyses were immediately informative, revealing gaps in the current coverage of the Environment Ontology (ENVO; Buttigieg et al., 2013).

Further work is planned to refine their computational pipeline and corpus, and to extend ENVO’s coverage of environments which the botanical community frequently sample. A brief publication reporting the process, findings, and results is in preparation.

In case you missed it, our latest Phenotype RCN publication came out this week in PLoS Biology. In this perspective we argue for more investment in the infrastructure needed to make phenotypes more accessible. Check it out!

Abstract.—Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today’s data barriers and facilitate analytical reproducibility.

Regular RCN attendees Matt Yoder, István Mikó and Andy Deans attended ICIM3 in Berlin, Germany on August 3rd–7th. The congress brought together world leaders in invertebrate morphology for a week of presentations and discussion on the campus of the Humboldt University in Berlin. Logistics were flawless, with ample food and drink to wet interactions (e.g., endless beer and pretzels for the poster session!). The conference was truly a showcase of phenotypes and was fascinating from the standpoint of just seeing examples of life evolving. For those of us interested in semantically describing morphological diversity, the myriad approachs to representing morphology as data was extremely informative and indicative of the challenges we face.

In addition to generally absorbing the goings on, Yoder and Deans participated in a eMorphology symposium led by Lars Vogt, one of the PIs of MorphDBase. Deans presented on the state of semantic phenotype representation, with particular attention to its role in taxonomy (Deans ICIM3 slideshow), a follow-up to a presentation and panel discussion from the last ICIM (Deans ICIM2 slideshow). Yoder delivered a talk (http://dx.doi.org/10.6084/m9.figshare.1127970) on behalf of Jim Balhoff et al., on presence/absence inference utilizing Phenoscape KB. Balhoff has written tools that utilize inference to expand the knowledge provided by curators into much larger datasets asserting the presence or absence of anatomical features across taxa. These tools also find logical inconsistencies with curator made statements, and are a great example of a practical approach to computing on phenotypes.

A meeting highlight was the opportunity to see the latest and greatest imaging technologies within a special symposium on advances in microscopy. Speakers highlighted advances in 3D and 4D imaging, with systems capable of generating massive datasets—easily rivaling the big-data world of genomics. Handling these data has become a science itself. It was great to see open-source software and hardware(!) initiatives leading the field in this regard. Stephen Saalfeld’s talk on image alignment was amazing, a presentation similar to that given at ICIM3 is available on Youtube. Pavel Tomancak’s description of light-sheet microscopy using OpenSpim was also inspirational.

Arthropod phenotypes on display in the halls. Ready access to specimens and hand-blown glass models (see below) catalyzed several discussions about the evolution of form and function in this phylum. Photo by Andy Deans (CC BY 2.0).

Finally, the meeting was flush with opportunities for developing longer term collaborations. The curators of MorphDBase and the recent initiative TaxonWorks spent significant time discussing the possibility of sharing a code-base and thus greatly extending their resources. We hope that this collaboration comes to fruition and that it becomes an important component of “phenotype-handling” in the future.

A special thanks to the Phenotype RCN PIs for supporting, in part, our attendance.

Glass models of invertebrates, on display at the Humboldt University, in Berlin. Photo by Andy Deans (CC BY 2.0).

Calling all Phenotype RCNer’s and anyone else who works with phenotype data – We want your name on a manuscript supporting a computable phenotypes future! (If you read and agree of course.)

Over the past four years of sponsoring meetings, courses, and exchanges, we have, with your help and participation, developed a comprehensive understanding of where the phenotype community is at, what is needed for integration of phenotypes with other data, and a vision of the science that could be achieved with this integration. In this article, we attempt to educate researchers, granting agencies, and policy makers on the current ‘non-computable’ state of phenotypic data across various life science domains, and we try to motivate them to use, develop, and advocate for semantic methods. Because of the relevance of this work to most areas of biological sciences and because it relates specifically to creating interdisciplinary knowledge—and especially because it is open access—PLoS Biology is our target journal.

Please respond by 18 June 2014 (next Wednesday). We will post updates here on our blog.

‘Branching’ phenotypes are not easily recovered from free text (far right column), the format in which most organismal phenotypes are recorded. (top row) Bee setae are usually modified in a way that presumably facilitates pollen collection, a €153 billion ecosystem service. This relatively simple phenotype has been described in myriad ways. Photo of bumble bee covered in pollen by Thomas Bresson (source). Photo of seta interacting with pollen grain by István Mikó (source). (middle row) Plant trichomes take on many forms and likewise are described using many lexicons. Photo of Arabidopsis plants covered in hair-like structures (trichomes) by BlueRidgeKitties (source). Scanning electron micrograph of Arabidopsis trichome by Heiti Paves (source). (bottom) In zebrafish larvae, angiogenesis starts with vessels branching to form a network (right image) that is referred in disparate ways. Zebrafish embryo photo by MichianaSTEM (source). Zebrafish blood vessels image is Figure 5A from Alvarez et al. 2009.

Landscape at Catalina State Park, near Biosphere 2 in Arizona. A great place to observe arthropod phenotypes! Photo by Andy Deans (CC BY 2.0)

The Arthropod Working Group of the Phenotype RCN stayed an extra day at Biosphere 2, after the annual group summit meeting, so that we could take stock of our own progress and discuss future interactions. We’re a heterogeneous crowd, each working on a different taxon (non-Hexapod Pancrustacea, Araneae,Hymenoptera, Coleoptera), often on different systems (integument, circulatory, neuroanatomy, etc.), and with different motivations (taxonomy, gene expression, evolutionary questions. etc.). Our annual meeting is a chance to catch each other up on progress in our systems but also to discuss limitations and possible solutions. We’re also charged with developing a common anatomy ontology that bridges disparate lineages, some of which are represented in existing anatomy ontologies (e.g., see Costa et al. 2013 and Yoder et al. 2010). In attendance this year: (L to R in photo below): Lars Vogt (Universität Bonn, Germany), Peter Grobe (Stiftung Zoologisches Forschungsmuseum Alexander Koenig Bonn, Germany), István Mikó (Penn State, USA), Stefan Richter (Rostock University, Germany), Martín Ramírez (Museo Argentino de Ciencias Naturales), Matt Yoder (Speciesfile, University of Illinois, USA), and, behind the camera, Andy Deans (Penn State, USA).

Rogues gallery of arthropod fanatics. Photo by Andy Deans (CC BY 2.0)

Wisely, we mostly steered clear of anatomical discussions—what’s this part here, and how do we define it?—which freed us up to talk about tools, progress, future proposals, and other news. That is, we had fewer tangents (and shouting) and more constructive conversations about collaboration. We captured most of the dialog in a Google doc (needs synthesis, for sure, and likely doesn’t capture ALL of our discussions, especially complex ideas articulated on the easel), but here are a few quick hits:

The MorphDBase project (Grobe & Vogt) recently received funding for further development, and there is now a lot of potential to integrate ontologies. We discussed ideas for annotations, workflows, and how our projects could interact more with this resource.

We talked about anatomical complexity more generally, especially in the context of essentialistic classes vs. those classes that are not so easy to define (cluster class). Our aim should be to develop user-friendly tools that make it easier to employ ontologies (i.e., that don’t require morphologists and taxonomists to overthink annotations or burden them with excessive evidence gathering).

The spider ontology (SPD) is being used in an ongoing effort to extract characters from the literature (Ramírez). The group discussed tools that could help facilitate this process (e.g., CharaParser) and continued development of the SPD (especially Web-based tools, like mx, that facilitate rapid, community development of ontologies).

The TaxonWorks project (Yoder) is looking for feedback regarding ontology tools. Should they integrate an ontology builder, à la mx? Perhaps one that interacts easily with Protégé (and the reasoners therein)? What about templates for certain kinds of taxonomic and phylogenetic characters? The user would plug in the anatomy and the phenotype, and TaxonWorks would write the semantics.

Of course there was also some groupthink about how to make progress towards our mandate: to build a common anatomy ontology for arthropods. More on that later, but the consensus is that we should develop system-based pieces of it separately, forging links between them later. This ontology cloud would be synthesized in a future manuscript.

It was an intense, 12-hour, pizza-fueled, beverage-driven marathon in an inspiring location. After what we universally felt was forward progress, though, we’re excited for the next round! Perhaps in Argentina, Martín …?

As a side note, it was a bit cool in Arizona in February, for most arthropods anyway, but I did see two very cool critters: a Scolopendra centipede, which was way to fast for me to photograph, and a Hadrurus scorpion, which I forgot entirely to photograph. So here’s a great image from Flickr that illustrates them both:

A bat fly (Diptera: Nycteribiidae) poses for the camera. It’s barely recognizable as a relative of the familiar Drosophila melanogaster (Diptera: Drosophilidae) and is radically different from the fish tongue-eating arthropod in the photo below. This photo by Gilles San Martin (CC BY-SA 2.0).

The Phenotype RCN Arthropod working group has focused mostly on the development and characterization of the Common Arthropod Anatomy Ontology (CAAO). Despite difficulties defining basic classes due to the immense differences in basic anatomical concepts—there are, after all, more than a million known species of arthropods, with almost as many different forms (see photos above and below)—we made progress in the development of certain portions of CAAO. During 2013 the group established the basis of classes and relationships referring anatomical structures of the arthropod the integument. Establishing this system is especially crucial for disciplines targeting world species diversity, such as arthropod taxonomy and phylogenetics, where more than 90% of the applied characters are related to the outer layer of the integument, the cuticle. Efforts have also been made on the development of the arthropod nervous system portion by the adoption of the relatively recently published relation system by Richter et al. (2010).
The development of CAAO is ongoing, but the focus of the working group has shifted a bit towards the establishment of outreach strategies for the better utilization of available ontologies by the domain experts and to establish tighter collaborations between research group members. We’re especially interested in seeding research that will become the basis new funding. The main subjects of this new directions are:

Cooperation between working group members who develop tools that enhance ontology development and usage for domain expert communities [e.g., mx (Yoder 2014) and Morph·D·Base (Vogt and Grobe 2010)].

Making more widely accessible and understandable the ideas developed by individual working group members on applications of ontologies to different areas of arthropod research [e.g. ontology based measure of structural complexity for phylogenies (Ramirez 2013); demarcating and differentiating basic categories of anatomical entities (Vogt 2010) and application of semantic models in species descriptions (Balhoff et al. 2013)].

To define the role and place of homology concepts in arthropod ontologies [e.g. Szucsich and Wirkner 2007, Franz 2013].

During 2013 some members of the Arthropod-working group were able to meet during two Arthropod specific meetings organized in Germany: Willi Hennig Society meeting in Rostock August 3rd-7th and the 6th Dresden Meeting on Insect Phylogeny in Dresden September 27-29. Although it was possible at these meetings to start cooperation on the above mentioned areas and outline future directions of the working group, it was also accepted that further meetings with more participants of the working group is needed in terms of establishing the planned collaborations. The annual RCN summit meeting on 21-23 February 2014 will be attended by most of the key personnels and hopefully help the working group to assure a workflow to reach the new directions.

We’re looking for a few good interns! Photo by Gilles San Martin. (CC BY-SA 2.0)

The Frost Entomological Museum at Penn State seeks undergraduate summer (2013) interns to assist with projects related to insect phenotype data, especially in the context of systematics and evolution. Interns will be exposed to a broad array of biodiversity informatics tools, including ontologies, and will learn aspects of specimen collection, handling, and curation. Applications are due March 20, 2013. More information is available at http://bit.ly/FrostInterns

Asilomar State Beach, Monterey, California, USA. The beach served as our chairs, and the rocks were our whiteboard.

We had a short but highly successful meeting a couple weeks ago, at the Asilomar State Beach conference center in beautiful Monterey, California. Our working group, the AAO, met in parallel with the vertebrate anatomy (with a twist of anthropology), plant anatomy, sponge morphology, and informatics working groups, which allowed for a few useful group exchanges and productive collaboration. (And the setting was difficult to beat! See above.) The AAO set several ambitious goals for this meeting, and we made substantial progress towards a first draft of a common Arthropod Anatomy Ontology. Arthropod group accomplishments from this meeting include:

We moved closer to a general common understanding of how we need to build a broadly useful common arthropod anatomy ontology. It can be difficult to reach consensus and understand through Skype and email. The opportunity for a face-to-face was invaluable for our group as we move forward in this effort, especially since our group includes experts from multiple disciplines (primarily genomics, comparative morphology, and systematics) and is prone to unintentionally talking past one another.

We moved closer to having a shared environment to view, search, and safely edit a common ontology with integrated OWL reasoning. We had been developing two ontologies in parallel, one in Protégé and one in mx, and one goal of this working group meeting was to reconcile those ontologies and move to a single mechanism for ontology development. We’ll be moving forward with an extended version of mx (developed over the next several months, hopefully) that includes a continuous integration and testing environment and which will automate error checking and feed back inferred classifications to mx so that they are visible for viewing. Ideally this environment will be extended to allow queries.

We made some headway towards recruiting more experts and raising awareness of our effort. We have a short list of other experts to target for face-to-face meetings and ideas for how to bring this project to potential consumers through meeting talks and/or symposia. Stay tuned for more details!

First draft definitions for general classes of arthropod external anatomy and a first attempt at OWL formalization for them (our biggest accomplishment). We spent a lot of time on cuticular classes (e.g., sclerite, conjunctiva) and articulations, including the various parts of articulations—structures we’d call condyles, fossae, etc. Some classes were relatively uncontroversial, for example articular surface: A sclerite surface that makes movable direct contact with another articular surface, with the properties is_a sclerite surface and contacts articular surface. Other classes were much more challenging, and our discussion often veered into the term/class danger zone. ‘Joint’ was especially troublesome. See for example the ‘wing joint’ of pterygote insects, which is a highly complex anatomical entity (image below). Another controversial (reactive? radioactive?) class was ‘area’, from the Hymenoptera Anatomy Ontology (HAO): An anatomical structure that is delimited by material or immaterial anatomical entities. Is this class useful for Arthropoda (or even Hymenoptera)?

Proximal area of a sawfly forewing, where the wing meets the thorax. This is a highly complex region of the body, chock-o-block with sclerites, conjunctiva, muscles, etc. Is there a way to define ‘joint’ so that it satisfies this complex? Should it be called ‘joint’?

As mentioned above we currently have two draft arthropod anatomy ontologies—one built largely by David Osumi-Sutherland (FlyBase, Cambridge University) in OWL and the other built largely by István Mikó (HAO, Penn State) in mx as a generalized version of the HAO—which overlap substantially, especially in general classes for external anatomy. David has been reconciling these two versions in one OBO-format file, with the aim that this will be used to seed the common version in mx (see small test ontologies on GitHub if you’re interested).

We’re considering a couple slight changes in our approach going forward: 1) moving to multiple meetings a month, rather than just one, and 2) establishing semi-independent subgroups that share anatomical interests, which could lead to more substantial growth/refinement of relevant areas (e.g., the nervous system or circulatory system). Email me, Andy Deans (adeans@psu.edu), if you are interested in the latest developments or to find out how to contribute. Thanks to all who were part of this working group: Frank Friedrich, István Mikó, David Osumi-Sutherland, Aaron Smith, Shaun Winterton, and Matt Yoder!