The Phenotype Research Coordination Network was funded by NSF to establish a network of scientists who are interested in comparing phenotypes across species and in developing the methods needed to make this possible

Call for Participation:

Computable evolutionary phenotype knowledge: a hands-on workshop

The Phenoscape project is hosting a hands-on workshop on Dec 11-14, 2017, at Duke University in Durham, North Carolina.

Evolutionary phenotype data that is amenable to computational data science, including computation-driven discovery, remains relatively new to science. Therefore use-cases and applications that effectively exploit these new capabilities are only beginning to emerge. If you are interested in discovering, linking to, recombining, or computing with machine-interpretable evolutionary phenotypes, this is the workshop for you!

The event will bring together a diverse group of people to collaboratively design and work hands-on on targets of their interest that take advantage and promote reuse of Phenoscape’s online evolutionary data resources and services. The event is designed as a hands-on unconference-style workshop. Participants will break into subgroups to collaboratively tackle self-selected
work targets.

[This post was written by Anne Thessen and originally appeared at datadetektiv.com; we thank Dr. Thessen for sharing it with our community as well!]

One of the more difficult aspects of trying to apply “big data” thinking in ecology is the massive heterogeneity of terms. I stumble over this issue every time I work on a data set for the Encyclopedia of Life. The many different ways to describe the same habitat (among other things) and the varying granularity with which people describe habitats make it very difficult for data consumers to find, for example, all the beetles that live in the desert. It’s doubly more difficult to go a step further and ask for traits of beetles that live in deserts, like color, for example.

As a side note, that example is very similar to some use cases I published with several colleagues about ways to combine phenotype and environment data.

Right now, we can ask Google “How much does a narwhal weigh?” and get the answer because of the fine work my EOL colleagues and I have been doing on TraitBank (go ahead, try it), but we’ve still got a way to go before we can ask “What color are beetles that live in the desert?”. We have a plan, though, and it involves semantic technology, i.e. ontologies.

Biology already has many ontologies available for use of varying quality. Most of them can be found at OBO Foundry. Not all domains of biology have good ontologies available, for example, ecology has been left out. That means there is no standard, machine-readable way of expressing which organisms are autotrophs, or nocturnal, or use camouflage, etc. Including terms such as these in an ontology is one of the many necessary steps before we can ask “Which organisms are nocturnal in an alpine forest habitat?” or, if we want to get more complicated, “Is there a relationship between the phylogeny of terrestrial, nocturnal organisms and latitude or elevation?”.

Building an ontology is a large, never-ending, hugely complicated task. One of my clients at University Colorado, Boulder, is the ClearEarth project. The goal of this project is to repurpose NLP and ML algorithms developed for biomedicine for use in geology and biology. These algorithms can read text and automatically generate ontologies. We’ve made a lot of progress annotating domain-specific text and will have some “auto-ontologies” by this summer. Very exciting! To support this effort and make sure the ontologies resulting from this project are meshed in with existing bio-ontologies, we are hosting an “ontology-a-thon” in Boulder this summer. Please take a look and apply, if you are interested in participating. We don’t have a detailed agenda just yet, but the idea is to get ontology and ecology experts in one room to curate the auto-ontology. All expenses paid, but space is limited.

The Phenotype RCN is wrapping up after five years of innovation and community-building. So many great ideas have come out of this community that we’ve been asked to produce a book called Application of Semantic Technologies in Biodiversity Science that showcases the state-of-the-art in semantics for biodiversity, phylogeny, phenotypes, environments, and genomes. Would you like to participate? Please send your chapter idea described as a single paragraph and a list of potential co-authors to Anne Thessen via email annethessen@gmail.com. Anne will be editing the book to be published by IOS Press in Berlin as a part of a Semantic Web series edited by Pascal Hitzler. If you were at the 2016 Phenotype RCN meeting at Biosphere 2, you met him there. We need to get busy on the book, so please submit your chapter ideas within two weeks (by Oct 5).

This book will be an excellent product of the RCN and a great way to synthesize all the great ideas everyone has had over the years.

I attended the Pacific Symposium on Biocomputing (PSB) in Jan 2016. I presented a talk titled “Investigating the importance of anatomical homology for cross-species phenotype comparisons using semantic similarity.” This work explores the utility of including anatomical homology when computing semantic similarity of phenotype profiles. The majority of talks at PSB were focused on disease analytics and use of clinical phenotypes. There was a good balance of computer scientists and biologists at the meeting. An interesting session at the meeting was the social media session that was focused on large scale data analytics from sources such as Twitter and Instagram to track the spread of epidemics.

I also attended ICBO 2016 to present my work on the impact of annotation granularity on semantic similarity of phenotypes. I also served on the program committee for ICBO 2016 and was one of the poster judges at the conference. The title of my talk was “Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles”. Considerable human effort and time is invested to curate phenotypes in great detail from biological and medical literature using standardized ontologies. However, it is unclear if this level of detail is important for effectively measuring semantic similarity between phenotype profiles. In my work, I tested the statistical sensitivity of widely used semantic similarity metrics at varying levels of annotation granularity to test if higher annotation granularity improves the sensitivity of similarity metrics.

Attending ICBO gave me the opportunity to present my work to a diverse group of scientists focused on varying ontological applications. I found that the conference featured scientists and researchers from a wide range of areas within biology, medicine, ecology, computer science and text-mining. Of particular interest to me were the BioCreative sessions which focused on a variety of natural language processing and text mining applications to extract knowledge from scientific literature.

Lastly, I would like to acknowledge travel support from the Phenotype RCN for conference travel.

Talks of special interest at Phenoday included Melissa Haendel on adding natural language synonyms for medical terms in the HPO, Wendy Chapman on the definition of “cough” (knowledge representation to support phenotyping from text), and Chris Mungall on a Bayesian approach to ontology structure inference with applications to the Disease Ontology (being in Orlando he used Mickey Mouse to illustrate his points on phenotyping, e.g., HP_0100024 is a conspicuously happy disposition associated with a chromosome 15q24 deletion, and MP_0001284 is absent vibrissae (aka no whiskers)).

Although Phenoday focused mostly on human health related phenotypes, related sessions during Bio-Ontologies SIG covered applications to other species. Seth Carbon described the Noctua annotation tool, which has a web-based configuration for associating genotypes to phenotypes, essentially a web-based reincarnation of Phenote. Chris Mungall also spoke in this session, this time on PhenoPackets and proposed data exchange standards for phenotype data.

David Osumi-Sutherland (along with Owen Randlett and Paul Sternberg) organized a workshop at the The Allied Genetics2016 Conference on Informatics Resources to Aid the Genetic Dissection of Neural Circuitry. While the name of the workshop doesn’t mention phenotypes it certainly was an integral part of what is needed for this work. The workshop was a showcase of carefully detailed work in worm, zebrafish, and fly brains and circuits.

Contact Suzi is you would like more information about these conferences.

A contingent of Phenotype RCN participants recently attended the 7th International Conference on Biological Ontology (ICBO) and BioCreative 2016 held over a stretch of pleasantly sunny days on the campus of Oregon State University in Corvallis, Oregon (August 1-4, 2016). The theme of the meeting was Food, Nutrition, Health, and Environment for the 9 billion and the meeting brought together folks interested in applying ontologies to innovative research in diverse domains including environment, biodiversity, biomedical sciences, plant biology, and agriculture.

The conference started off with a day of workshops covering text-mining, visualization, medicine, and tutorials on tools, techniques and standards. (Links to the program and abstracts are available here: http://icbo.cgrb.oregonstate.edu/program). Talks and posters during subsequent sessions included a diverse mix of topics such as sustainability, obstetrics and neonatal health, trauma centers, social science, infection disease, and biodiversity. Although wide-ranging in scope, a thread of common challenges emerged in working with ontology-based data, including the need for data harmonization/standardization, promoting shared resources, representation challenges for temporal or spatial reasoning, and improving descriptors/terminology.

The meeting ended with a panel discussion in which the question “Have ontologies reached their peak?” was discussed. This question was prompted by a noticeable decline since 2014 in Pubmed papers matching the word “ontology” (and a marked increase in those matching “data mining”). Consensus of the panel was that while the publication of new ontologies in Pubmed may have slowed, their use in biology was far from peaking. Rather, the community may have a more refined understanding of what an ontology is, which means fewer papers are being published that claim to be about ontologies.

Of particular interest to the phenotype community, here are the presentations given by recent RCN phenotypers:

James Balhoff, Wasila Dahdul, Prashanti Manda, and the Phenoscape team: The Phenoscape Knowledgebase: tools and APIs for computing across phenotypes from evolutionary diversity and model organisms

The Phenoscape project is recruiting a postdoc with training in bioinformatics and/or developmental biology who is interested in analyzing genomic and developmental data in relation to phenotypic data, with a focus on the vertebrate fin/limb.

The problem of how organismal phenotypes have evolved, are constrained, and acquire novelty, is one of the grand challenges in biology. The Phenoscape group has developed ontology-based methods for representing species phenotypes so that they can be integrated with model organism developmental and genetic data. The Phenoscape Knowledgebase (KB) contains over 500,000 vertebrate species phenotypes that are linked to ~16,000 genes associated with 320,000+ phenotypes and 37,000 genes with in situ expression data from model organisms (zebrafish, mouse, Xenopus, human). These data present a tremendous opportunity for integration with other data types to address questions about the evolution of phenotype.

We are seeking an individual with expertise in developmental biology and/or genomics, to (1) help evaluate results of bioinformatics methods being developed by Phenoscape and (2) leverage the Phenoscape Knowledgebase to study whole-organism phenotype and functional genomics in non-model organisms. The purpose of the methods is to improve prediction of the genetic basis of evolutionarily novel phenotypes by incorporating semantic similarity, homology, and phylogenetic propagation. Vertebrate fin and limb phenotypes and genes are enriched in the KB, and we are thus seeking candidates who ideally have knowledge of genes and networks involved fin/limb development. Further, this position presents a unique opportunity to leverage the linked developmental and genetic data in the Phenoscape KB for large-scale analysis of patterns of phenotypic evolution.

The postdoc will work under the direction of Paula Mabee (University of South Dakota) in association with Todd Vision (University of North Carolina), as part of a distributed, multidisciplinary team that includes evolutionary and model organism biologists, computer scientists, and bioinformaticists. Ideally the applicant will be based in South Dakota (with opportunities to travel to other sites), but we will consider qualified applicants who are available remotely and/or half-time. The position is available immediately for an initial appointment of one year, with potential to renew.

Please send an email to Andy Deans (adeans@gmail.com) or Eva Huala (evahuala@gmail.com) if you are interested, indicating the meeting proposed, whether you are presenting, your current position (student, faculty, etc.), the amount of funds requested, and a 200-word statement regarding the value of the opportunity to you and the relationship to phenotype ontologies.

What are the challenges in building, visualizing and using the Tree of Life? How can we best utilize and build on existing phylogenetic knowledge and look ahead to address the challenges of data integration? Recently, fellow Phenoscaper Jim Balhoff and I attended the first FuturePhy workshop in Gainesville, Florida (February 20-22, 2016). The workshop brought together three taxonomically-defined working groups (catfish, beetles, barnacles) to build megatrees from existing phylogenetic studies, and identify and begin applying diverse data layers for their respective groups. Open Tree and Arbor personnel were on hand discuss and help solve issues in data integration.

The catfish team (John Lundberg, Mariangeles Arce, Jim Balhoff, Brian Sidlauskas, Ricardo Betancur, Laura Jackson, Kole Kubicek, Kyle Luckenbill, and myself, Wasila Dahdul) included participants with expertise in catfish anatomy, phylogenetics (molecular and morphological), development, bioinformatics, and digital imaging. We were motivated to build on the work of the All Catfish Species Inventory to achieve a more complete understanding of catfish diversification by integrating published phylogenies, 2D and 3D images in various online repositories, and thousands of computable phenotypes for catfishes in Phenoscape.

We held several hands-on sessions on tree grafting (using Mesquite, R, and Arbor), data annotation (using Phenex), and tree submission to Open Tree.We also examined an automatically generated supermatrix for 18 published catfish matrices in the Phenoscape KB (generated using the OntoTrace tool), andprototype data visualizations for supermatrices developed by Curt Lisle in Arbor. We used Mesquite to manually create a draft megatree, and in parallel,uploaded trees to Open Tree, which automatically synthesized a megatree for catfishes. Our plan is to compare the output of manual tree-building in Mesquite with the automated tree from Open Tree.

Among the issues and priorities that emerged during the workshop was the need for inclusion of the authoritative Catalog of Fishes taxonomy in Open Tree, and allowing the addition of unnamed or uncertainly identified taxa commonly used in matrices. We also discussed challenges in automated character consolidation across multiple studies, and the reuse of images across multiple online archives.

We left with a plan to continue tree building and data layer integration post-workshop, with the aim of publishing the catfish megatree (including the methods and remaining challenges) and the integration of data layers via interactions between Arbor, Open Tree, and Phenoscape.

Participants at the fifth and final summit meeting of the Phenotype RCN. Photo by Andy Deans (CC BY 2.0).

The Phenotype Research Coordination Network hosted its fifth and final summit meeting at the end of February at Biosphere 2, with 66(!) people in attendance. The focus was on data integration, and we were fortunate to have the FuturePhy project join us. Our program was packed, with a mix of panels, talks (we have links to slideshows), and breakout sessions that focused on proposal ideas. One frequent topic for discussion was the need to keep this network going, as there remains a clear need for outreach and mechanisms that foster collaborations on phenotype data. Several working groups also focused on large, international collaborations that would make phenotype tools, like ontologies, and phenotype data more accessible and sustainable—imagine something like GenBank but for phenotypes.

Another successful and compelling component of this meeting was the inclusion of many early career researchers and graduate students, who formed a cohesive network themselves. Their discussions and reports to the larger group identified broad needs and informed our collective ideas for future outreach directions.

The Phenotype RCN has been productive, impactful, and and incredibly rewarding. We thank all who have been involved, especially meeting participants and our advisory board. While this phase—i.e., our original NSF-funded schedule—may be winding down, the network is robust and active. Stay tuned for further developments!