The NESCent Informatics program has two broad goals. The first is to provide support to the science sponsored by the Center. The second goal is to help build cyberinfrastructure that will enable evolutionary biologists to fully exploit the information-rich discipline that biology has become. This latter goal requires leveraging the energies and talents of the open source programming community to build extensible and interoperable software components for evolutionary analyses, and training the evolutionary biology community to fully realize the potential of these tools. This page describes several of NESCent's efforts to help build cyberinfrastructure.

Phyloinformatics Summer of Code

NESCent wasn't successful in their application to the Google Summer of Code for 2014. We do, however, have a number of ideas on our Phyloinformatics Summer of Code 2014 page that might be useful to prospective students and mentors, including links to some of the biology, math and science related organisations participating in this year's Summer of Code.

Cyberinfrastructure Summer Internships

We ran the Cyberinfrastructure Summer Traineeship program for students and postdocs interested in informatics as applied to biodiversity, earth and environmental data for the first time in 2009. Four trainees gained collaborative open-source software development experience by helping to build a Virtual Data Center (VDC).

Hackathons

What is a hackathon? A hackathon is a hands-on software development meeting that allows programmers from different teams to intensively work on a common set of objectives and interact face-to-face. Of course, there's a Wikipedia article, too.

Population Genetics in R Hackathon

In March, 2015 we held a hackathon at NESCent with the objective to help foster an interoperating ecosystem of scalable tools and resources for population genetics data analysis in the popular R platform. The event targeted interoperability, scalability, and workflow building challenges among the many population genetics R packages that already exist. It allowed a diverse group of population genetics researchers, method developers, and people with other relevant areas of expertise to collaborate on code, documentation, use-cases, and other resources that will aid their communities.

Tree-for-all Hackathon

The NSF-supported Open Tree of Life project has (1) gathered, encoded and annotated >4000 published phylogenies (“source trees”), (2) combined several taxonomic hierarchies into a reference taxonomy, and (3) used this information to generate a synthetic tree covering >2.5 million species. OpenTree provides access to all of this information via raw downloads, and also via queryable online interfaces that can be invoked by external software. However, tools that actually use these interfaces to deliver phylogenetic knowledge into the hands of scientists have not been developed yet.

To facilitate the development of tools that use its resources, Open Tree of Life, Arbor and the NESCent HIP working group jointly held a hackathon for testing, expanding and building upon the Open Tree of Life APIs. The Tree-for-all event was held September 15 to 19, 2014 at the University of Michigan. Details of the event and outcomes are on the hackathon GitHub repository.

Phylotastic: infrastructure for re-using megatrees

A new NESCent working group called Hackathons, Interoperability, Phylogenies (HIP), has so far staged two hackathons under the Phylotastic brand, one held June 4 to 8, 2012 at NESCent, and another one Jan 28 - Feb 1, 2013, at iPlant in Tucson, AZ. Participants built a web-services implementation of the pruning, grafting, name-reconciliation and other functionalities necessary for researchers to take advantage of emerging megatrees. See the Phylotastic page at the EvoIO wiki for more information.

GMOD Tools for Evolutionary Biology

This NESCent sponsored hackathon will fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. This hackathon will focus on tools for

viewing comparative genomics data;

visualizing phylogenomic data; and

supporting population diversity data and phenotype annotation.

The event will take place November 8-12, 2010, at NESCent and bring together a group of about 30 software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements.

This hackathon will provide a unique opportunity to infuse the community of GMOD developers with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components.

Integrating diverse biological data with the historical process of evolution is a grand challenge for 21st century biology. The interoperability of data from diverse fields (e.g., genetics, ecology, biodiversity, biomedicine) requires a technology infrastructure based on formalized, shared vocabularies. Developing such vocabularies is a community project. The VoCanp format chosen to promote this notion is similar to a hackathons, but instead of writing software focuses on vocabulary and ontology development.

Evolutionary Database Interoperability Hackathon

Despite the rich and meticulously curated variety of on-line databases of phylogenetic data, their holdings are only available in incompatible formats lacking explicit semantics, and programmable APIs for querying the data are often not provided, resulting in significant obstacles to interoperability and data integration. The hackathon brought together data and metadata experts and developers from a number of data providers with the developers of emerging standards for

an ontology resulting in formal and machine-interpretable semantics of evolutionary data and metadata (CDAO), and

a programmable web-service based interface for phylogenetic data providers (PhyloWS).

These standards, and many of the ideas for this hackathon arose from, and are a continuation of, the activities of NESCent's Evolutionary Informatics Working Group. The event was therefore also the last meeting of the working group.

NESCent Hackathon on Comparative Methods in R

The NESCent-sponsored Hackathon on Comparative Methods in R took place Dec 10-14, 2007, at NESCent on Durham, North Carolina. The R statistics package has emerged as a popular platform for implementation of comparative phylogenetic methods. The objective of the hackathon was to encourage the open development of software that is interoperable with other packages, supports data and exchange standards, and can be transparently extended to accommodate new data types or formats. The event brought together nearly 30 participants consisting of developers of comparative methods in R as well as users of comparative methods.

Outcomes of the event include better software tools for studying diversification rates, estimating divergence times, and modeling the evolution of continuous phylogenetic characters, as well as improved online documentation, shared libraries for basic phylogenetic data manipulation, and interoperability between R and the Mesquite package.

NESCent Phyloinformatics Hackathon: Lowering the Barrier

The Phyloinformatics Hackathon took place 11th to 15th December, 2006, at NESCent in Durham, North Carolina. On day 1, we heard user stories and chose six of the use cases as top-priority targets for development. At the end of the day we assembled into toolkit-specific groups to devise toolkit-specific plans focusing on one or more targets. For the rest of the week, we worked. Descriptions of the outputs of the hackathon are still in progress.

The objective of this first NESCent phyloinformatics hackathon was (and is on a continuing basis) on leveraging the Bio* open source software tools to provide the "glue" and lower the barriers for using phylogenetic tools within automated workflows. Details are outlined further in the formal proposal.

Call For Input

Informatics Initiatives

To ensure that the Center's Informatics program continues to be responsive to user needs, and to tap into the expertise and creativity of our community, we solicit short (2-6 pages) whitepapers on initiatives to be undertaken by the Center, including training, software development, hackathons, and coordination of data standards and ontology development.

Use-cases

Community input into use cases was a key element in guiding and focusing the work at the Phyloinformatics Hackathon on the most urgent or pervasive problems. We continue to invite input in various ways (see below) to help steer and validate future efforts and results of this and other activities. Input may take several forms:

actual data files (e.g., alignments, trees, other data) for use in testing

citations to published papers that pose challenging problems or that provide useful methods

your "wish list" for a phyloinformatics computing platform

The favored mechanism for providing input is to post your comments directly to this wiki. In order to do so, you may simply register at the wiki site and start editing, for example the the use case document (or feel free to create a new page). If you have a data file to use for testing, you may upload it to the wiki (and add a link describing it). Alternatively, you may send comments (and files) to Hilmar Lapp (please indicate if you don't wish to share your comments or data files on the wiki). You may also contact any of the organizers with questions or comments.