The researchers working in NSF's AToL (Assembling the Tree of Life)
program http://atol.sdsc.edu aim to reconstruct the
evolutionary origins of all living things. A lot of data is being
generated and consumed within each of the program's 30+ projects.

pPOD is an NSF-funded collaborative project
(IIS 0629846 + IIS 0630033 + IIS 0629702) dedicated to the
development of tools for the integration of AToL data across projects
and for the interoperability of AToL data within analysis pipelines.

AToL Projects' Data

The AToL projects include studies of bacteria, microbial eukaryotes,
vertebrates, flowering plants and many more. The data being generated
by these projects include:

Genotypic descriptions and their provenance;

Phenotypic descriptions and their provenance;

Specimens and their provenance including collection information, voucher deposition, etc.;

Interpretation of the primary measurements including homology;

Estimates of phylogenies, and information about the methods employed;

Supertree construction, and information about the methods employed; and

Post-tree analyses such as character evolution hypotheses.

While the data collection, storage, and dissemination within each AToL
project are well coordinated, there is a critical need to develop the
infrastructure to integrate all AToL data sources together with
other valuable resources such as publication archival databases,
morphological character databases, phylogenomics databases, etc. Such
integration will allow a project to share some of its data with the
community (export), as well as to benefit from retrieving useful
information from the rest of the community (import).

Core Technologies

We plan to develop and provide a reference implementation
for a core set of technologies that will enable interoperability,
i.e., both data and tool integration, following a three-pronged approach:

Develop an extensible core data model for phylogenetic data. The model will include a query language as well as extensible data structures and will benefit from research on efficiently querying phylogenetic data.

Develop schema mappings for peer-to-peer data integration and exchange, where a project can join existing integration groups by providing mappings between the schema of their data and the core data model or one of its extensions.

Develop a scientific workflow system (lab notebook) that will allow research groups to put together the data integration components with the local database access components and with the analysis tools. This system will provide strong support for systematics-oriented provenance management in anticipation of the increase in utility of provenance in future tools.

This is where the core data model part of the pPOD project is being hosted. You'll find source code, javadocs, test coverage reports, bug tracking, and all new publicly available documentation.

Share your experience

The ultimate justification of the project is to produce easy-to-use
tools. We plan to leverage combined experience in distributed database
integration, workflow systems, as well as the practical experience of
the AToL informatics and related communities. The project is
collecting suggestions, experience and, eventually, usecases from the
community. If you are moved to help, please post on the wiki at:
Contribute