Genome 10K 'Alignathon' harnesses community effort to show relationships between species

Anticipating the need to make sense of the glut of whole-genome data arising from the Genome 10K project, the project's data analysis wing has announced a friendly competition to find more powerful ways to align whole genomes with each other to discover the similarities and the changes brought about by evolution.

The 'Alignathon' will engage the genomics community in exploring and improving the current tools to answer the question, "How are these species, these genomes, related?"

"The availability of whole-genome sequences for an increasing number of animal species makes it imperative to find the best alignment tools for use in the Genome 10K project and other large-scale projects anticipated in the near future," said biomolecular engineer David Haussler from the University of California, Santa Cruz, a Howard Hughes Medical Institute investigator.

Genome 10K is an international project to sequence the genomes of 10,000 vertebrate species.

The Alignathon is patterned after a similar Genome 10K endeavor, the Assemblathon, a collaborative competition designed to compare the strengths of different programs for arranging DNA fragments generated by next-generation sequencing technology into a complete genome sequence. The Assemblathon has harnessed the talents of the broader community to improve methods and devise new ways to quantify and qualify them.

In the Alignathon, groups of bioinformaticians will apply their own favorite software algorithms to the problem of aligning whole-genomes to one another. The goal is to find the areas of common evolutionary history.

Biomolecular engineering graduate student Dent Earl from UC Santa Cruz has designed a test suite of data for the Alignathon.

"With the Alignathon, we hope to reproduce the way the Assemblathon has pulled together the assembly community," Earl said.

The test data consists of three parts: two simulated sets (a four-species primate-like phylogeny and a five-species mammal-like phylogeny) and one set comprising the 12 fly genomes. All the genomes in the sets are approximately 120-200 Mb with between four and seven chromosomes.

Earl said, "We hope that using genomes that are one-tenth the size of a typical mammal genome will allow participation by groups developing new methods that may not yet scale to full genome sizes."

To unite these sets, the Haussler team has created a testing framework that is both simple and extensible so that each participating group can run analyses independently and repeatedly and can submit their own evaluations to the project.

"Just like in the Assemblathon, participating groups will be included in the authorship of the paper that comes out of the project," Haussler said.