Bottom Line:
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

ABSTRACTGenome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .

Fig3: Assemblathon 2 metassembly contiguity and accuracy metrics. Assembly contiguity and accuracy metrics are shown at each merging step of all metassemblies for the three species. The x-axis represents the number of assemblies merged with one being the initial input assembly. Ctg contig, Scf scaffold

Mentions:
In all three species, the contiguity statistics are significantly improved by our metassembler algorithm (Fig. 3). Contig NG50 sizes increased by at least 3.9 kb and 4.4 kb for the fish and snake metassemblies, with a maximum increment of 3.98 kb and 13.8 kb, respectively. The largest increment in contig NG50 size was observed in the bird species, with an improvement ranging between 43.9 kb and 69.1 kb. Moreover, scaffold NG50 sizes improved by between 0.96 Mb and 1.4 Mb for the snake genome, and between 105 kb and 122 kb for the fish genome. For the bird species a decrement of −3.9 kb is observed for the A2Z permutation, while a maximum increment of 1.9 Mb is observed for the Scf N50 order permutation. Furthermore, the assembly quality metrics either remain unchanged or show a tendency to improve. The REAPR corrected NG50 sizes increase throughout the metassembly process, as well as the percentage of error-free bases. The number of CEGMA genes found either increases (in the bird and snake assemblies) or decreases slightly because of poor secondary assemblies (in the fish Assemblathon 2 cumulative Z score Abbrev genome). The complete results are available in Additional files 1, 5, 6 and 7.Fig. 3

Fig3: Assemblathon 2 metassembly contiguity and accuracy metrics. Assembly contiguity and accuracy metrics are shown at each merging step of all metassemblies for the three species. The x-axis represents the number of assemblies merged with one being the initial input assembly. Ctg contig, Scf scaffold

Mentions:
In all three species, the contiguity statistics are significantly improved by our metassembler algorithm (Fig. 3). Contig NG50 sizes increased by at least 3.9 kb and 4.4 kb for the fish and snake metassemblies, with a maximum increment of 3.98 kb and 13.8 kb, respectively. The largest increment in contig NG50 size was observed in the bird species, with an improvement ranging between 43.9 kb and 69.1 kb. Moreover, scaffold NG50 sizes improved by between 0.96 Mb and 1.4 Mb for the snake genome, and between 105 kb and 122 kb for the fish genome. For the bird species a decrement of −3.9 kb is observed for the A2Z permutation, while a maximum increment of 1.9 Mb is observed for the Scf N50 order permutation. Furthermore, the assembly quality metrics either remain unchanged or show a tendency to improve. The REAPR corrected NG50 sizes increase throughout the metassembly process, as well as the percentage of error-free bases. The number of CEGMA genes found either increases (in the bird and snake assemblies) or decreases slightly because of poor secondary assemblies (in the fish Assemblathon 2 cumulative Z score Abbrev genome). The complete results are available in Additional files 1, 5, 6 and 7.Fig. 3

Bottom Line:
Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses.We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly.We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition.

ABSTRACTGenome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for meta-assembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net .