Figure 2.

De novo assembly of the planarian head regeneration transcriptome. (a) Schematic overview of the assembly strategies, using only 2 × 36-bp paired-end Illumina
reads (blue), only 454 reads (red), or an assisted assembly of Illumina reads using
transcripts previously assembled from 454 data as scaffolds (purple). Quality metrics
shown include longest sequences in each assembly and the length N50, for which 50%
of all bases are contained in transcripts at least as long as N50. (b) Kernel densities of the length distributions for sequences assembled only from Illumina
data (blue), 454 data (red), Illumina data and 454 isotig scaffolds (purple), or for
computationally predicted transcripts by MAKER (green). For multi-isoform loci, only
the longest isoform was considered. (c) Kernel densities of ortholog hit ratios obtained by comparing sequences from the different
assemblies or computational prediction to the Schistosoma mansoni proteome using blastx. For multi-isoform loci, only the longest isoform was considered.
Colors as in (b). (d) Coverage of the 125 complete cDNA sequences from S. mediterranea available from GenBank by the best reciprocal blat hit from each dataset. For multi-isoform
loci, only the longest isoform was considered. The boxplot indicates the 75th, 50th
(median) and 25th percentile of cDNA coverage. In addition, individual points show
the full coverage distribution for all reciprocal best hits (454, n = 77; Illumina, n = 86; Illumina+, n = 75; MAKER, n = 60). (e) Fraction of sequences from the different assemblies that could be aligned over 90%
or 60% of their total length to a single genomic supercontig using blat. Colors as
in (b).