Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).
This is my concern as well. Or, rather, is there actually a significant
set of users out there who are dealing with next-gen sequencing and
would consider using BioPerl for their work?
I'm working with all the 1000-genomes data at the Sanger, and we at
least are probably never going to use BioPerl for the work.
> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
The fastq parser itself already seems pretty fast. The way to get the
speedup is to not create any Bio::Seq* objects but just return the data
directly. At that point it's not taking much advantage of BioPerl. But
certainly it could be done...