{- |
Module: Bio.Sequence.FastQ
Support the FastQ format that combines sequence and quality. See:
* <http://www.bioperl.org/wiki/FASTQ_sequence_format>
Of course, this is yet another vaguely defined pseudo-standard with
conflicting definitions. Of course Solexa had to go and invent a different,
but indistinguishably so, way to do it:
* <http://www.bcgsc.ca/pipermail/ssrformat/2007-March/000137.html>
* <http://maq.sourceforge.net/fastq.shtml>
Currently, we only support the non-Solexa FastQ, adding\/subtracting 33 for
the quality values.
As far as I know, FastQ is only used for nucleotide sequences, never amino acid.
-}moduleBio.Sequence.FastQ(-- * Reading FastQreadFastQ,hReadFastQ,parse-- * Writing FastQ,writeFastQ,hWriteFastQ,unparse)whereimportSystem.IOimportqualifiedData.ByteString.Lazy.Char8asBimportqualifiedData.ByteString.LazyasBBimportData.List(unfoldr)importBio.Sequence.SeqDatareadFastQ::FilePath->IO[SequenceNuc]readFastQf=(go.B.lines)`fmap`B.readFilefhReadFastQ::Handle->IO[SequenceNuc]hReadFastQh=(go.B.lines)`fmap`B.hGetContentshgo::[B.ByteString]->[SequenceNuc]go=map(eithererrorid).unfoldrparse-- | Parse one FastQ entry, suitable for using in 'unfoldr' over-- 'B.lines' from a fileparse::[B.ByteString]->Maybe(EitherString(SequenceNuc),[B.ByteString])parse(h1:sd:h2:sq:rest)=case(B.unconsh1,B.unconsh2)of(Just('@',h1name),Just('+',h2name))|h1name==h2name->Just(Right$Seqh1namesd(Just(BB.map(subtract33)sq)),rest)|otherwise->Just(Left$"Bio.Sequence.FastQ: name mismatch:"++showStanza,rest)_->Just(Left$"Bio.Sequence.FastQ: illegal FastQ format:"++showStanza,rest)whereshowStanza=unlines$mapB.unpack[h1,sd,h2,sq]parse[]=Nothingparsefs=letshowStanza=unlines(mapB.unpackfs)err=Left$"Bio.Sequence.FastQ: illegal number of lines in FastQ format: "++showStanzainJust(err,[])writeFastQ::FilePath->[Sequencea]->IO()writeFastQf=B.writeFilef.B.concat.mapunparsehWriteFastQ::Handle->[Sequencea]->IO()hWriteFastQh=B.hPuth.B.concat.mapunparse-- helper function for writingunparse::Sequencea->B.ByteStringunparse(Seqhsd(Justsq))=B.unlines[B.cons'@'h,sd,B.cons'+'h,BB.map(+33)sq]unparse(Seqh_Nothing)=error("Bio.Sequence.FastQ: sequence "++show(B.unpackh)++" doesn not have quality data!")