{- |
This module implements a \"flattened\" data structure for Blast hits,
as opposed to the hierarchical structure in "Bio.Alignment.BlastData".
The flat data type is useful in many cases where it is more natural
to see the result as a set of rows (e.g. for insertaion in a database).
It would probably be more (memory-) efficient to go the other way
(i.e. from flat to hierarchical), as passing the current, partially
built "BlastFlat" object down the stream of results and stamping
out a stream of completed ones. (See "Bio.Alignment.BlastXML.breaks"
for this week's most cumbersome use of parallelism to avoid the
memory issue.)
-}moduleBio.Alignment.BlastFlat(-- * The BlastFlat data typeBlastFlat(..)-- * Read XML format,readXML-- * Convert from hierarchical to flat structure,flatten-- * Re-exports from the hierarchical module ("Bio.Alignment.BlastData"),B.BlastRecord,B.blastprogram,B.blastversion,B.blastdate,B.blastreferences,B.database,B.dbsequences,B.dbchars,B.results,B.Aux(..),B.Strand(..))whereimportqualifiedBio.Alignment.BlastDataasBimportqualifiedBio.Alignment.BlastXMLasXimportData.ByteString.Lazy.Char8(empty)-- | The BlastFlat data structure contains information about a single matchdataBlastFlat=BlastFlat{query::!B.SeqId,qlength::!Int-- BlastRecord,subject::!B.SeqId,slength::!Int-- BlastHit,bits::!Double,e_val::!Double-- BlastMatch,identity::(Int,Int),q_from,q_to,h_from,h_to::!Int,aux::!B.Aux}readXML::FilePath->IO[BlastFlat]readXMLf=return.concatMap(flatten.B.results)=<<X.readXMLf-- | Convert BlastRecords into BlastFlats (representing a depth-first traversal of the -- BlastRecord structure.)flatten::[B.BlastRecord]->[BlastFlat]flatten=concatMapfrecordwherefrecordr=concatMap(fhit(bf0{query=B.queryr,qlength=B.qlengthr}))$B.hitsrfhitfh=map(fmatchf{subject=B.subjecth,slength=B.slengthh})$B.matcheshfmatchfm=f{bits=B.bitsm,e_val=B.e_valm,identity=B.identitym,q_from=B.q_fromm,q_to=B.q_tom,h_from=B.h_fromm,h_to=B.h_tom,aux=B.auxm}bf0=BlastFlate0e000(0,0)0000(B.FrameB.Plus0)e=empty