{-# LANGUAGE OverloadedStrings #-}-- An enumeratee for conversion from bytestring to individual FASTA entries is-- provided. In addition, convenience function for file- and compressed-- file-loading are available.moduleBiobase.Fasta.ImportwhereimportData.ByteString.Char8asBSimportData.Iteratee.IterateeasIimportData.Iteratee.ListLikeasIimportData.Iteratee.CharasIimportData.Iteratee.IOasIimportData.Iteratee.ZLibimportPreludeasPimportData.MonoidimportData.ListasLimportBiobase.Fasta-- | This is the type of the conversion function from FASTA data to the data-- 'z'. Make certain that all input is used strictly! BangPatterns are the-- easiest to do. In order, the function expects the current FASTA header, then-- a data segment, and finally the starting position of the data segment within-- the full FASTA data.---- If you need the conversion to run in constant time, do not use the-- convenience functions and replace the final conversion to a strict stream by-- your own conversion (or output) function.typeFastaFunctionz=FastaHeader-- ^ the ">" header->StartPos-- ^ where in the original sequence to start->WindowSize-- ^ how many characters we are looking at->PeekSize-- ^ this many characters are from the next window (peeking into)->FastaData-- ^ the actual sequence data->z-- ^ and what we return as result-- | Starting position in FASTA entry.typeStartPos=Int-- | Current header (the line starting with '>')typeFastaHeader=ByteString-- | FASTA datatypeFastaData=ByteString-- | WindowtypeWindowSize=Int-- | How many characters to peek forwardtypePeekSize=Int-- * conversion from FASTA to data of type 'z'.-- | Takes a bytestring sequence, applies 'f' to each bytestring of windowsize-- and returns the results z.rollingIter::(Monadm,Functorm,Nullablez,Monoidz)=>(StartPos->WindowSize->PeekSize->FastaData->z)->WindowSize->PeekSize->EnumerateeByteStringzmarollingIterfwindowSizepeekSize=unfoldConvStreamgo0wheregostart=doyss<-rollwindowSize(windowSize+peekSize)caseyssof[ys]->doletxs=BS.filter(/='\n')ysletl=BS.lengthxsreturn$(start+l,fstartwindowSizepeekSizexs)_->error"rollingIter: error"{-# INLINE rollingIter #-}-- | Outer enumeratee. See the two convenience functions for how to use it-- (just like any enumeratee, basically).---- The fasta function 'f' manipulates small stretches of fasta data and has-- arguments: fasta header, fasta data, start position (all filled by-- eneeFasta).---- Next we have the window size, how many characters to read at once,---- followed by the the number of characters to read in addition.---- The work is actually done by 'rollingIter'.eneeFasta::(Monadm,Functorm,Nullablez,NullPointz,Monoidz)=>FastaFunctionz->WindowSize->PeekSize->EnumerateeByteStringzmaeneeFastafwindowSizepeekSize=unfoldConvStreamgo""wheregohdr=dohdr<-I.takeWhile(/=10)-- 10 == '\n'is<-joinI$I.breakE(==62)-- 62 == '>'><>rollingIter(fhdr)windowSizepeekSize$stream2streamreturn(hdr,is){-# INLINE eneeFasta #-}-- * Convenience functions: final data is returned strictly.-- | From an uncompressed file.fromFile::(Monoidz,Nullablez)=>FastaFunctionz->Int->Int->FilePath->IOzfromFileffwindowSizepeekSizefp=run=<<(enumFile8192fp.joinI.eneeFastaffwindowSizepeekSize$stream2stream){-# INLINE fromFile #-}-- | From a gzip-compressed file.fromFileZip::(Monoidz,Nullablez)=>FastaFunctionz->Int->Int->FilePath->IOzfromFileZipffwindowSizepeekSizefp=run=<<(enumFile8192fp.joinI.enumInflateGZipOrZlibdefaultDecompressParams.joinI.eneeFastaffwindowSizepeekSize$stream2stream){-# INLINE fromFileZip #-}