Jason Stajich wrote:
>> Brad - Thanks for the game parser updates and test files.
>> I have some comments. One thing we've been kicking around with the latest
> bioperl release and the Bio::DB rewrites is that the expected behavior
> when a class is reading a data file of sequences is for it to only read
> one sequence at a time. The game code actually reads everything in at one
> time and does this multiple times depending on whether or not it is
> adding features or just reading in a primary seq.
>
These days sequences are getting bigger, not shorter. I think it is
absolutely worthwhile to try to avoid slurping in a whole file of
sequences whenever possible.
> It also expects only file names to be passed in, but I think file handles
> should be supported as well and that it should be using the
> $self->_filehandle method that all SeqIO classes subscribe to. This
> will not work with the current way of multiple passes on the document.
I agree. File handles as general streams should be supported. This way
you can pass in a socket or any emulation of a stream.
>> There are some simple ways around this, one is to read everything from the
> stream/file and store it as one giant string and then re-pass this string
> as input to the SAX parser. This will use up a large amount of memory and
> break on very large files. The problem is that the SAX parser is
> expecting to read to the end of the document not to the end of a
> <seq></seq> block. Anyone else had a chance to look over this and think
> about it?
>
Unfortunately not. I also don't know the internals of the perl SAX
interface, but in general SAX was exactly defined for one-pass
chunk-by-chunk stream reading. Does the perl SAX parser not adhere to
this concept, or are there pecularities of the GAME DTD that prohibit
this?
Hilmar
--
-----------------------------------------------------------------
Hilmar Lapp email: hlapp@gmx.net
GNF, San Diego, Ca. 92122 phone: +1 858 812 1757
-----------------------------------------------------------------