-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Chris,
Nice job. My only comment is that there's been a great deal of
consternation over the role of whitespace in GFF3 recently and I am
thinking changing the column delimiter back to strict tabs and
allowing spaces (but no tabs or other unescaped whitespace) in the
fields. I don't think this will affect your methods at all, but just
a heads up.
Lincoln
On Friday 05 March 2004 08:52 pm, Chris Mungall wrote:
> I have committed some new stuff to bioperl-live:
>> the script seq/unflatten_seq will now generate GFF3 - the
> unflattener module is used to build the 'feature graph' connecting
> genes, transcripts, exons and CDSs together. This means we can have
> GFF3 for anything in genbank!
>> As far as I'm aware, the only other sensible output formats to use
> here (ie formats that support feature graphs/containment
> hierarchies) are: chado, chaos, and the write-only asciitree.
>> This feature graph is written out in the GFF3 using the ID and
> Parent tags. To do this there is an extra intermediate step - the
> bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags
> are generated.
>> Here is a description of the changes I have made:
>> [unless you're a bioperl hacker you don't really need to read the
> rest of this]
>> You can get the context of what I'm on about from this thread:
>http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html>> Two new public methods:
>> FeatureHolderI->set_ParentIDs_from_hierarchy
>> sets both ID and ParentID from FeatureHolder hierarchy
>> SeqFeatureI->generate_unique_persistent_id
>> this is required by the above method
>> Lincoln wanted this to be private, but I think it has
> to be called from outside
>> FeatureHolderI->create_hierarchy_from_ParentIDs
>> the inverse of set_ParentIDs_from_hierarchy
>> (note that I have put the implementation in the interface - in the
> absence of proper abstract classes, this was deemed the best thing
> to do in the previous discussion on this)
>> Modifications:
>> SeqFeatureI->primary_id
>> This now maps to the tag_value 'ID' (ie the tag that GFF3 uses to
> uniquely identify a feature).
>> Minor modification
>> Bio::Tools::GFF now allows the -noparse=>1 option
>> this is simply to stop the module waiting on input from STDIN
> when used in write-mode (maybe there's a better way of doing this
> but I didn't want to mess with this module)
>> Proto-test
>> t/FeatureHolder.x
>> This unflattens a genbank sequences and roundtrips it to chadoxml
> via GFF3
>> This doesn't work yet - if you dump a splitfeature as GFF3 and
> re-import it, it becomes two features. Any volunteers to help
> fix this?
>> Unique IDs in bioperl:
>> In the discussion that preceeded this, it seemed that people liked
> the idea of persistent unique IDs, but there was no suggestions as
> to how to go about it. This is inherently difficult with objects,
> but I borrowed a solution from relational modeling.
>> A persistent unique ID is generated using
>> seq_id
> primary_tag
> start
> end
>> It is assumed that these are all set and comprise a "unique key"
> over features. Of course, there's no way to enforce this with
> objects. The generated ID is simply these values concatenated with
> : delimiters. You can think of this is a skolem function if you're
> that way inclined.
>> Another assumption is that seq_id is unique and persistent.
>> Of course, if you're dealing with data that changes with time, then
> changing the coordinates of a feature will change it's id. This is
> fine. If you want to use your own IDs rather than the generated
> ones, you can simply set the primary_id() field - or if you are
> using genbank files, add something like this
>> /ID=CG12345-RA
>> to the feature.
>> Stuff still to do:
>> * fix GFF3 to deal with roundtripping splitlocations
>> * A nest_features() method, as discussed in the previous email.
> This is the opposite of set_ParentIDs_from_hierarchy(), for reading
> in GFF3 (and then writing to a feature-graph compatible format, or
> to a database such as biosql or chado).
>> * Bio::SeqIO::GFF3
>> I know a lot of us would like this a lot - is there any plans to
> implement this yet?
>> * A GeneModel factory
>> This would take the output of the unflattener (a set of feature
> graphs typed to SO) and make SeqFeature::Gene objects
>> Cheers
> Chris
>> _______________________________________________
> Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org>http://portal.open-bio.org/mailman/listinfo/bioperl-l
- --
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQFATLi20CIvUP7P+AkRAuErAKCc4iNS3cnVLpbkLAfpba176o29aQCgndia
SZzP/ANnxz7kvKmg+5Ovq9c=
=FM1X
-----END PGP SIGNATURE-----