Dear Mark,
a question via the GenBank newsgroup:
As per these anouncements over the last year, the next release of
GenBank will have unlimited lengths in GenBank record. Do you
expext this will have much of an impact in the next release? Or do
you anticipate these longer records to trickle in over time?
For example, does NCBI plan to re-release all bacterial genomes by
next release of GenBank in this unified single record? Will there
be a public release of BLAST ahead of the next GenBank release?
(so that those of us who maintain local BLAST servers for our
communities can have a BLAST server readdy for this new data?)
cheers, and thanks for the wonderful work
your #1 fan,
f.
--
BF Francis Ouellette http://bioinformatics.ubc.ca/ouellette
On Sun, 25 Apr 2004, Mark Cavanaugh wrote:
> Greetings GenBank Users,
>> GenBank Release 141.0 is now available via ftp from the National
> Center for Biotechnology Information (NCBI):
[...]
> Release Date Base Pairs Entries
>> 140 Feb 2004 37893844733 32549400
> 141 Apr 2004 38989342565 33676218
> 1.4 Upcoming Changes
>> 1.4.1 **Sequence Length Limitation To Be Removed In June 2004**
>> At the May 2003 collaborative meeting among representatives of GenBank,
> EMBL, and DDBJ, it was decided that the 350 kilobase limit on the sequence
> length of database records will be removed as of June 2004.
>> Individual, complete sequences are currently expected to be a maximum
> of 350 kbp in length. One major reason for the existence of this limit is
> as an aid to users of sequence analysis software, some of which might not
> be capable of processing megabase-scale sequences.
>> However, very significant exceptions to the 350 kbp limit have existed
> for several years; Phase 1 (unordered, unoriented) and Phase 2 (ordered,
> oriented) high-throughput genomic sequences (HTGS) generated by efforts
> such as the Human Genome Project; large dispersed eukaryotic genes with
> an intron/exon structure that spans more than 350 kbp; and sequences
> which result from assemblies of Whole Genome Shotgun (WGS) project data.
>> Given these exceptions, and the technological advances which have made
> large-scale sequencing practical for an increasing number of researchers,
> the collaboration has decided that the 350 kbp limit must be removed.
>> As of June 2004, the length of database sequences will be limited only
> by the natural structures of an organism's genome. For example, a single
> record might be used to represent all of human chromosome 1, which is
> approximately 245 Mbp in length.
>> Software developers for some of the larger commercial sequence analysis
> packages were recently asked what timeframe would be appropriate for this
> change. Answers ranged from "immediately", to "several months", to "one year".
> So the one-year timeframe was selected, to provide ample time to implement
> changes which megabase-scale sequences may require.
>> Some sample records with very large sequences have been made available
> so that developers can begin to test their software modifications:
>>ftp://ftp.ncbi.nih.gov/genbank/LargeSeqs>> Many changes are expected after the removal of the length limit. For
> example, complete bacterial genomes (typically on the order of several
> megabases) will be re-assembled into single sequence records. The submission
> process for such genomes will become much more streamlined, since database
> staff will no longer have to split the genomes into pieces. BLAST services
> will be enchanced, so that hits reported within very large sequences will
> be presented in a meaningful context.
>> All such changes will be discussed more fully in future release notes,
> the NCBI newsletter, and the GenBank newsgroup.
---
- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
-
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb
- GenBank on the WWW, see: http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at bioinformatics.ubc.ca