Datafiles in this directory contain the Genbank accessions
used in trace-2-genome-alignment SNP discovery methods.
Files with names with the .qual extension, like YYY.qual,
contain quality scores for the corresponding sequences in
the FASTA file YYY.
The file checksum.txt contains the checksums for the files
when they are uncompressed.
The FASTA files have some information about the sequences
in the headers, things like clone, chromosome etc. If any
of this information is unknown it is marked with "XXX".
The files gb1*,gb2* and gb3* contain all the genomic sequences
from genbank (both HTG and finished) that are longer than
10,000 nucleotides.
The files gbfinished* contain all the finished sequences
from genbank, it can be safely assumed that each nucleotide
listed here has a quality score > 40.
The files nr* contain all sequences from genbank whose
quality values are known.
For SNP discovery, please use the gbfinished* and nr* files.
All duplications between the two sets have been removed.
This release, marked SEP-19 actually contains sequences from the
official SEP-4 freeze of genbank.