The Report for the 20th International Collaborators Meeting

International Nucleotide Sequence Database Collaboration (INSDC), the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meeting every year.

In 2007, the meeting was held at EMBL-Bank/EBI in UK, 21-23 May.

DDBJ, EMBL-Bank, GenBank reported each bank activities in the last year, discussed practical matters to maintain and to update INSDC.
The outcomes of the meeting are summarized below.

The Items; Discussed and To Be Studied

INSDC web site

The three banks agree with that we are to add some samples for standardized submissions as the contents of the INSDC web site.

Alternative assemblies

With large amount of reads of draft sequences available in the public, scientists are asking if they can submit assemblies of the reads to INSDC. We need to develop a policy for who can submit alternative assemblies and what we would do with the data once it is submitted i.e. would we start a TPA -like database for alternative assemblies? Three banks would ask to the advisors meeting.

GSC and MIGS

The Genomic Standards Consortium (GSC) is to support the community-based development of a standard datasets of information about complete genomes and metagenomic ones. It is currently working together towards the 'Minimal Information about a Genome Sequence(MIGS)' specification. Overall, the three banks agreed that a cooperative approach to GSC activities was preferred over a competitive approach.

EST/GSS clone library ID

A registration system to assign unique IDs for both academic and commercial EST and GSS libraries will be studied.

Controlled vocabulary in KEYWORDS line

The three banks agree to use following three keywords in common.

Two terms to describe the direction and location of EST

"5'-end sequence (5'-EST)"

"3'-end sequence (3'-EST)"

A term to indicate an entry belonging to a full length insert cDNA project

"FLI_CDNA"

Changes to the Feature Table Document: Features and Qualifiers

The following items will be applied from October 2007 with the revision of Feature Table Definition, if not otherwise specified.

A variety of new types of RNA transcripts, "miRNA", "siRNA", and so on, have been introduced in recent years. Because the number of non protein coding RNA families is quite likely to continue to expand, a new ncRNA feature that can flexibly accommodate them will be introduced.
Furthermore, snRNA, snoRNA, and scRNA features are merged into ncRNA feature by December 2007.

To indicate the nucleotide region encoding the proteolysis tag peptide of tmRNA, a new qualifier, /tag_peptide, will be used for the tmRNA feature.

Format: /tag_peptide=<base_range>
Example: /tag_peptide=90..122

"tmRNA" is added to the specified values for the /mol_type qualifier that indicates molecule type of the sequence in vivo on the source feature.

The value of /specimen_voucher qualifier will be become structured, consisting of an institution code, a collection code, and a specimen identifier, as well as the existing unstructured values.

Format:
/specimen_voucher="[<institution_code>:[<collection_code>:]]<specimen_id>"
There are three forms of specimen_voucher qualifiers;
<specimen_id>
<institution_code>:<specimen_id>
<institution_code>:<collection_code>:<specimen_id>

If the value of includes one or more colons, ":", it is 'structured'. Structured vouchers include institution_codes (and optional collection_codes) taken from a controlled vocabulary that denote the museum or herbarium collection where the specimen resides.