Update released June 10, 2004: The
Escherichia coli K-12 strain MG1655 sequence and annotations have
been updated; see this announcement for further
information. In addition, an Excel spreadsheet
is available which summarizes the MG1655 update in terms of nucleotide
sequence corrections and the consequent protein sequence changes.

Contribute to annotation updates!

A wealth of new information has become available recently for the annotation
of Escherichia coli K-12 strain MG1655. Annotation of the genome
is an ongoing task that benefits from the work of all end-users of the
sequence. To this end, we have adopted the ASAP
relational database as the venue for maintaining and updating the annotations,
as well as enabling community input towards that goal. Please note that
while you are invited to become a registered annotator and contribute
to the information within ASAP, there is no requirement to register in
order to view the current MG1655 annotations -- simply log on as a guest.
Furthermore, the "Add a note for the curator" function allows even guest
users to suggest additional annotation updates and corrections. Finally,
we are working with other groups to correlate and reconcile the various
lists and databases containing E. coli genomic information (see,
for example, the sites listed at the E.
coli Database Portal).

Some annotations have already been updated within ASAP, including a number
of revised gene boundaries, gene names and known or predicted gene products.
In addition, several new genes have been added to the annotations, and
perhaps inevitably, several previously annotated genes have been deaccessioned.
Some of these changes have been previously reported as personal communications
(see
Serres, et al. 2001). While you are directed to ASAP for the current
annotations, we will provide summary information on this page from time
to time; updated December 8, 2003.

The following genes have been added to the annotations
(also see RNA genes, below):

lend

rend

dir

bnum

type

gene

syn

product

16751

16903

-

b4412

CDS

hokC

gef

small toxic membrane polypeptide; component of addiction module

213925

214125

-

b4406

CDS

yaeP

conserved hypothetical protein

607059

607211

+

b4415

CDS

hokE

small toxic membrane polypeptide; component of addiction module

1268391

1268498

-

b4419

CDS

ldrA

small toxic polypeptide; component of addiction module

1268926

1269033

-

b4421

CDS

ldrB

small toxic polypeptide; component of addiction module

1269461

1269568

-

b4423

CDS

ldrC

small toxic polypeptide; component of addiction module

1489946

1490095

-

b4428

CDS

hokB

ydcB

small toxic membrane polypeptide; component of addiction module

1702575

1702700

+

b4409

CDS

blr

beta-lactam resistance protein

3697609

3697716

-

b4453

CDS

ldrD

small toxic polypeptide; component of addiction module

3718077

3718229

-

b4455

CDS

hokA

yiaZ

small toxic membrane polypeptide; component of addiction module

4190215

4190415

-

b4407

CDS

thiS

thiG1

sulfur carrier protein

4373895

4374020

+

b4410

CDS

ecnA

entericidin A (antidote to entericidin B); component of addiction
module

Although over 85% of the genome consists of protein-encoding genes, other
genes encode RNAs that function without being translated into proteins.
These "RNA genes" are often referred to as noncoding or non-coding RNAs
(ncRNA); other designations include small RNA (sRNA), non-messenger RNA
(nmRNA), small non-messenger RNA (snmRNA), functional RNA (fRNA), and
the generic miscellaneous RNA (misc_RNA) used in GenBank. The best-known
RNA genes encode transfer RNAs (tRNA) and ribosomal RNAs (rRNA), but since
the late 1990s many new noncoding RNAs have been found to play significant
roles in the cell.

New annotations of RNA genes

In addition to 22 rRNAs and 86 tRNAs, a handful of misc_RNAs were already
annotated in our GenBank entry. In consultation with Susan Gottesman,
Gisela Storz, and Karen Wassarman, we have begun an effort to add a number
of other RNA genes to the annotations  initially in ASAP and eventually
in GenBank as well. The following table lists these RNA genes, including
their assigned b-numbers; it includes those previously annotated; updated
October 30, 2003.

*The gene expression profiles of selected sRNAs were based on data from
Affymetrix E. coli antisense genome arrays. The chip design file
was modified to fit the newest annotation and data were extracted with dchip
software.

Beginning with our publication of the complete genome sequence (Blattner,
et al. 1997) we have assigned each gene (protein- or RNA-encoding)
a unique numeric identifier beginning with a "b" -- the so-called b-numbers
or Blattner numbers. These designations remain constant through further
updates, gene identifications, etc. It has come to our attention that
others have assigned b-numbers without consulting us; for example, yaeP
(b4406) has been designated B0189.1 in Swiss-Prot, and b4502 in the RefSeq
version of the genome sequence. In general, we will not track those designations,
just as we do not invent our own GenBank accession numbers, etc.

The provisional y-names for uncharacterized ORFs are based on a systematic
nomenclature described by Kenn Rudd (Rudd
1998). Briefly, the first three letters of a "y" name are based on
the map position of an ORF at the time the name was assigned, in a manner
analogous to the "z" naming system for transposon insertions. As with
b-numbers, the y-names are not reused if an ORF is given a new gene name
or if an ORF becomes defunct. According to the original scheme, once a
function was established for an E. coli gene the provisional y-name
would be abandoned and a new gene name chosen. Since the y-names are used
in the literature, ASAP retains them as synonyms when a gene is renamed.

The standard genetic nomenclature for E. coli is that of Demerec
et al. 1966, as subsequently amended through use, and as described
in Instructions to Authors for the Journal of Bacteriology. In order to
avoid chaos, we tend to defer to the E. coli Genetic Stock Center
(CGSC) database at Yale University
as the final authority on gene names.