CleanEST

introduction

The EST division of GenBank, dbEST, is widely used in many
applications such as gene discovery and verification of exon–intron
structure. However, the use of EST sequences in the dbEST libraries
is often hampered by inconsistent terminology used to describe the
library sources and by the presence of contaminated sequences. Here,
we describe CleanEST, a novel database server that classified dbEST
libraries and removes contaminants. We classified all dbEST
libraries according to species and sequencing center. In addition,
we further classified human EST libraries by anatomical and
pathological systems according to eVOC ontologies. For each dbEST
library, we provide two different cleansed sequences: ‘pre-cleansed’
and ‘usercleansed’. To generate pre-cleansed sequences, we cleansed
sequences in dbEST by alignment of EST sequences against well-known
contamination sources: UniVec, Escherichia coli, mitochondria and
chloroplast (for plant). To provide user-cleansed sequences, we
built an automatic user-cleansing pipeline, in which sequences of a
user-selected library are cleansed on-the-fly according to
userselected options. The server is available at
http://cleanest.kobic.re.kr/ and the database is updated monthly.