Influenza virus (IV) is the causative agent of several serious influenza
pandemics. Recently, a highly pathogenic avian influenza virus (AIV; H5N1)
has resulted in the death of more than 100 people and the slaughter of
millions of poultry in Asia, Europe, and Africa. The Beijing Institute of Genomics (BIG), Chinese
Academy of Sciences (CAS) has been sequencing various influenza viruses
collected by scientists of different institutions from different parts of
China. We have sequenced isolates of AIV subtype H5N1 from 1997 to 2005 in
different hosts found in China, such as wild birds, poultry, water fowls,
and mammals. BIG also has an on-going effort to generate more AIV/IV
sequences. Meanwhile, significant international efforts in sequencing the
viral isolates have been made worldwide that the number of IV sequences has
been rising rapidly in public resources.

Frequent outbreaks of highly pathogenic avian influenza and the increasing
data for comparative analysis require a central database specialized in
influenza virus. We established the Influenza Virus Database (IVDB) to
integrate information and to create an analysis platform for genetic,
genomic, and phylogenetic studies of the virus. IVDB hosts complete genome
sequences of influenza A virus generated by Beijing Institute of Genomics and
curates all other published influenza virus sequences after expert
annotations. For the convenience of efficient data utilization, our Q-Filter
system classifies and ranks all nucleotide sequences into 7 categories
according to sequence content and integrity. IVDB provides a series of tools
and viewers for analyzing the viral genomes, genes, genetic polymorphisms
and phylogenetic relationships comparatively. A searching system is
developed for users to retrieve a combination of different data types by
setting various search options. To facilitate analysis of the global viral
transmission and evolution, the IV Sequence Distribution Tool (IVDT) is
developed to display worldwide geographic distribution of the viral
genotypes and to couple genomic data with epidemiological data. The BLAST,
multiple sequence alignment tools and phylogenetic analysis tools were
integrated for online data analysis. Furthermore, IVDB offers instant access
to the pre-computed alignments and polymorphism analysis of influenza virus
genes and proteins and presents the results by SNP distribution plots and
minor allele distributions.

IVDB continues to make enhancements to its data quality, utility and
functionality, aiming to be a powerful information resource and an analysis
workbench for scientists working on IV genetics, evolution, diagnostics,
vaccine development, and drug design.

Data Sources & Data Curation

As an integrated IV information resource, IVDB contains both BIG's data and
data from public resources. BIG data have been submitted to NCBI and are being processed now. Data from NCBI Influenza Virus Resource (by U.S. National
Institute of Health) form the backbone of the database. Since ISD (Influenza Sequence Database, by Los Alamos
National Laboratory of the U.S.) accepts direct submission and contains some
IV sequences that are absent in NCBI's Influenza Virus Resource, IVDB
integrates additional 2,242 sequence segments from ISD. The current protein
3-D structural data are mainly from PDB (Protein
Data Bank). More 3-D data will be acquired from our collaborators expertized
in protein 3-D modeling.

All data in IVDB have been curated manually since data quality is of crucial
importance for analysis. The main data curations include:

Examine annotations and source information from each sequence entry in
public databases

Redefine host species (e.g. goose, coot instead of avian in general) and
sampling locations (not only continents and country/regions, but also
provinces/states of a nation), and store the information in searchable
fields.

Match protein sequences with nucleotide sequences, and further annotate
some predicted CDS and UTR.

Furthermore, we utilized the self-developed Influenza Virus Sequence Quality
Filter System (Q-Filter) to classify nucleotide sequences into 7 categories
for the convenience of efficient data utilization.

Tools & Analysis

As an IV analysis platform, except for the powerful searching system, IVDB
provides a series of tools and viewers for analyzing genomes, genes and
functions, polymorphisms, and phylogenetic relationship individually or in a
comparative context.

The tools include BLAST, multi-sequence alignment, phylogenetic tree
builder, as well as self-developed IV Sequence Quality Filter System
(Q-Filter) and IV Sequence Distribution Tool (IVDT). IVDB also offers
instant access to the pre-computed multiple sequence alignments and sequence
polymorphisms on both the nucleotide and protein level and presents the
results by the SNP distribution plot and minor allele distributions.( For
more information, see Help.)

Authors of IVDB:

CHANG Suhua
ZHANG Jiajie
LIAO Xiaoyun

Dr WANG Jing
Dr YU Jun
ZHU Xinxing
WANG Dahai
WANG Zizhu

Thanks: RUAN Jue, YUAN Haifeng and LI Zhao for the technical support. We are indebted to all our collaborators, IV sample providers and other IV sequence producers for their contribution and kind support to the IVDB database.