Meeting report

Over the past decade, the University of Tennessee (UT), Oak Ridge National Laboratory
(ORNL), and the Kentucky Biomedical Research Infrastructure Network (KBRIN) have collaborated
to share their extensive bioinformatics research and educational expertise to further
strengthen bioinformatics in the Tennessee and Kentucky region. One of the results
of these collaborations is joint sponsorship in an annual regional bioinformatics
summit that brings together researchers, educators, and students with interests in
bioinformatics from research and educational institutions in Kentucky, Tennessee,
and other states. These summits provide unique opportunities for enhancing collaborative
links in the region and for further integration of multidisciplinary research efforts
across institutions. As a result, a number of new collaborative research and educational
projects have been fostered across institutions. The Eighth Annual UT-ORNL-KBRIN Bioinformatics
Summit was held at Fall Creek Falls State Park in Pikeville, Tennessee from March
20–22, 2009. A total of 202 participants registered for the summit, with 94 from various
Tennessee institutions and 68 from various Kentucky institutions. A number of additional
participants came from universities and research institutions from other states and
countries, e.g. the National Institutes of Health, Virginia Commonwealth University,
University of Cincinnati, Emory University, and the University of British Columbia.
Seventy-seven registrants were faculty, with an additional 62 student, 43 staff, and
20 postdoctoral level participants.

The conference program included three days of presentations. The first day was devoted
to workshops, including two Geospiza/Digital World Biology workshops, along with a
bioinformatics education and a microarray analysis workshop. The last day and a half
were dedicated to scientific sessions in bioinformatics divided into three plenary
sessions: Medical and Translational Informatics, Systems Biology, and Next-Generation
Sequencing and Epigenetics.

Geospiza/Digital World Biology Workshops

Dr. Sandra Porter, president of Digital World Biology, kicked off the Summit with
two workshops from Geospiza and Digital World Biology. The first workshop, "Polymorphism/SNP
Discovery" focused on discovering single nucleotide polymorphisms (SNPs) using raw
Sanger sequencing trace files [1,2] and the associated Phred [3,4] quality values in regions with low quality scores. These SNPs can be visually represented
by using techniques for viewing sequence chromatograms such as Geospiza's FinchTV.
The example of the SNP with dbSNP [5] entry rs671 was used as an illustrative case. This SNP results in a single base difference
in the nucleotide sequence from an A in the wild type to a G in the mutant, causing
a change from a glutamine to a lysine in amino acid position 487 of the alcohol dehydrogenase
(ALDH2) gene. Conformational changes in the ALDH2 protein structure cause an individual
ingesting alcohol to lose the capability of efficiently metabolizing acetaldehyde
[6]. HapMap [7] information for this SNP indicates that 31% of the Han Chinese population is heterozygous
for this SNP. In fact, this allele is not typically found in any population outside
of Asia [8]. Dr. Porter discussed using NCBI's Cn3D [9] to view structural locations to form hypotheses as to the possible molecular interactions
of amino acids at specific locations and how these interactions can be affected by
polymorphisms. This discussion was illustrated with the wild type and mutant for ALDH2
structures with Protein Data Bank (PDB) [10] entries 1O05 and 1ZUM, respectively, where the change from a negatively to a positively charged amino acid
causes a change in how the protein subunits of ALDH2 interact.

In the second Geospiza workshop, "Next Generation DNA Sequencing", Dr. Porter provided
an overview of three next generation sequencing platforms (454 [11], Illumina [12], and SOLiD [13]) which are technologies enabling for an increased availability of DNA sequence information
at a reduced cost per run compared to the traditional Sanger sequencing technique.
The properties of each approach in terms of methodologies, data preparation, raw data,
analysis, and sequence types interrogated (i.e. genomes, transcriptomes, miRNAs, copy
number variants, SNP analysis, epigenetic methylation) were explained. Dr. Porter
also discussed the pipeline that Geospiza have in place for dealing with data from
sample to results. Use of next generation sequencing data and RNA-Seq [14] for transcriptome analysis as opposed to microarrays [14] is becoming a more real possibility. One of the main advantages of such an approach
is that it becomes possible to study all possible isoforms and SNPs, including those
not previously described. Studies performed for transcriptome analysis [15,16] were analyzed using Geospiza's GeneSifter™ software.

Bioinformatics Education Workshop

Dr. Steven Jennings of the University of Arkansas-Little Rock led a discussion on
the state of bioinformatics education. Building upon his experience in creating a
Ph.D. program in Bioinformatics as well as serving on national society level bioinformatics
education committees, Dr. Jennings offered several insights into how a student interested
in bioinformatics should be trained. The issue of training students often leads to
a struggle of breadth versus depth of training. As Dr. Jennings pointed out, many
of the techniques students learn will be out-of-date within five years of graduation.
Therefore, importance in bioinformatics training should be placed on producing students
who are independent thinkers who are able to adapt to changing technologies. A methodology
for constructing a program in bioinformatics was proposed by constructing a cube,
where each dimension represents topics in the fields of biology, computer science,
and mathematical modeling/computation. The intersection of these areas shows the difficulty
in producing a "one size fits all" program.

Statistical Analysis of Microarrays Workshop

Issues involved in analyzing microarray data from a statistical perspective were the
topic of the workshop provided by Dr. Arnold Stromberg from the University of Kentucky.
Dr. Stromberg has examined a number of research issues with microarrays, including
pooling samples [17]. The main focus of this workshop was to encourage researchers to reduce the number
of tests and gene lists used in order to increase the p-values and reduce the false
discovery rate (FDR). A quadratic regression analysis technique was discussed that
allows researchers to classify the behavior of genes into one of nine basic patterns
over time [18]. Such an approach can be favorable to cluster analysis by showing the actual behavior
of the gene(s) of interest. Dr. Stromberg suggested that the best approach to solving
issues with microarrays is to consider the experimental design from the outset, keeping
in mind three key questions: 1) What do you want to know? 2) What is the simplest
design that will do the job? 3) Can the design be modified to reduce variability?

Medical and Translational Informatics

This year's Medical and Translational Informatics session included a plenary presentation
by Bruce Aronow of the University of Cincinnati and Cincinnati Children's on "Integrative
Biology and Disease." Dr. Aronow presented his perspective of building upon systems
biology techniques for understanding systems dynamics across concepts to allow for
a higher level of abstractions. Inclusion of databases of prior knowledge such as
molecular, clinical and phenomic sources is key to the understanding of what is going
on biologically. For instance, a pathway can be analyzed by first understanding how
miRNAs can knock down transcription factor expression, thereby altering gene expression
which in turn may affect a particular pathway. A discussion of ontological models
for drug and disease correlation was included, which will hopefully lead to better
personalized medicine by developing a greater connectivity of knowledge between drug
interactions and their effects on genes, gene products, as well as transcriptional
and translational control elements. The Systems Biology of Disease and Drug Ontology
(SBD) as well as GATACA and ToppGene [19] were discussed as tools that allow for a better understanding of disease.

A second plenary presentation entitled "Slim-Prim: A bioinformatics database bridging
basic and clinical science" was made by Ian Brooks of the University of Tennessee
Health Science Center. Slim-Prim is a HIPPA compliant management system for managing
information for either scientific laboratories or patient-care research. Slim-Prim
was initially developed for use by members of the University of Tennessee Health Science
Center's Clinical and Translational Science Institute (CTSI). At its base is an Oracle
data/knowledge management system. Built upon this core is a web-based API for building
forms for individual projects or patient studies. Each project can then be linked
to additional information such as patient history and biorepository information both
locally and in a federated fashion. Dr. Brooks discussed two such sources of electronic
health records currently housed in Slim-Prim, the Kids' Inpatient Database (KID) [20] which contains 7 million records; and the Mid-South eHealth Alliance (MSeHA) [21] which produces "RHIO" for electronic health records at a rate of 1.5 million records
per year. A web-based report generator, Knowledge Informatics for Science and Medical
Education and Training (KISMET), allows for access to local and national resources,
including caBIG [22], for more complete analysis of the Slim-Prim data. The main benefits of the Slim-Prim
system are that it is user-friendly, secure, versatile, and portable.

Systems Biology

The Systems Biology plenary session featured four speakers from Virginia Commonwealth
University (VCU). Dr. Michael Miles presented his research on genetic characterization
of robust ethanol-responsive gene networks in mouse prefrontal cortex [23-28]. Analysis by his group of QTL mapping of genome-wide expression changes to ethanol
in mice response pinpointed multiple genome loci showing strong signals, indicating
the role of these loci in gene expression changes. A number of loci were suggested
to influence regulation of response to ethanol for gene networks. Epistatic interactions
were observed for a number of loci, suggesting the role for DNA modification in regulation
of gene expression in response to ethanol.

The presentation on systems vaccinology for Cryptosporidium, an important apicomplexan parasite, was made by Dr. Gregory Buck, head of the Center
for the Study of Biological Complexity at VCU. He summarized his research, which yielded
the genome sequences of C. hominis and C. parvum [29,30]. He further described the successful identification by his group of promising vaccine
targets by employing a joint strategy of comparative analysis of gene expression and
proteomics of different stages of the life cycle of Cryptosporidium, using genome analysis identifying predicted membrane- or surface-associated proteins,
secreted proteins, and other relevant candidates, and by employing a combination of
experimental and in silico analysis [31-33].

Dr. Zhongming Zhao presented his research on gene networks and pathways in schizophrenia.
In his presentation, he discussed his bioinformatic approach to identify candidate
genes for schizophrenia by combining results from gene mapping studies including genome-wide
association analysis, linkage analysis, gene expression information, and literature
search, and by employing screening criteria of connectivity in the human protein-protein
interaction networks [34]. He also outlined his research on the role of microRNA interaction networks in schizophrenia
and the successful development of an online database for schizophrenia genes.

Dr. Ping Xu presented the final talk at this session, in which he described his integrative
study of streptococcal virulence by employing comparative genomics and systems biology.
He described the devastating effect of streptococcal infections and summarized a systematic
experimental genome-wide deletion analysis of each open reading frame in the Streptococcus sanguinis genome, which will lead to better understanding of the phenotypic role of each of
these genes [35,36].

Next-Gen Sequencing and Epigenetics

Robert Hanson of NIH/NIDDK was the first presenter in this session with a talk on
"Genetic and Epigenetic Studies of Type 2 Diabetes in American Indians." His presentation
included a discussion of the complexity of Type 2 diabetes, specifically in understanding
the role of potential epigenetic factors. These studies involved looking at the birth
weight of babies in addition to familial history in the American Indian population.
Genome-wide linkage analysis and association mapping studies indicate potential candidates
[37-57]. Some variants show significantly weaker effects in American Indians than in Europeans,
indicating the importance of epigenetics in terms of parent-of-origin effects and
interaction with the diabetic intrauterine environment.

Jarret Glasscock of Cofactor Genomics followed with a presentation titled "New aspects
of bioinformatics introduced by next-generation sequencing technologies." Dr. Glasscock
has been involved in early testing and characterization of many of the Next-Gen sequencing
platforms, including the llumina, 454, and SOLiD technologies. His presentation covered
the possibilities these technologies now provide, including large scale and single
nucleotide polymorphism discovery, gene expression quantification, and epigenetic
studies through bisulphate sequencing. An overview and contrast of these technologies
were given in terms of which types of studies are most suitable for each. Dr. Glasscock's
presentation led to an engaging discussion. These exciting technologies are rapidly
evolving and lead to many interesting research questions both with the data generated
and in methodologies for handling and annotating the data itself.

Educational Opportunities

Dr. Cynthia Peterson, the director of the UT/ORNL Graduate School of Genome Science
and Technology presented an update on the educational opportunities at UT/ORNL. She
discussed the progress made with SCALE-IT (scalable computing and leading edge innovative
technologies) program over the past year. In addition, she discussed the National
Institute for Mathematical & Biological Synthesis (NIMBioS), a one-of-a-kind institute
housed at the University of Tennessee. NIMBioS is the result of a $16 million National
Science Foundation award to the University of Tennessee, Knoxville that will draw
more than 600 national and international researchers each year to participate in working
groups, workshops, and conferences. Support for working groups, postdoctoral and sabbatical
fellowships, as well as graduate assistantships are all available through NIMBioS
http://www.nimbios.org/webcite. PEER, The Program for Excellence & Equity in Research, was also discussed as an
avenue to increase the diversity of student populations in the STEM areas through
graduate fellowships, scientific training, and career skills workshops.

Poster session

Thirty-five posters were presented on Saturday afternoon during a two-hour poster
session. Abstracts (many of which appear in this supplement) were divided into the
general groupings of Bioimaging, Bioinformatics Infrastructure, Bioinformatics of
Health and Disease, Comparative Genomics, Databases, Functional Genomics, Gene Regulation,
Genomics, Machine Learning and Algorithms, Microarrays, Ontologies and Text Mining,
Proteomics, Structure and Function Prediction, and Systems Biology.

Five of these abstracts were included in the summit program as short platform presentations.
They included "A Framework for Layered Integration of Heterogeneous Data: A Case Study"
(Vida Abedi); "Motif Tool Manager: a web-based platform for motif discovery" (Vinhthuy
Phan); "The Ontological Discovery Environment: Integrating gene-centered data across
diverse experiments " (Jeremy Jay); "Transcriptional Profiling of CD4 T-cells Reveals
Abnormal Gene Expression in Young Prediabetic NOD Mice" (Dorothy Kakoola); and "Extensive
Parent-of-Origin Genetic Effects on Fetal Growth" (Ron Adkins). A sixth presentation
was given by Ramin Homayouni of ComputableGenomix on their GeneIndexer toolkit.

Future plans

The 2010 Bioinformatics Summit will rotate back to Lake Barkley State Park in western
Kentucky for the spring of 2010. Areas of interest will likely be on the use of next-generation
sequencing technologies in research laboratories, clinical informatics, and integrative
systems biology.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

We would like to thank the additional Conference Program Committee members Nigel Cooper
(University of Louisville), Jack Dongarra (University of Tennessee), Ramin Homayouni
(University of Memphis), Michael Langston (University of Tennessee), Terry Mark-Major
(University of Tennessee-Memphis), Dan Masys (Vanderbilt University), Terry Moore
(University of Tennessee), Jay Snoddy (Vanderbilt University), Arnold Stromberg (University
of Kentucky), Bruce Whitehead (University of Tennessee Space Institute) and Rob Williams
(University of Tennessee-Memphis) for putting together a well received scientific
program. In addition, we wish to thank Terry Mark-Major, Stephanie Dearing, and Michelle
Padgett for all of their extensive efforts in putting together all of the details
that allowed for the meeting to proceed. We would also like to thank the staff of
Fall Creek Falls State Park, who helped to make our stay as pleasant and smooth as
possible. Funding for the UT-ORNL-KBRIN Summit is provided in part by The Kentucky
Biomedical Research Infrastructure Network (KBRIN), the University of Tennessee Molecular
Resource Center, The UT-ORNL Science Alliance, The University of Tennessee Center
for Integrative and Translational Genomics, and NIH grants P20RR16481 and R13LM009315.