“If successfully implemented, this work could
significantly expand the capacity of existing pipelines for
large-scale data analysis and scientific discovery, resulting in a
significant impact on the field,” he says. “The
expected outcome of this work will be a set of computational tools
of high utility for the microbiology community and
beyond.”

The research addresses two key issues currently facing the
metagenomics community.

Novel Method for Advanced Algorithms

Sun says accurate construction and annotation of OTU tables
using millions of 16S rRNA sequences is one of the most important
yet most difficult problems in microbiome data analysis.

Sun’s research proposes a novel method that performs OTU
table construction and annotation simultaneously by utilizing input
and reference sequences, reference annotations and data clustering
structure within one analytical framework.

Dynamic data-driven cutoffs are derived to identify OTUs that
are consistent not only with data clustering structure but also
with reference annotations.

“When successfully implemented, our method will generally
address the computational needs of processing hundreds of millions
of 16S rRNA reads that are currently being generated by large-scale
studies,” Sun says.

Sequencing Technology Allows Unique Strategies

The second issue concerns developing novel methods to extract
pertinent information from massive sequence data, thereby
facilitating the field shifting from descriptive research to
mechanistic studies.

“We are particularly interested in microbial community
dynamics analysis, which can provide a wealth of insight into
disease development unattainable through a static experiment design
and lays a critical foundation for developing probiotic and
antibiotic strategies to manipulate microbial communities,”
Sun notes.

Traditionally, system dynamics is approached through time-course
studies. However, due to economical and logistical constraints,
time-course studies are generally limited by the number of samples
examined and the time period followed.

“With the rapid development of sequencing technology, many
thousands of samples are being collected in large-scale studies.
This provides us with a unique opportunity to develop a novel
analytical strategy to use static data, instead of time-course
data, to study microbial community dynamics,” Sun says.

Lab Focuses on Machine Learning, Bioinformatics

“To our knowledge, this is the first time that massive
static data is used to study dynamic aspects of microbial
communities,” he adds. “When successfully implemented,
our approach can effectively overcome the sampling limitation of
time-course studies, and it opens a new avenue of research to study
microbial dynamics underlying disease development without
performing a resource-intensive time-course study.”

Collaborators on the research project are:

Robert J. Genco, PhD, DDS, Distinguished Professor of oral
biology in UB’s School of Dental Medicine

Jean Wactawski-Wende, PhD, dean of UB’s School of Public
Health and Health Professions, SUNY Distinguished Professor and
professor of epidemiology and environmental health