Michael Liou
Statistics PhD Student at UW-Madison

New Post-graduate Research Project

October
1st,
2016

I joined a new lab this Fall semester at the Unversity of Nebraska, Lincoln
working with Dr. Clarke. In this project, we are trying to infer genomic
distances between various E. Coli strains. This information could be useful to
identify clusters of pathogenetic E. Coli and help track down root sources of
contamination to minimize food bourne outbreaks. How? E. Coli genomes vary from
location to location, and thus, determining a “distance” between the genomes
can help determine if these E. Coli samples are from the same outbreak or not.
Since genome sequencing is now quick and cost-effictive thanks to the rise of
“Next-Generation
Sequencing”
technologies, accurate algorithmic pipelines to process that information is
critical. That is where this project fits in. Calculating a “distance” between
two genomes is a complicated problem depending on how deep down the rabbit hole
you want to go. For starters, an average E. Coli genome is about 5 million base
pairs long and bacterial genomes are relatively dynamic in terms of large
insertions and deletions.

Relatedly, the FDA already has a whole genome sequencing
program
designed to help public officials identify and understand pathogens isolated
from patients, the environment or food.

I’m looking forward to the work I will be doing for this project. It will be a
mix of manipulating genomic data on high performance computing clusters and
statistical work to validate the distances.