My research interests are focused on
designing and scaling Machine Learning methods for use on enormous
datasets, specifically in the fields of Semi-Supervised
Learning, Natural Language Processing, and Bayesian Models.

Background

I completed my undergraduate degree in Computer Science at
The University of Dayton in 2008. In 2007, I interned at GE Aviation
in Evendale, Ohio, where I automated systems for the detection
of production delays in the Supply Chain Management department.

Non-graduate research experience includes a summer research internship at
UIC working in Distributed Machine Learning with
Dr. Robert Grossman and
Dr. Yunhong Gu.
I designed and implemented large-scale clustering
algorithms to be run on their
UDT Protocol.
And least recently of all, I was a research assistant for Dr. Kathryn Fischbach at
UTHSCSA's Alzheimer's research lab.

Summary:
When labeled data is scarce or expensive to collect,
Semi-supervised learning (SSL) methods that utilize
unlabeled datasets can outperform standard machine
learning algorithms. In this paper, we propose
a scalable SSL improvement to the classic Naive Bayes
Classifier.
Modern SSL techniques typically require multiple
passes over the unlabeled data, which is often
impossible on the web-scale corpora being produced
today. In this paper, we show that improving baseline
estimates of word frequencies using unlabeled data
can improve Naive Bayes Classifiers while
scaling to modern massive data sets.
In experiments with text topic classification and
sentiment analysis, we show that our method is both
more scalable and more accurate than SSL techniques
from previous work.