Justin Sybrandt

Ph. D. Candidate : Machine Learning

Biography

Justin Sybrandt is a Ph.D. Candidate at Clemson University studying large-scale text mining applications.
A majority of his work focuses on hypothesis generation in medicine.
Justin works in the Algorithms and Computational Science Lab overseen by his advisor Ilya Safro.

Recent Posts

I’m here in Seattle, WA attending the IEEE International Conference on Big Data. I’ll be presenting two recent works. The first, presents a new method to validate hypothesis generation systems. The second, uses that method to determine the quality of input papers needed to make good conclusions. With two papers in the same conference, I will be giving a double-length talk! If you’re around, I’ll be at the end of the L12 session Wednesday morning.

I have the chance to present my work at the Google Ph.D. Intern Research Conference (PIRC).
This poster represents all of the work we have added to the Moliere project since our original paper last year.

Today in a class, we were asked to write an iterative solver for numerical equations.
Now, many students in the class did not have an optimization background, so for the benefit of everyone, I want to share a simple overview of this exercise and how to go about solving it.

The problem was stated as follows:

$$ M(a) = 2\times a + 14$$
$$ G(b) = b - 2 $$

And our goal was to find some solution $x$ such that $M(x) = G(x)$.
Additionally, we were supposed to do so iteratively, so just solving the system of equations was out of the question.
This is because our next exercise would have a different $M$ and $G$, so our code should be able to support whatever.

For the sake of generalization, my solution here will assume only the $M$ and $G$ are continuous, but I will not assume we know their derivatives.
Additionally, I will be writing my code in python, simply because I find that it is easier for anybody to understand.
Knowledge of python, hopefully, won’t be necessary.
But first, lets go over some aspects of the problem…

Recent Publications

Drive-by Health Monitoring utilizes accelerometers mounted on commercial and civilian vehicles to gather dynamic response data that can be used to continuously evaluate the health of bridges faster and with less equipment than traditional structural health monitoring practices. Because vehicles and bridges create a coupled system, vehicle acceleration data contains information about bridge frequencies that can be used as health indicators.

The potential for automatic hypothesis generation (HG) systems to improve research productivity keeps pace with the growing set of publicly available scientific information. But as data becomes easier to acquire, we must understand the effect different textual data sources have on our resulting hypotheses. Are abstracts enough for HG, or does it need full-text papers? How many papers does an HG system need to make valuable predictions? How sensitive is a general-purpose HG system to hyperparameter values or input quality? What effect does corpus size and document length have on HG results?

The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses.

Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. We discover these connections with our tool MOLIERE.

Since the Agile Manifesto, many organizations have explored agile development methods to replace traditional waterfall development. Interestingly, waterfall remains the most widely used practice, suggesting that there is something missing from the many “flavors” of agile methodologies. We explore seven of the most common practices to explore this, and evaluate each against a series of criteria centered around product quality and adherence to agile practices. We find that no methodology entirely replaces waterfall and summarize the strengths and weaknesses of each. From this, we conclude that agile methods are, as a whole, unable to cope with the realities of technical debt and large scale systems. Ultimately, no one methodology fits all projects.

By utilizing General Parallel File System (GPFS) policy scans, distsync finds changed files without navigating between directories. This allows our tool to more efficiently synchronize large out of date file systems.