Managing distributed computing using tools such as Docker, Kubernetes, Luigi, and Spark.

Developing web applications with Ruby on Rails, Sencha Ext JS, and other frameworks.

Familiarity with a broad range of topics in evolution, ecology, genetics, genomics, physics, and applied mathematics.

Examples

As the Data Science Developer at ReverbNation, I collaborate with executives, product managers, and marketers to understand and predict our users' responses to our communications and services. Typically, my work begins with my colleagues' questions and intuitions, which I translate into descriptive statistics, graphs, hypotheses, and statistical models. For each analysis, I prepare a suitable data set, often from multiple data sources. Simple, well chosen descriptive statistics and graphs can be highly instructive, but when appropriate, I also use more elaborate statistical and machine-learning methods. In every case, I strive to present the results in lucid, practical terms. My primary computational tools are SQL, including MySQL and PostgreSQL; Python, including NumPy, SciPy, pandas, Jupyter, and scikit-learn; and Docker, Kubernetes, Luigi, and Spark.

As a freelance software developer with my own company Haygoodness L.L.C., I've developed a laboratory information management system (LIMS) for the Duke University Genome Sequencing Shared Resource, which serves customers both at Duke and around the world. This system, known as DUGSIM, enables customers to get self-serve estimates, request quotes from staff, place orders, download results, and receive invoices for DNA sequencing on several platforms (Illumina, Pacific Biosciences, etc.). It also enables staff members to prepare quotes, process orders, schedule and track library preparations and sequencing runs, distribute results, and issue invoices. It has been in use since mid-2013 and has handled over 4000 orders as of early 2017. DUGSIM is built with Ruby on Rails, Sencha Ext JS, MySQL, Sphinx, Apache, Phusion Passenger, and Ubuntu Linux.

As a Postdoctoral Fellow in the Duke University Biology Department, I conducted research in evolutionary genetics and genomics. For example, colleagues and I fitted (MLE, MCMC) statistical models to DNA sequences from non-protein-coding, putatively gene-regulatory regions of the human, chimpanzee, and macaque genomes. We found evidence for many adaptive changes in the human lineage, particularly in noncoding regions adjacent to coding regions for proteins involved in neural development and function (Haygood et al., 2007). Subsequently, we performed a meta-analysis of surveys for adaptive changes in the human lineage, and we found that neural-related genes were prominent in surveys of noncoding regions but not in surveys of coding regions (Haygood et al., 2010). These findings affirm a long-standing conjecture that human cognition evolved mainly through changes in gene regulation. My primary computational tools were Ruby, R, and C.

As a Quantitative Analyst at Hydrologic Consultants, Inc. (of Sacramento, CA, later acquired by Bookman-Edmonston Engineering, Inc., later acquired by GEI Consultants, Inc.) and Timothy J. Durbin, Inc., I analyzed hydrologic data and situations for several clients. For example, I applied statistical methods (ANCOVA, MAP estimation) to streamflow measurements in order to reveal trends in water use within the North Platte River watershed, despite climatic fluctuations. Other projects were less statistical and more mathematical. For example, I extended and applied proprietary numerical software (PDE solution via FEM) for modeling groundwater flow and solute transport in order to elucidate salt-water intrusion into an aquifer beneath Lompoc, CA. These analyses were implemented using FORTRAN, Excel, and Access.

Experience

Data Science Developer, ReverbNation, 2014 –present.Applied statistics and machine learning in support of online services used by over four million musicians.

Q & A

Q1: Why did you leave academia?

A1: I didn’t have to. My position at Duke was “soft money” but in no immediate danger. I’d applied for several faculty jobs, done one interview, and scheduled another. Deciding to leave wasn’t easy, but after considering it for quite awhile, I concluded that although I’d been a mostly happy and fairly productive student and postdoc, I’d almost surely be neither happy nor productive, in any sense that matters to me, as a faculty member.

The crux of the matter is that faculty members at major universities are now employed not so much to do research as to manage it and, above all, to get money for it. As Paul Graham observed, "Professors nowadays seem to have become professional fundraisers who do a little research on the side." Ultimately, there are several reasons why, including a decline in federal research funding precipitated by the end of the Cold War, so-called tax revolts that have left state universities cash-strapped, and other trends in American society and government. Proximately, the driving force is that in many fields, the available dollars have been dwindling for years, at least per researcher, if not for the field as a whole. As the pie has gotten ever smaller, professional survival has demanded ever more strenuous efforts to get a piece of it. I foresaw a future in which however much I struggled to concentrate on science, my thoughts would be dominated by money and its concomitants, politics and bureaucracy.

And I foresaw that the resulting science — steered by me but largely done by my students and postdocs — would probably be, like most academic science, of little consequence. Thomas Merton remarked, “There is always a temptation to diddle around in the contemplative life, making itsy-bitsy statues.” It isn’t only in the contemplative life. Most academic research is of marginal interest even when it’s first published, let alone 10 or 20 years later. Many academic publications aren’t cited even a dozen times. Genuinely innovative thinking is never easy, but certain characteristics of academia make it harder. Money, politics, and bureaucracy are severely distracting. Moreover, as Stuart Rojstaczer observed, “With so little money available, funding agencies have become very cautious in the type of work they are supporting. They want ‘proven results’ [and] a ‘high probability of success’ for their money.” So they fund proposals that go just a little bit beyond what’s already been done.

I don’t consider myself to have abandoned science by leaving academia. Indeed, I’ve continued to collaborate and contribute. At present, I’m mainly occupied with commercial work, but I’m determined to return to basic research in due course. People who doubt the feasibility of high-quality basic research outside academia should recall that Charles Darwin was never a faculty member, Albert Einstein did much of his best work while employed by the Swiss patent office, etc. Of course, I’m not claiming to be the next Darwin or Einstein. I may never do any science of much interest outside academia. However, I think I stand a better chance outside than I would inside. That may well not be true of other people — some people are better at fighting off distractions, and some kinds of science need more institutional support — but I’m pretty sure it’s true of me.

Q2: Given your background, why haven’t you started a biotech company?

A2: I’ve thought about it but decided against it, at least for now. Biotech companies tend to need several years and several million dollars to develop a product. I’m not terrifically patient, and having to sell an idea to venture capitalists before even starting to realize it sounds an awful lot like the grant grind I left academia to get away from (see Q1).

Q3: Were you involved in that boating disaster in the Gulf of California?

A3: Yes. It was an ecological research expedition in March, 2000. I was in a small boat that capsized in a wind-driven swell. Of my eight companions, five died, including the leader of the expedition. I could easily have died too, but with help from another survivor, I got to shore.