Dave Moore

I am currently at Google, working with the Bayesflow team to create tools for probabilistic modeling at scale. Previously I was a PhD student in computer science at UC Berkeley, advised by Stuart Russell. Before coming to Berkeley, I was an undergrad at Williams College, where I majored in CS and math and wrote a senior thesis with Andrea Danyluk.

My thesis project was the application of Bayesian inference to nuclear test monitoring: given seismic waveforms from a global network of stations, we want to infer seismic events that plausibly explain the observed signals. Portions of this work were funded by the CTBTO and DTRA. My thesis on signal-based Bayesian monitoring of seismic events is now available! Or see our AISTATS paper for a shorter summary of this work.

More generally I'm interested in model-based machine learning: how can we build intelligent agents that understand the laws by which the world works and can exploit this knowledge to predict and plan in novel circumstances? Relevant topics include deep generative models, automated Bayesian inference and probabilistic programming, causal inference, and applications of ML in science and medicine. I'm also very interested in ensuring that intelligent systems contribute to human flourishing and help us lead our best lives: this includes systems that want to understand and optimize their users' values. An area that I haven't worked in, but would like to, is the application of insights from artificial agents to better understand human purpose, relationships, and mental health. If you're thinking about any of these things, or would like to be, please get in touch!

Selective Conferences

Detecting weak seismic events from noisy sensors is a difficult perceptual task. We formulate this task as Bayesian inference and propose a generative model of seismic events and signals across a network of spatially distributed stations. Our system, SIGVISA, is the first to directly model seismic waveforms, allowing it to incorporate a rich representation of the physics underlying the signal generation process. We use Gaussian processes over wavelet parameters to predict detailed waveform fluctuations based on historical events, while degrading smoothly to simple parametric envelopes in regions with no historical seismicity. Evaluating on data from the western US, we recover three times as many events as previous work, and reduce mean location errors by a factor of four while greatly increasing sensitivity to low-magnitude events.

Gaussian processes have been successful in both supervised and unsupervised machine learning tasks, but their computational complexity has constrained practical applications. We introduce a new approximation for large-scale Gaussian processes, the Gaussian Process Random Field (GPRF), in which local GPs are coupled via pairwise potentials. The GPRF likelihood is a simple, tractable, and parallelizeable approximation to the full GP marginal likelihood, enabling latent variable modeling and hyperparameter selection on large datasets. We demonstrate its effectiveness on synthetic spatial data as well as a real-world application to seismic event location.

Workshops and other lightly-refereed venues

Parallel Chromatic MCMC with Spatial Partitioning
Jun Song and David A. Moore
AAAI Workshop on Distributed Machine Learning, San Francisco, February 2017.
[arXiv][abstract][bib]

We introduce a novel approach for parallelizing MCMC inference in models with spatially determined conditional independence relationships, for which existing techniques exploiting graphical model structure are not applicable. Our approach is motivated by a model of seismic events and signals, where events detected in distant regions are approximately independent given those in intermediate regions. We perform parallel inference by coloring a factor graph defined over regions of latent space, rather than individual model variables. Evaluating on a model of seismic event detection, we achieve significant speedups over serial MCMC with no degradation in inference quality.

We introduce a framework for modeling parameter
symmetries in variational inference by explicitly mixing a base
approximating density over a symmetry group. We show that this can be
done tractably for the case of a Gaussian mixture over the orthogonal
group under an isotropic variance assumption. Initial results show
that inference with a symmetrized posterior avoids component collapse
and leads to improved predictive performance.

Gaussian process (GP) regression is a powerful technique for nonparametric regression;
unfortunately, calculating the predictive variance in a standard GP model requires time O(n^2) in the size of the training set. This is cost prohibitive when GP likelihood calculations must be done in the inner loop of the inference procedure for a larger model (e.g., MCMC). Previous work by Shen et al. (2006) used a k-d tree structure to approximate the predictive mean in certain GP models. We extend this approach to achieve efficient approximation of the predictive covariance using a tree clustering on pairs of training points. We show empirically that this signicantly increases performance at minimal cost in accuracy. Additionally, we apply our method to "primal/dual" models having both parametric and nonparametric components and show that this enables efficient computations even while modeling longer-scale variation.

Localization tools such as the CTBTO's GA (Global Association) system work by analyzing detections of arriving phases; these detections constitute a discretized, thresholded summary of information from the underlying seismic waveforms. By contrast, our SIG-VISA (SIGnal-based Vertically Integrated Seismic Analysis) system operates directly at the level of the raw waveforms or envelopes, applying Bayesian inference to a generative probabilistic model of seismic traces to search for the event bulletin having the highest posterior probability given the observed signals. We exhibit a SIG-VISA prototype demonstrating improved sensitivity and localization performance compared to purely detection-based methods.

The SIG-VISA generative envelope model views the observed signal envelope as the composition of a background noise process plus a set of arriving phase envelopes; in its simplest form, each phase envelope consists of a parameterized template perturbed by an autoregressive process. We build station-specific models for the template shape parameters, including amplitude and coda decay rate across multiple frequency bands, as well as the parameters of the noise and signal perturbation processes. We show empirically that our signal-based model leads to increased sensitivity compared to a purely detection-based model, since by comparing the observation likelihoods under the signal and noise models we can extract statistical evidence from sub-threshold arrivals (or their absence) which would otherwise be ignored. This capability is especially valuable for faintly detected (low-magnitude and/or teleseismic) events.

A further advance of SIG-VISA is the incorporation of nonparametric modeling using Gaussian processes (GPs). Known to the geophysics community as the mathematical foundation of kriging, GPs provide a probabilistic framework for predicting the attributes of future events based on past events in similar or nearby locations. SIG-VISA makes use of GP models for the template shape parameters, improving localization performance relative to simpler parametric models of the types used in previous systems (e.g. NET-VISA). Moreover, we are developing a nonparametric model for the template perturbations that captures the phenomenon of correlated waveforms from nearby events; this allows a waveform matching effect to fall out naturally from the probabilistic inference.

Our project has initiated and will develop and evaluate a new Bayesian approach for nuclear test monitoring. We anticipate that the new approach will yield substantially lower detection thresholds, possibly approaching a theoretical lower bound that we hope to establish. We will also develop new techniques to implement such monitoring capabilities within a general-purpose Bayesian modeling and inference system that may eventually support a wide range of information-system needs for arms treaties.

In ongoing work that is moving towards possible deployment, we have completed a prototype seismic monitoring system based on a generative, vertically integrated statistical model linking hypothesized events to “detections” extracted from raw signal data by classical algorithms. On test data sets of naturally occurring events curated by human experts, our system exhibits roughly 60% fewer detection failures than the currently deployed automated system, SEL3, that forms part of the International Monitoring System.

The current phase of the project moves away from hard-threshold detections altogether. Instead, the generative model spans the full range from events to measured signal properties. Given the observed signal traces, the statistical inference algorithm attempts to maximize a whole-network statistical measure of the likelihood that an event – or collection of events – has occurred. Specialized techniques such as waveform matching and double differencing are realized within our framework as special cases of probabilistic inference; our initial experiments using 2D simulated data indicate that a full Bayesian analysis can provide more accurate absolute and relative locations than double differencing, while simultaneously estimating the velocity structure of the observed region.

As we move toward a full-scale implementation, the primary tasks will involve the development of accurate predictive models of waveform properties. These models will combine both parametric forms (for example, triangular envelopes in multiple frequency bands) and nonparametric forms based on previously observed waveforms from nearby events. Hybrid models will smoothly interpolate between these two forms depending on the distance of the hypothesized event from previously observed events.

Learning the relational structure of a domain is a fundamental
problem in statistical relational learning. The deep transfer
algorithm of Davis and Domingos attempts to improve structure
learning in Markov logic networks by harnessing the power of transfer
learning, using the second-order structural regularities of a source
domain to bias the structure search process in a target domain. We
propose that the clique-scoring process which discovers these
second-order regularities constitutes a novel standalone method for
learning the structure of Markov logic networks, and that this fact,
rather than the transfer of structural knowledge across domains,
accounts for much of the performance benefit observed via the deep
transfer process. This claim is supported by experiments in which we
find that clique scoring within a single domain often produces results
equaling or surpassing the performance of deep transfer incorporating
external knowledge, and also by explicit algorithmic similarities
between deep transfer and other structure learning techniques.

If you're a Williams or Berkeley undergrad thinking about a career in AI, or in particular applying to CS grad schools, feel free to get in touch; I'm more than happy to talk about my experiences with the process!

I've started a blog to contain writings on CS and non-CS topics. I'm not sure yet how often it'll be updated.