Andrew McDavid, University of Rochester

Main Content

Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene co-regulatory networks from such data, I propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. I employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. Even under departures from this Hurdle model, the proposed method appears to be more sensitive than existing approaches.