Abstract

DNA methylation is amongst the best studied of epigenetic mechanisms impacting
gene expression. While much attention has been paid to the proper normalization
of bioinformatical data produced by DNA methylation assays, linear models remain
the current standard for analyzing post-processed methylation data, for the ease
they afford for both statistical inference and scientific interpretation. We
present a new, general statistical algorithm for the model-free estimation of
the differential methylation of DNA CpG sites, complete with straightforward and
interpretable statistical inference for such estimates. The new approach
leverages variable importance measures, a class of parameters arising in causal
inference, in a manner that facilitates their use in obtaining targeted
estimates of the importance of each CpG site. The proposed procedure is
computationally efficient and self-contained, incorporating techniques to
isolate a subset of candidate CpG sites based on cursory evidence of
differential methylation and providing a multiple testing correction that
appropriately controls the False Discovery Rate in such multi-stage analysis
settings. The effectiveness of the new methodology is demonstrated by way of
data analysis with real DNA methylation data, and a recently developed R package
(methyvim; available via
Bioconductor) that provides support for data analysis with this methodology is
introduced.