Joint Analysis of Methylation and Genotype Data

2017-10-30

Contents

1 Joint analysis of methylation and genotype data

Single nucleotide polymorphisms (SNPs) can create and destroy CpGs. As methylation occurs mostly at CpGs, such CpG-SNPs can directly affect methylation measurements.

Recall that enrichment-based methylation methods measure total methylation in a vicinity of a CpG. By creating or destroying a CpG, CpG-SNPs introduce a variation in the total methylation in a vicinity of the CpG which can greatly reduce our power to detect case-control differences.

RaMWAS can account for a possible effect of CpG-SNPs by testing for joint significance of \(\beta_1\) and \(\beta_2\) the following model:

1.1 Input data

The SNP data must have the same dimensions as the CpG score matrix, i.e. it must be available for the same set of samples and the same set of locations. Data preparation may include finding the closest SNP for every CpG and exclusion of CpGs without any SNPs in vicinity.

1.1.1 Create data matrices for CpG-SNP analysis

To illustrate this type of analysis we produce the following artificial files.

CpG_locations.* – filematrix with the location of the SNP-CpGs.
It has two columns with integer values – chromosome number and location (chr and position).

Coverage.* – filematrix with the data for all samples and all locations.
Each row has data for a single sample. Row names are sample names.
Each column has data for a single location. Columns match rows of the location filematrix.

SNPs.* – filematrix with genotype data, matching the coverage matrix.

First, we load the package and set up a working directory. The project directory dr can be set to a more convenient location when running the code.

1.2 SNP-CpG analysis

Let us test for association between CpG scores and and the sex covariate (modeloutcome parameter) correcting for batch effects (modelcovariates parameter). Save top 20 results (toppvthreshold parameter) in a text file.