I was wondering if someone could comment on, or point me in the right direction of, considerations when normalising Illumina 450k Methylation data when there are large differences in global methylation status? The experiment I have is where the same cell line is used across 24 samples but there are different treatments and timepoints. One of the treatments is with decitabine, for example, which results in a marked global demethylation seen as a leftward shift in the global beta value profile - see Figure 5A below for an example.

I have read around the area a bit, and much of the literature is concerned with exploring differences between cancer/normal or different tissues. This guide from Brent Pedersen was particularly helpful:

it seems that your dataset is a perfect example of when to use functional normalization. If there are large differences in global methylation status, functional normalization should be able to keep them while removing unwanted technical variation. Functional normalization is a kosher within-array normalization based only the array control probes. Those, by design, are not associated with the biology of your samples, and therefore global differences in methylation seen between samples should be conserved. In our implementation of functional normalization in minfi, preprocessFunnorm() implements as well the 'noob' background correction method (Triche et al., 2013), which improves significantly the downstream analysis results.

Thanks very much for the reply Jean-Philippe. From my reading I thought your method would be appropriate, so I compared the mds and density plots of the same dataset either completely un-normalised, or after running preprocessNoob() and preprocessFunnorm(). The groups seem to cluster tighter after noob but spread out again after funnorm, the shape of the density plot changes too. I wasn't quite sure how to interpret this so it would be great to hear your thoughts.

I just looked at your RPubs report (pretty nice!) -- it seems indeed that you've got tighter clusters with noob. In my experience, when the sample size is small (n=19 in your study, correct?), noob by itself performs the best. However, you might want to try preprocessFunnorm() with different number of with (nPCs =1, 2 ,... 5). Otherwise, I would use preprocessFunnorm() with the following parameters:

nPCs = 0, bgCorr = TRUE, dyeCorr = TRUE

which calls preprocessNoob() and performs a quantile normalization on the Y chromosome by sex.

Thanks again Jean-Philippe that's really useful insight. There are actually 24 samples, it's quite difficult to see with filled circles in the plot. I also found your BioC 2014 tutorial for minfi which gave some information on the QC features, it seems that two of my samples fell below the expected line but not by much. I'll try excluding these from the analysis.

A further question I do have is how you would go about looking for differentially methylated regions when you have such a significant global demethylation. What I'd be looking for in a sense is any regions that are differentially methylated more or less than the global shift. Do you think bumphunter could be used in this context in some way? I'm imagining you might add a constant or something to the model?

this is a good reminder that we need to update the vignette of minfi (it is more than outdated). The QC line was defined using blood samples with no global hypo/hyper methylation, and therefore is not relevant for your study -- for instance most of tumor samples fall below this line in my experience.

For the DMR analysis, I don't think there is a general answer to your question. It is hard to define what is the global shift between your samples, since the global shift could be a combination of several small regions with large shifts or/and large regions of hypomethylation ("hypomethylation blocks") etc. You might want first to see if there are large blocks of hypomethylation between your different treatments. In the devel version of minfi, there is a piece of code to do that: https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R

If you find blocks, then I would run bumphunter and see if you get DMRs outside of those blocks (you probably will).

Great thanks, I'll take a look at that code. I just don't want to do a standard DMR analysis because I have a sense that everything will change!! Re the minfi vignette it would be really useful to have some additional insight on normalisation approaches. I didn't use minfi initially for my analysis (used lumi instead) simply because it wasn't too clear to me what a sensible approach was, whereas lumi seemed a bit more explicit. The information was there of course, I just had to read the papers, which I subsequently did, but still it would be really useful to give some sort of sensible overview and advice for what sort of normalisation to use when (perhaps using some of the comments here).

Happy to contribute/comment further on the documentation side if it helps - I can't write good enough R to develop packages but I can do documentation... :)