Minfi Methylation Analysis with Epiviz Integration

Morgan Walter, Héctor Corrada Bravo

2016-06-22

Introduction

The purpose of this workshop is to serve as a guide for analyzing 450k Illumina Human Methylation data to find differences in methylation at the base pair, and region levels. We will use previous research findings of differentially methylated locations, the TRIM29 gene and other transcription regulators between various solid tissue samples, including breast (BRCA), and colon (COAD), and lung (LUSC), and between normal and tumor breast (BRCA) samples are observed as guide.

We will use mainly the minfi and epivizr packages in Bioconductor. The minfi package is used to identify differentially methylated positions, regions, and blocks. The epivizr package is used to serve data generated in R to an Epiviz interactive visualization workspace. This protocol aims to walk through the analysis and visualization of Illumina Infinium Human Methylation 450k data from the Cancer Genome Atlas (TCGA) project.

Here is the general process to be followed:

Motivation

Research findings from Avraham et al. in 2014 conclude that the TRIpartite Motif-containing 29 (TRIM29), also called the ataxia telangiectasia group D-complementing (ATDC), has differing functions in various cancer types. In breast cancer, the expression of TRIM29 is less than the expression in normal breast tissue, and it acts as a tumor suppressor. The promoter of the gene is hypo-methylated in normal breast tissue, but hyper-methylated in other normal tissues, like the colon, lung, and endometrium, where it is oncogenic. In tumor breast tissues, the methylation increased and expression decreased when compared with normal breast tissues. In comparisons between other cancer types and their respective normal tissues, the opposite methylation pattern was observed1.

The researchers identified 110 genes that were differentially methylated in the breast compared to all other epithelial tissues tested. Of these 110 genes, a group 15 of them were DNA binding and transcription regulators, which were found to have a notable relevance to cancer. Their analysis of TCGA data for these genes, including ALX4, GATA5, MGMT, NEUROG1, SOX10, SREBP1, ST18, TRIM29, and TP73, revealed methylation differences at most of the corresponding loci between normal and tumor breast samples. The methylation pattern also varied between breast cancer types, including basal, luminal-A and luminal-B, and between different normal tissue types, identifying these gene regions as differentially methylated regions (DMRs). The TRIM29 gene has a CpG island at the transcription start site of the gene (between 120,008,000 and 120,009,000 on chromosome 11)1.

In breast tissues, it was found that this CpG island was hypo-methylated methylation, and in other tissues, including colon and lung, it has full methylation. These results were found analyzing Illumina-Infinium HumanMethylation27 array data. TRIM29 expression was higher in normal breast tissues than in other normal tissues tested. The main conclusion proposed by authors was that, “Epigenetic alterations affecting tissue-specific differentiation are the predominant mechanism by which epigenetic changes cause cancer1.” In this protocol we will recapitulate some of these findings and use interactive visualization to explore additional findings in this data.

Gathering and Preprocessing Data

The data used in this protocol is 450k Illumina Human Methylation data from the Cancer Genome Atlas (TCGA) project. In order to analyze the data and prepare it for visualization in Epiviz, it must first be loaded into R. We use four different types of 450k samples, as shown in the chart below, so that methylation analyses could be done between the three normal samples and the normal and tumor BRCA samples, separately.

Cancer Type

Tissue

# Normal Samples

# Tumor Samples

BRCA

breast

15

15

COAD

colon

15

0

LUSC

lung

15

0

The accompanying vignette data_preprocessing.Rmd contains information on how to download and preprocess data used in this workshop. The end result are objects in classes defined by minfi:

Data Input

Processing Function

Output

Raw data (IDAT files)

read.450k.exp

RGChannelSet

RGChannelSet

preprocessIllumina

MethylSet

MethylSet

mapToGenome

GenomicMethylSet

GenomicMethylSet

ratioConvert

GenomicRatioSet

For this workshop we have subsetted these to chromosomes 10, 11 and 20:

DMP Visualization

Epiviz workspaces and hosted data

For this analysis we will display our analysis data along with contextual data hosted on servers at the University of Maryland including gene annotation, CpG Island locations and gene expression from the Gene Expression Barcode project. These pre-existing workspaces are similar to those from Timp et al. in 2014 that included visualizations of large hypo-methylated blocks2 available at http//epiviz.github.io/timp2014. We’ll start by exploring a couple of those to get a sense of the type of visualizations and analyses we can perform.

We have created workspaces you can use for this workshop. Of course, you can also create your own following these steps:

Login into the app (you can use one of a number of authentication options)

Add the genes track, CpG Islands track, and gene expression heatmap

Type the chosen name of the workspace in space that reads “Save Workspace Name” and click “Save Workspace” button

Your workspace will now be saved under your account and will be accessible by others as the URL is specific to each workspace (e.g. http://epiviz.cbcb.umd.edu/?ws=<workspace_id>). The final workspace should look similar to this:

Starting Epiviz Workspace

Starting epivizr session

We can now start an epiviz connection to the workspace with hosted data to which we can add plots as we do our analysis. Start the epiviz workspace we provide (or the one you created yourself)

If you created your own workspace, provide it’s id as the workspace argument.

The app object serves as the interface between your R session and the epiviz interactive visualization application. We can use it to obtain information about available charts, settings, the current location in view. We can also use it to navigate to specific genomic locations as well. We will see these during the course of the workshop.

For example, to find out which chart types are available in the epiviz web app, we can use the app$chart_mgr$list_charts() method.

Differentially methylated position (DMP) analysis

Minfi’s main function to find differentially methylated positions is “dmpFinder”. It uses single probe analysis on CpGs to find methylation differences and a t-test (or F-test) between two or more phenotype groups phenotypes to determine CpG positions that are differentially methylated (in this case between BRCA normal and tumor samples).

Before we do this analysis, we may want to add a track showing promoter regions for genes since we may want to distinguish DMPs that occur within promoters. We will use Bioconductor annotation infrastructure to do this.

The promoters_track object serves as the interface to the specific track we just added to the epiviz workspace. We can change track settings through the object. Let’s first find out settings that are available to modify for this track. To do so we use the promoters_track$print_info().

Let’s change the track color to distinguish it from the CpG track we added previously: promoters_track$set(colors=c("#FFFFFF"))

Now we have a matrix with mean methylation for each group of samples which we will use to create our track. Let’s create a GRanges object that maps each of these values to the corresponding probes genomic location. Finally, add these values as a line track.

Another useful track to plot for these DMPs is the p-value resulting from the statistical test. We will plot that as a StackedLineTrack. First, we add the p-value to the GRanges object we are using.

dmp_gr$pval <--log10(dmp$pval)

This time, we are plotting things a bit differently. The plot method we have used so far does two things: first, it registers the data object we pass to it as a data source in epiviz, and then it adds a chart with the default chart type defined for that data type.

An important part of the epivizr design is that data and plots are separated: you can make multiple charts from the same data object without having to replicate data multiple times. This way, when you navigate in the app data queries are made by data object, not per chart, which leads to a more responsive application. Also, this allows users to create charts that include data from both the R session and data hosted remotely.

So, since we will be using a “StackedLineTrack” which is not a default chart type, we need to do these two steps in order.

Finally, let’s add an MA plot of BRCA methylation to visualize how methylation differences are distributed across probes. We can of course do this in R by adding methylation difference and average as additional columns to the “GenomicsRanges” object we are using. However, we can also exercise epiviz’s “Computed Measurement” feature to do this.

Use “Computed Measurements” to create plot with existing data with the x-axis showing the average methylation and the y-axis showing the difference in methylation between normal and tumor samples

Further instructions for creating MA plots in Epiviz can be found here: epiviz.github.io/computed-measurements.html

Now that we have added all of these tracks, let’s navigate to the TRIM29 gene location to check the results discussed above. You can zoom in to get a closer look at the methylation differences at the CpG island at the TRIM29 promoter.

app$navigate("chr11",119870000,120100000)

The ending DMP Visualization workspace should look like this:

Let’s close the connection to the epiviz app for the next section.

app$stop_app()

Differentially Methylated Region Analysis

Let’s start from the same initial workspace as before.

app <-startEpiviz(workspace="kr0MpYk4g0u",try_ports=TRUE)

and add the promoter regions and average methylation tracks we added previously.

Next, use Minfi’s bumphunter function to find differentially methylated regions, which are deviations in the methylation of relatively small CpG loci clusters at the gene promoter scale (1-2kb), between normal and tumor breast tissue. Bumphunting uses a cutoff value to determine the magnitude in methylation difference to identify differentially methylated regions (DMR).

Let’s add a track containg the resulting dmrs. We will color blocks in track depending on the kind of dmr we find. To do so we will create to datasources, one for hypo-methylated DMRs, and one for hyper-methylated DMRs.

Exercise: Add an MA plot of breast normal and tumor gene expression from the gene expression barcode to visualize differences in gene expression that are important. Use the same method as in the DMP Visualization section, but with gene expression data for BRCA normal and tumor types rather than DNA methylation data.

In order to compare the visualizations created in this protocol to the findings of prior research, let’s use the slideshow method in epivizr closer look at the genes of interest. In this case we will examine a subset DNA binding and transcription regulator genes, where differences in methylation have been found by Avraham et al. (ALX4, GATA5 and MGMT). Gene of Interest | Chromosome # | Start | End —————- | ———— | ——— | ——— ALX4 | 11 | 44282278 | 44331716 GATA5 | 20 | 61038553 | 61051026 MGMT | 10 | 131265454 | 131565783

Block finding

Let’s start with the same basic workspace again.

app <-startEpiviz(workspace="kr0MpYk4g0u",try_ports=TRUE)

The third analysis we will use from minfi will be based on finding larger regions of methylation differences (blocks) in breast cancer. As before, let’s add tracks with promoter regions and average methylation percentages.

In comparison to bumphunter, Minfi’s blockFinder function is used to find differences in methylation on a much larger scale than bump hunting (gene scale instead of promoter scale). Block finding uses the cpgCollapse function to create clusters of neighboring open sea loci, groups them into larger regions than bump hunting only would, and then uses bump hunting on the averages of these regions to detect large differentially methylated regions.