Snippets of Computational Biology, Bioinformatics, Productivity and the like

Thursday, 8 April 2010

R: heatmaps with gplots

I use heatmaps quite a lot for visualizing data, microarrays of course but also DNA motif enrichment, base composition and other things. I particular like the heatmap.2 function of the gplots package. It has a couple of defaults that are a little ugly but they are easy to remove. Here is a quick example:

First lets make some example microarray data.

exampleData <- matrix(log2(rexp(1000)/rexp(1000)),nrow=200)

This just makes two exponential distributions and takes the log2 ratio to make it look a bit like microarray fold changes, but this really could be able matrix of numbers.

Next I will just plot the most variable row/genes/whatever, this step is obviously optional but it reduces the size of the plot to make them easier to see, and normally I only care about the things that are different.

evar <- apply(exampleData,1,var)

mostVariable <- exampleData[evar>quantile(evar,0.75),]

This just calculates the variance of each row in the matrix, then makes a new matrix of those rows that have a variance that is above the 75th percentile, so the top 25% most variable row.

#install.packages("gplots")

library(gplots)

heatmap.2(mostVariable,trace="none",col=greenred(10))

Next we load the gplots package (install it first if you do not already have it). We then simple pass the mostVariable matrix to the heatmap.2 function. The trace="none" option removes a default, which is to add a line to each column, which I find distracting. The col=greenred(10) option uses another gplots function (greenred), which simply generates a color scheme from green to red via black. You could use any color scheme here such as col=rainbow(10) or a scheme from RColorBrewer.

That is about it really for basic heatmaps.

For more advanced heatmaps, you can do other things such as adding color strips to the rows or columns to show groupings, for example:

Here were are generating the ordering of the rows ourselves, in this case by the sum of the absolute values of each row. Then we turn off the clustering of the rows and the row dendrogram and get something like this: