Sunday, June 3, 2012

Drawing heatmaps in R with heatmap.2

A heatmap is a scale colour image for representing the observed values of two o more conditions, treatments, populations, etc. The observations can be raw values, norlamized values, fold changes or any others. Let's begin with an example. Say that I'm interesting in the differential expression of the proteome of three cell lines. I have my data in tab delimited text and will upload it into RStudio (a prettier interface for R: http://rstudio.org/).

My data consist of three cell lines (only two are showed) with three replicates each, where each row represents the expression value of a protein (only five are showed). It is classical in R to use the heatmap function which outputs a yellow-orange heatmap with two dendrograms: one on the left side representing the rows (i.e., entities, genes, proteins, etc.), and a second one on the top representing the columns (i.e., conditions, treatments, etc.). Additionally the names of the rows and the columns are written in the right side and bottom, respectively:

heatmap(as.matrix(data)) # The data must be in matrix format and not in frame format!

Some things can be improve in the above heatmap. You can see that the column names didn't fit in the figure, and the rownames are too many that we better avoid them. The heatmap function has many paramters that allow the user to improve the figure. You can see them by typing help(heatmap). Right now, we are interested in two of them the cexCol to adjust the size of the column names and the labRow to choose the name of the rows. Try the following line and you will see how it changes.

heatmap(as.matrix(data), cexCol=0.7, labRow=NA)

Although heatmap is a good function, a better one exists nowadays and is heatmap.2. Heatmap2 allows further formatting of our heatmap figures. For example, we can change the colours to the common red-green scale, represent the original values or replace them with the row-Z-score, add a colour key and many other options. Let's see!

However this is not the way of dealing with the data. When we work with high throughput data, the first step is to log-transform the intensities and then apply a normalization method. We can do that with the following lines:

A heatmap can be seen as an array of figures. The first figure is the real heatmap itself, the second figure is the rows' dendrogram, the third is the columns' dendrogram, and the last figure is the color-key. In that sense, we can control the relative position of each figure using the layout parameter lmat and also introduce blank spaces to tight the figures by introducing 0 (zeros) in the lmat matrix. Additionally, we can customize the width and height of the array by tuning the parameters lhei and lwid. Two more things! First, when working with microarray data, the common colours are red and green, but the current fashion in proteomics is to use red and blue. Secondly, lets suppose that the first 250 proteins belong to a same family, the second 250 to another and so on; and now we wanna know if they actually group together in the dendrogram. We assing a colour to each group, and then match each protein in the heatmap with its corresponding colour, using the parameter RowSideColors. When introducing a row side colour bar, this element becomes now the first element of the figure-matrix, so the real heatmap becomes the second element, the rows' dendrogram the third, the columns' dendrogram the fourth and the color-key the fith. You can see all this changes in the heatmap below.

Finally, if we had a table with the fold change values per each comparison, and a second table containing the p-values of those fold changes, we can plot both in a single heatmap where the colours represent the fold change values and the numbers over each colour square its respective p-value. In my heatmap below the numbers over the squares represent the p-values in a discrete scale, where -1 means non-significant, 0 means significant and 1 highly significant. To write the p.values over the heatmap, we use the parameter cellnote, and if we want them in black, we need to set notecol = "black". Additionally, since we want to plot the real fold change values and not their Z-score, we set scale = "none". Look at the following lines.

The best tutorial for drawning heatmaps, for sure! For newbies on R, like me, it's not so trivial understanding the power of functions aor their interactions and you explainded very simple. Thanks, it just helped me a lot to proceed on my thesis. Leonardo

I have completed Agilent microarray data analysis in R now Drawing heatmaps in R with heatmap.2 I am getting problem heatmap(as.matrix(data.frame))Error in as.vector(x, mode) : cannot coerce type 'closure' to vector of type 'any'Please help me in rectifying this error.

I have only 2 column (sample). It seems that color is shown only by the relative expression level, so the result has only two specific color, red and green, but doesn't has lighter red and green or darker red and green. please help me to correct this.

Blog Archive

About Me

Mannheimia succiniciproducens lives inside the stomachs of cows and helps them digest grasses. It is an isolated capnophilic (grows best in the presence of carbon dioxide) bacterium found in the bovine rumen. M. succiniciproducens is a non pathogenic, non-spore-forming, mesophilic, Gram-negative bacterium of the genus coccobacillus.
Mannheimia succiniciproducens efficiently fixes carbon dioxide and produces substantial amounts of succinic acid. Researchers believe it will be possible to use this bacterium for the efficient production of succinic acid, an important industrial chemical that can be used as a green feedstock for the manufacture of biodegradable polymers, synthetic resins and various chemical intermediates and additives.
The M. succiniciproducens genome is a single circular chromosome of 2,314,078 base pairs (bp) with no plasmid.
Nat. Biotechnol. 22(10):1275-1281(2004)