What can epigenomics do for you?

In 1990, with the initial launch of a 15-year project to map and sequence the human genome, a new era of science began. However, even after its successful and early completion in 2001 [1], no one could have foreseen how, only a few years later, genome sequencing would explode to become a widely applied multi-purpose tool whose applications include the mapping of epigenetic modifications and the complete assessment of both coding and non-coding RNA transcripts. The game changer behind this explosion was the transition from the classic electrophoretic Sanger sequencing method, which had limited scalability, to image-based massively parallel 'sequencing-by-synthesis' platforms.

It had already become clear in the early days of the post-genome era, before these technological breakthroughs, that there were additional layers to the primary sequence waiting to be uncovered, and a small number of pilot epigenome projects, including the Human Epigenome Consortium (HEC), were launched [2, 3]. While on the right track, these early projects suffered from lacking the sequencing capacity required to tackle the multidimensional space of the epigenome. This obstacle was overcome in 2006, with the introduction of next generation sequencing platforms, and the NIH was commendably fast to capitalize on these developments by implementing both its ENCODE and 'Roadmap Epigenomics' projects. ENCODE aimed to utilize the newly generated epigenome maps to assist in discovering and assigning functional elements in the genome, while the Roadmap Epigenomics Program aimed to create reference maps for the majority of normal, primary cell types [4, 5]. The success of these projects has helped to popularize epigenomics and has proved somewhat contagious, with additional consortia, such as the recently funded BLUEPRINT (European) and DEEP (German), arriving on the scene; the International Human Epigenome Consortium (IHEC) now coordinates international efforts.

The core technologies used in these projects, and in general across the field, have stabilized over the years and standards are now largely agreed upon. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) [6, 7] remains the standard assay for determining transcription factor binding, as well as for mapping the genome-wide distribution of histone modifications. Continued efforts to increase sensitivity and resolution has resulted in some recent technical improvements to the basic ChIP-seq method, in the form of nano-ChIP-seq [8] and ChIP-exo [9], respectively. By contrast, dozens of assays exist for DNA methylation [10], although most genome-wide studies are focused on just a few of these [11–13]. As costs continue to decrease, methods are converging on whole genome bisulfite sequencing [14], which had previously been prohibitively expensive. As with exome sequencing, the subject of Genome Biology's 2011 special issue [15], the driving force behind the ongoing explosion in epigenome studies, and data, has been an increase in sequencing capacity at reduced cost.

What can epigenomics do for you?

Epigenome data are very powerful and have multiple applications that extend beyond a simple map of a particular mark or modification in a given cell type. Below, I will highlight a few selected examples of these applications, although this is a far from exhaustive list.

Genome annotation

Mammalian genomes are large and complex. Understanding such genomes is not trivial and comparative genomics based on the primary DNA sequence alone, while powerful [16], cannot provide all the answers. As demonstrated several years ago [7], and further highlighted by the many recent ENCODE publications [17], chromatin signatures enable efficient and precise genome annotation of regulatory elements, and can pinpoint functional or cell type-specific regions of interest.

Cell identity

It has become abundantly clear over the past years that epigenomic maps provide more information than can be gained from gene expression data alone [6, 7, 18–20]. While genes are either expressed or not, chromatin states can add further refinement to a gene's activity status, such as whether it is primed or poised, and can also describe varying degrees of repressed states that would all look the same by any gene expression measure. The precise chromatin state of these loci can have clear consequences for how they behave in both normal development and disease.

Disease

As highlighted by many studies of human disease, including several in this issue [21–27], epigenomic maps can be utilized to trace the origin of cells, dissect effected pathways and identify predictive biomarkers. Epigenome data have also proved to be powerful in helping to pinpoint disease-relevant regulatory elements through epigenome-wide association studies, or 'EWAS', especially when integrated with data from genome-wide association studies [17, 28].

Challenges

Several of the reports in this special issue expand the catalog of user-friendly tools for the visualization of epigenomic datasets [27, 29–31], and much work has previously been done elsewhere in this area, including the development of advanced epigenome browsers [32, 33]. Making data even more accessible will be critical if the field is to continue its rapid growth and strengthen its impact. As the number of epigenomic datasets grows into the thousands and tens of thousands, of key importance will be plans made to ensure that sufficient standards are met and that data can be navigated in well curated, high quality databases. An additional challenge in data integration is that much of the full complexity of the epigenome lies in uncharted waters, with many known modifications remaining unmapped and other modifications, such as hydroxylmethylation [34], moving to the center stage. This dynamism in the types of data being produced will be sure to generate increasing demands for new, refined, bioinformatic tools.

Conclusions

The overall impact of the growing number of epigenomes, including the NIH Roadmap Epigenomics Project reference maps, will likely be underestimated. For example, almost every study uses the reference genome sequence, whether it is to design primers, target constructs or align sequencing reads, yet those studies rarely acknowledge the reference genome, because it is simply there and so you can just use it.

I predict that, as investigators become more accustomed to epigenome browsers and to utilizing the existing data for various purposes, the reference set of epigenomics maps will also become a routine resource used in many studies. Applications would include providing a quick overview of a gene/locus of interest; helping to refine a hypothesis; assisting primer design by narrowing down the exact region of dynamic regulation; forming the bases of reporter assays by selecting with precision the functional elements of an upstream regions; and so forth.

This special issue covers epigenomics over a wide range of organisms, systems and methods, all of which provide an informative sampling to illuminate the possibilities for future studies in this expanding and exciting field.