Summarizing DNA Methylation Data at the Gene Level

Recently, high-throughput microarrays have been developed to measure genome-wide DNA methylation. The current arrays generate datasets at the probe-level, where each probe represents a short genomic position in DNA. For a given gene in DNA, there are tens of probes associated with that gene. To integrate DNA methylation datasets with other biological datasets such as gene expression, usually a gene-level dataset is needed. Thus, it is critical to summarize probe-level methylation values of a gene to a single gene-level value. However, currently there is no gold standard to convert probe-level data to gene-level data.

In this project, we assess several ways to convert probe-level DNA methylation datasets to gene-level. The method that gives the highest correlation between genome-wide methylation and gene expression profiles will be selected as the best approach.

Students will have the opportunity to obtain high-throughput biological datasets from biological databases and implement programs to analyze these datasets.