Article Text

Cardiovascular disease (CVD), the leading cause of death around the globe, is caused by interactions of genetic, environmental, and lifestyle factors. Numerous efforts have been made to explore DNA sequence variation in relation to CVD phenotypes, but less attention has been given to the functional consequence of these DNA variants, such as mRNA/miRNA transcription, DNA methylation/histone modification, and protein and metabolite alterations. Considering that the vast majority of DNA variants only explain a tiny proportion interindividual variation in CVD traits, it is likely that they influence CVD through downstream effects via the “expressed genome”. Owing to the advent of high-throughput “omic” technologies, the elements of the expressed genome can be assessed systematically. The purpose of the AHA Statement on the Expressed Genome1 is to summarize various applications of omics to identify biomarkers for clinical diagnosis and prediction and as a means to elucidate CVD mechanisms. The statement additionally discusses issues to be addressed in future omic studies, with a major focus on a variety of CVD traits including coronary artery disease (CAD), stroke, heart failure, and arrhythmias.

Transcriptomics refers to the application of high-throughput methods such as microarray analysis to study the transcriptome, and encompasses measures of expression levels of RNA transcripts as well as microRNAs (miRNAs) and long intergenic non-coding RNAs (lincRNAs) under specific conditions or in a specific cell. The comparison of transcriptomes allows the identification of differentially expressed genes between disease and normal samples, in distinct cell types, or in response to different treatments – whether a drug or a perturbation. For example, using mRNAs derived from whole blood and myocardium to study risk factors for CAD, gene expression analysis has identified several pathways related to inflammation, a well-known process related to atherosclerosis2. Although informative, transcription of DNA into RNA is just one entry point of the expressed genome; epigenetics, proteomics, metabolomics, represent additional features of the expressed genome that can impact disease mechanisms.

Epigenetics reflects potentially heritable changes in gene transcription (e.g. switching genes on or off) that do not involve changes to the underlying DNA sequence, and includes DNA methylation, histone modifications, as well as transcription factors, miRNAs, and lincRNAs. Epigenetic changes are involved in many normal cellular processes but they can also contribute to more damaging effects contributing to disease. As with other omics, scalable technology permits the assessment of individual candidate genes as well as more comprehensive genome-wide (or epigenome-wide) associations. Unlike static genetic variation, epigenetic features change over time and are influenced by numerous factors including age, the environment/lifestyle, and disease state. Therefore, it is difficult to know whether observed changes are causal for an associated disease trait or are merely a downstream consequence of the disease process. Mendelian randomization, which utilizes a genetic proxy (instrumental variable) to interrogate the causal relation between an epigenetic modification and an outcome, is a valuable approach to address this issue3.

Proteomics uses high-throughput approaches to analyze protein abundance in a tissue, cell, or fluid compartment, such as plasma or urine. Dynamic changes in protein levels reflect the cell’s responds to internal and external stimuli and provide a snapshot of the cell in action. Identifying reliable protein biomarkers of CVD would be enormously valuable for diagnosis, treatment, and prevention4. Because of very limited availability of coronary plaque or myocardial tissues for proteomic characterization, most protein biomarkers of CVD are measured in plasma, serum, or whole blood.

Metabolomics represents the systematic identification and quantification of small molecules (typically <1,500 Daltons) and biochemical intermediates (metabolites). As by-products of enzymatic reactions, these compounds are the closest link to disease phenotypes, and they provide a fingerprint of biochemical information reflecting the underlying physiological and pathological state that can be linked to disease. For example, metabolomic studies have identified branched-chain amino acids, lysophosphatidylcholines, and fatty acids associated with atherosclerotic CVD5. Many omic studies have yielded discrepant results even with similar study designs, techniques and outcomes6. Replication studies require considerable time, expense, and effort, but they are critical steps toward validating and interpreting the findings from initial discovery. This was the focus of a 2012 Institute of Medicine (IOM) report7 that was necessitated in part by the erroneous application to clinical practice of omics results that lacked external validity. In response to the IOM report, the National Cancer Institute developed a set of criteria for determining the readiness of an omics test for use in patient care.

Not considered in the AHA statement on the expressed genome is the integration of different omic elements, which will prove to be a critical element in future CVD research. Complex diseases are impacted by alterations in multiple aspects of the genome and its expressed elements from causal genetic variants, to environmental effects on DNA methylation, and downstream effects on the transcriptome, the proteome, and metabolome. Consequently, recent efforts have focused on integration of genetic variants with transcriptomics (expression quantitative trait loci, eQTLs)8, miRNA expression (miR-eQTLs)9, DNA methylation (DNA methylation quantitative trait loci, mQTLs)10 and proteomics (protein quantitative trait loci, pQTLs)11 to study CVD and its risk factors. Integration of data across multidimensional omic data will provide new insights into disease mechanisms and will enable more precise and personalized approaches to diagnosis, treatment, and prevention.

The characterization of the expressed genome via high-throughput omics has already shed new light on our understanding of heart disease and stroke and major advances will continue. But research into the multitude of contributions of the genome and expressed genome to the pathogenesis of CVD is still in its infancy. As more data are generated across many tissues via multiple omics platforms, novel methods for data integration are needed to analyze vast amounts of multidimensional data12. The ultimate success of such approaches will require collaboration among molecular and computational biologists, biostatisticians, mathematicians, computer scientists, and bioinformaticians and will lead to a radically improved understanding of cardiovascular disease processes as a prelude to the development of more effective treatments.

References

Musunuru K, Ingelsson E, Fornage M, Liu P, Murphy AM, Newby LK, Newton-Cheh C, Perez MV, Voora D, Woo D; on behalf of the American Heart Association Committee on Molecular Determinants of Cardiovascular Health of the Council on Genomic and Precision Medicine and Council on Epidemiology and Prevention; Council on Cardiovascular Disease in the Young; Council on Cardiovascular and Stroke Nursing; Council on Cardiovascular Surgery and Anesthesia; Council on Clinical Cardiology; and Stroke Council. The expressed genome in cardiovascular diseases and stroke: refinement, diagnosis, and prediction: a scientific statement from the American Heart Association [published online ahead of print July 31, 2017]. Circ Cardiovasc Genet. doi: 10.1161/HCG.0000000000000037.