Abstract

Environmental exposures filtered through the genetic make-up of each individual alter the transcriptional repertoire in organs central to metabolic homeostasis, thereby affecting arterial lipid accumulation, inflammation, and the development of coronary artery disease (CAD). The primary aim of the Stockholm Atherosclerosis Gene Expression (STAGE) study was to determine whether there are functionally associated genes (rather than individual genes) important for CAD development. To this end, two-way clustering was used on 278 transcriptional profiles of liver, skeletal muscle, and visceral fat (n = 66/tissue) and atherosclerotic and unaffected arterial wall (n = 40/tissue) isolated from CAD patients during coronary artery bypass surgery. The first step, across all mRNA signals (n = 15,042/12,621 RefSeqs/genes) in each tissue, resulted in a total of 60 tissue clusters (n = 3958 genes). In the second step (performed within tissue clusters), one atherosclerotic lesion (n = 49/48) and one visceral fat (n = 59) cluster segregated the patients into two groups that differed in the extent of coronary stenosis (P = 0.008 and P = 0.00015). The associations of these clusters with coronary atherosclerosis were validated by analyzing carotid atherosclerosis expression profiles. Remarkably, in one cluster (n = 55/54) relating to carotid stenosis (P = 0.04), 27 genes in the two clusters relating to coronary stenosis were confirmed (n = 16/17, P<10(-27 and-30)). Genes in the transendothelial migration of leukocytes (TEML) pathway were overrepresented in all three clusters, referred to as the atherosclerosis module (A-module). In a second validation step, using three independent cohorts, the A-module was found to be genetically enriched with CAD risk by 1.8-fold (P<0.004). The transcription co-factor LIM domain binding 2 (LDB2) was identified as a potential high-hierarchy regulator of the A-module, a notion supported by subnetwork analysis, by cellular and lesion expression of LDB2, and by the expression of 13 TEML genes in Ldb2-deficient arterial wall. Thus, the A-module appears to be important for atherosclerosis development and, together with LDB2, merits further attention in CAD research.

Sixty-six gene profiles (15,042 RefSeqs each) from liver, skeletal muscle, and visceral fat and 40 from atherosclerotic aortic wall were clustered by a coupled two-way approach. First, the RefSeqs were clustered according to their average probe signal values on the chip (mRNA level, see figure “clustering”) resulting in 11 skeletal muscle, 20 visceral fat, 15 liver, and 14 atherosclerotic arterial wall clusters together representing 4007 RefSeqs/3958 genes. Second, clustering within each tissue cluster was performed to sort patients by mRNA levels. Clusters that sorted the patients according to extent of coronary stenosis were considered further. To validate these atherosclerosis-related clusters, we performed cluster analysis of 25 gene-expression profiles of carotid atherosclerosis lesions. Of eight clusters representing 903 RefSeqs/894 genes, one segregated patients according to IMT. The extent of overlap between this cluster relating to carotid atherosclerosis and the two clusters relating to coronary atherosclerosis was used as the confirmatory measure. Genetic enrichment and functional gene classifications were then assessed by bioinformatic and TRANSFAC analyses. Animal and cell models were used for functional validation of individual genes.

The cluster was defined by related mRNA levels (indicated by average probe signals on the arrays) and identified as one of fourteen atherosclerotic arterial wall clusters by the second step of coupled two-way clustering of mRNA profiles from STAGE patients (Text S1). Columns represent individual patients, and rows individual RefSeqs with corresponding gene symbols and mRNA ratios of the two patient groups. Above heat map: individual patient numbers, below heat map: bars indicating individual stenosis score together with means ± SD and average ratios in each group and P-values for comparing groups. EVA1 is represented by two RefSeqs.

The cluster was defined by related mRNA levels (indicated by average probe signals on the arrays) and identified as one of 20 visceral fat clusters by the second step of coupled two-way clustering of mRNA profiles from STAGE patients (Text S1). Columns represent individual patients, and rows individual RefSeqs with corresponding gene symbols and mRNA ratios of the two patient groups. Above heat map: individual patient numbers, below heat map: bars indicating individual stenosis score together with means ± SD and average ratios in each group and P-values for comparing groups. Red highlighting indicates genes also found in the cluster in Figure 2.

The cluster was defined by related mRNA levels (indicated by average probe signals on the arrays) and identified as one of eight carotid stenosis clusters by the second step of coupled two-way clustering of mRNA profiles from Carotid Stenosis patients (Text S1). Columns represent individual patients, and rows individual RefSeqs with corresponding gene symbols and mRNA ratios of the two patient groups. Below heat map: bars indicating individual IMT together with means ± SD and average ratios in each group and P-values for comparing groups. Red highlighting indicates genes also identified in the clusters in Figure 2 and Figure 3. EVA1 is represented by two RefSeqs.

(A) Venn diagrams showing overlaps of genes in the A-module (three clusters related to extent of atherosclerosis) (Figure 2, Figure 3, Figure 4). Seven genes were found in both the atherosclerotic arterial wall and visceral fat clusters (P = 10−10), 17 in the atherosclerotic arterial wall and carotid stenosis clusters (P = 10−30), and 16 in the visceral and carotid stenosis clusters (P = 10−27). Six genes were found in all three clusters (P = 10−23). The union of all three clusters represented 128 genes. (B) A gene regulatory network inferred by co-expression of A-module genes using genome-wide expression data from the atherosclerotic arterial wall, carotid stenosis tissue, and visceral fat. Network edges are supported by at least two of the datasets, resulting in a total of 49 nodes. Marked in black are nodes (genes) with known regulatory activity, which are prioritized by the algorithm (Text S1). Marked as diamonds are 24 genes present in intersections between at least two of the clusters in Figure 5A (n = 27). (C) The TEML pathway. Marked in red are eight genes in the A-module that perfectly matched genes in the TEML pathway (P = 6.6×10−5). Marked in blue are 15 genes in the A-module that were associated with the TEML pathway according to Panther family annotation in DAVID. For a list of all genes in the TEML pathway and Panther families see Table S7 and Table S8, respectively. (D) The P-value distribution of 484 eSNPs (SNPs with allele distribution affecting gene expression) in the A-module indicating association with CAD according to a recent GWAS, the WCTTT study [10].