Effects of cis and trans genetic ancestry on gene expression in African Americans.

Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America. aprice@hsph.harvard.edu

Abstract

Variation in gene expression is a fundamental aspect of human phenotypic variation. Several recent studies have analyzed gene expression levels in populations of different continental ancestry and reported population differences at a large number of genes. However, these differences could largely be due to non-genetic (e.g., environmental) effects. Here, we analyze gene expression levels in African American cell lines, which differ from previously analyzed cell lines in that individuals from this population inherit variable proportions of two continental ancestries. We first relate gene expression levels in individual African Americans to their genome-wide proportion of European ancestry. The results provide strong evidence of a genetic contribution to expression differences between European and African populations, validating previous findings. Second, we infer local ancestry (0, 1, or 2 European chromosomes) at each location in the genome and investigate the effects of ancestry proximal to the expressed gene (cis) versus ancestry elsewhere in the genome (trans). Both effects are highly significant, and we estimate that 12+/-3% of all heritable variation in human gene expression is due to cis variants.

Gene expression differences between CEU and YRI are validated in AA samples.

The y-axis shows the difference in normalized gene expression due to ancestry estimated from AA samples (ãg,AA) and the x-axis shows the difference in normalized gene expression due to ancestry estimated from CEU and YRI samples (ãg,CEU+YRI) (see for details of normalization). (A) We plot each of the 4,197 genes separately. (B) For aid in visualization, the 4,197 genes are averaged into bins of 20 genes according to values of ãg,CEU+YRI; binning does not affect the slope of the plot. The slope of each plot is our estimate 0.43 of the parameter c.

Plots are analogous to except that genetic (SNP) data were used instead of gene expression data. (A) We plot a random subset of 4,197 markers, for visual comparison to . (B) We average into bins of 20 markers. The slope of each plot is our estimate 0.96 of the parameter c.