Abstract

Transcriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to predict (impute) gene expression levels from genotypes from samples with matched genotypes and expression levels in a specific tissue. However, it is challenging to develop robust and accurate imputation models with limited sample sizes for any single tissue. Here, we first introduce a multi-task learning approach to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average 39% improvement in imputation accuracy and generated effective imputation models for an average 120% (range 13%-339%) more genes in each tissue. We then describe a summary statistic-based testing framework that combines multiple single-tissue associations into a single powerful metric to quantify overall gene-trait association at the organism level. When our method, called UTMOST, was applied to analyze genome wide association results for 50 complex traits (N_total=4.5 million), we were able to identify considerably more genes in tissues enriched for trait heritability, and cross-tissue analysis significantly outperformed single-tissue strategies (p=1.7e-8). Finally, we performed a cross-tissue genome-wide association study for late-onset Alzheimer's disease (LOAD) and replicated our findings in two independent datasets (N_total=175,776). In total, we identified 69 significant genes, many of which are novel, leading to novel insights on LOAD etiologies.

Copyright

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.