About This PhD Project

Project Description

Rationale Genome Wide Association Studies (GWAS) have identified a large number of genetic mutations called Single Nucleotide Polymorphisms (SNPs) that are associated with disease and other phenotypes such as height. However, for the majority of complex disease we can only very weakly predict disease status from such studies. This is in part because rare SNPs are observed in only a very few people and thus we have insufficient statistical power.

At the same time, methods for predicting the influence of SNPs on functional process such as protein folding have become increasingly informative. It is natural to try to combine these sources of information, which leads to consideration of Bayesian methodology. This approach uses functional predictions as a prior on which SNPs should be influencing a particular functional pathway associated with disease. This should significantly reduce the amount of data required to infer the action of a rare SNP, and therefore breathe new life into GWAS.

Aims & Objectives This study will develop new Bayesian methodology to link functional predictions of SNPs with a GWAS study. These approaches are computationally costly if performed naively and a key objective will be to import approaches from machine learning to implement a practical GWAS for large studies. This will be applied to the ALSPAC and UK10K datasets in order to learn new genetic associations in a number of key diseases, with an aim of being able to analyse 100,000 Genomes England participants.

Methods We will construct simple Bayesian models for GWAS, incorporating different types of functional predictions into the prior structure. Once this has been tested on appropriately subsampled data, we will develop variational inference approaches (Wang & Blei 2013) to speed the computation to the required scale and produce a software package. Students can choose whether to pursue further applications in epidemiology, or extend the methodology to more complex cases.