Statistical Genetics

Summary

Following the genomics revolution that was triggered by the effort to sequence a human genome, which peaked around 2001, modern genetics has become ever more data-driven and more statistical. Perhaps the most active area of statistical genetics in recent years has been the development of methods to analyse data from...read moregenome-wide association studies (GWAS). Human GWAS involve large numbers of genetic markers (typically half a million or more) typed in large numbers of individuals (usually at least several thousands and often now tens of thousands). A primary goal of the analysis is to identify genomic regions at which the marker alleles vary systematically with phenotype, for example different frequencies in cases and controls. Such a pattern suggests that different alleles at that locus have different phenotypic effects, or in other words that the locus contributes to genetic mechanisms underlying the phenotype. However, identifying the directly-relevant alleles and delineating the mechanism of effect remain highly challenging. GWAS in animal and plant breeding typically use smaller numbers of individuals but this is compensated by closer relatedness among them, as well as more controlled environments making it easier to identify genetic effects. Prediction of phenotype from genome-wide markers has recently gained in prominence as an analysis goal in human genetics, borrowing tools from animal and plant breeding where prediction has always been a major focus. A related topic of current interest is the dissection of heritability using markers rather than pedigrees.

The term “genomics” originally referred to the use of genome-wide genetic data, contrasting with classical genetics that focused on individual loci. However, “genomics” has also come to refer to large-scale analyses of data “downstream” from the DNA, most often mRNA. Subsequently many other “omics” terms were coined, such as transcriptomics, proteomics, metabolomics and epigenomics. There is an increasing drive towards “data integration” that simultaneously uses data from different omics technologies. In these lectures we will mainly focus on DNA-based analyses. DNA has a special status because it is essentially fixed across cell types and throughout an individual’s lifetime, and so direction of causality can be inferred more easily than for other potential risk factors.

Beyond studies of human disease and traits of economic interest in animals and plants, GWAS can be applied to samples of bacterial and viral pathogens, or model organisms central to studies of biological mechanisms. Statistical genetics plays a key role in many other fields, such as population genetics studies aimed at illuminating the demographic history of populations. These can inform us about the migrations and population sizes of our human ancestors or those of other organisms, for example species of interest in conservation genetics and wildlife management. Statistical analyses also underpin many studies in evolutionary genetics, for example those aimed at understanding mechanisms of selection, identifying loci that are affected by selection and estimating selection coefficients. In addition to these, we will also review statistical genetics issues in forensic genetics, particularly for the evaluation of DNA profile evidence, and more recently in the prediction of phenotype from an anonymous DNA sample.

Inferences of relatedness among two or more individuals is a central theme, because relatedness is a long-established proxy for sharing of genes, which is relevant to genotype-to-phenotype analyses (including GWAS), conservation genetics and forensics. Genome-wide genetics data offer new ways to measure and use relatedness, which will also be covered in the lectures.