Abstract

More than a decade after the sequencing of the human genome, a deluge of genome-wide population data are generating a portrait of human genetic diversity at an unprecedented level of resolution. Genomic studies have provided new insight into the demographic and adaptive history of our species, Homo sapiens, including its interbreeding with other hominins, such as Neanderthals, and the ways in which natural selection, in its various guises, has shaped genome diversity. These studies, combined with functional genomic approaches, such as the mapping of expression quantitative trait loci, have helped to identify genes, functions, and mechanisms of prime importance for host survival and involved in phenotypic variation and differences in disease risk. This review summarizes new findings in this rapidly developing field, focusing on the human immune response. We discuss the importance of defining the genetic and evolutionary determinants driving immune response variation, and highlight the added value of population genomic approaches in settings relevant to immunity and infection.

Introduction

Studies of the history of the genus Homo have entered a golden age with the advent of genome-wide, deep-sequencing approaches. Surveys of genetic variation, such as the Exome Variant Server (Fu et al., 2013), the Exome Aggregation Consortium (Lek et al., 2016), the 1,000 Genomes Project (Auton et al., 2015), and the Simons Genome Diversity Project (Mallick et al., 2016), provide invaluable resources for the clinical interpretation of genetic variants identified in disease patients, and for dissections of the genetic architecture of morphological and physiological traits (Fig. 1, A and B). Genomic studies have also increased our understanding of the demographic history of our species (Veeramah and Hammer, 2014; Nielsen et al., 2017), including the contribution of interbreeding with archaic hominins, such as Neanderthals and Denisovans (Racimo et al., 2015), and the extent to which natural selection has acted on the human genome (Fan et al., 2016). This work has, in turn, provided insight into the mechanisms by which deleterious variants are removed from the population (Lohmueller, 2014; Henn et al., 2015) and the potential of humans to adapt to the broad range of environments they occupy (Jeong and Di Rienzo, 2014; Fan et al., 2016).

Approaches to dissect the genetic basis of phenotypic traits and diseases. (A) Clinical genetics aims to identify rare mutations associated with severe clinical phenotypes. The genealogy illustrates the Mendelian inheritance of an autosomal recessive disorder, where one copy of the segregating disease mutation has no phenotype (gray), whereas two copies lead to the manifestation of the severe disorder (black). (B) Genetic epidemiology aims to dissect the genetic and environmental factors contributing to complex disease phenotypes by using population-based approaches. The Manhattan plot shows six loci associated with susceptibility to leprosy by a GWAS in the Chinese population (Liu et al., 2015). (C) Population genetics explores how past human demography, including admixture with archaic hominins, and natural selection have affected the genetic diversity of human populations. It represents a powerful approach to identify common variants associated with phenotypic traits that have played a major role in human survival and adaptation, including resistance to infectious diseases. (D) Cellular genomics allows the investigation of the genetic basis of intermediate molecular phenotypes, such as gene expression, which can be linked to ultimate organismal traits. The identification of genetic variants affecting gene expression in cis or trans, through the mapping of eQTL, helps to pinpoint cellular and molecular mechanisms involved in complex phenotypic traits, including immunity to infection.

Over the last decade, genomic scans of natural selection in the human genome have consistently identified genes and functions relating to immunity and host defense as recurrent targets of selection (Barreiro and Quintana-Murci, 2010; Quintana-Murci and Clark, 2013; Fumagalli and Sironi, 2014; Karlsson et al., 2014). There is also growing evidence to suggest that admixture with ancient hominins introduced advantageous immune variants into the modern human population (Racimo et al., 2015). Dissection of the action of selection on immune genes in settings relevant to immunity and infection, and investigations of the effects of genetic variants on responses to immune activation, through the mapping of expression quantitative trait loci (eQTL), are the key to identifying evolutionarily important determinants of host immune responsiveness and of the mechanisms underlying the diversity of immune phenotypes and disease risk (Fig. 1, C and D; Casanova et al., 2013; Fairfax and Knight, 2014).

This review focuses on the crucial contribution of genome-wide approaches to the understanding of human phenotypic diversity, by improving our knowledge of the demographic and adaptive history of our species, particularly as concerns pathogen pressures, and the admixture of Homo sapiens with archaic, now-extinct hominins, leading to the acquisition of advantageous variants. We will specially highlight recent advances in studies of the genetic, nongenetic, environmental, and evolutionary factors driving the inter-individual and inter-population phenotypic variability of immune responses, providing insight into the mechanisms underlying host immunity to infection and susceptibility to immune-related diseases.

Reconstructing the genomic history of Homo sapiens

Genomic surveys support an origin of modern humans in sub-Saharan Africa ∼200,000 yr ago, with subsequent dispersal out of Africa ∼40,000–80,000 yr ago, followed by the rapid colonization of South Asia, Australia, Europe, and East Asia (Novembre and Ramachandran, 2011; Veeramah and Hammer, 2014; Nielsen et al., 2017). Humans ultimately reached distant locations, such as the Americas, ∼15,000–35,000 yr ago, and the remote islands of Oceania, which were settled as recently as ∼1,000–4,000 yr ago (Fig. 2). African populations present the highest levels of diversity worldwide, and the diversity of non-African populations decreases with increasing distance from Africa (Abecasis et al., 2010, 2012; Campbell et al., 2014), attesting to the occurrence of bottlenecks and founder events during their migrations across the globe (Jakobsson et al., 2008; Henn et al., 2012). Recent studies based on whole-genome sequencing have shown that all contemporary non-Africans are descended from a single population that left Africa less than 60,000 yr ago (Malaspinas et al., 2016; Mallick et al., 2016). Modern Papua New Guineans present genomic traces of an ancient dispersal out of Africa ∼120,000 yr ago (Pagani et al., 2016), a dispersal that remains to be confirmed, but which clearly made little contribution to modern Eurasians.

Major human migrations across the globe and presence of archaic ancestry in the genomes of modern human populations. Red arrows indicate the major migrations of Homo sapiens to colonize the world after the out of Africa exodus. Blue arrows indicate some more recent migratory events (<5,000 yr ago), and dashed arrows represent the historical migrations related to the Trans-Atlantic slave trade. Approximate geographic areas of modern human populations presenting Neanderthal or Denisovan ancestry are shaded in blue and orange, respectively, according to the Simons Genome Diversity Project (Mallick et al., 2016). This map does not present a comprehensive evaluation of the populations presenting archaic ancestry, as only populations for which genetic profiles are available are plotted. The Neanderthal ancestry observed in American populations does not reflect in situ admixture with Neanderthals but, instead, their varying levels of European ancestry.

Genetic studies have also revealed substantial population structure (i.e., subdivision) within continents. In sub-Saharan Africa, population structure is correlated with geography, language family, and mode of subsistence (Henn et al., 2012; Campbell et al., 2014; Patin et al., 2014), whereas, in Europe, genetic diversity is strongly driven by geography, and population structure is detectable within even very small geographic areas (Novembre et al., 2008). The quantification of population structure has important implications for studies of disease, as it must be carefully controlled for in genome-wide association studies (GWAS). Genomic studies can also quantify major admixture events between populations, such as those between the ancestors of African-Americans (Bryc et al., 2010; Montinaro et al., 2015). Decomposing the genomes of admixed populations into segments of different ancestry facilitates the identification of variants associated with disease risk, particularly for traits with different prevalence in the parental populations of admixed groups (Smith and O’Brien, 2005).

Natural selection has driven human adaptation to environmental cues

During their spread across the globe, humans encountered a highly diverse set of climatic, nutritional, and pathogenic conditions, to which they had to adapt. Phenotypic traits increasing their chances to survive and reproduce in such environments are due, at least in part, to genetic variation and thus, transmitted across generations. The dissection of the legacy of past selection in the human genome has proved crucial for identification of genes underlying the broad morphological and physiological diversity observed across human populations, and for increasing our understanding of the genetic architecture of adaptive phenotypes (Vitti et al., 2013; Jeong and Di Rienzo, 2014; Key et al., 2014b; Fan et al., 2016).

Forms of natural selection

Although natural selection acts on phenotypes, it is primarily the genetics underlying the phenotype that is inherited, so genotypes will be acted on by selection. Selection can be classified according to whether an allele is disfavored (purifying selection) or favored (positive selection). Purifying selection removes deleterious alleles and is the most pervasive form of selection (Eyre-Walker and Keightley, 1999; Kryukov et al., 2007). Conversely, positive selection acts upon advantageous mutations after different evolutionary trends (Pritchard et al., 2010). According to the classic sweep model, positive selection acts upon a newly arising advantageous mutation that will increase in frequency, whereas selection on standing variation involves the selection of an allele that is already segregating in the population, which therefore increases in frequency, and polygenic adaptation involves the simultaneous selection of variants at many loci, each of which makes a small contribution to fitness. Genetic adaptation may also occur through balancing selection, which preserves functional diversity through heterozygote advantage, frequency-dependent selection, or pleiotropy (Charlesworth, 2006; Key et al., 2014b). Each selection form leaves specific molecular signatures around the selected region (Fig. 3 A), which are detected by an ever-increasing number of statistical methods that, depending on the nature of genetic data analyzed, can inform the time range at which selection occurred (Box 1).

Natural selection and archaic introgression affect human genome diversity. (A) Schematic representation of how different selective regimes and archaic introgression affect population genetic diversity. Purifying selection removes deleterious alleles (black dots) from the population. In turn, genetic variants conferring a selective advantage (e.g., disease resistance) increase in frequency, or are maintained, in the population though positive and balancing selection, respectively. Different models of positive selection are represented: the classic sweep, where a new mutation rapidly increases in frequency until fixation (red dots); selection on standing variation, where selection acts on a preexisting genetic variant in the population (orange dots); and polygenic adaptation, where multiple genetic variants located in different genomic regions simultaneously increase in frequency (green dots). Balancing selection is represented here by heterozygote advantage (blue dots). Archaic introgression illustrates ancient gene flow between archaic hominins (purple segments) and, in this case, the ancestors of modern Europeans. The traces of this archaic introgression are today detectable in the genomes of Europeans. (B) Genetic diversity and proportion of deleterious alleles in populations with different demographic regimes. A population that underwent a past expansion (left) presents higher levels of genetic diversity, i.e., higher absolute numbers of benign (white dots) and deleterious (black dots) variants, than a population that experienced a bottleneck followed by an expansion (right). However, the proportion of deleterious variants, with respect to those that are benign is lower in the expanding population due to the increased efficiency of purifying selection in removing deleterious alleles. (C) Some molecular signatures of positive selection. The FST statistic (top) measures genetic distances between populations; positive selection, which acts locally, tends to increase the genetic differentiation of particular loci, resulting in high FST values. The distribution of FST values, calculated for genome-wide SNPs across all chromosomes, is presented here. The dashed red line corresponds to the 99th percentile of the distribution, and loci above this line should be considered as strong candidates for positive selection. The degree of haplotype homozygosity (bottom) is another feature commonly used to detect advantageous variants. Positive selection can lead to an excess of homozygosity (green line) of the haplotypes carrying the positively selected allele (red line), with respect to the other haplotypes (blue line). This pattern is the result of the rapid increase in frequency of the advantageous allele, with recombination not having enough time to break down the haplotype on which the selected mutation arose. (D) Comparison of archaic introgression scores between two classes of genes (top); class B genes (e.g., innate immunity genes) present higher archaic ancestry than class A genes (e.g., the rest of the genome). The TLR1-6-10 gene cluster presents a high degree of archaic ancestry (bottom); archaic haplotypes (blue lines) surrounding this genomic region are represented here in European individuals (adapted from Deschamps et al., 2016).

Box 1. Methods to detect natural selection at different time scales

An increasing number of statistical tests are available to detect the occurrence, intensity, and time frame of selection acting on genomes. Frequency-based methods measure nucleotide diversity at a given locus, as well as the deviations of the distribution of its allele frequency spectrum from standard neutral expectations. These methods, including Tajima’s D and derivatives, generally detect selection events that occurred in the past ∼250,000 years: negative values of these tests indicate an excess of rare alleles, compatible with negative or positive selection, whereas positive values reflect an excess of intermediate frequency alleles, a pattern compatible with balancing selection (Kreitman, 2000; Nielsen, 2005).

Methods based on population genetic differences, such as FST and its derivatives, allow the identification of variants that have been favored locally (i.e., frequency increase of the advantageous allele in a population-specific manner) in the past ∼75,000 years by pinpointing marked population differences in frequency of these alleles (Holsinger and Weir, 2009). In turn, methods based on linkage disequilibrium generally measure the level of haplotype homozygosity associated with the selected allele, with respect to that associated with the corresponding neutral allele at a specific site. These include the long-range haplotype (LRH), the integrated haplotype score (iHS), and the cross population extended haplotype homozygosity (XP-EHH) tests; significant values of these tests highlight the rapid increase in frequency of the causal advantageous mutation, and their associated variants, supporting an event of positive selection occurring in the past ∼25,000 years (Sabeti et al., 2002; Voight et al., 2006; Wang et al., 2006). In the context of recent selection, a novel method, the singleton density score, allows the detection of adaptive events within the last ∼2,000 years, based on the low number of rare mutations present on the haplotype carrying the advantageous allele (Field et al., 2016).

Due to the moderate to high false discovery rate of some of these methods, together with the difficulty in identifying the causal selected variants, composite methods that combine multiple statistics into a single score, such as the composite likelihood ratio (CLR) and the composite of multiple signals (CMS), have been developed to increase power and minimize the detection of false-positive signals (Nielsen et al., 2005; Grossman et al., 2010, 2013).

More generally, the performance of methods to detect selection depends on several parameters, such as the age of the advantageous mutation, its selection coefficient, and the demographic history of the population studied, which affect allele frequencies and other aspects of the genetic data to different extents, thus impacting the interpretation of the results and the significance of the statistical tests. The extent to which currently used methods to detect selection are empowered and robust to some of these factors remains an active field of research in population genetics. Further details of these intraspecies methods, together with methods analyzing interspecies divergence to detect negative and positive selection, such as the ratio of substitution rates at non-synonymous and synonymous sites (dN/dS), Hudson-Kreitman-Aguadé (HKA), and McDonald-Kreitman tests, have been extensively reviewed elsewhere (Kreitman, 2000; Nielsen, 2005; Sabeti et al., 2006; Nielsen et al., 2007; Vitti et al., 2013).

Removal of deleterious mutations by purifying selection

Recent studies have shed new light on the ways in which population history can modify the mode in which selection removes deleterious variants (Lohmueller, 2014; Henn et al., 2015; Simons and Sella, 2016). Non-African populations have a higher proportion of deleterious variants of essential genes than Africans, a pattern consistent with less efficient purifying selection to purge deleterious alleles from small populations (Lohmueller et al., 2008; Gutenkunst et al., 2009; Tennessen et al., 2012). Population history appears to have a negligible impact on the mean burden of individual deleterious variants (Fu et al., 2014; Simons et al., 2014; Do et al., 2015), but it has been consistently shown that genetic drift affects the frequency of weakly deleterious mutations more strongly in bottlenecked than in large populations (Fig. 3 B). Indeed, the number of homozygous deleterious genotypes carried by individuals increases with distance from Africa (Henn et al., 2016), and founder populations, such as French-Canadians, and bottlenecked populations, such as the Finns, have larger proportions of deleterious variants, including loss-of-function variants and complete gene knockouts, than expanding populations, such as French or other European populations (Casals et al., 2013; Lim et al., 2014).

Genomic traces of admixture between archaic and modern humans

Genomic studies of ancient DNA have revealed that our species, Homo sapiens, interbred with other human forms, such as Neanderthals and Denisovans, present in Eurasia from ∼30,000–50,000 yr ago. This interbreeding resulted in a phenomenon known as archaic introgression; the legacy of this ancient admixture is observed in the genomes of modern human populations (Figs. 2 and 3 A; Kelso and Prüfer, 2014; Vattathil and Akey, 2015). Whole-genome sequences from ancient specimens have revealed the presence of DNA of Neanderthal ancestry accounting for ∼2% of the genome in Europeans to ∼4% in Asians (Green et al., 2010; Prüfer et al., 2014). DNA introgressed into modern humans from Denisovans is found mostly in Australo-Melanesians, in whom it may account for up to 6% of their genomes, and, to a lesser extent, in South East Asians (Reich et al., 2010, 2011; Meyer et al., 2012). These estimates are averages across the modern human genome, and specific regions of the genome may have degrees of Neanderthal ancestry as high as 64% in Europeans and 62% in Asians (Sankararaman et al., 2014).

However, there has been strong selection against archaic introgression, particularly among protein-coding genes, probably caused by its deleterious effects in modern humans (Sankararaman et al., 2014; Vernot and Akey, 2014; Fu et al., 2016). Identifying regions in the human genome strongly depleted of archaic ancestry can thus identify functions that may have contributed to the uniqueness of some modern human traits (Vernot et al., 2016). For example, a region devoid of Neanderthal ancestry has been identified around the forkhead box protein P2 (FOXP2) gene, mutations of which are associated with language disorders (Konopka and Roberts, 2016). The AMY1 gene, encoding an amylase enzyme responsible for starch digestion, also lies in a Neanderthal desert. Unlike modern humans, who carry multiple copies of AMY1, Neanderthals and Denisovans had only one copy of this gene, suggesting that the production of larger amounts of salivary amylase for starch digestion has been of great benefit to modern humans (Perry et al., 2015).

Despite the widespread signature of purifying selection against archaic alleles, modern humans have also acquired advantageous alleles via admixture with ancient hominins, through adaptive introgression (Kelso and Prüfer, 2014; Racimo et al., 2015; Vattathil and Akey, 2015). Genes involved in functions relating to keratin filaments, sugar metabolism, muscle contraction, and oocyte meiosis have been targeted by adaptive introgression from Neanderthals (Sankararaman et al., 2014). For example, EPAS1, associated with hemoglobin concentration and response to hypoxia, displays a high degree of Denisovan ancestry in Tibetans, suggesting that this population acquired advantageous alleles for life at high altitude through ancient admixture (Huerta-Sánchez et al., 2014). Whatever the potential benefits of archaic introgression in the past, alleles of Neanderthal origin have been shown to be associated with several neurological, dermatological, and immunological phenotypes, indicating an influence of ancient admixture on current disease risk in humans (Simonti et al., 2016). These studies show how the detection of introgression from archaic hominins can shed light on the impact of ancient hybridization on the morphological and physiological variability of modern human populations.

Purifying selection and essentiality for immune responses

Genes evolving under purifying selection, and thus under strong constraints, are generally involved in mechanisms essential for host defense, and the variation of these genes can lead to severe disorders. Microbial sensors, such as endosomal TLRs and many NLRs, adaptors, such as MYD88 and TRIF, and effectors, such as some type-I IFNs and IFN-γ, have been shown to evolve under strong purifying selection, attesting to the unique, essential nature of the mechanisms in which they are involved (Quintana-Murci and Clark, 2013). Consequently, rare mutations of highly constrained genes, such as TLR3, TRIF, MYD88, STAT1, TRAF3 and the genes of the IFN-γ pathway, are associated with life-threatening diseases, such as HSV-1 encephalitis, pyogenic bacterial infections, and Mendelian susceptibility to mycobacterial disease (Pérez de Diego et al., 2010; Boisson-Dupuis et al., 2012; Casanova et al., 2013). More generally, a recent genome-wide study has shown that innate immunity genes have evolved under stronger purifying selection than the rest of the genome, with those associated with autosomal-dominant forms of primary immunodeficiencies subject to the strongest degree of purifying selection (Deschamps et al., 2016). These findings support the hypothesis that constrained genes are of major biological relevance in host survival and should be given priority in genetic studies aiming to identify new genetic etiologies of severe, infectious disease phenotypes.

A recent study has estimated the time frame of selection targeting innate immunity genes, and found that most selective events occurred ∼6,000–13,000 yr ago (Deschamps et al., 2016), corresponding to the period at which human societies adopted agriculture, a major transition in human lifestyle that likely modified human exposure to pathogens and led to genetic adaptation. The detection of immune genes targeted by local adaptation can increase our understanding of the mechanisms involved in host resistance to infectious challenges at different time scales, the nature of which may also differ between human populations.

Present deleterious effects of past selection events

Environmental factors may change over time, and genetic variants inherited from our ancestors and that have been favored by past selection can become detrimental in modern societies, increasing the risk of inflammation or autoimmunity (Barreiro and Quintana-Murci, 2010; Sironi and Clerici, 2010; Brinkworth and Barreiro, 2014). For example, the prevalence of celiac disease, an autoimmune disorder caused by gluten intolerance, differs considerably between human populations, with individuals of European ancestry particularly affected (Kang et al., 2013). Population genetic analyses have shown that the high frequency of several risk alleles of genes associated with celiac disease, such as IL12A, IL18RAP, and SH2B3 (Hunt et al., 2008), in Europeans results from past positive selection events (Barreiro and Quintana-Murci, 2010; Zhernakova et al., 2010). Functional analyses of the SH2B3 risk variant have suggested that this otherwise deleterious variant was subject to positive selection in the past because it confers greater protection against infection (Zhernakova et al., 2010). However, the pleiotropic nature of SH2B3 (i.e., regulating the structural organization and development of platelets and endothelial cells), like that of many immune genes, suggests that alleles that currently increase the risk of inflammatory or autoimmune disorders have in the past conferred greater reproductive success via a broader range of beneficial phenotypes (Brinkworth and Barreiro, 2014). The case of celiac disease neatly illustrates the tradeoff between past selection and current maladaptation, and highlights the value of population genetics for elucidating the mechanisms contributing to pathological inflammation and its heterogeneous distribution across populations.

The immune legacy of archaic hominins in our genomes

There is increasing evidence to suggest that immune genes have not only been an important substrate of selection, but also that their diversity in modern humans has been affected by archaic introgression (Racimo et al., 2015). Innate immunity genes generally display a higher degree of Neanderthal ancestry than the rest of the genome, and genes encoding sensors, such the OAS and the TLR6-1-10 clusters, transcription factors, such as IRF6, and effector molecules, such as the restriction factors IFITM1-3 and the cytokines IL17A and IL17F, present the highest Neanderthal introgression scores at the genome-wide level (Fig. 3 D; Deschamps et al., 2016). In some cases, there is also evidence for selection acting on the archaic introgressed segments (i.e., adaptive introgression). For example, the HLA region, which evolves under balancing selection in humans, harbors functional variants that were probably introgressed from Neanderthals and Denisovans (Abi-Rached et al., 2011). Other candidates for adaptive introgression include STAT2, which is involved in the interferon response to viral infection; OAS1, which has been implicated in innate immune responses to viruses; and the TLR6-1-10 cluster, encoding proteins responsible for the sensing of microbial components on the cell surface (Mendez et al., 2012, 2013; Sankararaman et al., 2014; Dannemann et al., 2016; Deschamps et al., 2016; Sams et al., 2016).

The signals of adaptive introgression detected suggest an advantage for modern human survival, but recent findings have suggested that there may be a tradeoff for archaic introgression. For example, a nonsynonymous variant of the ZNF365D gene present in ∼26% of Europeans and absent from Africans was inherited from Neanderthals and is associated with a higher risk of Crohn’s disease (Sankararaman et al., 2014). Likewise, variants of TLR6-1-10 inherited from Neanderthals and Denisovans and present in Europeans and Asians have been associated with greater susceptibility to allergies (Dannemann et al., 2016). These studies collectively demonstrate that admixture with archaic hominins introduced variants into the gene pool of modern Eurasians some 30,000–50,000 yr ago, increasing the diversity of the modern human immune repertoire. These variants were initially neutral or advantageous for modern humans, but some are today associated with disease phenotypes.

Understanding immune response phenotypes through regulatory variation

Studies of the effects of selection on diversity at immune gene loci increase our understanding of the biological relevance of the functions concerned, complementing immunological, clinical, and epidemiological genetic studies (Casanova et al., 2013). Nevertheless, little is known about the relationship between genetic variation and immune phenotype diversity, and the nature of the immunological mechanisms under selection remains largely unexplored. The contribution of host genetic variants to variation in immune phenotypes, including those related to disease, is increasingly being documented by GWAS (Vannberg et al., 2011; Parkes et al., 2013; Abel et al., 2014). However, the multiple variants associated with immune phenotypes tend to have small individual effect sizes, and the identification of causal functional variants remains challenging (Manolio et al., 2009). In recent years, analyses of regulatory variants of gene expression (eQTL) have proved to be of considerable biomedical value (Montgomery and Dermitzakis, 2011), establishing links between intermediate phenotypes, such as gene expression, and organism traits, such as immunity to infection (Fig. 4 A; Fairfax and Knight, 2014). Furthermore, GWAS have shown that susceptibility to common immune diseases is primarily controlled by noncoding, probably regulatory, variants (Hindorff et al., 2009; Nicolae et al., 2010; Schaub et al., 2012; Fraser, 2013; Pickrell, 2014). However, eQTL studies have also shown that the genetic control of gene expression varies considerably across tissues and cell types (Dimas et al., 2009; Price et al., 2011; Powell et al., 2012), reinforcing the need to consider the appropriate cell context specificity of eQTL in studies aiming to increase knowledge on the genetic bases of phenotypic traits or diseases.

Principles of eQTL mapping for investigating immune response variation. (A) General workflow to map the genetic basis of immune response variation in human populations; transcriptional responses, to various infections or immune stimulations, of primary immune cells from healthy donors are defined to subsequently assess the correlation between genotype variation and gene expression phenotypes, through eQTL mapping. (B) Plots representing eQTL; in contrast with SNP1, which does not behave as an eQTL (left), SNP2 is detected as an eQTL because its genotypes correlate with gene expression variation (middle). SNP3 behaves as a response eQTL; genotypes are correlated with gene expression variation only in stimulated conditions, indicating gene–environment interactions (right). (C) The fine mapping of the TLR1 genomic region (left) detected the SNP rs5743618 as the best trans-eQTL in Europeans (Quach et al., 2016), whose derived C allele is associated with the expression patterns of multiple genes upon Pam3CSK4 monocyte stimulation (right). The C allele presents signatures of local adaptation in Europeans (EUR), where it is present at very high frequency, whereas it is virtually absent in African (AFR) and East-Asian (EAS) populations.

Two subsequent seminal eQTL studies provided mechanistic insights into the genetic basis of immune variation (Fairfax et al., 2014; Lee et al., 2014). The expression of hundreds of genes after the stimulation of monocytes from individuals of European descent with LPS or IFN-γ was shown to be under genetic control. The genes concerned include nodal genes and effector molecules involved in pathways such as those involving the products of the TLR4-related TIRAP, TRAF6, FADD, and MAP kinase genes; the inflammasome-related CASP1 and PYCARD genes; genes encoding downstream cytokines, such as IL6, CXCL9, and IFNB1; and many IFN-γ pathway genes (Fairfax et al., 2014). The other study focused on responses to LPS, influenza, or IFN-β in dendritic cells from individuals of various ethnic backgrounds (Lee et al., 2014). The authors identified a large number of response eQTL, and showed that some of the regulatory variants involved altered the binding of transcription factors released only upon immune activation, such as STAT1 and STAT2, which are released upon IFN-β treatment, providing some insight into the mechanisms underlying the gene–environment interactions detected. In some cases, response eQTL overlap variants associated with susceptibility to infectious and autoimmune diseases, as illustrated by CARD9 and Crohn’s disease, NOD2 and leprosy, or IRF7 and systemic lupus erythematosus (Fairfax et al., 2014; Lee et al., 2014). These findings support the notion that disease-risk alleles can display activity only after some sort of immune activation (Gaulton et al., 2010; Gregory et al., 2012), and highlight the value of response eQTL mapping for investigations of the mechanisms underlying complex disease phenotypes.

Genetic determinants of population variation in immune responses

Analyses of eQTL on lymphoblastoid cell lines from various human populations worldwide provided proof of concept that genetic variation accounts for differences in gene expression among ethnic groups (Spielman et al., 2007; Stranger et al., 2012). Using primary monocytes and T lymphocytes, it has been shown that cis-eQTL are largely shared across populations, with only a small number of them being population specific (Raj et al., 2014). Although this study focused on gene expression at the steady state, two recent studies have determined the degree, and underlying genetic control, of population differences in the response to immune stimulation (Nédélec et al., 2016; Quach et al., 2016). They defined transcriptional responses to immune challenges through RNA sequencing, and mapped eQTL in macrophages from African-Americans and European-Americans exposed to Listeria monocytogenes and Salmonella Typhimurium (Nédélec et al., 2016), and in monocytes from Africans and Europeans exposed to TLR ligands (activating TLR4, TLR1/2, and TLR7/8) and influenza A virus (Quach et al., 2016). Despite differences in the experimental settings, both these studies found that the genes for which the response to immune stimulation differed strongly between populations were enriched in genetic control, with the regulatory variants concerned presenting different allele frequencies in populations of different ancestries.

Individuals of African descent generally display a stronger response to bacterial infection, particularly for inflammatory response genes, than individuals of European descent (Nédélec et al., 2016). A large fraction of European individuals has a mutation at the TLR1 locus that impairs NK-κB activity (Barreiro et al., 2009) and acts as a population-specific trans-eQTL (Fig. 4 C). This TLR1 mutation is associated with a large network of genes displaying decreased levels of expression in response to immune activation only in Europeans (Quach et al., 2016). Altogether, these studies identified a large number of genes displaying considerable differences between populations, especially in the context of their responses to immune activation, revealing mechanisms that might account for differences in the clinical manifestations of immune-related diseases between individuals or ethnic groups.

There is growing evidence to suggest that regulatory variants play a major role in population adaptation (Schaub et al., 2012; Fraser, 2013; Pickrell, 2014). The enrichment of immune-responsive regulatory variants in population-specific signals of positive selection thus facilitates the identification of genes and phenotypes of evolutionary importance that have contributed to human survival. Examples of population differences in immune responses resulting from local adaptation driven by regulatory variants include the expression of HLA-DQA1, associated with susceptibility to celiac disease; ERAP2, involved in susceptibility to Crohn’s disease; CCR1, limiting leukocyte recruitment and preventing inflammatory responses; and TLR1, associated with markedly lower levels of inflammatory response gene expression (Nédélec et al., 2016; Quach et al., 2016). The positive selection signature detected at the TLR1 trans-eQTL is consistent with attenuated TLR1-mediated signaling being advantageous to Europeans, highlighting the benefits of avoiding strong inflammatory responses that may be harmful to the host (Quach et al., 2016).

Introgression from Neanderthals also contributed to the diversification of transcriptional responses to infection in human populations. The genetic segments introgressed appear to have preferentially introduced regulatory variants into European genomes, with effects on steady-state expression and responses to TLR7/8 stimulation and influenza virus (Quach et al., 2016). Furthermore, several eQTL have been identified as potential candidates for adaptive introgression, in which the archaic variants confer greater adaptation through the regulation of gene expression. Examples include genes such as DARS, which is associated with neuroinflammatory and white matter disorders (Nédélec et al., 2016); the OAS locus, at which archaic variants appear to be associated with diverse flavivirus resistance phenotypes (Sams et al., 2016); and PNMA1, which harbors a response eQTL for influenza virus and encodes a protein that interacts with the viral protein PB2 and stimulate interferon production (Quach et al., 2016).

All these studies support the notion that variation in gene expression has been an important vehicle for human adaptation (Fraser, 2013), and clearly demonstrate that selection and archaic admixture have had a significant impact on present-day inter-population differences in immune responses, at least in terms of transcriptional variability. Only a partial understanding of the factors driving immune response variability can be gleaned from analyses based solely on eQTL, as these studies provide no information about the effects of genetic variants on other molecular phenotypes, such as protein production (protein quantitative trait loci, pQTL). Indeed, it has been shown that while a substantial fraction of genetic variants influence gene expression at all levels, from mRNA to steady-state protein abundance, the effects of eQTL are attenuated at the protein level, and some pQTL have little or no effect at the mRNA level suggesting that they affect posttranscriptional gene regulation (Battle et al., 2015). Likewise, one recent study identified genetic variants regulating cytokine production in response to various microbial stimuli, and showed that some of these variants co-localize with regions presenting signatures of positive selection (Li et al., 2016). Further investigations are required, but, by and large, the studies published to date have provided invaluable resources, by identifying genetic variants affecting intermediate molecular phenotypes in response to immune challenges in different cell types. They have also increased our understanding of the molecular and cellular mechanisms underlying host immunity to infection and susceptibility to disease.

Exploration of the epigenetic and environmental factors affecting the diversity of immune responses, across individuals and populations, is currently a major challenge, as a large fraction of the variation in immune responses cannot be attributed to genetic factors (Barreiro et al., 2012; Fairfax et al., 2014; Lee et al., 2014; Çalışkan et al., 2015; Nédélec et al., 2016; Quach et al., 2016). In this context, a recent study has shown a major impact of host nongenetic factors, such as age and gender, and environmental variables, such as annual seasonality, on the production of inflammatory cytokines in healthy donors upon cellular activation by 19 (non-) microbial stimuli; for example, the production of IFN-γ and IL-22 was significantly lower in elderly individuals (Ter Horst et al., 2016). Likewise, it has been shown that the composition and function of gut microbial communities also contribute to interindividual variation in cytokine responses to immune stimulation, providing a set of microbial-derived mediators that influence immune phenotypes in healthy individuals (Schirmer et al., 2016). Furthermore, there is growing evidence to suggest that other factors, such as social environment, can also affect immune response variation and, therefore, individual physiology and disease risk (Tung et al., 2012). In this context, a recent study has sought to decipher the influence of social status on immune system functions, using female rhesus macaques as a model (Snyder-Mackler et al., 2016). They showed that social rank affects immune cell proportions, gene expression levels and responses to immune stimuli, with a stronger proinflammatory and antibacterial response in low-status individuals. Together, these studies highlight the need to further investigate, using integrative approaches, the respective effects of host genetic and nongenetic factors, together with environmental parameters and lifestyle variables, on immune response variation, to improve our understanding of the different factors contributing to health disparities between individuals and populations.

Conclusion and perspectives

The collection of sequencing data for thousands of individuals from different populations worldwide, including some archaic hominins, has provided new insight into the demographic and selection history of our species, revealing the admixture of early Eurasians with other types of humans, such as Neanderthals and Denisovans, from whom they acquired advantageous alleles. Interestingly, genomic traces of archaic introgression have also been detected in Africans (Hammer et al., 2011; Lachance et al., 2012), raising the exciting possibility that other unknown archaic groups may have contributed to human genetic diversity. Sequencing additional samples from ancient hominins and from multiple human populations, and the development of robust statistical models and computational methods for detecting selection, will further deepen our knowledge of the contribution of archaic hominins to the diversity of human traits and complex diseases and will help to identify the functions and mechanisms that have contributed to human adaptation and survival over time.

The study of genetic variants with regulatory effects on gene expression (eQTL) has also provided insight into the genetic and evolutionary determinants of population phenotypic diversity, particularly for immune-related diversity (Fairfax and Knight, 2014). However, because the regulation of gene expression is highly dependent on cellular context (Dimas et al., 2009; Nica et al., 2011; Fairfax et al., 2012), it is essential to map eQTL, under basal conditions or after challenge, in a vast array of innate and adaptive cell types, to obtain a complete portrait of the genetic control of immune responses. In this context, the GTEx project explores the landscape of gene expression across 54 different tissues, providing the richest catalog of tissue-specific and shared eQTL (GTEx Consortium, 2015). The extension of this across-tissue rationale to multiple populations from different ethnic backgrounds will, together with population genetic analyses, help to provide a comprehensive picture of the immunological mechanisms underlying host adaptation to pathogen pressure and the maintenance of homeostasis.

An expansion of current efforts to map the genetics of gene expression onto other immune phenotypes, such as protein production, metabolite levels, and heterogeneity in immune cell populations, together with the influence of the microbiome, is increasing our understanding of the genetic, epigenetic, and environmental determinants of variation in immune responses and immunity to infection (Liston et al., 2016; Brodin and Davis, 2017). For example, studies analyzing twin cohorts, as well as a population from Sardinia, have dissected the heritable and nonheritable status of immune phenotypes, such as cell population frequencies and protein expression levels (Orrù et al., 2013; Brodin et al., 2015; Roederer et al., 2015), and found, for example, that age increases interindividual variation of these parameters, owing to increased exposure to pathogens, nonpathogenic microbes, or vaccination history (Brodin et al., 2015). Likewise, a recent study has characterized the genetic, epigenetic, and transcriptomic landscape of several major immune cell types and found that transcriptional variation at the majority of genes is a result of the presence of cis-regulatory genetic variants, but epigenetic influences were also observed for a small subset of biologically relevant genes (Chen et al., 2016). In this line, ongoing efforts, such as those of the Immune Variation (ImmVar) Project, the Human Functional Genomics Project, and the Milieu Intérieur Consortium, are focusing on the dissection of how interactions between host genetic and epigenetic variation and a large number of variables, including age, sex, commensal microbiota, nutrition, latent infections, and history of vaccination, drive the variation of human immune responses. Collectively, these studies constitute useful, comprehensive resources to investigate the interplay between genetic and nongenetic factors in modulating the plasticity of the immune system (Liston et al., 2016; Brodin and Davis, 2017).

In the context of how evolution has acted on immune phenotypes, it is important to keep in mind that potentially advantageous immune traits can be transmitted across generations not only through genetics but also via some environmental and cultural factors. These include, for example, nutritional and physical activity behaviors, smoking, mate choice, and access to medical care. Thus, another way to explore the evolution of human populations and the influence of selection is to use phenotypic approaches in the natural setting (Linnen and Hoekstra, 2009; Stearns et al., 2010). Focusing on phenotypic data collection, it has been reported that traits related to morphology (e.g., weight and height), physiology (e.g., cholesterol and systolic blood pressure), and life history (e.g., age at first and last births) have been targets of selection (Stearns et al., 2010). Future studies should develop comprehensive phenotypic databases as a complement to genetic studies to assess how the environmental and cultural changes operated by human populations can affect the evolution of our species at the genetic and phenotypic levels over time.

In conclusion, the integration of all of these datasets into a clinical, epidemiological, and population genetics framework will provide new insight, and probably a few surprises, concerning the history and immune responses of the genus Homo, and the ways in which our genetic and nongenetic makeup, together with changes in our environment and cultural behaviors, influence phenotypic variation in both health and disease.

Acknowledgments

We wish to thank the members of the Human Evolutionary Genetics Laboratory for discussions and critical reading of the manuscript.

Research in the Quintana-Murci laboratory was supported by the Institut Pasteur, the Centre National de la Recherche Scientifique (CNRS), the French Government’s Investissement d’Avenir program, Laboratoire d’Excellence Integrative Biology of Emerging Infectious Diseases (grant no. ANR-10-LABX-62-IBEID), the Agence Nationale de la Recherche (ANR) grants IEIHSEER (ANR-14-CE14-0008-02) and TBPATHGEN (ANR-14-CE14-0007-02), and the European Research Council under the European Union’s Seventh Framework Program (FP/2007–2013)/ERC Grant Agreement No. 281297.

This article is distributed under the terms of an Attribution–Noncommercial–Share Alike–No Mirror Sites license for the first six months after the publication date (see http://www.rupress.org/terms/). After six months it is available under a Creative Commons License (Attribution–Noncommercial–Share Alike 4.0 International license, as described at https://creativecommons.org/licenses/by-nc-sa/4.0/).