Mapping the landscape of host-pathogen coevolution: HLA class I binding and its relationship with evolutionary conservation in human and viral proteins.

Abstract

The high diversity of HLA binding preferences has been driven by the sequence diversity of short segments of relevant pathogenic proteins presented by HLA molecules to the immune system. To identify possible commonalities in HLA binding preferences, we quantify these using a novel measure termed "targeting efficiency," which captures the correlation between HLA-peptide binding affinities and the conservation of the targeted proteomic regions. Analysis of targeting efficiencies for 95 HLA class I alleles over thousands of human proteins and 52 human viruses indicates that HLA molecules preferentially target conserved regions in these proteomes, although the arboviral Flaviviridae are a notable exception where nonconserved regions are preferentially targeted by most alleles. HLA-A alleles and several HLA-B alleles that have maintained close sequence identity with chimpanzee homologues target conserved human proteins and DNA viruses such as Herpesviridae and Adenoviridae most efficiently, while all HLA-B alleles studied efficiently target RNA viruses. These patterns of host and pathogen specialization are both consistent with coevolutionary selection and functionally relevant in specific cases; for example, preferential HLA targeting of conserved proteomic regions is associated with improved outcomes in HIV infection and with protection against dengue hemorrhagic fever. Efficiency analysis provides a novel perspective on the coevolutionary relationship between HLA class I molecular diversity, self-derived peptides that shape T-cell immunity through ontogeny, and the broad range of viruses that subsequently engage with the adaptive immune response.

Allele efficiency scores for OMIM human proteins by HLA supertype groups. HLA efficiency scores of 95 HLA alleles grouped by supertypes are shown for the a set of 4,761 human proteins that form the OMIM database. As can be seen, HLA-A alleles have higher efficiency scores than HLA-B alleles, with the exception of the B58 supertype.

Heat map distribution of allele efficiencies for human viruses and human proteins (x axis) by HLA supertype families (y axis). A matrix of efficiency scores computed for each of the 95 HLA alleles studied for 52 human viruses and a set of human proteins. Each entry in this efficiency matrix represents the efficiency score of a specific HLA allele (y axis) for a specific viral proteome. HLA alleles were grouped by supertypes, and human viruses were grouped by viral families and by Baltimore classification. Average efficiency scores over a large set of human proteins are presented in the bar to the left of the matrix. Distinct patterns of targeting efficiency can be observed for both HLA alleles (grouped by supertype or loci) and for different viral groups and families. UC, unclassified alleles that have not been assigned to supertypes; HSV-1, herpes simplex virus type 1; EBV, Epstein-Barr virus; CMV, cytomegalovirus; KSHV, Kaposi's sarcoma-associated herpesvirus; SARs-CoV, severe acute respiratory syndrome coronavirus; HTLV-1, human T-cell leukemia virus type 1; ssRNA, single-stranded RNA; RT, reverse transcriptase.

Correlations between allele efficiency scores for human proteins and human viruses. (A) Correlation (Spearman rank) between allele efficiency scores for human proteins, cytomegalovirus (herpesvirus), and dengue virus (flavivirus) (each representing one column in Fig. ). (B) Frequency distributions of correlation coefficients between proteomes derived from randomized HLA alleles (n = 10,000) compared with actual values from panel A, as indicated with arrows. (C) Correlation matrix of efficiency scores: human and viral proteomes (the three scores from panel A are dots with appropriate intensity in this matrix). The extent to which HLA efficiency scores are correlated between human viruses, as well as self-peptides (extreme left column), is represented here according to Spearman rank correlation coefficient values. For abbreviations, see the legend of Fig. .

Targeting efficiencies for dengue virus (serotype 2, whole proteome), for all 95 analyzed HLA alleles. Each dot marks the efficiency score of a single HLA allele. Alleles are sorted by loci. Blue bars represent average locus efficiencies. HLA alleles previously associated with hemorrhagic fever are marked by squares, and those associated with protection are indicated with diamonds. Differences between the two groups were found to be significant (P = 0.05).

Targeting efficiencies for HIV-1 Gag protein for all 95 analyzed HLA alleles. Each dot marks the efficiency score of a single HLA allele. Alleles are sorted by loci. Blue bars represent average loci efficiencies. HLA alleles previously associated with slow HIV disease progression are marked by triangles, and those associated with rapid disease progression are indicated with squares. Alleles that have been associated with protection from infection are marked by diamonds.