Related article:

Abstract

The T cell receptor (TCR) repertoire is diverse, thus allowing recognition of a wide range of pathogens by T cells. In humans, the study of the formation of TCR repertoires is problematic because of the difficulty in performing investigations in vivo. In this issue of the JCI, Khosravi-Maharlooei and colleagues describe a new humanized mouse model that allows direct investigations on this topic. Using high-throughput and single-cell TCR–complementarity-determining region 3 β (TCR-CDR3β) sequencing, the authors were able to demonstrate that human thymic selection is a major driver of TCR sequence sharing, also implicating a preferential selection of shared cross-reactive CDR3βs during repertoire formation.

A functional TCR repertoire requires great diversity to recognize a wide range of pathogens. Extrapolations of early sequencing results on a few hundred T cell receptors (TCRs) led to an estimate of approximately 106 different TCR β chains in human blood, each pairing, on average, with at least 25 different α chains (1). Recent studies have used next-generation TCR sequencing to capture the diversity of TCR repertoires. The first such study estimated that normal human blood contains 3 × 106 to 4 × 106 unique TCR β chains (2), while a later study provided a minimal estimate of 100 × 106 unique TCRβ sequences in naive CD4+ and CD8+ T cell repertoires of young adults (3). Another study estimated that there are 40 × 106 to 70 × 106 unique TCRβ sequences and 60 × 106 to 100 × 106 TCRα sequences in human pediatric thymi (4). However, a surprising degree of repertoire overlap has been observed between individuals in studies of peripheral T cells (5). In silico modeling suggested a role for recombination bias in generating shared sequences in mice (6). In a human system, we have investigated the role of thymic selection in the generation of shared sequences.

Diversity of the TCR repertoire is formed at different levels. First, different Vβ, Dβ, and Jβ genes recombine to generate TCR β chains. Similarly, different Vα and Jα genes recombine to generate TCR α chains (7). Exonuclease removal of exposed residues and addition of random nucleotides at the junctional sites mediated by the enzyme terminal deoxynucleotidyl transferase (TdT) further diversifies the TCR repertoire (8). Finally, α and β chains combine to form functional TCRs. Studies in mice have shown that thymocytes undergo different selection processes that shape the TCR repertoire after the formation of functional TCRs (9, 10). This repertoire is further altered in the periphery by the expansion and deletion of certain clones (11). Our knowledge of the formation of the human TCR repertoire is limited by access to human thymus samples and the inability to manipulate variables in vivo. Humanized mice generated by transplanting immunodeficient mice with human thymus and hematopoietic stem cells (HSCs) provide a unique opportunity to investigate the factors involved in the formation of a human TCR repertoire. The few studies in the literature that have used humanized mice to investigate the human thymus T cell repertoire (12–15) are limited to V-J gene usage and complementarity-determining region 3 (CDR3) length distribution analysis (spectratyping). Here, we performed high-throughput and single-cell TCR immunosequencing of human thymocyte subsets generated in human thymi and the periphery of replicate humanized mice to determine whether exposure to the same antigens for positive and negative selection in thymus tissue from the same human donor would lead to the formation of similar TCR repertoires in different individuals. We also used this model to investigate the impact of positive and negative selection on the human TCR repertoire and to examine the factors that determine V-J pairing. Although initial repertoire formation was highly stochastic and differed markedly among biological replicates, we demonstrate preferential selection of a set of “public” sequences that can be selected by disparate HLA alleles. Comparison with known allo–cross-reactive and type 1 diabetes–associated (T1D-associated) autoreactive TCRs support the interpretation that these are highly cross-reactive TCRs that are preferentially selected and shared between different individuals.

Kinetics of human cell development in humanized mice and the histology of grafted thymi. To study the formation of the human thymic and peripheral TCR repertoire, we generated 3 batches of mice. As shown in Figure 1A, the first experiment consisted of 3 mice (1autoA, 1autoB, and 1autoC) that were generated by transplantation with the same fetal liver HSCs and autologous fetal thymus. Therefore, they had the same genetic background, and selection took place in the same thymus (Figure 1A). The second batch consisted of 6 mice that were generated by transplantation with the same fetal liver HSCs (different from experiment 1). Mice designated 2autoA, 2autoB, and 2autoC received an autologous fetal thymus, while the mice designated 2alloA, 2alloB, and 2alloC received an allogeneic fetal thymus, so that thymic selection occurred in a different thymus, whereas the thymocyte genetic backgrounds were the same as those of the other 3 mice (Figure 1B). Experiment 3 consisted of 2 mice (2autoA and 2autoB) that were thymectomized and transplanted with the same fetal liver HSCs and autologous fetal thymus. We also analyzed and sequenced peripheral CD4+ and CD8+ cells in these mice, in addition to thymic single-positive CD4+ (SP-CD4) and SP-CD8 cells. The mice in the first experiment were euthanized 14 weeks after transplantation, whereas the mice in the second and third experiments were euthanized 20 and 22 weeks after transplantation, respectively. Supplemental Figure 1A (supplemental material available online with this article; https://doi.org/10.1172/JCI124358DS1) shows the gross appearance of the spleen, lymph nodes (LNs), and the grafted thymus under the kidney capsule of a representative humanized mouse at the time of harvest. Supplemental Figure 1B shows H&E staining of a representative grafted thymus and a thymus from a 13-year-old child. Cortical (hypercellular) and medullary (hypocellular) areas and Hassall’s corpuscles in the medullary areas are noticeable in the H&E stains. Immunofluorescence staining of a representative grafted thymus and a thymus from a 13-year-old child stained for HLA-DR, cytokeratin 8 (CK8) and CK14 is shown in Supplemental Figure 1C. HLA-DR+ cells that were not stained for CKs were HSC-derived antigen-presenting cells (APCs) that were mainly concentrated in medullary areas. We further characterized the APCs in grafted thymi by flow cytometric analysis (FCM). B cells (CD19+), monocytes (CD14+), and DCs (CD11c+) collectively constituted approximately 30% of the double-negative (CD4–CD8–) cells in grafted thymi (Supplemental Figure 1D).

The kinetics of the peripheral appearance of human immune cells (hCD45+), B cells (CD19+), and T cells (CD3+), as well as the T cell naive/memory phenotype are shown in Supplemental Figure 1, E–H. The majority of T cells in peripheral blood at weeks 14–16 were naive.

Our method for constructing humanized mice included several measures to eliminate preexisting thymocytes and their progeny from the transplanted fetal thymic tissue. These measures included freezing and thawing the thymus tissues as described previously (16), pipetting up and down to physically release thymocytes, and injecting 2 weekly doses of a depleting anti-CD2 antibody as described previously (16). To assess the role of cells carried in the thymic tissue in producing peripheral and intrathymic T cell populations in this model, we generated a batch of mice with allogeneic fetal HSCs and thymus tissue. The fetal thymic cells were HLA-A3–, whereas the fetal HSCs were HLA-A3+. Twenty-four weeks after transplantation, we euthanized the animals and evaluated the origin of T cells in grafted thymi and peripheral lymphoid tissues. Approximately 3% of double-positive (DP) and SP-CD8 thymocytes and 2% of SP-CD4 cells were thymus graft derived (HLA-A3–) (Supplemental Figure 1I). Approximately 0.5% of CD4+ and CD8+ cells in the spleen were thymus graft derived (Supplemental Figure 1J). Therefore, the majority of T cells in the grafted thymi and spleens of these animals were derived from the HSCs that were given intravenously.

Effect of selection on diversity. The cell counts of grafted thymi in addition to the sorted cell numbers are summarized in Supplemental Table 1. For each sample, we obtained template counts, clonality scores, and unique clone counts at the nucleotide level (for both productive rearrangements and nonproductive rearrangements that include frame shifts or premature stop codons) and the aa level. These data are shown in Supplemental Table 1. Template counts for CD69– DP cells were lower than expected from the number of cells, probably reflecting the rearrangement of TCRβ after acquisition of the DP phenotype in a significant fraction of cells (17). Clonality (a normalized measure of inverse diversity based on CDR3β sequences) in all thymic samples was very low, demonstrating production and selection of a highly diverse repertoire in the human thymus grafts. Clonality scores are typically much higher for both CD4+ and CD8+ T cells in human peripheral blood, most markedly for CD8+ T cells, presumably reflecting antigen-driven expansions (18). Accordingly, clonality of peripheral CD4+ and CD8+ cells was markedly higher than that of thymic SP-CD4 and SP-CD8 cells in experiment 3 (Figure 1F). Although only some differences achieved statistical significance, all thymocyte subsets (CD8+ SP, CD4+ SP non-Tregs, CD4+ Tregs) showed increased clonality scores for aa compared with nucleotide sequences (Figure 1, D and E, and Supplemental Table 1). Collectively, these results show the effect of selection on narrowing the TCR repertoire, since selection is applied to TCR protein and multiple productive nucleotide sequences can produce the same peptide sequence.

In experiment 2, in which CD69– and CD69+ DP thymocytes were sequenced in addition to the 3 SP subsets, each animal showed very low clonality scores for the CD69– DP cell population. With the exception of 1 animal (2alloB), we were unable to detect a positive selection–induced increase in clonality in the CD69– to CD69+ transition. However, a comparison of CD69– DP cell populations and all 3 SP cell subsets (SP-CD4 non-Treg [referred to hereafter as SP-CD4 for simplicity], SP-CD8, and CD4+ Tregs [referred to hereafter as Tregs]) revealed an increase in aa sequence clonality (Figure 1E and Supplemental Table 1). Collectively, these data demonstrate a narrowing of the T cell repertoire due to thymic selection.

Compared with the original fetal thymus, clonality scores were lower in the grafted thymi for SP-CD8 and Treg populations (Figure 1G), demonstrating greater diversity of the thymocytes generated from human HSCs used to construct humanized mice than in the original fetal thymus, which had a gestational age of 17 weeks, when thymic development and generation of a fully diversified repertoire is not complete. It has been previously reported that the TCR repertoire of mouse neonates (day 1 after birth) is much narrower than that of adult mice because of the lack of random nucleotide insertions in the CDR3s (19). As shown in Figure 1H, TdT was not expressed in DP cells of fetal human thymus. However, it was expressed in DP cells of postnatal thymi as well as grafted human thymus in humanized mice, thus explaining the greater diversity of TCR repertoires in grafted human thymi in our study compared with that in the fetal human thymus.

Role of stochastic rearrangement and selection in TCR repertoire formation. To obtain an understanding of the impact of stochastic TCR rearrangement versus background genetics on the TCR repertoire, we compared repertoires generated under the same conditions from the same progenitor pool across identical as well as allogeneic, extensively HLA-mismatched (Supplemental Table 2) thymi by measuring the Jenson-Shannon divergence (JSD) and the number of shared CDR3β TCR sequences, as the shared CDR3β fraction quantifies sharing of unique sequences and JSD additionally accounts for the frequency of shared sequences. In experiment 1, even though all 3 mice received the same HSCs and thymus from the same human fetal donor, their TCR repertoires at the level of CDR3β were highly divergent at both the nucleotide and aa levels (Figure 2A). In experiment 2, in which 6 mice received the same HSCs, with 2autoA, 2autoB, and 2autoC mice receiving autologous thymus tissues and 2alloA, 2alloB, and 2alloC mice receiving allogeneic thymus tissues, we observed a similarly high divergence among all thymi in different cell populations (Figure 2B). Furthermore, there was no difference in divergence between pairs of mice whose T cells developed in the same thymus versus those whose T cells developed in the allogeneic thymi (Supplemental Table 3). In addition, the observed divergence between mice for both aa and nucleotide repertoires was significantly higher than the baseline generated from repeated undersamplings of identical repertoires for all thymic subpopulations, as determined by both JSD (Supplemental Figure 3A) and the shared CDR3β fraction (Supplemental Figure 3B). All of these findings emphasize the highly stochastic nature of TCR repertoire formation at the level of CDR3β.

Repertoire divergence between animals in each experiment. (A–C. ) JSD scores at nucleotide and aa levels for each cell population in experiment 1 (n = 3 comparisons), experiment 2 (n = 15 comparisons), and experiment 3 (n = 1 comparison), respectively. JSD scores for each possible pair of mice were calculated and are presented as box-and-whisker plots, which show the median, range, and interquartile range, as well as outliers (except experiment 3, for which only 1 comparison per cell subset is shown, because there were only 2 mice). (D) JSD aa scores across different cell populations in experiment 2 for all sequences versus the 100 most frequent sequences. (E) JSD aa scores for TCR repertoires of different cell populations from grafted thymi of the 6 mice in experiment 2 compared with the original autologous fetal thymus. *P < 0.05, **P < 0.01, and ***P < 0.001, by paired t test with Bonferroni’s multiple testing correction for all comparisons.

In all experiments, and in every thymocyte and peripheral subset, the divergence was lower at the aa level compared with the nucleotide level, and the JSD decreased for selected (CD69+ DP and SP cell populations) compared with unselected (CD69– DP) cell populations at the aa but not the nucleotide level (Figure 2, A–C). This finding suggested that, despite the stochastic nature of repertoire formation, thymic selection in identical thymi results in selection of some shared sequences between individuals.

We compared the fraction of CDR3βs that were shared between every possible pair of mice for each thymocyte population. As shown in Figure 3, the fraction of shared CDR3βs between paired mice in all 3 experiments was less than 4% of all thymic TCRβs for each mouse. The proportion of shared CDR3βs was highest at the aa level and also was higher at the productive/nucleotide level compared with the nonproductive/nucleotide level (Figure 3, A–C, and Supplemental Table 4). In addition, the proportion of shared CDR3βs increased significantly during transition from the CD69– DP to the CD69+ DP stage, indicating positive selection for these shared CDR3βs. The proportion of shared CDR3βs further increased for the sorted mature (CD3hiCD5hi) SP-CD4 and -CD8 cell populations, which had completed positive selection and partially undergone negative selection, compared with the positively selected CD69+ DP cell population (Figure 3B). CDR3β sharing was even higher in peripheral CD4+ and CD8+ cell subsets compared with thymic SP-CD4 and SP-CD8 samples in experiment 3 (Figure 3C), possibly due to repertoire narrowing via completion of negative selection and post-thymic selection and expansion of certain clones. Pairwise divergence analyses by JSD yielded similar results, with significant decreases in DP-CD69+ compared with DP-CD69– cells and further decreases in the mature SP cell populations (SP-CD4, SP-CD8) (Figure 2B) and peripheral CD4+ and CD8+ cell populations (Figure 2C). Together, these findings demonstrate that both positive and negative selection of human thymocytes increases CDR3β overlap between individual T cell repertoires. However, the highest proportion of sequences shared between 2 replicate thymic repertoires found in the SP-CD4 subset accounted for only 3.5% of the repertoire.

Proportion of shared CDR3βs between animals and experiments. (A–C) Box-and-whisker plots (dot plot for experiment 3 because of the smaller sample size) comparing proportions of shared CDR3βs between each asymmetric mouse pair in experiment 1 (n = 6 comparisons), experiment 2 (n = 30 comparisons), and experiment 3 (n = 2 comparisons) for each cell population at the nucleotide nonproductive, nucleotide productive, and aa levels. (D) Box-and-whisker plot distributions of the proportion of shared CDR3βs comparing all versus the top 100 sequences by frequency in experiment 2. (E) Comparisons in both directions between each pair of mice in experiment 2, depending on whether the mice received the same (autologous thymus, n = 12 comparisons) or a different thymus (allogeneic thymus, n = 18 comparisons). (F) Distributions of the proportion of shared CDR3βs between each pair of mice within and across experiments. Exp, experiment. (G) Ratio of unique CDR3β nucleotide sequences per aa sequence (Nt/aa ratio) in shared versus unshared sequences for each pair of mice in experiment 2. Supplemental Table 5 shows the mean Nt/aa ratio for each subset and P values comparing different subsets. Box-and-whisker plots show the median, range, interquartile range, and outliers. ***P < 0.001, by paired t test with Bonferroni’s multiple testing correction.

As another readout of the effect of selection, we compared aa sequence convergence of nucleotide sequences for shared and unshared CDR3βs. For each pair of mice in experiment 2, we measured the number of unique CDR3β nucleotide sequences corresponding to each aa sequence shared between the same cell populations in both mice (shared) compared with the number of unique nucleotide sequences corresponding to each aa sequence present in at least 1 of the mice but not shared between both (unshared). Although the average nucleotide-per-aa sequence ratio was close to 1 for unshared sequences, it was significantly higher for the shared CDR3βs in all cell populations, indicating preferential selection of shared aa sequences (Figure 3G and Supplemental Table 5). Within the population of shared CDR3βs, this ratio was significantly higher in DP-CD69+ cells compared with that in DP-CD69– cells and in SP cell populations (except Tregs) compared with the ratio in DP-CD69+ cells, indicating selection for the shared sequences (Supplemental Table 6).

The JSD between different mice was lower among the 100 most frequent sequences compared with all sequences for each of the 5 selected and nonselected cell populations, indicating greater overlap among the more abundant sequences (Figure 2D). Consistently, the fraction of CDR3β sequences overlapping between animals was greater among the top 100 sequences compared with the entire cell population (Figure 3D). Although this finding may reflect the greater likelihood of detecting abundant sequences in general, it is also consistent with the possibility that the shared CDR3βs are preferentially selected.

Surprisingly, for the 5 different selected and nonselected cell populations, the proportion of shared CDR3βs was not different between mice with allogeneic versus autologous thymi (Figure 3E). Furthermore, we detected no dramatic increase in shared CDR3βs among mice within an experiment compared with those between experiments, despite the different genetic backgrounds of the HSCs and thymi used to generate the T cells in each experiment (Figure 3F). In addition, different cell subsets in allogeneic and autologous thymi in experiment 2 had similar divergences compared with the original autologous fetal thymus (Figure 2E). Supplemental Tables 7 and 8 show the numbers of unique and shared CDR3βs between the 3 mice that were generated by transplantation with the same thymus and HSCs in experiment 1 and the 6 mice transplanted with allogeneic and autologous thymi in experiment 2, respectively. Supplemental Table 9 shows the total number of shared and nonshared CDR3βs for each cell population in each experiment at both aa and nucleotide levels. Consistent with the results described above, the number of overlapping CDR3βs increased as the selection progressed, and we detected dramatically larger numbers of overlapping CDR3βs at the aa sequence level compared with numbers at the nucleotide sequence level.

In order to address variable template counts across samples, we validated the results of the repertoire divergence analysis by randomly subsampling each sample to the same template count and then repeating the analysis. With 3 subsamples of 1000 templates each, we observed the same trends as in whole-sample comparisons with regard to the shared CDR3β fraction (Supplemental Figure 3C). Specifically, we observed consistently increased sharing at the aa level compared with the nucleotide level and increased sharing in DP-CD69+ samples compared with DP-CD69– samples, and in SP compared with DP samples. Therefore, our results are stable across random subsamples of the data, regardless of the variable sample sizes (Supplemental Figure 3C).

Shared CDR3βs have a shorter length due to fewer N insertions than do unique CDR3βs and often use different V genes. Further characterization of the CDR3βs that were shared between any 2 thymi versus those that were detected in only 1 thymus (unique sequences) revealed that the shared CDR3βs were significantly shorter than the unique CDR3βs. The shared CDR3βs had an average length of approximately 40 nucleotides, whereas the unique CDR3βs had an average CDR3β length of approximately 44 nucleotides (Figure 4A). The number of inserted nucleotides at V-D and D-J junctions was significantly lower for the shared CDR3βs compared with numbers for the unshared CDR3βs (Figure 4B). As the number of V and J nucleotide deletions was slightly higher in unshared CDR3βs (Figure 4, C and D), the shorter length of shared CDR3βs was thus attributable to the lower number of nucleotide insertions in these sequences. The shorter length of shared CDR3βs did not simply reflect the fact that they tended to be relatively abundant, as the average CDR3 length of the 1000 most abundant CDR3βs overall was significantly greater than that of the 1000 least abundant CDR3βs (Figure 4E and Supplemental Table 6). However, each animal showed CDR3β shortening as selection progressed from the CD69+ DP to the SP stage among the 1000 most abundant sequences, but not among the least abundant sequences (Figure 4E and Supplemental Table 10), indicating a selective preference for shorter shared CDR3βs. Shortening of CDR3βs continued further in the transition from thymic SP to peripheral CD4+ and CD8+ cells (Figure 4F).

Characteristics of shared versus unshared CDR3βs. (A) Nucleotide length distribution of shared versus unshared SP-CD4 CDR3βs in experiment 2. (B) Number of nontemplate nucleotide insertions at V-D plus D-J junctions for shared versus unshared SP-CD4 CDR3βs. (C and D) Number of nucleotides that are deleted from the 3′ end of V genes and the 5′ end of J genes at V-D and D-J junctions of SP-CD4 CDR3βs, respectively. (E) Distribution of combined (all 6 mice in experiment 2) CDR3β length for the 1000 most frequent CDR3βs and the 1000 CDR3βs with the lowest frequencies across different thymic cell populations. The Supplemental Table 6 shows P values comparing different cell subsets (unpaired t test). (F) Nucleotide CDR3β length for all thymic and peripheral T cell subsets in experiment 3. (G) Proportion of shared CDR3βs (aa level) using the same Vβ gene, Jβ gene, and Vβ-Jβ pair for SP-CD4 and SP-CD8 T cell populations, comparing mice with the same (autologous) versus allogeneic thymus in experiment 2. ***P < 0.001, by paired t test with Bonferroni’s multiple testing correction. Box-and-whisker plots show the median, range, interquartile range, and outliers.

We characterized the V and J gene usage of shared CDR3βs among SP-CD4 cells. Only approximately 20%–25% of CDR3βs shared between different thymi used the same V gene, whereas almost all shared CDR3βs used the same J gene. The percentage of shared CDR3βs that used the same V-J pair and hence the same TCRβ chain is therefore 20%–25% (Figure 4G). These percentages were not different when allogeneic versus autologous thymi were compared.

TCRβ chain overlap between different cell subsets in individual human thymus grafts. To better understand the selection of shared CDR3β sequences, we compared CDR3β sequences of SP T cell populations within each mouse in experiment 2. As shown in Figure 5A, there was overlap in CDR3β sequences between SP thymocyte populations from each mouse, especially among the 100 most frequent sequences. Among the sequences with identical CDR3βs in different mature thymocyte subsets, approximately 60% used the same V gene within individual mice (Figure 5B), whereas approximately 40% used different V genes. Almost all of the shared CDR3βs were associated with the same J gene, so approximately 60% of TCRs with shared CDR3βs used the same V-J pair and shared the entire TCRβ chain (Figure 5B). Since the HLAs that SP-CD4 and SP-CD8 cells are selected on are different, these results suggest that cross-reactive TCR β chains can be selected on different MHCs. CDR3βs that were shared in both SP-CD4 and SP-CD8 cells of each mouse in experiment 2 had an average nucleotide-per-aa sequence ratio of approximately 2, whereas the CDR3βs that were not shared had an average ratio close to 1, pointing to preferential selection of the shared CDR3βs in both T cell subsets (Supplemental Table 11).

Overlap between different cell subsets and enrichment for cross-reactive/autoreactive CDR3βs among shared sequences. (A) Proportions of shared CDR3βs between paired cell populations in each thymus graft in experiment 2 (aa level) among all versus the 100 most frequent CDR3βs (n = 6). Potentially ambiguous sequences present in more than 1 cell population were not removed from this analysis. Supplemental Table 11 shows the average number of unique nucleotide sequences per aa sequence for shared versus unshared CDR3βs between SP-CD4 and SP-CD8 cells. (B) Proportion of shared CDR3βs with a shared Vβ gene, Jβ gene, and Vβ-Jβ pair, comparing each pair of SP cell populations in each mouse in experiment 2. (C and D) ORs of cross-reactivity in shared versus unshared sequences, sharing in cross-reactive versus allo–non–cross-reactive sequences, and T1D reactivity in shared versus unshared sequences for experiments 1, 2, and 3. P values are shown in Supplemental Table 12. (E) Clone fraction and cumulative frequency of T1D-reactive CDR3βs in different cell subsets in experiment 2. *P < 0.05 and ***P < 0.001, by unpaired t test with Bonferroni’s correction (A) and paired t test with Bonferroni’s correction (paired by mouse) (E). Box-and-whisker plots show the median, range, interquartile range, and outliers.

Shared CDR3βs are more likely to be cross-reactive than are unshared sequences. The data above suggested that shared CDR3 sequences might be highly cross-reactive against disparate specificities. To address this possibility, we compared the repertoires of shared and unshared CDR3βs from SP cell populations in both experiments with a list of cross-reactive CDR3βs defined by a greater than 2-fold frequency expansion in mixed lymphocyte reactions of a human peripheral blood sample against 2 different allogeneic donors sharing no HLA alleles. Among 100,112 and 29,033 alloreactive CDR3β sequences, 1,019 sequences expanded to both stimulators and were therefore identified as cross-reactive. Fisher’s exact test revealed a highly significant increase (Supplemental Table 12) in the rate at which shared versus unshared CDR3β sequences from experiments 1, 2, and 3 were cross-reactive against 2 different sets of alloantigens (Figure 5C). Conversely, we observed a highly significant increase in the odds of allo–cross-reactive sequences compared with alloreactive but non–cross-reactive sequences being shared between the mice in experiments 1 and 2 (Figure 5C) . P values by Fisher’s exact test for the OR of cross-reactivity in shared versus unshared sequences as well as for the OR of sharing in cross-reactive sequences versus allo–non–cross-reactive sequences are listed in Supplemental Table 12. These data demonstrate that shared CDR3 sequences are more cross-reactive than are unshared sequences.

Selection of autoreactive TCRs. In view of the evidence for cross-reactivity of shared sequences selected between disparate thymi and subsets, we hypothesized that shared sequences might be enriched for autoreactivity. We interrogated a previously described list of 1655 T1D-associated autoreactive CDR3βs (20), along with some newer unique CDR3β aa sequences (total of 2208 sequences) associated with T1D, largely from peripheral blood but also found in pancreas, LNs, and spleen of T1D donors from the network for Pancreatic Organ donors with Diabetes (nPOD) program (21). These sequences were derived from a number of assays including sequencing of T cells following FACS proliferation of dye-labeled responding T cells harvested following culture with autoantigens (22), direct MHC tetramer isolation of autoreactive T cells (22–25), or following isolation and examination of peptide reactivities from islet-infiltrating T cells (26). T1D reactivity for these sequences was defined as reactivity to islet antigens such as GAD65 and insulin as described previously (21).

Comparison of these autoreactive TCRs with the TCR repertoires of grafted thymi in experiment 2 revealed a significant increase in both the cumulative frequency and clone fraction of T1D-associated sequences in SP-CD8 versus DP-CD69– cell populations (Figures 5E). Remarkably, the odds that a CDR3β shared between SP subsets in any 2 mice in experiments 1, 2, or 3 was T1D reactive was highly significantly greater than that for nonshared CDR3βs (Figure 5D), suggesting that shared CDR3s were enriched for autoreactivity. The P values for the odds of T1D reactivity in shared versus unshared sequences are listed in Supplemental Table 12.

CDR3α and TCR sharing from single-cell sequencing. To determine the extent to which CDR3β sharing was associated with sharing of the entire TCR, including the α chain, we performed single-cell TCR sequencing of thymic SP-CD4 cells from the same mice whose cells were bulk-sequenced in experiment 2 (except mouse 2autoA, due to a technical failure). Comparing each pair of mice, we found that the level of CDR3α sharing was significantly higher than that of CDR3β sharing (Figure 6A). However, the level of sharing for paired CDR3α-CDR3β was near zero and significantly lower than for either TCR chain on its own (Figure 6A), showing that the TCRs were almost always different among clones with a shared CDR3 α or β sequence. Consistent with the findings from bulk sequencing, the levels of shared CDR3s were not different between mice with allogeneic versus autologous thymi, either for TCRα, TCRβ, or paired TCRα-TCRβ (Figure 6B). The number of unique CDRαs, CDR3βs and paired CDR3α-CDR3βs, the fraction of cells with a β chain that have at least 1 paired α chain or 2 paired α chains and the fraction of cells with an α chain that have a paired β chain is shown in Supplemental Table 13.

Fraction of shared CDR3αs, CDR3βs, and paired CDR3α-CDR3βs revealed by single-cell T cell sequencing. (A) Fraction of shared CDR3αs, CDR3βs, and paired CDR3α-CDR3βs for SP-CD4 cells between each pair of mice in experiment 2 (except 2autoA mice) at the aa level (comparisons in both directions, n = 20 comparisons). ***P < 0.001, by unpaired t test. (B) Comparisons in both directions between each pair of mice, depending on whether the mice received the same (autologous thymus, n = 8 comparisons) or a different thymus (allogeneic thymus, n = 12 comparisons).

Sub-sequence features are conserved in shared CDR3βs. Methods from Greiff et al. (27), which successfully distinguished between public and private antibody repertoires, were applied to this data set to determine whether sub-sequence-level features can distinguish between shared and unshared sequences. This method uses a normalized gapped k-mer (2 sub-sequences of length k, separated by a gap of up to m aa) count as an input to a support vector machine (SVM) to determine whether a shared or unshared status can be predicted. Optimal parameters determined by Greiff et al. (k = 1, m = 1, and cost = 100) were used for SVM analysis, and 10-fold cross validation was performed to assess the performance of the classifier, using balanced accuracy (mean of sensitivity and specificity) as a performance metric. This was repeated on 100 length-matched shared and unshared sequence data sets generated as described above. As shown in Supplemental Figure 4A, these features can be used to predict a shared or unshared status of sequences with a median balanced accuracy of approximately 62% to 78% for all cell subsets, in which 50% would be equivalent to a random classifier. The frequency of gapped k-mers in shared sequences plotted against the frequency in unshared sequences further supported the hypothesis that there are sub-sequence features that are conserved in shared sequences (Supplemental Figure 4B). We also found a notable enrichment in the “CASSL” motif at the 5′ end of shared CDR3βs relative to unshared sequences, even in the unselected CD69– DP cell population (Supplemental Figure 5), though that motif was highly represented in both shared and unshared sequences (Supplemental Figure 5).

Evidence suggesting a role for self-peptides in human thymocyte selection. In preselection murine thymocytes, TCRβ CDR3 interfacial hydrophobicity at position 6 and position 7 (P6 and P7), the residues that interface with the peptide and MHC, correlated with the ability to be activated by self-peptide and MHC (28). Stadinski et al. developed a self-reactivity index based on the hydrophobicity of aa at CDR3β P6 and P7 and showed that this index correlates well with increased and decreased self-reactivity during positive and negative selection, respectively. We performed a similar analysis on human thymocyte and peripheral T cell subsets from experiments 2 and 3, focusing on CDR3β lengths of the greatest frequency in all thymocyte subsets (13–16 aa). For each mouse, we analyzed P6 and P7 aa frequencies in the thymic (DP-CD69–, DP-CD69+, SP-CD4+, SP-CD8+, and Treg) and peripheral (CD4+ and CD8+) cell populations, and normalized the frequencies within each cell population. For each animal, a fold-change in frequency of P6 and P7 aa residues between cell populations was recorded, and these values were averaged across the mice within each grafted thymus group (i.e., experiment 2, allogeneic versus autologous thymus, and experiment 3). For experiment 2 samples, we compared SP-CD4, SP-CD8, and Tregs against DP-CD69– thymocytes to evaluate the entire thymic selection process, DP-CD69+ thymocytes against DP-CD69– thymocytes, and SP-CD4, SP-CD8, and Tregs against DP-CD69+ thymocytes. For experiment 3 samples, we compared peripheral CD4+ cells and peripheral CD8+ cells against SP-CD4 and SP-CD8 cells, respectively. As shown for P6 in Figure 7A and Supplemental Figure 6A and for P7 in Figure 7B and Supplemental Figure 6B, we observed a trend toward enrichment of hydrophobic aa (as defined in Figure 8C in ref. 29) at both positions as thymic selection progressed. Results of the Spearman’s nonparametric rank test for the mice in experiments 2 and 3 are shown in Figure 7, A and B. We observed statistically significant correlations between the fold changes of the aa residue at P6 (Figure 7A) or P7 (Figure 7B) and its hydrophobicity during selection from the DP-CD69– stage to the mature SP-CD4, CD8+, and Treg populations. Both autologous and allogeneic thymi showed a similar trend toward increasing hydrophobicity at P6 and P7 as selection progressed (Supplemental Figure 6). Positive selection from the CD69– to the CD69+ DP stage was associated with significantly increased P6 hydrophobicity, and overall selection from CD69– DP to both CD4+ and CD8+ SP populations was associated with significantly increased hydrophobicity at P6 and P7 and at P6 for the CD69–-to-Treg transition. The CD69+ DP to SP transition was associated with significantly increased hydrophobicity only for SP-CD8 cells at P6 and for SP-CD8 and Tregs at P7. Overall, we found that increased hydrophobicity with the transition from DP-CD69– to SP cells was more pronounced than with the transition from DP-CD69+ to SP cells. This trend was stopped or reversed in the transition from SP-CD4 and SP-CD8 cells to peripheral CD4+ and CD8+ cells, both at P6 and P7 (Figures 7, A and B, and Supplemental Figure 6). In sum, our data demonstrate an increase in hydrophobic aa usage at P6 and P7 in association with selection of human thymocytes (more associated with positive selection) and arrest or reversal of this trend in the transition from SP thymocytes to peripheral T cells, possibly in association with completion of negative selection.

Interaction with self-peptides in the selection of shared and unshared sequences. Fold changes (mean ± SEM) in the relative aa frequencies versus hydrophobicity of the aa based on Gibbs free energy at P6 (A) or P7 (B) for transition from DP-CD69– to DP-CD69+ cells and from there to SP cell subsets in experiment 2, and also in transition from SP-CD8 and SP-CD4 to peripheral CD8+ and CD4+ cells for experiment 3. Spearman’s correlation coefficient R and P values from the nonparametric Spearman’s correlation test are shown. Negative R values imply that, as hydrophobicity increases, so does the fold change in the relative aa frequency across the 2 cell populations. *P < 0.05, **P < 0.01, and ***P < 0.001, by unpaired t test. (C) Differential abundance of each aa at each position in CDR3β, computed by random selection of a length-matched unshared sequence for each shared sequence. Shared sequences are those present in at least 2 mice, and unshared sequences are unique to a single mouse. Only results for aa producing a Benjamini-Hochberg–adjusted P value of less than 0.05 by Fisher’s exact test are shown. The aa plotted at a frequency of 0 were preferentially used at that position in shared sequences, whereas those with a frequency of less than 0 were preferentially used in unshared sequences.

Shared sequences might escape negative selection. To analyze differential usage of aa at each position as defined by the international ImMunoGeneTics (IMGT), we performed a Fisher’s exact test for all sequences in each of the 100 length-matched data sets of shared and unshared sequences. Differentially used aa were plotted if the Benjamini-Hochberg–adjusted P value was less than 0.05 for the Fisher’s exact test to ensure that differences were significant (Figure 7C). Only aa showing up in at least 75 of the 100 downsamples were annotated. We noted a significant enrichment for the neutral aa G and hydrophilic aa (e.g., Q) and a significant decrease in hydrophobic aa (e.g., W) at P6 and P7 (equal to positions 109 and 110 on the plots, respectively) in shared sequences among CD69+ DP cells, most thymic SP cell populations, and peripheral cell populations (Figure 7C). We did not observe this pattern for shared sequences among CD69– DP cells. The reduced hydrophobicity at P6 and P7 in shared sequences among selected but not unselected cell populations suggests that selected shared sequences may have weaker interactions with self-peptides than unshared sequences and that this may allow them to escape negative selection.

V and J gene usage. As shown in Supplemental Figure 7, the pattern of V and J gene usage for the SP-CD4 cell population was very similar between the mice that received autologous tissue versus those that received allogeneic thymus tissue in experiment 2, with no significant differences in V or J gene usage, arguing against a major role for selection in determining V and J gene usage. Overall, we detected a similar pattern of V and J gene usage for SP-CD4 cells when comparing mice in experiments 1, 2, and 3, which received different HSCs as well as different thymi (Supplemental Figure 7). We also observed a similar pattern of V and J gene usage between thymic SP-CD4 and peripheral CD4+ cells in the mice in experiment 3 (Supplemental Figure 7). We also observed similar V and J gene usage patterns across all cell populations for the mice in experiment 2 (Supplemental Figure 8). Few statistically significant differences are shown in Supplemental Figure 7 and Supplemental Figure 8. Thus, we detected only small effects of genetic background and/or thymic selection on the overall pattern of V and J gene usage. These findings were confirmed by single-cell TCR-sequencing data, which showed a similar pattern of V and J gene usage for both α and β chains comparing SP-CD4 cells in mice with allogeneic versus autologous thymi in experiment 2 (Supplemental Figure 9).

Supplemental Figure 10A shows plots of VJ usage for CD4+ repertoires of 1autoA, 1autoB and 1autoC mice, which received the same thymic tissue and HSCs. These representative plots show no disproportionately favored V-J pairing in the repertoire. We detected a strong correlation between the observed VJ usage and the VJ usage expected from the stochastic combination of V genes with J genes according to the background frequency of each V and J (Supplemental Figure 10B). The observed and expected VJ distributions were compared by Mann-Whitney U test, which failed to reject the null hypothesis that VJ pairing is stochastic.

We have demonstrated the impact of positive and negative selection in a human thymus on the human TCR repertoire. We observed very high diversity among human thymocytes at all stages of development. This diversity was much greater than that for peripheral blood human T cells that include memory cell populations (18), consistent with studies demonstrating that TCR repertoires of human naive T cells are much more diverse than those of memory T cells (3). The observed lower diversity at the aa versus the nucleotide sequence level, in productive versus nonproductive sequences, and in selected SP cell populations (SP-CD4, SP-CD8, and Tregs) versus nonselected DP thymocytes demonstrates that thymic selection narrows the human TCR repertoire at both the DP and the SP stages. We demonstrated that diversity further decreases in peripheral CD4+ and CD8+ cells in humanized mice.

The diversity of the SP-CD4 and SP-CD8 TCR repertoires in autologous and allogeneic grafted thymi was greater than that of the original fetal thymus used for the generation of the humanized mice, reflecting the immaturity of the repertoire in the 17-gestational-week fetus used and confirming that the thymocytes that developed in our studies arose de novo from engrafted stem cells as a result of the success of our procedures for purging the graft of preexisting thymocytes prior to implantation. The TCR repertoire of neonatal mice is much narrower than that of adult mice until day 4 or 5 after birth, when the TdT enzyme is activated (19). Limited studies indicated that CD45+ cells enter the human fetal thymus at 8 weeks of gestation and that all steps of TCR development are detectable by week 16 (30), but TdT was reportedly undetectable in fetal human thymus (31), consistent with our observation of very low TdT expression in DP cells of such thymi. In contrast, we detected much higher TdT levels in postnatal human thymi and similar levels in grafted human thymi, thus explaining the greater TCR diversity in grafted human thymi compared with fetal thymus donors. While grafted and human postnatal thymi showed very similar structures and cell populations, we cannot rule out the possibility that undetected structural differences might influence selection.

A high divergence of TCRβ repertoires in thymocytes generated in identical thymi from the same HSCs at the same time is consistent with the stochastic nature of TCR repertoire formation and may contribute to the incomplete penetrance of genetically controlled autoimmune diseases in identical twins (32). Increased CDR3β sharing among productive compared with nonproductive TCR rearrangements and in selected versus unselected thymocyte populations and a high convergence of nucleotide sequences at the aa level demonstrate that thymic selection favors these shared TCRβ CDR3s. Although many of these shared CDR3βs used different V genes but were associated with identical J genes, 20%–25% also had identical V genes, indicating that the same entire β chain was used. Single-cell analysis revealed that TCRs with the same β chain are almost always paired with different α chains.

Our observation of shared sequences between different donors and thymi is consistent with studies in mice showing the existence of MHC-independent public, abundant CDR3 sequences with convergent recombination, also with fewer N insertions than average, among peripheral T cells (6). These authors performed numerical simulations of TCR rearrangements to demonstrate that biases in TCR recombination could only partially explain the observation. To our knowledge, our study is the first to specifically analyze the impact of selection on shared human thymocyte sequences and demonstrates that thymic selection is a significant factor driving their abundance among mature T cells. Previous next-generation sequencing studies of the peripheral blood TCR repertoire of monozygotic twins revealed increased shared CDR3βs among highly abundant clonotypes and a similar overall overlap between the repertoires of monozygotic twins and unrelated individuals (33), consistent with our observation of similar sharing between mice with the same HSCs but different thymi and between mice with different thymi and HSCs. The increased TCRβ sharing observed by Qi et al. in peripheral blood from twins compared with unrelated individuals was more pronounced in memory than in naive T cells (34) and may be explicable by post-thymic selection events rather than thymic selection.

Several possibilities could explain our finding of similar TCR sharing among thymocytes developing in autologous versus allogeneic thymi. First, the allogeneic and autologous thymi may share common peptide motifs that contribute to the selection of shared CDR3βs in the context of different HLAs, as previously suggested (5). According to a model based on sequencing of adult peripheral blood CD8+ T cells, CDR3β sharing was observed at a 500-fold or greater rate than expected from stochastic rearrangements and was independent of HLA sharing (5). Consistent with our observation of shared sequences among thymocyte subsets, different peripheral blood T cells subsets revealed significant CDR3β sharing (35). Although peripheral blood T cell studies may reflect post-thymic, antigen-driven expansion, this had not occurred in the thymic grafts in our studies.

We favor a second possible explanation for interindividual and inter–T cell subset CDR3β sharing, namely, that the shared CDR3βs are more highly cross-reactive than are nonshared CDR3βs and can therefore be positively selected by diverse HLA-peptide complexes. In all 3 experiments, we observed an increased overlap of more abundant CDR3βs. This may reflect the greater likelihood of detecting more abundant sequences in a given sample size. However, our observation of increased CDR3β sharing following positive selection of DP thymocytes suggests that they are preferentially selected. We also observed an increase in P6 and P7 hydrophobicity during positive selection, demonstrating the role of HLA-peptide complexes in this process. CDR1 and CDR2 interact prominently with MHC to influence CD4+ and CD8+ T cell development, whereas CDR3β predominantly interacts with peptides bound to MHC (36), which differ for recognition by SP-CD4 and SP-CD8 cells. Thus, the hypothesis that most of the shared CDR3βs are cross-reactive seems most probable. TCRs have flexible CDR3 loops (37) and can bind to different peptide-MHC complexes through different mechanisms, including binding with an altered angle or register (38). For example, a human, preproinsulin-reactive CD8+ T cell clone (1E6) is capable of recognizing more than 1 million peptides in the context of a single MHC I molecule (39), including microbial peptides as well as self-peptides (40). Such cross-reactivity is necessary for the limited number of TCRs in the body to recognize more than 1015 different peptide-MHC complexes (40). In mice, TCRs that survive thymic selection are enriched for cross-reactive TCRs compared with those that do not. Interestingly, cross-reactive TCRs were frequently reactive to both MHC I and II (41). Furthermore, in mice genetically engineered to express human TCR genes with either a single allele of human MHC II or a single mouse MHC II, a surprisingly high number of CD4+ TCRs was shared between the 2 groups (42). The finding that single MHCs from different species are able to select shared TCRs provides support for our hypothesis about selection of cross-reactive TCRs.

The shared sequences in our study had a shorter CDR3β length than did nonshared sequences, consistent with previous studies (5, 33), as a result of lower numbers of N insertions. Since more abundant CDR3βs were, on average, longer than the less frequent ones, even in the preselection DP-CD69– cell population, the increased sharing among shorter sequences does not simply indicate an inherent bias for β-selection of shorter CDR3βs. We found that the average length of more abundant CDR3βs decreased during thymic selection, consistent with previous reports in humans and mice (43–46). Shorter CDR3βs may be positively selected more easily, given the increased low-affinity cross-reactions with diverse MHC-peptide complexes. TCRs in mice deficient for TdT, which mediates N insertion in CDR3s, have a shorter average CDR3 length (8) and show increased inter-animal sharing of TCR sequences with increased cross-reactivity for different peptides compared with WT mice (47). We observed increased usage of neutral and hydrophilic aa (G and Q, specifically) among shared sequences at P6 and P7 (positions 109 and 110 in Figure 7B) compared with unshared sequences following the transition from DP-CD69– to DP-CD69+. This difference was maintained in the SP cell populations, despite the overall increase in hydrophobicity with selection at these positions in the total repertoire. In view of murine studies indicating that hydrophobicity at P6 and P7 of CDR3β correlates with reactivity to self-peptide–MHC complexes (28), these data suggest that shared sequences may be preferentially positively selected and survive negative selection because of their low affinity for self-peptide–MHC. Consistently, shared sequences in mice had reduced affinity for self-peptide–MHC (27). Thus, while the low affinity of shared sequences for self-peptide–HLA is sufficient to promote positive selection, the low level of hydrophobicity at the peptide binding site may allow these sequences to preferentially evade negative selection in the thymus. The observed average CDR3β shortening as negative selection progresses (CD69+ DP to SP transition) among the most abundant but not among the least abundant sequences is consistent with the avoidance of negative selection by these shorter, more commonly shared CDR3βs. While one SP thymocyte subset (CD8+ SP) showed an unexpected increase in P6 and P7 hydrophobicity during the transition from the CD69+ DP phase, we observed more consistent increases with overall selection between the CD69– DP and SP stages for each subset, suggesting that the major increase may reflect positive selection. The CD69+ DP cell population probably had already undergone considerable negative selection, diluting the increased hydrophobicity associated with positive selection, and negative selection probably continued into the SP phase, making comparisons between these populations too complex to interpret.

Our single-cell TCR-sequencing data showed even greater TCRα sharing than TCRβ sharing between animals and also demonstrated that almost all TCRs with shared α or β chains do not form the same αβ pairs. This result implies that the selective pressure for shared CDR3βs is applied separately to the β chain. Since we did not perform bulk analysis of CDR3α at various stages of thymocyte selection, we do not know whether the high level of CDR3α sharing reflects generational bias or selective pressure. Consistent with our results, greater TCRα than TCRβ sharing has been reported among healthy humans, while no difference was found in the diversity of CDR3αs compared with CDR3βs (48). Structural analyses have shown that both CDR3α and CDR3β interact with peptides and contribute to the recognition of peptide-MHC complexes (49). However, a predominance of charged and polar aa in CDR3α compared with CDR3β may affect peptide interactions (50).

The comparison of shared and unshared sequences in our study with known autoreactive and cross-reactive TCRs supports our hypothesis that shared CDR3s are cross-reactive. Cross-reactive CDR3βs identified through sequencing of healthy donor T cells responding to disparate alloantigens were strikingly enriched for shared compared with unshared sequences identified in thymi in both experiments. Moreover, the frequency of known T1D-associated CDR3βs was increased during positive selection in the grafted thymi, and these sequences were highly significantly increased among shared CDR3s (Supplemental Table 12). These data suggest that autoreactive T1D–associated sequences are enriched for cross-reactive TCRs that can be selected by different thymi with different MHCs

Despite the very divergent CDR3β TCR repertoires, V and J gene usage was overall very similar among different mice generated with different HSCs and different thymi. There is controversy in the literature about the role of genetic background and thymic selection in determining V and J gene usage (33, 51–54). Our analysis of thymocyte subsets generated from different donors demonstrates only a minor role for thymic selection in determining overall V and J gene usage. Intrinsic genomic factors such as promoter strength or differential structural features are probably the main determinants of V and J gene usage in TCRs. However, the small number of HSC donors in our study did not permit detection of more subtle effects of genetic background and different HLA alleles on V and J gene usage, as has been reported (55, 56). A report of increased similarity in V and J gene usage of peripheral naive T cells between identical twins compared with non-twins but overall similar V and J usage frequencies for any 2 donors, regardless of relatedness (56), is consistent with a subtler effect of HLA alleles in selecting for certain V and J genes. In a mouse study (42), selection of T cells on a single murine or human MHC II molecule resulted in some differences in V and J gene usage. However, the proportion of SP thymocytes in these mice was significantly lower compared with that in normal B6 mice, suggesting inefficient selection on a single MHC allele, which could potentially skew the repertoire toward using certain V and J genes. Our study shows that the thymic MHC does not markedly influence human V and J gene usage when there is a full complement of HLA molecules.

Interestingly, sequences shared between mice were enriched across all cell populations for the 5′ “CASSL” motif in CDR3β within the V region of CDR3β. This motif is found in the 3′ tail of TRB5-01, which is consistently the most highly expressed V gene across all samples, at roughly 12%. Though it is enriched in shared sequences, the motif has high representation in both shared and unshared sequences sets. Published structural analysis of TCR binding to the MHC-peptide complex for a clone with this motif shows no direct interaction of the motif with peptide-MHC (57). Our observation of enrichment for this motif in shared sequences of preselection CD69– DP thymocytes suggests a more structural role, such as in increased binding to pre-TCRα, rather than a role in self-antigen–driven selection. After successful rearrangement of the TCR β chain, developing thymocytes undergo an estimated 6–7 rounds of cell division before independent rearrangements of TCRα (58, 59), which could enrich such sequences in preselection thymocytes. Increased stability of the TCR tertiary structure or increased binding to TCRα (60) could also explain this enrichment in shared sequences of preselection thymocytes. Our detection of fewer unique CDR3α sequences than CDR3β sequences in the single-cell analysis was surprising in light of the ability of early DP thymocytes to expand during β selection and the ability of individual T cell clones to rearrange only 1 β chain and 2 α chains. This result probably reflects reduced efficiency in detecting α chains as opposed to β chains, depending on the sequencing platform used, as less than 100% of cells with β chains sequenced had a sequenced α chain.

Besides V and J gene usage, some reports indicate that the frequency of particular TCR Vβ-TCR Jβ recombinations in human lymphocytes is controlled genetically (61). By comparing the observed VJ frequency distributions in productive and nonproductive repertoires with the VJ frequency distribution expected from a stochastic combination of V genes and J genes according to their background frequency in thymic TCR repertoires, our results show that VJ pairing is only determined by the amount of expression of each gene, with no evidence for preferential V-J recombinations.

In conclusion, our data indicate that human thymus in humanized mice selects a very diverse TCR repertoire. Formation of the human TCR repertoire is largely stochastic and can be almost totally divergent in thymi of animals with identical HSCs, thymus, genetic background, and environment. However, we show that thymic selection increases the overlap between human TCR repertoires and that recognition of self-peptide–HLA plays a role in human thymocyte selection. The overlap of CDR3βs in disparate thymi and different thymocyte subsets, the shorter CDR3β lengths of shared sequences, the direct evidence for allo–cross-reactivity and autoreactivity of shared sequences, and the analysis of aa usage in these CDR3 sequences are consistent with the interpretation that shared sequences are preferentially positively selected because of high cross-reactivity and evade negative selection as a result of low affinity for self-peptides. Thus, we have used the humanized mouse model in our studies to obtain insights into the factors determining human T cell repertoire formation.

FCM of different subsets of grafted thymus and peripheral cells. Mice from experiments 1, 2, and 3 were euthanized 14, 20, and 22 weeks, respectively, after thymus transplantation. Grafted thymi (for mice in all experiments) and spleens and LNs (only for mice in experiment 3) were harvested, and the thymocytes and pooled splenic and LN cells were isolated and subjected to FCM (Supplemental Figure 2; see Supplemental Methods for details on FCM). HLA typing of fetal tissues used to generate humanized mice in all 3 experiments is shown in Supplemental Table 2.

DNA isolation and high-throughput CDR3β TCR sequencing. Genomic DNA was isolated from sorted cell populations using the QIAGEN DNeasy Blood and Tissue Kit. DNA was frozen at –20°C and shipped on dry ice to Adaptive Biotechnologies for high-throughput TCRβ CDR3 sequencing. The TCR-sequencing data were retrieved using ImmunoSEQ software (Adaptive Biotechnologies). The Adaptive Biotechnologies raw sequencing data on all samples are available at https://github.com/aleksobrad/humanized-mouse-data (GitHub commit ID: 96d3c83).

Single-cell TCR sequencing. Single-cell TCR sequencing was performed using the 10× Genomics platform as detailed in the Supplemental Method.

Computational and statistical analysis. Computational and statistical analysis methods are described in Supplemental Methods.

Cross-reactive TCR list. PBMCs from a human subject were stained with CFSE and cocultured separately for 6 days with 2 fully HLA-mismatched irradiated PBMCs as stimulator cells. Dividing CD4+ and CD8+ T cells (CFSElo) and unstimulated CD4+ and CD8+ T cells from the same donor were subjected to FACS, and after DNA isolation their TCRβ was sequenced as described above. Alloreactive sequences were defined as CDR3β aa sequences that were expanded in frequency by at least 2-fold from unstimulated to stimulated samples, above a minimum frequency in the stimulated samples of 1 × 10–5 templates, as described previously (18, 62). Alloreactivity was determined separately for CD4+ and CD8+ samples, and alloreactive CDR3βs responding to both stimulators were considered cross-reactive, since they responded to different stimulators with no MHC sharing. We compared the repertoires of T cell populations in grafted thymi of humanized mice with the list of cross-reactive TCRs to investigate the selection of cross-reactive TCRs in human thymus, calculating the OR of allo–cross-reactivity in shared versus unshared clones as well as the OR of shared clone status in allo–cross-reactive versus allo–non–cross-reactive sequences, with significance assessed by Fisher’s exact test. Lists of the CDR3 sequences determined to be cross-reactive and allo–non–cross-reactive are available as .csv files at https://github.com/aleksobrad/humanized-mouse-data (GitHub commit ID: 96d3c83), along with the raw adaptive sequencing data from healthy control mixed lymphocyte reactions (MLRs) of the same responders against 2 fully HLA-mismatched stimulators.

T1D-reactive TCR list. Our T1D-reactive TCR data set contains 2208 unique CDR3β aa sequences associated with T1D derived from peripheral blood, pancreas, LNs, and spleen of T1D donors from the nPOD program (20). These sequences were derived from several assays including sequencing of T cells following FACS proliferation of dye-labeled responding T cells harvested in response to culture with autoantigens (21), direct MHC tetramer isolation of autoreactive T cells (21–24), and, in certain situations, an assay following the isolation and examination of peptide reactivities from islet-infiltrating T cells (25). T1D reactivity for these sequences was defined as reactivity to islet antigens such as GAD65 and insulin as described previously (26). A single, prominent T1D-reactive nucleotide sequence (encoding CASSSFWGSDTGELFF TCRBV11-02 TCRBJ02-02) present in bulk-sequencing data but missing from single-cell data was removed from experiment 2 because of suspected contamination from a vector with the same nucleotide sequence that was present in the laboratory. We compared the repertoires of T cell populations in grafted thymi in humanized mice with this list of T1D-reactive sequences to investigate the selection of T1D-reactive TCRs in human thymus, calculating the OR of T1D reactivity among shared versus unshared sequences, with significance assessed by Fisher’s exact test. The .csv files with the CDR3 sequences of T1D-reactive sequences are available at https://github.com/aleksobrad/humanized-mouse-data (GitHub branch name: Humanized-Mouse-Data/t1d_sequences.csv).

Statistics. For comparisons between different cell populations analyzing clonality, JSD and fraction of shared sequences, paired t tests with Bonferroni’s multiple testing correction were performed. For studies that involved calculation of the OR, we performed a Fisher’s exact test. The Spearman’s correlation coefficient R value and P value from the nonparametric Spearman’s correlation test were reported where fold changes in the relative aa frequencies are plotted against aa hydrophobicity based on Gibbs free energy. An unpaired t test with Bonferroni’s correction for multiple testing was performed to compare the Vβ and Jβ gene usages between different groups for each gene. For comparison of the observed and expected frequencies of VJ gene pairs, Mann-Whitney U tests were performed, with the null hypothesis that there is no difference between the distribution of observed frequencies and the distribution of frequencies expected from random VJ combination. In all statistical tests, corrected P values of less than 0.05 were considered significant.

Study approval. Protocols involving the use of human tissues and animals were approved by the IRB and IACUC of Columbia University, and all of the experiments were performed in accordance with the protocols.

This research was supported by the following NIH grants: P01 AI04589716 and R01DK103585 (to MS) and P01 AI42288 and DK106191 (to TMB). Funding was also provided by the NIH Human Islet Research Network (HIRN) Opportunity Pool Fund (RRID:SCR_014393; https://hirnetwork.org; U01 DK104162, to MS and TMB). This research was performed using resources and/or funding provided by the NIH HIRN, which is supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) (RRID: SCR_014393; https://hirnetwork.org; CMAI UC4 DK104207, to MS, and DK104194, to TMB). Research was performed at the Columbia Center for Translational Immunology (CCTI) Flow Cytometry Core facility, which is supported in part by the Office of the Director of the NIH (S10OD020056, S10RR027050, P30CA013696, 5P30DK063608, and R01DK106436). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. MKM was supported by a Friedman Award from the University of British Columbia (Canada) and an American Diabetes Association (ADA) Postdoctoral Fellowship. We thank Arup Chakraborty (Massachusetts Institute of Technology) and Peter Sims (Columbia University) for helpful comments on the manuscript and Nicole Casio (Columbia University) for assistance with the submission.