Conceived and designed the experiments: JJAC RJdB CK. Performed the experiments: JJAC. Analyzed the data: JJAC RJdB CK. Contributed reagents/materials/analysis tools: JJAC RJdB CK. Wrote the paper: JJAC CK.
The authors have declared that no competing interests exist.
The cellular immune system screens peptides presented by host cells on MHC molecules to assess if the cells are infected. In this study we examined whether the presented peptides contain enough information for a proper self/nonself assessment by comparing the presented human (self) and bacterial or viral (nonself) peptides on a large number of MHC molecules. For all MHC molecules tested, only a small fraction of the presented nonself peptides from 174 species of bacteria and 1000 viral proteomes (
Human cells sample short peptides from endogenous proteins, and present them to the immune system via HLA class I molecules on the cell surface. T-cells scan the presented peptides and need to discriminate foreign (nonself) peptides from human (self) peptides. We show that this is a difficult task, despite the exquisite specificity of T-cells. We estimate, using HLA-peptide binding predictions and T-cell recognition models, that almost a third of the nonself peptide-HLA complexes is so similar to a self peptide-HLA that a T-cell cannot tell them apart. Since T-cells have to ignore self peptides to prevent autoimmunity, we estimate that at least a third of the foreign peptides has to be ignored as well, and therefore fails to evoke an immune response. Foreign peptides that are never used in immune responses, have been referred to as the “holes in the repertoire”. Since the sizes of the holes we predict agree with those that were previously found, our conjecture is that the holes are entirely due to similarity with self peptides. We test this conjecture with public data on HIV-1 and vaccinia responses, and confirm that self similarity is a major determinant of the immune response to nonself peptides.
The recognition of peptide-MHC-I complexes (pMHC) by the T-cell receptor (TCR) is required for effector T-cells to kill an infected cell. Although some MHC-I molecules have a preference to present pathogen-derived peptides
We have previously shown that on HLA-A2 molecules only a minute fraction (
Given these new insights, we here extend our previous investigations on self/nonself overlaps by including the T-cell recognition of pMHCs. In addition, we analyze the self/nonself overlap of peptides presented on several HLA-A and HLA-B molecules, to estimate the degree of variance among different MHC-I molecules. Using high-quality predictors of the MHC-I presentation pathway
MHC class I molecules shape CD8
The chance that a bacterial or viral peptide overlaps with a peptide in the human proteome is shown as open and closed circles for bacteria and viruses, respectively. Stars indicate the self/nonself overlaps with shuffled bacterial (open stars) or viral (closed stars) proteins. For all peptides of 5 amino acids or longer, the overlap of unshuffled viruses and bacteria is significantly smaller than the shuffled (representing the expected) overlap (Ranksums test: p
Surprisingly, the overlaps do not decrease much further for peptides longer than 9 mers (see
Only peptides that are presented on an MHC-I molecule, i.e. about 1–3% of all 9 mers
For a large set of common human MHC-I molecules (13 HLA-A molecules and 15 HLA-B molecules, see
In A, the exact overlap of the complete peptide (positions 1–9). In B, the exact overlap of the middle positions of the peptide (positions 3–8) that are assumed to be in contact with the TCR. In C, the degenerate overlap of positions 3–8, i.e. a cross-reactive T-cell overlap. In all cases, the left and right figures show the self/nonself overlaps determined using a scaled or fixed MHC binding threshold, respectively (see
So far, we only considered identical self and nonself peptides as overlaps. However, also non-identical MHC-I presented peptides can be recognized by the same T-cell
To see if other TCR-pMHC contacts follow the same interaction-“rules”, all non-redundant TCR-pMHC-I structures found in the PDB-database (
TCR contacts for 9 pMHC-TCR structures that have a 9mer (see
Given these data, we studied how much of presented nonself can be discriminated from presented self by T-cells. First, the self/nonself overlaps were determined on those positions recognized by T-cells, i.e. the middle positions (P3–8) of MHC-I presented peptides. The self/nonself overlap of these 6mer fragments is on average 18 times higher than the overlap based on all positions (i.e., 2.7% for scaled thresholds and 1.7% for fixed thresholds see
Recognized peptide positions | ||||
Self | P1–9 | P1 and P3–8 | P3–8 | |
percentage | (complete) | (non-anchor) | (middle) | |
Exact | 100 | 0.15%* | 0.41% | 2.7%* |
50 | 0.09% | 0.25% | 1.6% | |
Degenerate | 100 | 0.7% | 5.2% | 29%* |
Overlaps were determined using all positions of the peptide (P1–9), the non-anchor positions (P1 and P3–8) or the middle positions between the anchors (P3–8). Further, overlaps were determined as exact, i.e. every position should be identical, or as degenerate, i.e. with 1 or 2 substitutions being allowed to mimic T-cell recognition (see
Next, overall self/nonself overlaps were estimated with a novel model of degenerate T-cell binding. As above, T-cells were assumed to bind to the middle positions (P3–8) of the MHC-I presented peptides only. In addition, the degeneracy was modeled by considering two peptides as overlapping if they have mismatches in maximally two regions. We allow one mismatch at the N-terminal side of the fifth position (P1–4) and one at the C-terminal side of that position (P6–9) (see
Despite the high overlaps, our assumptions on the degenerate T-cell recognition can be considered conservative. For example, position 3 of the presented peptide tends to have few interactions with the TCR (see
Although these estimates on cross-reactive overlaps remain relatively crude, our results show that the degenerate recognition of MHC-I presented peptides by T-cells has a profound effect on self/nonself discrimination. This reconfirms that deletion of self reactive T-cells is important, as many of them would be activated during an infection and induce an autoimmune response. As a consequence, we estimate that about a third (
Immunogenic | Non-Immunogenic | Chi |
|||
Self Overlapping | Not Overlapping | Self Overlapping | Not Overlapping | ||
HIV-1 peptides on HLA-A*0201 | 4 | 29 | 18 | 36 | 0.027 |
Vaccinia peptides on HLA-A*0201 | 3 | 15 | 8 | 18 | 0.29 |
HIV-1 peptides on non-HLA-A*0201 molecules | 0 | 9 | 4 | 9 | 0.066 |
HLA-A*0201 pMHC from the IEDB | 54 | 143 | 230 | 362 | 0.0038 |
For immunogenic or non-immunogenic HIV-1 peptides presented on HLA-A*0201 determined by Frankild et al.
Previously, we have shown that the few epitopes sampled from a pathogens proteome are likely to be unique and are not expected to be present in the host (human) proteome
One might intuitively think that the high self/nonself overlap estimates are in disagreement with the exquisite specificity of T-cell recognition. However, in our “degenerate” model of the middle positions (P3–8) with maximally 2 conservative mismatches, an individual T-cell recognizes only one in 2.7 million pMHCs. This level of specificity is much higher than experimental measurements of about one in 100.000
Could longer peptides be a solution for the high self/nonself overlaps caused by degenerate T-cell recognition? Given that T-cells cannot use all the information that is present in an MHC-I presented 9mer, we do not expect that the presentation of longer peptides would make much difference. Even though a longer peptide would contain more information, if that is not detected by the T-cells it would not improve self/nonself discrimination. Alternatively, MHC binding could be more specific at for instance position 1, thus preserving self/nonself information as now happens at the anchor positions. The disadvantage of more specific binding motifs would be the reduced presentation of foreign peptides and more opportunities for a virus to escape MHC presentation.
Another consequence of a high self/nonself overlap could be high risk of autoimmunity. The identification of self antigens targeted in autoimmune diseases remains an enormous challenge, and our method of identifying overlapping peptides could possible help to narrow the search for these auto antigens. This requires a thorough understanding of the pathogens that might trigger a particular autoimmune disease and the corresponding HLA risk factors. Unfortunately, only for few autoimmune diseases sufficient data is available to extract such associations. For instance, Epstein Barr virus and HLA-B*4402 are associated with multiple sclerosis
The predicted self/nonself overlap varies between HLA molecules (see
Our estimates on self/nonself overlaps can explain why MHC-I restricted cellular immune responses to a pathogen are more narrow than the (predicted) number of pMHCs for that organism
Human, Murine, viral and bacterial proteomes were downloaded via
The peptides presented on a certain MHC-I molecule can be predicted by simulating three key-processes of MHC-I presentation, i.e. proteasomal cleavage, TAP transport and peptide-MHC-I binding. The combination of proteasomal cleavage and TAP-transport determines which peptides reach the ER to potentially bind MHC-I. This process was predicted using NetChop Cterm3.0
All results were checked for consistency with two other MHC-I binding prediction methods, NetMHCpan-2
Per MHC-I molecule, the set of presented 9 mers derived from viral or bacterial (nonself) proteomes and that from the human (self) proteome were compared to see how much these sets overlap. In the self/nonself overlap determination for vaccinia-derived pMHC from Assarsson et al.
Additionally, self/nonself overlaps were estimated using the “peptide similarity score”-method described in detail by Frankild et al.
The cross-reactivity in our degenerate overlap model of T-cell recognition (described above) was determined in order to compare it with experimentally determined levels. For every possible 9mer peptide, the number of variants at the T-cell recognized middle positions (P3–8) was determined that would be recognized by the same T-cell in our degenerate overlap model. In other words, for every combination of amino acids at P3–8 we performed an exhaustive search to determine how many other combinations would also be recognized. On average, 24 of such combinations were found. Thus, given the number of possible variants at positions P3–8 (
Four sets of pMHCs were obtained for which the immunogenicity had been determined previously. The first set of HIV-1 derived peptides presented on HLA-A02 was determined by Frankild et al.
For all HLA molecules, we predicted the binding of 1.000.000 random peptides with equal amino acid frequencies using NetMHC-3.2 and the thresholds described above. The Shannon entropy was determined per position on the predicted binders, per HLA molecule, and used as a measure of selectivity. Based on this selectivity, the six least specific positions were determined for each HLA molecule to use in the “allele specific” analysis of degenerate self/nonself overlaps (
Structures of HLA-I-9mer-TCR-complexes were downloaded in August 2011 from the PDB-database (
Statistical tests were performed using the stats-package from the scipy-module in Python. A Permutation test was also done in Python, using the shuffle function in the random-package from the numpy-module, to identify human proteins that have more than expected peptides that overlap with viruses or bacteria. The permutation test was performed as follows: per human protein, we counted the number of viruses or bacteria that overlap with a 9mer peptide in this protein. These counts were normalized by the length of the protein, i.e. the number of overlapping viruses or bacteria was divided by the protein length. In 1000 permutations, per human protein a number of overlapping viruses or bacteria was drawn based on the expected fraction of overlaps and given the protein length. If the actual number of overlaps was higher than the number in all 1000 permutations, the human protein was selected as a protein with a significantly high number of viral or bacterial overlaps.
A similar analysis was performed to identify proteins with more than expected HLA-B*5401 ligands. First, per protein the number of HLA-B*5401 binding peptides was predicted as described above. Next, this prediction was compared in 1000 permutations where a number of binding peptides was drawn based on the specificity of HLA-B*5401 (i.e. 2.3% as described above). If the actual number of binding peptides was higher than the number in all 1000 permutations, the protein was selected as a protein with a significantly high number HLA-B*5401 ligands.
(PDF)
(PDF)
(PDF)
(PDF)
We thank Johannes Textor for valuable comments on the manuscript and discussion on this research project, and Hanneke van Deutekom, Xiangyu Rao and Ilka Hoof for discussion and technical support.