Conceived and designed the experiments: AM UK BA. Performed the experiments: AM BA. Analyzed the data: AM CRMB BA. Wrote the paper: AM BA.
The authors have declared that no competing interests exist.
Theoretical methods for predicting CD8+ T-cell epitopes are an important tool in vaccine design and for enhancing our understanding of the cellular immune system. The most popular methods currently available produce binding affinity predictions across a range of MHC molecules. In comparing results between these MHC molecules, it is common practice to apply a normalization procedure known as rescaling, to correct for possible discrepancies between the allelic predictors. Using two of the most popular prediction software packages, NetCTL and NetMHC, we tested the hypothesis that rescaling removes genuine biological variation from the predicted affinities when comparing predictions across a number of MHC molecules. We found that removing the condition of rescaling improved the prediction software's performance both qualitatively, in terms of ranking epitopes, and quantitatively, in the accuracy of their binding affinity predictions. We suggest that there is biologically significant variation among class 1 MHC molecules and find that retention of this variation leads to significantly more accurate epitope prediction.
The use of prediction software has become an important tool in increasing our knowledge of infectious disease. It allows us to predict the interaction of molecules involved in an immune response, thereby significantly shortening the lengthy process of experimental elucidation. A high proportion of this software has focused on the response of the immune system against pathogenic viruses. This approach has produced positive results towards vaccine design, results that would be delayed or unobtainable using a traditional experimental approach. The current challenge in immunological prediction software is to predict interacting molecules to a high degree of accuracy. To this end, we have analysed the best software currently available at predicting the interaction between a viral peptide and the MHC class I molecule, an interaction that is vital in the body's defence against viral infection. We have improved the accuracy of this software by challenging the assumption that different MHC class I molecules will bind to the same number of viral peptides. Our method shows a significant improvement in correctly predicting which viral peptides bind to MHC class I molecules.
Cytotoxic T lymphocytes (CTLs) discriminate between healthy and pathogen-infected cells by recognizing and responding to a molecular complex on the surface of the infected cell. This complex consists of a specific major histocompatibility complex (MHC) molecule and a peptide derived from the proteins contained in the cell. If the cell contains a pathogen, peptides from the pathogen proteome will be presented and, with the right MHC – peptide complex, a CTL response will be elicited.
Of the large number of peptides that can be derived from a pathogen only a small minority elicits a CTL response. This number has been estimated to be between 1 in 2,000 and 1 in 5,600
Once CTLs recognize the MHC-peptide complex, they are capable of destroying the infected cell by the release of lytic granules containing cytotoxic effector proteins. This results in the destruction of the target cell by apoptosis. An effective CTL response has been shown to confer protection against viral infection, such as HIV
More generally, epitope prediction algorithms are being increasingly used to understand the CTL response. For example, in the case of HIV-1 infection, algorithms have been used to confirm which MHC-associated epitope mutations are likely to confer escape from a CTL response
A range of computational algorithms have been developed to predict CTL epitopes in pathogen protein sequences. Since the most selective requirement for a peptide to be immunogenic is the ability of the peptide to bind to the MHC molecule, most prediction methods focus on this stage of the pathway. As a general rule, information gained from experimental binding assays is used to train the algorithm until it is efficient at predicting novel MHC–peptide complexes. The algorithms that are used vary in complexity and accuracy. Some can be trained to recognize peptide motifs that are required for binding to a particular MHC molecule
Artificial neural networks (ANNs) take into account, in addition to the identity of each amino acid residue, the interactions between adjacent amino acids in a potential epitope. In summary, an ANN for a particular MHC molecule is trained to recognize associated inputs (a peptide sequence) and outputs (the binding affinity for that sequence with the MHC molecule)
NetCTL
In order to make the prediction values comparable between each MHC molecule, it is recommended that the MHC-peptide binding affinity scores are rescaled
However, we will present evidence in this paper that in correcting for differences between the allelic predictors, information is being lost that reflects true biological variation between MHC molecules and, by extension, differences in their ability to bind to peptide sequences. We show that, for both qualitative and quantitative measures of binding, rescaling impairs rather than improves allelic predictor performance. This is of importance for vaccine design and to understand the nature of the CTL response. In particular, crucial between-allele variations in binding affinity and preference which may contribute to differences in the outcome of infection are likely to be obscured by rescaling.
In order to test the effect of rescaling on epitope prediction accuracy, we used two web-based prediction methods, NetCTL v1.2
NetMHC v3.0 simply predicts MHC-peptide binding, using ANNs to predict binding affinities for 43 MHC molecules. In order to test the effect of rescaling, it was necessary to produce rescale values for each of the 43 allelic predictors. This was performed as in NetCTL; 500,000 unique random nonamers were obtained from the proteome of
In summary, we tested two sets of rescaling values: those obtained from NetCTL v1.2 and those that we calculated using NetMHC v3.0.
Epitope datasets were constructed from sources detailed below. In each case, the prediction methods were tested by their ability to detect these epitopes amongst the full set of overlapping nonamers derived from the proteins that contained the epitopes. The full set of nonamers will contain a small number of known epitopes and the remainder will be ‘non-epitopes’. Of course, this set of non-epitopes could include epitopes that have not been experimentally verified. However, the majority (see
The SYF1 dataset is a supertype dataset derived from SYFPEITHI
Experimentally defined epitopes in HIV-1 were extracted from the HIV Molecular Immunology Database
In summary, it was possible to test 41 of the 43 allelic predictors for MHC molecules in NetMHC v3.0. The positive set consisted of 661 epitopes, defined in terms of start and end positions relative to the HIV reference strain HXB2 (supplementary
The Lanl661 dataset was modified for testing with NetCTL. From these 661 epitopes, a total of 179 bound to the 12 alleles for which NetCTL has allelic predictors. The input sequence to NetCTL contained 3,000 overlapping nonamers. For this experiment, the negative set consisted of ((3,000 * 12)−179) 35,821 nonamers, and a positive set of 179 nonamers. The positive set of Lanl179 is available in the supplementary material (
ROC curves give a visual measure of the accuracy of a prediction method. The threshold at which the prediction method identifies a peptide as being an epitope varies along the length of the curve. Each point on the curve gives the fraction of true positive epitopes found as a function of the number of false positive ‘epitopes’ at that threshold. Hence, setting a strict threshold for epitope detection will result in high specificity (correct predictions) but low sensitivity (missing a high proportion of true binders). The area under the ROC curve gives the AUC (Area under Curve) measurement. In order to test for significant difference between ROC curves, we conducted the bootstrapping analysis detailed in
Using the 2 epitope datasets, HIV216 and SYFPEITHI863, and the same methods from
The training data for NetMHC v3.0 is available at
ROC curves were used to analyse the effects of rescaling on epitope prediction. Both NetCTL v1.2 and NetMHC v3.0 were tested and 3 datasets were used (
Each graph shows the ROC curves using different combinations of datasets and prediction methods (see
ROC Curve | Colour | Method | Dataset | Rescaling | AUC | Bootstrap P-Value |
Black solid | NetCTL v1.2 | SYF1 | No |
0.949 | <0.001 | |
Red dashed | NetCTL v1.2 | SYF1 | Yes | 0.937 | ||
Black solid | NetMHC v3.0 | SYF1 | No | 0.932 | <0.001 | |
Red dashed | NetMHC v3.0 | SYF1 | Yes | 0.905 | ||
Black solid | NetMHC v3.0 | Lanl661 | No | 0.944 | <0.001 | |
Red dashed | NetMHC v3.0 | Lanl661 | Yes | 0.937 | ||
Black solid | NetCTL v1.2 | Lanl179 | No |
0.933 | <0.001 | |
Red dashed | NetCTL v1.2 | Lanl179 | Yes | 0.918 |
In NetCTL v1.2, the TAP and cleavage scores are combined with the rescaled MHC binding score to produce a combined score for each submitted nonamer. In order to test how NetCTL performed without rescaling, it was still necessary to divide the MHC binding score by a rescaling value so the weightings of the TAP and cleavage score were still applicable and accurate. By averaging over all rescaling values and dividing the MHC binding value by this number, rescaling differences were “averaged out” and it was still possible to use the extra information from the TAP and cleavage predictions.
One possible explanation for why rescaling has a detrimental impact on prediction is that there may be a positive correlation between rescale factor and allelic predictor accuracy. To check this hypothesis we calculated the AUCs for each NetMHC v3.0 predictor using the Lanl661 dataset and plotted this against the corresponding rescale factor, the results of which are shown in
There is no evidence for a correlation of AUC and rescale value for the whole set of allele predictors (R2 = 0.0068, p = 0.606), nor for the subset of predictors with an AUC>0.9 (R2 = 0.0007, p = 0.887). This analysis used the Lanl661 epitope dataset.
Consequently, it is unlikely that a correlation between rescale values and AUC values explains our findings. However, certain alleles like B0801 do have both a low rescale value and a low AUC. To double check that these poor accuracy predictors were not causing the inaccuracies in rescaled predictions we repeated our ROC curve analysis for Lanl661 without the low accuracy predictors (those with an AUC value below 0.9; namely A6801, A6802, B3501, B0702, B0801, B0802 and B4501). In the remaining, reduced subset of predictors there was even less evidence for a correlation between AUC and rescale factor (R2 = 0.0007, p = 0.887). For this subset of predictors the accuracy was still significantly better if rescaling was not applied (
Therefore, we believe there is no evidence to support the hypothesis that the reason rescaling is detrimental is because there is a correlation between rescale factors and AUC.
We used 3 other metrics
The rank of known epitopes was compared with non-epitopes from the same protein for both rescaled and non rescaled predictions. From
Non-rescaling predicted binding affinities produced improved results compared to rescaling at given sensitivities using the epitope datasets from
Non-rescaling predicted binding affinities also produced improved results comparing the total number of epitopes among the top 5% predicted binding affinities (supplementary
Using 2 sets of experimentally-derived epitope-allele binding affinities, we also showed that the correlation between predicted and experimental affinities was weaker with rescaling than without (supplementary
Rescaling is, in theory, a sound approach to improving epitope prediction and in particular comparability of predictions obtained using different allelic predictors. However, using a number of different measures of accuracy, in the context of two commonly used prediction methods, we have demonstrated that rescaling actually impairs rather than improves predictive performance and comparability. We suggest that rescaling predicted affinities results in a loss of information that outweighs any advantage gained in correcting for differences in training data.
The first approach used ROC curve analysis and showed clear differences between rescaling and non-rescaling. The ROC curve gives a graphical representation of how well the prediction method ranks true epitopes among a set of non-binding peptides. Or to use an analogy, how efficient it is at finding the epitopic needle in a haystack of random peptides. From
Added to the significant results from the ROC curve analysis, the supplementary analysis demonstrated the positive effect of removing rescaling in terms of the correlation with experimental data (supplementary
There has been little research on the variation in ‘stickiness’ among MHC molecules, i.e. whether some MHC class I molecules are capable of binding to a greater number of epitopes than others. The binding motifs for MHC-peptide binding vary across the range of alleles, but the assumption made for rescaling is that each molecule would bind to the same number of peptides out of a large random selection. Estimates based upon mass spectrometry suggest that over 2,000 peptides are associated with HLA-A2.1 and −B7 and it is speculated that the actual total could be over 10,000 per MHC molecule
This data may also be informative regarding optimization of peptide cargo in the endoplasmic reticulum (ER). We would argue that peptide optimization is the biological interpretation of rescaling: alleles have similar numbers of epitopes because peptides with a lower binding affinity are replaced in the ER. We know that optimisation cannot be complete because otherwise every allele would just present one epitope: the one with highest affinity. However, it seems likely that there is a degree of optimization
In summary, we suggest that much of the observed variation between allelic predictors reflects genuine biological information which should not be discarded as experimental noise and that rescaling is based on an unjustified assumption: that all alleles bind the same number of peptides. Removing this assumption, we have demonstrated a significantly improved predictive performance.
These conclusions are important both for studies that use prediction methods to understand the CTL response and for T cell epitope discovery programs where avoiding rescaling could save a large amount of experimental effort, ultimately leading to improved vaccine implementation.
The HIV HXB2 proteome.
(0.06 MB DOC)
The SYF1 dataset.
(0.14 MB DOC)
The Lanl 661 dataset.
(0.54 MB DOC)
The Lanl 179 dataset.
(0.16 MB DOC)
The specificity of non-rescaled and rescaled results at specified sensitivity values.
(0.03 MB DOC)
The fraction of the total number of epitopes in the 2 epitope datasets among the top 5% of predicted binding affinities.
(0.03 MB DOC)
The result of the ROC curve analysis, using the Lanl 661 dataset and excluding any alleles (7 in total) that had an AUC<0.9 from
(0.03 MB DOC)
A comparison of ranks between rescaled and non-rescaled predicted binding affinities.
(0.03 MB DOC)
The relationship between rescaled/non-rescaled predicted binding affinities and experimental binding affinities.
(0.19 MB DOC)
A comparison of rescale values.
(0.04 MB DOC)
We are very grateful to Can Keşmir, Andrew George, Rob De Boer, and Morten Nielsen for some helpful discussion regarding this paper. Many thanks also to Claus Lundegaard, who provided us with technical assistance obtaining the rescale values for NetMHC v3.0, as well as extra data and helpful comments.