Conceived and designed the experiments: PW AS BP. Performed the experiments: PW JS CD BM. Analyzed the data: PW BP. Contributed reagents/materials/analysis tools: JS CD BM. Wrote the paper: PW JS BM AS BP.
The authors have declared that no competing interests exist.
The identification of MHC class II restricted peptide epitopes is an important goal in immunological research. A number of computational tools have been developed for this purpose, but there is a lack of large-scale systematic evaluation of their performance. Herein, we used a comprehensive dataset consisting of more than 10,000 previously unpublished MHC-peptide binding affinities, 29 peptide/MHC crystal structures, and 664 peptides experimentally tested for CD4+ T cell responses to systematically evaluate the performances of publicly available MHC class II binding prediction tools. While in selected instances the best tools were associated with AUC values up to 0.86, in general, class II predictions did not perform as well as historically noted for class I predictions. It appears that the ability of MHC class II molecules to bind variable length peptides, which requires the correct assignment of peptide binding cores, is a critical factor limiting the performance of existing prediction tools. To improve performance, we implemented a consensus prediction approach that combines methods with top performances. We show that this consensus approach achieved best overall performance. Finally, we make the large datasets used publicly available as a benchmark to facilitate further development of MHC class II binding peptide prediction methods.
A critical step in developing immune response against pathogens is the recognition of antigenic peptides presented by MHC class II molecules. Since experiments for MHC class II binding peptide identification are expensive and time consuming, computational tools have been developed as fast alternatives but with inferior performance. Here, we carried out a large-scale systematic evaluation of existing prediction tools with the aim of establishing a benchmark for performance comparison and to identify directions that can further improve prediction performance. We provide an unbiased ranking of the performance of publicly available MHC class II prediction tools and demonstrate that the MHC class II prediction tools did not perform as well as the MHC class I tools. In addition, we show that the size of training data and the correct identification of the binding core are the two factors limiting the performance of existing tools. Finally, we make available to the immunology community a large dataset to facilitate the evaluation and development of MHC class II binding prediction tools.
The activation of CD4+ helper T cells is essential for the development of adaptive immunity against pathogens
A hallmark of the MHC class II binding peptide groove is that there are four major pockets. These pockets accommodate side-chains of residues 1, 4, 6, and 9 of a 9-mer core region of the binding peptide. This core region interaction largely determines binding affinity and specificity
MHC class II molecules are highly polymorphic, and this polymorphism largely corresponds with differences along the peptide binding groove. However, the binding motifs derived for MHC class II molecules are highly degenerate, and many promiscuous peptides have been identified that can bind multiple MHC class II molecules
Computational prediction of MHC class II epitopes is of important theoretical and practical value, as experimental identification is costly and time consuming
The establishment of numerous MHC class II epitope databases has facilitated the development of a large number of algorithms aimed at predicting peptide binding to MHC molecules. Early works focused on finding peptide patterns and deriving motifs for MHC molecules
Despite the large number of available prediction methods, computational prediction of MHC class II epitopes remains a challenging problem. It has been suggested that the prediction performance of class II algorithms is systematically inferior to that of MHC class I epitope prediction methods
We assembled a dataset of peptide binding affinities for various MHC class II molecules experimentally measured in our group (see
Organism | MHC class II types | Number of MHC-peptide affinities | |
New | Known | ||
Human | HLA-DRB1*0101 | 3882 | 1390 |
HLA-DRB1*0301 | 502 | 817 | |
HLA-DRB1*0401 | 512 | 675 | |
HLA-DRB1*0404 | 449 | 233 | |
HLA-DRB1*0405 | 457 | 175 | |
HLA-DRB1*0701 | 505 | 424 | |
HLA-DRB1*0802 | 245 | 213 | |
HLA-DRB1*0901 | 412 | 174 | |
HLA-DRB1*1101 | 520 | 522 | |
HLA-DRB1*1302 | 289 | 242 | |
HLA-DRB1*1501 | 520 | 491 | |
HLA-DRB3*0101 | 420 | 104 | |
HLA-DRB4*0101 | 245 | 203 | |
HLA-DRB5*0101 | 520 | 383 | |
Mouse | H-2-IAb | 500 | 225 |
H-2-IEd | 39 | 231 |
Number of records in IEDB as of 12-04-2006.
The MHC class II binding prediction tools evaluated in this study are listed in
Category | Method | MHC class II types | Training dataset | Algorithm |
Matrix based | ARB | 16 (16) | IEDB | Average relative binding (ARB) matrix |
PROPRED | 51 (11) | TEPITOPE | Pocket profile | |
SVMHC | 51 (11) | TEPITOPE | Pocket profile | |
SYFPEITHI | 6 (6) | SYFPEITHI | Position specific scoring matrices | |
RANKPEP | 46 (16) | MHCPEP | Position specific scoring matrices | |
SMM-align | 17 (16) | IEDB SYFPEITHI | Stabilized matrix | |
Machine Learning based | SVRMHC | 6 (5) | AntiJen | Support vector machine regression |
MHC2PRED | 21 (15) | MHCBN JenPep | Support vector machine | |
Multivariate regression | MHCPRED | 10 (6) | JenPep | Quantitative structure activity relationship (QSAR) regression |
Number of MHC class II types covered by a prediction method. The number in parentheses is the number of MHC class II types also in our dataset.
The binding predictions for peptides in our affinity dataset were extracted from the MHC class II binding prediction tools with custom scripts (see
Prediction results for eight methods for HLA DRB1*0101 are shown in the ROC curve. The curves were generated by plotting the true positive rate (
MHC class II type | Number of peptides | ARB | MHC2PRED | MHCPRED | PROPRED | RANKPEP | SMM-align | SVRMHC | SYFPEITHI | Consensus |
DRB1*0101 | 3882 | 0.76 | 0.67 | 0.62 | 0.74 | 0.70 | 0.77 | 0.69 | 0.71 | 0.79 |
DRB1*0301 | 502 | 0.66 | 0.53 | 0.65 | 0.67 | 0.69 | 0.65 | 0.72 | ||
DRB1*0401 | 512 | 0.67 | 0.52 | 0.60 | 0.69 | 0.63 | 0.68 | 0.66 | 0.65 | 0.69 |
DRB1*0404 | 449 | 0.72 | 0.64 | 0.79 | 0.66 | 0.75 | 0.80 | |||
DRB1*0405 | 457 | 0.67 | 0.51 | 0.75 | 0.62 | 0.69 | 0.62 | 0.72 | ||
DRB1*0701 | 505 | 0.69 | 0.63 | 0.78 | 0.58 | 0.78 | 0.68 | 0.83 | ||
DRB1*0802 | 245 | 0.74 | 0.70 | 0.77 | 0.75 | 0.82 | ||||
DRB1*0901 | 412 | 0.62 | 0.48 | 0.61 | 0.66 | 0.68 | ||||
DRB1*1101 | 520 | 0.73 | 0.60 | 0.80 | 0.70 | 0.81 | 0.73 | 0.80 | ||
DRB1*1302 | 289 | 0.79 | 0.54 | 0.58 | 0.52 | 0.69 | 0.73 | |||
DRB1*1501 | 520 | 0.7 | 0.63 | 0.72 | 0.62 | 0.74 | 0.64 | 0.67 | 0.72 | |
DRB3*0101 | 420 | 0.59 | 0.68 | |||||||
DRB4*0101 | 245 | 0.74 | 0.61 | 0.65 | 0.71 | 0.74 | ||||
DRB5*0101 | 520 | 0.7 | 0.59 | 0.79 | 0.73 | 0.75 | 0.63 | 0.79 | ||
IAB | 500 | 0.8 | 0.56 | 0.51 | 0.74 | 0.75 | 0.86 | |||
IED | 39 | 0.53 | 0.83 | |||||||
Mean | 0.71 | 0.58 | 0.58 | 0.73 | 0.66 | 0.73 | 0.65 | 0.68 | 0.76 | |
Min | 0.59 | 0.48 | 0.51 | 0.58 | 0.52 | 0.66 | 0.62 | 0.65 | 0.68 | |
Max | 0.8 | 0.70 | 0.63 | 0.80 | 0.83 | 0.81 | 0.69 | 0.73 | 0.86 |
Performance is measured in terms of AUC as described in
Since we restrict our testing to publicly available tools, it is important to point out that the methods were trained on different datasets (
The cutoff of 1000 nM to classify peptides into binders and non-binders was chosen following an expert immunologist's recommendation for an immunologically relevant threshold, but it is still somewhat arbitrary. To further our analysis in a systematic fashion, we varied the cutoff from 50 nM to 5000 nM. The changes in cutoffs enable us to evaluate performances of binding prediction to identify peptides with different affinities. A cutoff of 50 nM focuses on identifying strong binders, while a cutoff of 5000 nM will identify all including very weak binders. The results of the evaluation using different cutoffs are shown in
A key difference between MHC class I and class II molecules is that the binding groove of class II molecules is open at both ends
We next analyzed whether the various class II prediction tools can accurately identify the 9-mer cores of a binding peptide. We extracted MHC-peptide complex structures from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). A total of 29 structures associated with 14 different MHC class II molecules were identified (
Core | Peptide | Chain | PDB ID | MHC class II type |
PFPQPELPY | LQPFPQPELPY | C | 1S9V | DQB1*0201 |
EALYLVCGE | LVEALYLVCGERGG | C | 1JK8 | DQB1*0302 |
LPSTKVSWA | EGRDSMNLPSTKVSWAAVGGGGSLVPRGSGGGG | C | 1UVQ | DQB1*0602 |
MRMATPLLM | PVSKMRMATPLLMQA | C | 1A6A | DRB1*0301 |
FKGEQGPKG | AGFKGEQGPKGEPG | E | 2FSE | DRB1*0101 |
IGILNAAKV | GELIGILNAAKVPAD | C | 1KLG | DRB1*0101 |
VIPMFSALS | PEVIPMFSALSEGATP | C | 1SJE | DRB1*0101 |
WRFLRGYHQ | GSDWRFLRGYHQYA | C | 1AQD | DRB1*0101 |
YSDQATPLL | AAYSDQATPLLLSPR | C | 1T5W | DRB1*0101 |
YVKQNTLKL | PKYVKQNTLKLAT | C | 2G9H | DRB1*0101 |
MRADAAAGG | AYMRADAAAGGA | E | 2SEB | DRB1*0401 |
YVKQNTLKL | PKYVKQNTLKLAT | C | 1J8H | DRB1*0401 |
VHFFKNIVT | ENPVVHFFKNIVTPR | C | 1BX2 | DRB1*1501 |
FKNIVTPRT | NPVVHFFKNIVTPRTPPPSQ | C | 1FV1 | DRB5*0101 |
YHFVKKHVH | GGVYHFVKKHVHES | C | 1H15 | DRB5*0101 |
AQKAKANKA | FEAQKAKANKAVDGGGG | B | 1LNU | IAb |
MRMATPLLM | GSHSRGLPKPPKPVSKMRMATPLLMQALPMGSGSGS | C | 1MUJ | IAb |
SQAVHAAHA | RGISQAVHAAHAEI | B | 1IAO | IAd |
TQGVTAASS | GHATQGVTAASSHE | B | 2IAD | IAd |
IAPVFVLLE | YEIAPVFVLLEYVT | B | 1ES0 | IAg7 |
RHGLDNYRG | AMKRHGLDNYRGYS | P | 1F3J | IAg7 |
DYGILQINS | STDYGILQINSRW | P | 1IAK | IAk |
HRGAIEWEG | GNSHRGAIEWEGIESG | P | 1D9K | IAk |
GGASQYRPS | HSRGGASQYRPSQRHGTGSGSGS | P | 1K2D | IAu |
IAYLKQASA | ADLIAYLKQASAKGG | B | 1KTD | IEK |
IAYLKQATK | ADLIAYLKQATKGGG | B | 1KT2 | IEK |
IAYPKAATK | ADLIAYPKAATKF | E | 1R5V | IEK |
ITAFNDGLK | KKVITAFNDGLKGGG | B | 1FNE | IEK |
ITAFNEGLK | KKVITAFNEGLKGGG | B | 1I3R | IEK |
MHC class II type | Known cores | Methods (Number of core regions identified correctly) | |||||||
PROPRED | SMM-align | RANKPEP | ARB | MHCPRED | MHC2PRED | SVRMHC | SYFPEITHI | ||
DQB1*0201 | 1 | NA | NA | 0 | NA | NA | 0 | NA | NA |
DQB1*0302 | 1 | NA | NA | 0 | NA | NA | 0 | NA | NA |
DQB1*0602 | 1 | NA | NA | 0 | NA | NA | NA | NA | NA |
DRB1*0101 | 6 | 6 | 5 | 5 | 4 | 1 | 2 | 3 | 6 |
DRB1*0301 | 1 | 1 | 1 | 1 | 0 | NA | 0 | NA | 1 |
DRB1*0401 | 2 | 2 | 1 | 1 | 0 | 0 | 2 | 0 | 1 |
DRB1*1501 | 1 | 1 | 1 | 1 | 0 | NA | 0 | 1 | 1 |
DRB5*0101 | 2 | 2 | 1 | 0 | 0 | NA | 0 | 2 | NA |
IAb | 2 | NA | 1 | 2 | 0 | 0 | 0 | NA | NA |
IAd | 2 | NA | 0 | 0 | 0 | 0 | 0 | NA | NA |
IAg7 | 2 | NA | NA | 0 | NA | NA | 1 | NA | NA |
IAk | 2 | NA | NA | 1 | NA | 0 | NA | NA | NA |
IAu | 1 | NA | NA | 0 | NA | NA | NA | NA | NA |
IEk | 5 | NA | NA | 5 | NA | 3 | NA | NA | NA |
Accuracy (Correct/Total) | 29 | 1.000 (12/12) | 0.625 (10/16) | 0.552 (16/29) | 0.250 (4/16) | 0.211 (4/19) | 0.250 (5/20) | 0.545 (6/11) | 0.900 (9/10) |
The ultimate goal of MHC binding peptide prediction is to identify epitopes that activate T cells. Recognition of a peptide bound to an MHC molecule by a T cell receptors is the critical step in this activation, and binding of peptide to the MHC molecule is obviously a necessary requirement
For each of the 664 peptides, we obtained H-2 IAb binding predictions from the five methods in our study that cover H-2 IAb following exactly the same procedures as predictions of simple binding. We then evaluated the methods' performance in predicting which peptides triggered an immune response. The ROC curves quantifying the performance of each method are shown in
ROC curves are generated from the predictions made by five MHC class II peptide binding prediction methods on the LCMV CD4+ T cell activation data. The AUC value for each method is shown in parentheses.
To further analyze the performance of the T cell activation prediction, we classified peptides into predicted binders and non-binders. Since different methods produce scores on different scales, we adopt a rank based classification in that we classify the top 10% highest scoring peptides as binders. We then calculated sensitivity and positive predictive value (PPV) for each method (
ARB | MHC2PRED | MHCPRED | RANKPEP | SMM-align | Consensus | |
Sensitivity | 4/9 (44.4%) | 2/9 (22.2%) | 1/9 (11.1%) | 3/9 (33.3%) | 2/9 (22.2%) | 6/9 (66.7%) |
Positive predictive value | 4/64 (6.2%) | 2/64 (3.1%) | 1/64 (1.6%) | 3/64 (4.7%) | 2/64 (3.1%) | 6/64 (9.4%) |
Our evaluation of prediction performance suggests that in all cases there is clearly room for improvement, and that no single method is dominantly better than all others. Motivated by the success of a consensus prediction approach to map MHC class I epitopes in vaccinia virus
The consensus prediction performance is shown in the last column of
The MHC-peptide affinity, MHC-peptide structure and T cell activation datasets are available as supplemental material at
In this study we have presented a comprehensive dataset for the systematic evaluation of MHC class II peptide binding prediction methods. This dataset consists of three components. The first component is a large set of 10,017 quantitative peptide-binding affinities for 16 MHC class II types that significantly expands the amount of publicly available data. These data were generated under identical experimental conditions and comprise affinities for binders as well as non-binders. The second component is a set of non-redundant structures of MHC class II molecules complexed with peptide ligands compiled from the PDB. This set of structures provided a “gold standard” for evaluating the ability of prediction methods to locate the 9-mer core of epitopes. The last component is a set of 664 peptides that has been tested experimentally to determine their ability to stimulate CD4+ T cells from widely utilized C57BL/6 (H-2b) strain of laboratory mice. Together, these datasets serve as a benchmark set to facilitate the development and testing of algorithms for predicting peptide binding to MHC as well as T-cell responses.
Several previous studies have compared the performances of various MHC class II binding prediction methods
We have carried out a comprehensive unbiased evaluation of existing MHC class II epitope prediction algorithms using these datasets. Except binding prediction for ARB, all the other MHC class II prediction algorithms are evaluated in a completely blinded fashion. In our analysis, the better performing methods proved to be those that are based on quantitative matrices extended by method specific features. For example, SMM-align is the only method tested that considers the contribution of residues outside of the binding groove, and TEPITOPE is the only method whose matrices are based on experiments aimed to determine individual amino acid's contribution to binding. Merely using quantitative matrices alone is not sufficient to ensure good performance, since pure position specific scoring matrix based methods such as RANKPEP and SYFPEITHI do not perform as well.
One potential reason for the differential performance of various methods is the likely different number of data points utilized by the various methods in the training stage. In this respect, we anticipate that the datasets described herein, and now made publicly available, could be utilized to retrain several of the methods and further increase their performance.
Despite the large number of existing MHC class II epitope prediction methods, the best performance is generally not as good as that for MHC class I methods. Indeed, it is notable that the majority of methods examined in the present study have also been employed to make predictions for MHC class I peptide binding, and almost invariably their performance is appreciably better in the context of class I
In an attempt to identify what limits the performance of MHC class II binding prediction, we tested the ability of prediction methods to identify the 9-mer peptide cores revealed in crystal structures of MHC-peptide complexes. Except for PROPRED and SYFPEITHI, the methods examined performed poorly, suggesting that difficulties in identification the correct binding core contribute to the inferior performance of class II binding prediction. It is noteworthy that the two methods with the best core predictions do not take all positions of a peptide into account when making binding predictions, but rather focus on anchor positions in the peptide. This may explain why especially the ARB method performs much poorer in the core identification rather than the binding predictions: It treats all positions in the peptide identically and relies on automated peptide alignments to derive an overall peptide profile. While this inclusion of weakly interacting positions can be an advantage to predict overall peptide binding, it may lower the accuracy when picking the correct core.
In an attempt to improve upon the prediction performance realized by individual prediction tools, we implemented a consensus approach for class II binding predictions. The consensus approach was found to clearly outperform each individual prediction approach when measured over the entire dataset, and provided the best predictions for 10 out of 14 molecules. This shows that the consensus approach is just as useful for MHC class II peptide binding prediction as its recent successful application for MHC class I molecules
Other types of meta approaches have been successfully applied to MHC binding prediction. For example, Mallios
In any case, it is also likely that the remarkable increase in performance obtained by the use of the consensus approach hinges on the fact that it combines information derived from methods trained on large numbers of data points with methods incorporating structural considerations leading to effective core predictions. We are currently working on development of algorithms specifically combining these two different features.
We also tested the ability of MHC class II binding prediction methods to predict a peptide's ability to activate CD4+ T cells. Most of the methods were associated with good performance. This was somewhat surprising since T cell activation is a multi-step process where multiple signals are needed for successful activation
In conclusion, we have presented a set of benchmarks to facilitate the evaluation and development of MHC class II binding predictions. While several good methods are available, these do not reach the performance of those for MHC class I molecules. We have shown that a simple and robust consensus approach can improve the prediction performance for the great majority of the MHC class II molecules tested. Finally, we speculate that novel approaches that capture distinct features of MHC class II peptide interactions could lead to more successful predictions than the current approaches, which are commonly developed as extensions of MHC class I predictions.
Peptides utilized for the assessment of MHC binding, antigenicity and immunogenicity were purchased as crude material from Mimotopes (Minneapolis, MN and Clayton, Victoria, Australia), Pepscan Systems B.V. (Leylstad, Netherlands) or A and A Labs (San Diego, CA). Quality control analyses of crude syntheses were performed by mass spectrometry on randomly selected peptides. Peptides selected for additional deconvolution and HLA peptide binding assays were resynthesized by A and A as purified material. Peptides were purified to >95% by reversed-phase HPLC, and the purity assessed by amino acid sequence and/or composition analysis.
Quantitative assays to measure the binding affinities of peptides to purified soluble class II molecules are based on the inhibition of binding of a radiolabeled standard peptide. Binding assays were performed essentially as described previously
The assembled MHC class II peptide binding affinities are listed in
Structures of MHC class II were retrieved from the Protein Data Bank with a keyword search (using keyword “MHC class II”). The retrieved structures were then examined to select complexes have epitopes with at least 9 amino acids. In addition, the structures were examined to identify entries with identical MHC and binding peptide sequences. For duplicated structures of the same MHC and epitope, we retained the structure with the highest resolution. The final dataset contains 29 non-redundant structures.
The eight MHC class II binding prediction tools evaluated in this study are listed in
In terms of the number of MHC class II types covered, the two TEPITOPE based methods (PROPRED and SVMHC) have the broadest coverage with 51 types, 11 of which also appear in our dataset. The next most comprehensive method is RANKPEP which covers 46 types, 16 of which overlap with our dataset. ARB, MHC2PRED and SMM-align make predictions for about 20 MHC class II types and the majority of the types (15 to 16) also appear in our dataset. The three remaining methods (MHCPRED, SVRMHC and SYFPEITHI) have less coverage, as they only predict peptide binding for 5 to 6 MHC class II types in our dataset.
We identified eight publicly available MHC class II prediction tools through literature search and the IMGT link list at
For the ARB evaluation, the 10-fold cross validation results stored at IEDB was used to estimate performance since ARB was trained on datasets overlapping with the one used in this study. For the other seven tools in the evaluation, we wrote python script wrappers to automate prediction retrieval. For the SYFPEITHI prediction, we patched each testing peptide with three Glycine residues at both ends before we submitted it for prediction. This was recommended by the creators of SYFPEITHI method to ensure that all potential binders are presented to the prediction algorithm. For all other methods, the original testing peptides were submitted directly for prediction. Peptide sequences were sent to the web servers one at a time and predictions were extracted from the server's response. To assign a single prediction for peptides longer than nine amino acids in the context of tools predicting the affinity of 9-mer core binding regions, we took the highest affinity prediction of all possible 9-mers within the longer peptide as the prediction result.
For each MHC class II molecules whose binding can be predicted by three or more algorithms, we employed the following approach to generate a consensus prediction. First, we selected the top three methods that give the best performance. For each method, the tested peptides are ranked by their scores with higher ranks for better binders. For each tested peptide, the three ranks from different methods are then taken and the median of the three is calculated. This median rank is taken as the consensus score.
Receiver operating characteristic (ROC) curves
C57BL/6 (H-2b) mice were purchased from The Jackson Laboratory (Bar Harbor, ME), and infected intraperitoneally with 2×105 PFU of LCMV Armstrong (i.p.). Spleens were harvested eight days post infection, and IFN-γ ELISPOT assays were performed as previously described
AUC values for the tested MHC class II binding prediction methods using different cutoffs. The cutoffs for binders were varied from 50 nM to 5000 nM.
(0.03 MB XLS)