The authors have declared that no competing interests exist.
Conceived and designed the experiments: YK JWY BP. Performed the experiments: YK. Analyzed the data: YK JWY AS BP. Wrote the paper: YK JWY AS BP.
The immune system rapidly responds to intracellular infections by detecting MHC class I restricted T-cell epitopes presented on infected cells. It was originally thought that viral peptides are liberated during constitutive protein turnover, but this conflicts with the observation that viral epitopes are detected within minutes of their synthesis even when their source proteins exhibit half-lives of days. The DRiPs hypothesis proposes that epitopes derive from
To defend the host from an infection, the immune system continuously scans cell surfaces for foreign objects. Specifically, a virus inside a cell exploits the host to make copies of its proteins; viral proteins are broken up into peptide fragments; and the fragments are displayed on the infected cell's surface, thereby allowing detection and cell-killing. How these peptide fragments for cell-surface presentation are generated remains unknown. An understanding of this step will lead to rational design of vaccines and insights into tumor immunosurveillance and autoimmunity. One possible mechanism is that the peptide fragments come from defective proteins missing either the beginning or end regions, which may result in a bias. Here, we analyzed locations of a large set of known viral epitopes, peptide fragments recognized by the immune system, within their proteins. We find that all regions of proteins are represented well by the immune system. However, there is a statistically significant bias in the central regions of proteins, which correlate with a pattern of conservation spanning the length of viral proteins. Our results suggest a combined effect of conservation and enhancement of immune responses through repeated exposures in shaping the distribution of known viral epitopes.
The immune system rapidly detects virus-infected cells through cell-surface presentation of viral peptides to T-cells despite the fact that the half-lives of source proteins are typically orders of magnitude longer than the response time (i.e. days vs. minutes)
Although the DRiPs hypothesis is well into its teens, surprisingly little is known about the nature of the DRiPs
To overcome the limitation of the current experimental approaches, we have investigated if a data-driven approach can provide insights into the nature of DRiPs. Namely, the availability of a large repository of immune epitopes stored at the Immune Epitope Database (IEDB)
Because viruses exploit the translational machinery of the host to synthesize their proteins and are thus relevant in the context of the DRiPs hypothesis, we retrieved all MHC-I restricted T-cell epitopes of viruses for which reference proteomes are available. Top 20 viruses based on number of tested peptides are shown in
Organism Name | All | Positive | Negative |
Vaccinia virus | 8612 | 414 | 8198 |
Hepatitis C virus | 1637 | 622 | 1015 |
Lymphocytic choriomeningitis virus | 1046 | 100 | 946 |
Human herpesvirus 5 | 997 | 433 | 564 |
Human herpesvirus 4 | 839 | 255 | 584 |
Influenza A virus | 655 | 303 | 352 |
Murine coronavirus | 468 | 22 | 446 |
Dengue virus | 204 | 103 | 101 |
Human respiratory syncytial virus | 187 | 32 | 155 |
Yellow fever virus | 187 | 49 | 138 |
Murid herpesvirus 1 | 184 | 42 | 142 |
Hepatitis B virus | 179 | 88 | 91 |
Equine infectious anemia virus | 155 | 90 | 65 |
Primate T-lymphotropic virus 1 | 151 | 124 | 27 |
Human papillomavirus - 16 | 130 | 76 | 54 |
Hantaan virus | 122 | 12 | 110 |
Human herpesvirus 8 | 99 | 78 | 21 |
Human herpesvirus 1 | 98 | 80 | 18 |
Theilovirus | 97 | 16 | 81 |
West Nile virus | 82 | 59 | 23 |
For each organism, total number of tested peptides as well as numbers of those with positive and negative assay outcomes are shown. A total of 93 viruses were studied. For brevity, 20 viruses with the highest number of tested peptides are shown in the table.
After determining positions of viral peptides in their reference antigens and calculating normalized positions, we constructed distributions of normalized positions for positives (i.e. epitopes) and negatives as shown in
(A and B) For each category of peptides indicated (i.e. ‘Positive’ or ‘Negative’), peptides were mapped onto all proteins of the corresponding genome with peptide similarity cutoff of 100%. For each peptide:antigen mapping, a normalized position was calculated. The set of normalized positions was plotted as a histogram. (C) To show positional bias of epitopes, the box plot shows results of 1000 bootstrap sampling of Positive and Negative data sets of normalized positions and plotting ratios of their probabilities for each bin. Boxes cover the range from 25th to 75th percentiles. Whiskers extend out from boxes 1.0 times the interquartile range.
To show positional bias of epitopes, a probability ratio plot (i.e. p(x|positive)/p(x|negative), see
The positional bias of epitopes observed is supported by results of statistical analyses on the corresponding contingency table (supplementary
A possible source of positional bias of epitopes is unequal distributions of amino acids spanning the length of antigens. For instance, the positional bias observed may have been due to hydrophobic residues being preferentially found in middle regions rather than at N- and C-termini. Such unevenness would mean that MHC alleles binding peptides with hydrophobic anchor residues would impose positional bias of epitopes. To rule out this possibility, we determined positional bias of
Our prediction strategy uses recently developed peptide:MHC-I binding algorithms, which have achieved high accuracies in benchmarks
Probability ratio plots derived from the distributions for 12 HLA supertypes are shown in
For each supertype, 9-mer peptide binding predictions were carried out and ratios of probability masses of predicted ‘binders’ and ‘non-binders’ were calculated. Peptide binding predictions were made for alleles belonging to each supertype, using SMMPMBEC method. All possible 9-mer peptides were generated from a set of viral proteins that contain at least one tested peptide from
The presented results largely reflect results reported in
A DRiP-independent factor that could explain the positional bias of viral epitopes is positional bias of protein conservation. Specifically, if ends of proteins are less conserved than the middle region and epitopes tend to be more conserved than non-epitopes, positional bias of epitopes may result. To test this possibility, we calculated conservation scores at the residue-level for proteins of the viruses (See
In
For each viral protein which also contained at least one tested peptide from
To determine if sequence conservation alone can explain positional bias of epitopes, we first had to determine the relationship between conservation and immune recognition. In
(A) Bootstrap sampling of conservation scores for positive and negative peptides are shown as two boxplots placed next to each other. The bins used are of variable lengths to ensure a sufficient count in each bin. Each bin contains ∼20% of data points. Middle positions of bins are indicated on the x-axis. The difference between the means of the two conservation score distributions is statistically significant (Welch's t-test; one-sided; p-value = 5.9×10−6). (B) Estimating probability ratios from the conservation score distributions. This is simply taking a ratio of positive and negative peptide probabilities as a function of a conservation score. Confidence intervals are derived from bootstrap sampling. (C) Estimated probability ratios as a function of normalized position, using the mapping shown in the second panel. As input, distributions of means of conservation scores shown in
Next, we combined our estimates of conservation bias over the length of a protein with our estimate of correlation between conservation and immune recognition. Using conservation as a function of normalized position in
There are two possible explanations for the observed correlation between immune recognition and conservation of peptides. One explanation is that the immune recognition machinery has evolved to preferably recognize epitopes that are conserved, as evidenced by an overlap of MHC binding motifs on conserved sequence regions found in
To determine whether there is an intrinsic enhanced immune recognition for peptides that are conserved in viral species, we retrieved viral epitopes identified in the context of
The difference between the two distributions is not statistically significant (Welch's t-test; one-sided; p-value = 0.62).
By leveraging a large set of experimentally determined epitopes from a wide range of viruses stored in the IEDB, we determined positional bias of epitopes in source antigens. The shape of positional bias curve (
The connection between positional epitope biases with protein conservation is reasonable in the context of boosting effects associated with repeated vaccine administrations. The principle idea behind boosting is that those epitopes already exposed to the immune system tend to dominate in the following exposure
In addition to explanations discussed above, there are a number of recent ones that may be relevant. First, Calis et al. have reported correlations between G+C content and potential MHC-I binders
Other investigators have also reported epitope conservation for HIV
Regarding the MHC-motif specific biases and conservation, it has been reported that predicted binding affinities of HLA molecules positively correlate with conserved regions of a wide range of viruses
In conclusion, to better understand mechanistic details of antigen processing steps involving DRiPs, positional bias of MHC-I restricted viral T-cell epitopes was measured. Our findings indicate that there is indeed such bias in antigens, where epitopes at N- and C-termini are under-represented. Although mechanisms associated with translational errors such as downstream initiation and premature termination may contribute to observed positional bias, our data indicate that differential conservation spanning protein length is an alternative explanation.
For each virus listed in
First, the query should retrieve epitopes derived from proteins newly expressed in a host cell, rather than epitopes recognized after peptide or protein immunization. Only for newly synthesized epitopes can defective ribosomal products skew the positional distribution of epitopes in antigens. To meet this requirement, we query the IEDB for epitopes identified using assays in which the ‘Immunogen Type’ is a whole ‘Organism’ (rather than an individual peptide or antigen). Second, we further limit the query to epitopes restricted by MHC class I molecules. Third, we limit the query to epitopes with ‘virus’ as the source organism.
We then grouped the epitopes retrieved with the query by viral species. As described below, we want to map all epitopes from one species to a single reference proteome. Therefore, we excluded all viruses for which we do not have reference proteomes available, resulting in a total of 93 different viruses.
To ensure consistent calculation of positional bias, we mapped all epitopes onto antigens from a single complete reference genome for each species based on sequence similarity rather than using the source antigens listed in the IEDB, which are those specified by the author mapping the epitope and are derived from different strains and are of divergent quality. For example, an author may have used truncated versions of an antigen, or epitopes may come from a polyprotein of Dengue virus, which later gets cleaved into individual products. Consequently, epitope positions can be made relative to the polyprotein or to final cleavage products.
To carry out the mapping, we used NCBI's BLAST with a default setting to search for presence of epitopes in antigens and to retrieve only those hits with exact matches in the reference genome. In addition, we required homology between the originally curated source antigen of the epitope and the antigen in the reference genome using BLAST searches with an E-value cutoff of 0.001, thereby ensuring meaningful mapping. Lastly, we required that there is only a single best match of the epitope in the reference genome to ensure that the position of the epitope in the antigen can be uniquely determined. We did not consider ties because of associated uncertainty in mapping.
To derive a measure of epitope position that is independent of protein length, a normalized position,
After mapping peptides with positive (i.e. epitopes) and negative T-cell assay outcomes onto their corresponding antigens, we calculated their normalized positions, x, as described in
To indicate positional bias, we calculated ratios of probability masses for positive and negative PMFs: p(x|positive)/p(x|negative). Absence of positional bias corresponds to a probability ratio of 1.0 for all bins. A probability ratio less than 1 indicates under-representation of epitopes while greater than 1 indicates over-representation.
To determine whether differences in conservation over the normalized position in a protein contribute to positional bias of epitopes, we estimated conservations at the residue level for proteins from the viruses using Rate4site algorithm
(TIFF)
(TIFF)
(TIFF)
(XLS)
(DOCX)