Conceived and designed the experiments: TJ. Performed the experiments: AW YP XD. Analyzed the data: AW YP YS. Wrote the paper: AW YP YS TJ.
The authors have declared that no competing interests exist.
The variants of human influenza virus have caused, and continue to cause, substantial morbidity and mortality. Timely and accurate assessment of their impact on human death is invaluable for influenza planning but presents a substantial challenge, as current approaches rely mostly on intensive and unbiased influenza surveillance. In this study, by proposing a novel host-virus interaction model, we have established a positive correlation between the excess mortalities caused by viral strains of distinct antigenicity and their antigenic distances to their previous strains for each (sub)type of seasonal influenza viruses. Based on this relationship, we further develop a method to rapidly assess the mortality burden of influenza A(H1N1) virus by accurately predicting the antigenic distance between A(H1N1) strains. Rapid estimation of influenza mortality burden for new seasonal strains should help formulate a cost-effective response for influenza control and prevention.
In epidemiology, investigators usually rely on surveillance data to assess the impact of an influenza virus on human health. However, accurate assessment of the influenza mortality burden at the early stage of influenza infection is rather challenging because the early influenza surveillance data are very limited and prone to bias as well. This speaks to an urgent need for the development of a more effective method for rapid and accurate estimation of influenza mortality burden. By proposing a novel host-virus interaction model, we have established a quantitative relationship between the antigenic variation of human influenza virus and its mortality burden. Based on this relationship, we further develop a method to rapidly assess the mortality burden of influenza A(H1N1) virus by accurately predicting the antigenic distance between A(H1N1) strains. We believe that our work will help develop a timely and sensible influenza preparedness programme that balances the gains of public health with the social and economic costs.
Seasonal influenza viruses have been and will continue to be a significant threat to public health
In epidemiology, investigators usually rely on the surveillance data to assess the impact of an influenza virus on human death by estimating its case fatality ratio (CFR), the ratio of the number of deaths caused by the virus to the number of the diagnosed cases of the virus infection
It is generally assumed that the extent of an influenza virus to alter its antigenicity and to escape the pre-existing immunity in the human population determines its intensity of infection at the population level
To explore the relationship between antigenic variations of influenza viruses and the excess mortalities they cause, we proposed a simplified host-virus interaction model (
The V1. V2, V3, … represent the first, second, third, … antigenic strain that circulated prior to a novel antigenic strain. To represent the ability of a novel strain to escape the pre-existing human immunity, we introduce a metric, integrated antigenic distance (D), as a linear combination of the antigenic distances between the novel virus and its previous antigenic strains (d).
On the basis of the simplified host-virus interaction model, to establish correlation between the antigenic variations of an antigenic strain and the total excess mortality it may cause, we first looked into the contribution of a previous antigenic strain to induce pre-existing immunity and cross protect infection by a challenging strain.
Virus (sub)type | Correlation method | Previous antigenic strain |
||||
1st | 2nd | 3rd | 4th | 5th | ||
A(H1N1) | Spearman | 0.64(0.14) | −0.12(0.83) | - |
- | |
Pearson | 0.65(0.12) | −0.25(0.63) | - | - | ||
A(H3N2) | Spearman | 0.21(0.46) | −0.12(0.69) | 0.13(0.65) | 0.02(0.95) | |
Pearson | 0.31(0.29) | −0.08(0.77) | 0.19(0.53) | 0.17(0.56) | ||
B | Spearman | 0.26(0.46) | 0.57(0.11) | 0.07(0.88) | −0.58(0.23) | |
Pearson | 0.36(0.31) | 0.53(0.14) | 0.14(0.77) | −0.5(0.31) |
The numbers in parenthesis indicate the P-values of corresponding coefficients. The largest coefficient for each (sub)type is highlighted in bold.
The previous
Not applicable due to the limited number of antigenic strains.
Interestingly, for type B virus, although there is a positive correlation between the total excess mortality caused by the challenging strain and its antigenic distances to the previous three antigenic strains, the best correlation is with the third antigenic strain (PCC = 0.84 (P-value = 0.009); SCC = 0.74 (P-value = 0.045)).
To further investigate the effect of previous antigenic strains combined on the excess mortality of a challenging strain, we first integrated the antigenic distances between the challenging strain and the previous antigenic strains of different numbers (see
Virus (sub)type | Correlation method | No. of background strains | ||||
1 | 2 | 3 | 4 | 5 | ||
A(H1N1) | Spearman | 0.64(0.14) | 0.75(0.066) | - |
- |
|
Pearson | 0.79(0.03) | 0.85(0.03) | - |
- |
||
A(H3N2) | Spearman | 0.53(0.05) | 0.57(0.03) | 0.28(0.33) | 0.27(0.35) | |
Pearson | 0.57(0.03) | 0.51(0.06) | 0.39(0.17) | 0.34(0.24) | ||
B | Spearman | 0.26(0.46) | 0.55(0.13) | 0.74(0.045) | 0.71(0.14) | |
Pearson | 0.36(0.31) | 0.48(0.19) | 0.73(0.04) | 0.67(0.14) |
Virus (sub)type | Correlation method | No. of background strains | ||||
1 | 2 | 3 | 4 | 5 | ||
A(H1N1) | Spearman | 0.64(0.14) | 0.75(0.066) | - |
- |
|
Pearson | 0.79(0.03) | 0.85(0.03) | - |
- |
||
A(H3N2) | Spearman | 0.53(0.05) | 0.57(0.03) | 0.28(0.33) | 0.27(0.35) | |
Pearson | 0.57(0.03) | 0.51(0.06) | 0.39(0.17) | 0.34(0.24) | ||
B | Spearman | 0.26(0.46) | 0.55(0.13) | 0.74(0.045) | 0.71(0.14) | |
Pearson | 0.36(0.31) | 0.48(0.19) | 0.73(0.04) | 0.67(0.14) |
The numbers in parenthesis indicate the P-values of corresponding coefficients. The largest coefficient for each (sub)type is highlighted in bold.
Not applicable due to limited number of antigenic strains.
The genetic distance is another metric widely used to quantify the genetic variation between viruses. We further analyzed the correlation of influenza virus excess mortality and genetic distance for A(H1N1), A(H3N2) and B viruses (
The remarkable correlation between antigenic distance and excess mortality opens a new avenue to estimate the mortality burden of a novel antigenic variant that could potentially cause an influenza epidemic or pandemic. Here we sought to develop an approach to rapidly estimate the mortality burden of influenza A(H1N1) viruses, which is the most common cause of influenza (flu) in humans. To develop the approach, we first need to establish a quantitative relationship between the mortality burden of an A(H1N1) antigenic variant and its antigenic distances to previous antigenic strains, and then develop a computational model to predict antigenic distances between A(H1N1) viral strain based on their HA sequences.
To establish a quantitative relationship between the mortality burden and antigenic variation for A(H1N1) virus, we considered the integrated antigenic distance between a challenging strain and the previous two antigenic strains because the Pearson test gave the best correlation with statistical significance for the previous two antigenic strains.
(A) The nonparametric (the dashed line) and ordinary linear (the black line) regression between the excess all-cause mortalities caused by A(H1N1) antigenic strains and their integrated antigenic distances to the previous first and second antigenic strains. The nonparametric regression is done using the local polynomials method (called loess method
The recent seasonal A(H1N1) virus A/Brisbane/59/2007, which started to circulate in humans since the 2007–2008 season, has caused excess mortality of 50 per million as of the 2008–2009 season (
To develop a rapid tool to predict the excess mortality to be caused by a novel A(H1N1) antigenic variant, we need to determine its antigenic distances to its previous two antigenic strains. However, determining the antigenic distance between viruses using experiments such as hemagglutination inhibition (HI) assay is time-consuming and labor-intensive
(A) Six derived antigenic epitopes (Sa, Sb, Ca, Cb, Pa and Pb) on human influenza A(H1N1) HA protein were considered as the structural basis underlying the interactions between HA and neutralizing antibodies. They were marked on the surface of the structure model of HA of A/putertorico/8/34 (H1N1) virus (PDB ID: 1RVZ) using different colors. (B) A cartoon illustrating the physicochemical mechanisms underlying an epitope-mediated interaction between HA and antibody. Salt-bridge interaction was shown by a link between two charged atoms. Hydrogen bonding was represented as a link of “-OH—N-”. Hydrophobic microenvironment was described as a cluster of hydrophobic groups highlighted in orange. (C–D) The prediction performances on the training data (C) and testing data (D). Black lines reflect linear fit with a zero intercept. The linear fits to the training data and testing data yield correlation coefficients of 0.79 and 0.80 respectively. The details for the model description see the
We further evaluated how well the predicted antigenic distance, also called EADpred distance, of A(H1N1) virus correlates with the observed excess mortality. We carried out the same correlation analyses as we did for the observed antigenic distance described above by substituting the observed antigenic distances with the EADpred distances (
(A–B) The correlation between excess all-cause mortality (black) and EADpred (A) /genetic (B) distance to previous single antigenic strains. The symbols Ο, □, Δ and × indicate the excess all-cause mortality and the antigenic/genetic distances between antigenic strains as challenging strains and the previous first, second, and third antigenic strains, respectively. Information about the antigenic strains sees
In this study, by proposing a simplified virus-host interaction model, we have discovered a direct and positive correlation between the extent of antigenic variation of an influenza virus and the total excess mortality it may cause. The impact of influenza on human death has been a long-standing focus of influenza studies. Many factors have been thought to contribute to influenza mortality burden including viral factors such as the pathogenicity-related molecular markers
A major challenge in correlation analysis of the antigenic variation of influenza virus and its mortality burden is to attribute the excess mortality to a specific influenza strain. Although the excess mortality attributed to all influenza viruses in a given period of time can be inferred with high confidence from the mortality data reported
Although the correlation between the antigenic variation of influenza virus and its mortality burden is impressive, a major limitation of our work is that most of the correlation coefficients have large confidence intervals (see
Human influenza A viruses will continue to have significantly negative impact on public health and cause substantial morbidity and mortality. Timely and accurate estimation of their impact on human death will help formulate more sensible and cost-effective influenza prevention and control policies. The discovery of a significantly positive correlation between antigenic variation of an influenza virus and its excess mortality has allowed us to further establish a quantitative relationship between them. In addition to A(H1N1), we also quantified the relationship between antigenic variation and excess mortality for influenza A(H3N2) and B virus (see
Since the current experimental methods in determining antigenic distances between viral isolates are time-consuming, we further proposed a sequence-based approach, EADpred, to predict antigenic distance between A(H1N1) viral strains, which enables us to rapidly assess the mortality burden of an A(H1N1) antigenic variant. Our method only relies on the HA sequence data of the influenza viruses rather than the surveillance data, which offers a rapid and reliable tool to assess the potential impact of an influenza virus on human death even before infection occurs in humans. Since a rapid and accurate prediction of influenza mortality burden should greatly help develop a timely and sensible preparedness programme that balances the gains of public health and the social and economic costs, we believe that our method will be very useful for rapid assessment of the influenza mortality burden of other future A(H1N1) variants, and is also applicable to the antigenic variants of human A (H3N2) and B viruses with proper modification.
HI data, HA sequences, the US mortality data, the US population data and other surveillance data regarding influenza in the US including number of total respiratory specimens tested for influenza and positive isolates of three human influenza viruses, A(H1N1), A(H3N2) and B from season 1977–1978 through 2008–2009 were collected from published records, documents or databases (see ). The antigenic strains for each (sub)type of human influenza virus were defined based on the vaccine strains recommended by World Health Organization (WHO) or the reference strains used by US Centers for Disease Control and Prevention (CDC) in influenza surveillance. These strains were selected from the US CDC reports or related documents, which were required to be dominant (comprising >50% of the total isolates of the same (sub)type) in at least one flu season based on the influenza surveillance by the US CDC. Their actual circulation time was based on the influenza surveillance carried out by the US CDC. See
Antigenic distance between two strains
The excess all-cause mortalities for seasons 1977–1978 through 2008–2009 were calculated by following Simonsen's method
For the correlation analysis, we only considered the antigenic strains that have completed the whole circulation (from the beginning of circulation to the end of circulation) from 1977 through 2009 (
The development of the EADpred consists of four steps described in brief as follows (details see
We have derived six antigenic epitopes in the HA of A(H1N1) virus, including four expanded known antigenic epitopes (Sa, Sb, Ca and Cb) and two novel antigenic epitopes (Pa and Pb) (
The amino acid changes in an antigenic epitope were transformed into a linear combination of physiochemical proterties as follows:
To predict the antigenic distances between two viral strains (d), we considered a linear combination of the changes in physicochemical properties in all the six derived antigenic epitopes:
Then the Equation 4 and 5 were combined into one equation, which is re-represented as follows:
The relative weights of the Equation 6 were parameterized on the training dataset using a stepwise multiple regression. After regression, we found a certain linear correlation between antigenic distance and the number of terms with non-zero weight. Therefore, to achieve a better prediction performance, we added in our previous model another term,
In this study, all statistical analyses including the use of Spearman and Pearson correlation methods were carried out using the statistical package R
The leave-one-out cross validation of the linear regression for analyzing the relationship between the excess all-cause mortality caused by an antigenic strain and its integrated antigenic distance to its previous two strains. Each time, the excess all-cause mortality caused by an antigenic strain and its integrated antigenic distance to its previous two strains were removed and a linear equation was fitted to the remaining data. Using the fitted equation, we then predicted the total excess mortality caused by the antigenic strain based on its integrated antigenic distance to its previous two antigenic strains.
(0.34 MB TIF)
The scatterplot of the sum of (sub)type-attributed excess mortality that we calculated and the reported excess mortality in each season. The red, blue and green points represent the A(H1N1), A(H3N2) and B dominant seasons respectively. A (sub)type is defined to be dominant in the season when its ratio of virus isolates is the biggest in that season. The black line is the diagonal line of the plot. The boxed dots are those with large deviations.
(0.42 MB TIF)
The nonparametric (the red line) and robust logarithm (the black line) regression between the excess all-cause mortality and the antigenic distance to the previous first antigenic strain for influenza A(H3N2) virus. The nonparametric regression is done using the loess method with span 1.5. The equation and its R-squared shown on the plot are for the robust logarithm regression.
(0.34 MB TIF)
The nonparametric (the red line) and robust logarithm(the black line) regression between the excess all-cause mortality and the antigenic distance to the previous third antigenic strain for influenza B virus. The nonparametric regression is done using the loess method with span 1.5. The equation and its R-squared shown on the plot are for the robust logarithm regression.
(0.33 MB TIF)
The seasonally virus isolates, antigenic strains and excess all-cause mortalities of human influenza A(H1N1), A(H3N2) and B from the year 1977 through 2009.
(1.07 MB TIF)
The Spearman and Pearson Correlation Coefficients between the excess all-cause mortalities and the genetic distances to its previous individual antigenic strains. The numbers in parenthesis are the P-values of the corresponding coefficients. The largest coefficient for each (sub)type is highlighted in bold. a: The previous i-th antigenic strain is the i-th antigenic strain prior to an antigenic strain that is considered as a challenging strain. b: Not applicable due to the limited number of antigenic strains.
(0.03 MB DOC)
The Spearman and Pearson Correlation Coefficients between the excess all-cause mortalities and the integrated genetic distances relative to the previous 1–5 antigenic strains as background strains. The numbers in parenthesis are the P-values of the corresponding coefficients. The largest coefficient for each (sub)type is highlighted in bold. a: Not applicable due to limited number of antigenic strains.
(0.03 MB DOC)
The classical and robust regression analysis of the relationship between the antigenic distance and the excess mortality for human A(H1N1) using five different equations. The table lists the function, R-squared and P-value for each regression.
(0.04 MB DOC)
The performance comparison between the EADpred method and one of the best site-based methods in predicting antigenic variants (see
(0.03 MB DOC)
The confidence interval of the Spearman and Pearson Correlation Coefficients between the excess all-cause mortalities and antigenic distances to previous individual antigenic strains. The numbers in parenthesis are the 95% confidence interval of corresponding coefficients. The numbers in red are the coefficients with P-value smaller than 0.05. a: The previous i-th antigenic strain is the i-th antigenic strain prior to an antigenic strain that is considered as a challenging strain. b: Not applicable due to the limited number of antigenic strains.
(0.03 MB DOC)
The confidence interval of the Spearman and Pearson Correlation Coefficients between the excess all-cause mortalities and the integrated antigenic distances relative to the previous 1–5 antigenic strains as background strains. The numbers in parenthesis are the 95% confidence interval of corresponding coefficients. The numbers in red are the coefficients with P-value smaller than 0.05. a: Not applicable due to limited number of antigenic strains.
(0.03 MB DOC)
The classical and robust regression analysis of the relationship between the antigenic distance and the excess mortality for human A(H3N2) using five different equations. The table lists the function, R-squared and P-value for each regression.
(0.04 MB DOC)
The classical and robust regression analysis of the relationship between the antigenic distance and the excess mortality for human B virus using five different equations. The table lists the function, R-squared and P-value for each regression.
(0.04 MB DOC)
Antigenic distances between antigenic strains for human influenza A(H1N1), A(H3N2) and B, and antigenic distances between A(H1N1) viruses used for developing the EADpred method.
(0.54 MB DOC)
Six predicted epitopes of the A(H1N1) HA protein. a: The epitopes are extended from the known epitopes based on references 12–14. b: Two predicted novel antigenic eptiopes supported by references 17 and 18.
(0.03 MB DOC)
Values of five selected physiochemical properties of the 20 amino acids. a: The hydrophobic values came from the BLAS910101 entry in AAindex database
(0.05 MB DOC)
Supporting methods, legends for supporting tables and figures.
(0.10 MB DOC)
We would like to thank Drs. Xiaoying Koh, Jianzhu Chen of MIT and Genhong Cheng of UCLA for critical review of the manuscript, Dr. Minghua Deng and his student Lin Hou of Beijing University for help on the statistical analysis, and members of Jiang lab for help and discussions. We are also grateful for the three anonymous reviewers, whose insightful comments have helped us to improve our work.