The authors have declared that no competing interests exist.
Conceived and designed the experiments: TMKC. Performed the experiments: TMKC. Analyzed the data: TMKC. Contributed reagents/materials/analysis tools: TMKC LG LJ YEL JH BN PAB. Wrote the paper: TMKC JH PAB.
Gauging the systemic effects of non-synonymous single nucleotide polymorphisms (nsSNPs) is an important topic in the pursuit of personalized medicine. However, it is a non-trivial task to understand how a change at the protein structure level eventually affects a cell's behavior. This is because complex information at both the protein and pathway level has to be integrated. Given that the idea of integrating both protein and pathway dynamics to estimate the systemic impact of missense mutations in proteins remains predominantly unexplored, we investigate the practicality of such an approach by formulating mathematical models and comparing them with experimental data to study missense mutations. We present two case studies: (1) interpreting systemic perturbation for mutations within the cell cycle control mechanisms (G2 to mitosis transition) for yeast; (2) phenotypic classification of neuron-related human diseases associated with mutations within the mitogen-activated protein kinase (MAPK) pathway. We show that the application of simplified mathematical models is feasible for understanding the effects of small sequence changes on cellular behavior. Furthermore, we show that the systemic impact of missense mutations can be effectively quantified as a combination of protein stability change and pathway perturbation.
Small changes in protein sequences, such as missense mutations resulting from genetic variations in the genome, can have a large impact on cellular behavior. Consequently, numerous studies have been carried out to evaluate the disease susceptibility of missense mutations by directly analyzing their structural or functional impact on proteins. Such an approach has been shown to be useful for inferring the likelihood of a mutation to be disease-associated. However, there are still many unexplored avenues for improving disease-association studies, due to the fact that the dynamics of biological pathways are rarely considered. We therefore explore the practicality of a structural systems biology approach, combining pathway dynamics with protein structural information, for projecting the physiological outcomes of missense mutations. We show that stability changes of proteins due to missense mutations and the sensitivity of a protein in terms of regulating pathway dynamics are useful measures for this purpose. Furthermore, we demonstrate that complicated mathematical models are not a prerequisite for mapping protein stabilities to network perturbation. Thus it may be more feasible to study the systemic impact of missense mutations associated with complex pathways.
How one links genetic information to physiological outcomes is an important issue in the current ‘post-GWAS’ (genome-wide association studies) era
Interpreting the physiological effect on cells due to missense mutations in proteins is not a simple task. This is partly achievable through analyzing the increasing number of protein structures deposited in the Protein Data Bank (
The work of Kiel and Serrano suggests that integrating protein structural analysis with pathway modeling can be a useful method to facilitate the physiological annotation of missense mutations in proteins. However, the effectiveness of this approach at quantifying missense mutations located in different proteins remains unexplored. Also unexplored is the utility of this approach with simpler mathematical models, considering only the dynamics of key proteins while the remaining proteins in the pathway are omitted – this is potentially a more practical approach for achieving an improved inference of the parameter space, thereby increasing the reliability of the analysis (current ODE models describing biological pathways often contain tens or hundreds of parameters that can neither be easily measured nor calibrated experimentally). Furthermore, extensive investigation is required to determine how the approach performs when annotating missense mutations whose physiological outcomes can be clinically defined and examined.
These issues are discussed in this work by gauging the systemic impacts of missense mutations through integrating protein and pathway behavior via reduced ODE models. Here we present and discuss the measurement of a ‘systemic impact factor’ (SIF), defined as a function of free energy change (ΔΔG) and systemic control (CSpi, see Methods section ‘Control coefficient’), as a practical approach for evaluating the relative effects of missense mutations in a specific system. For mutations appearing in proteins whose complexed and uncomplexed states are both considered in the model, we calculate their maximum SIFs by taking the maximum ΔΔG between the two states. This is because the average score of the two protein states does not necessarily have a clear biophysical meaning in terms of describing the overall stability change of a mutation. Although summing the ΔΔGs calculated in the two protein states may have biophysical meaning, complications will be incurred when comparing the SIFs to other proteins that only have one conformational state analyzed in the model (either complexed or uncomplexed). Therefore, by using the maximum ΔΔGs we do not compromize the biophysical meaning of SIF and at the same time make the SIF scores more comparable across different proteins that may or may not have two states.
The benchmark includes two biological systems: (1) the fission yeast G2 to Mitosis (G2-M) transition and (2) the human MAPK signaling pathway. The first system is a well-defined system for studying the genotype-phenotype relationship as the systemic perturbation of missense mutations can be directly benchmarked to the length change of yeast cells. We use the temperature-sensitive yeast strains as experimental models, each of them containing a single missense mutation in protein Cdk1 or Cyclin B (CycB), and we measure their cell lengths at septation (septation is immediately followed by mitosis). Finally, the practicality of the SIF score in quantifying the systemic effect of missense mutations is evaluated by the correlation between the calculated SIF scores and
The G2-M transition controls when a cell enters mitosis and determines the size of a cell at the point of division into two daughter cells. In fission yeast,
(A) Identifying the target system for study. In this case we show the scheme of the G2-M model that regulates the G2 to mitosis transition in the cell cycle. (B) Mapping mutations onto their 3D structures (Cdk1 and CycB in this example) and associating them with the ODE parameters. Mutations located at or close to the active site (colored in blue) are considered to perturb the ODE rate constants that describe interactions between MPF and their regulating kinases wee1 and cdc25 (shown with blue circles). Mutations that are not in the functional sites (colored in red) are considered to perturb the ODE rate constants describing the rate of protein degradation (shown with red circles). Also, for each mutation we evaluate its ΔΔG that is considered as the perturbation of ODE parameters. (C) Calculating the CSpi that reflects the sensitivity of perturbing ODE parameters in terms of regulating the downstream reporter protein (MPF in the G2-M model). Here we show the perturbation on the degradation rate of MPF as an example: The green arrows mark the effect of perturbation on CycB concentration when cells enter mitosis, which is a result of MPF curve shifts (the red line represents wild type whereas orange and purple lines are mutant types). (D) Inferring the systemic consequences of mutations based on ΔΔG and CSpi. Mutations that have smaller or larger SIF scores are likely to have smaller or larger sizes at septation, respectively. The scale bars shown in the microscopic photos represent the average length of wild-type yeasts.
The model we present here (
|
|
|
|
|
|
|
CycB = 0.01; MPF = 0.01; Wee1 = 1.0; Cdc25 = 0.01 |
kS = 0.2; kd = 0.008; |
k′25 = 0.008; k″25 = 0.89; k′wee = 0.03; k″wee = 0.18; |
kawee = 0.61; kiwee = 0.71; ka25 = 0.80; ki25 = 0.35 |
|
Jawee = 0.90; Jiwee = 0.21; Ja25 = 0.19; Ji25 = 0.93 |
The parameters of the ODEs are: ks is the rate of CycB synthesis and is associated with the concentration of Cdk1; kd describes the degradation rate of CycB and the degradation rates of MPF. V25 and Vwee are the activation and inactivation rates of MPF, respectively. Kawee and Kiwee are the rates of Wee1 being activated by a phosphatase (which is not explicitly formulated in our model) and inactivated by MPF, respectively. Ka25 and Ki25 are the rates of Cdc25 being activated by MPF and inactivated by a phosphatase, respectively. Ja25 and Jiwee are the Michaelis constants of MPF for Cdc25 and Wee1, and Ji25 and Jawee are the Michaelis constants of a phosphatase for Cdc25 and Wee1, respectively.
Investigation of the parameter space through the replica exchange Monte Carlo algorithm (see Methods section ‘Replica exchange Monte Carlo method’) shows that the parameters in our
Here we consider each missense mutation as a perturbation to the wild-type status as described in the
The systemic perturbation of the G2-M transition (
Cell Length (µm) | Cell Length (µm) | ||||||
Strain Number | Strain Name | Mutated Protein | Residue Change | 25°C | 30°C | ||
Mean | Stdev | Mean | Stdev | ||||
275 | M35 | Cdk1 | G43E | 16.3 | 1.5 | 23.4 | 6.0 |
368 | 3w | Cdk1 | C67Y | 11.1 | 1.1 | 9.8 | 1.4 |
8 | 33 | Cdk1 | A177T | 15.2 | 1.2 | 18.2 | 1.8 |
154 | 56/130 | Cdk1 | G183E | 10.4 | 1.0 | 12.2 | 1.9 |
274 | L7 | Cdk1 | P208S | 16.4 | 1.0 | 17.6 | 2.0 |
515 | M63 | Cdk1 | G227C | 16.2 | 1.3 | 19.9 | 2.1 |
6 | NA | CycB | C379Y | 14.5 | 1.4 | 19.3 | 2.2 |
4932 | NA | CycB | W395R | 18.2 | 1.0 | 18.9 | 1.0 |
972 | WT | NA | NA | 12.8 | 1.6 | 14.5 | 1.1 |
The modeled structure of MPF shows that mutation G43E in Cdk1 is located at the interface of MPF subunits and thus is likely to have a significant effect on the stability of the MPF complex (see Methods section ‘Homology modeling of Cdk1, CycB and MPF structures’ regarding structural modeling). Mutations A177T, G183E and P208S in Cdk1 are located at or close to the active site and hence are likely to cause functional effects; C67Y and G227C in Cdk1 and W395R in CycB are at the periphery of the proteins and thus are mainly structurally related. Mutation C379Y in CycB is within a hydrophobic core and is likely to have a considerable impact on the MPF complex by destabilizing the structure of CycB (
The link between SIF and systemic perturbation (SP) can be statistically established through regression:
Hence for the G2-M model the magnitude of each SIF value indicates the degree of impact a mutation can have on the quantity of CycB, which determines when a cell enters mitosis and therefore the length of the cells.
The fundamental concept of our approach is to build a wild-type model that faithfully reflects
For the eight missense mutation studies presented here, their SIF values are calculated (
The experimentally measured cell lengths and the calculated SIF scores at 25°C and 30°C are shown in grey and black, respectively. The x-axis error bars show the standard error of cell lengths; the y-axis error bars show the standard error of SIF scores, resulting from the evaluation of ΔΔG.
Amino acid change | Target Protein | Impact Type |
ΔΔG (Cdk1/CycB) |
ΔΔG (MPF) |
Maximum ΔΔG (kcal/mol) |
CSpi | SIF | ||
kd(CycB) |
kd (Cdk1) |
Jwee+J25 |
|||||||
G43E | Cdk1 | S | 2.10 | 24.9 |
24.9 | - | 0.011 | - | 0.27 |
C67Y | Cdk1 | S | 3.17 | 1.31 | 3.17 | - | 0.011 | - | 0.035 |
A177T | Cdk1 | F | 5.97 | 3.65 | 5.97 | - | - | 0.011 | 0.066 |
G183E | Cdk1 | F | 3.72 | 4.13 | 4.13 | - | - | 0.011 | 0.045 |
P208S | Cdk1 | F | 3.56 | 2.35 | 3.56 | - | - | 0.011 | 0.039 |
G227C | Cdk1 | S | 7.69 | 7.23 | 7.69 | - | 0.011 | - | 0.085 |
C379Y | CycB | S | 31.92 |
34.56 |
34.56 | 0.004 | - | - | 0.138 |
W395R | CycB | S | 6.57 | 6.15 | 6.57 | 0.004 | - | - | 0.026 |
Each mutation is considered to have mainly functional (F) or structural (F) impact according to their locations in its target protein.
ΔΔG of the mutations in individual Cdk1 or CycB; each of them is an average value considering structures sampled from molecular dynamic simulations.
ΔΔG of the mutations in Cdk1-CycB complex (MPF); each of them is an average value considering structures sampled from molecular dynamic simulations.
Maximum of ΔΔG considering both complexed and uncomplexed states of the target protein.
Perturbation on CycB degradation was weighted 0.3 for the degradation of monomeric CycB and weighted 0.7 for the degradation of complexed CycB (MPF).
Perturbation on Cdk1 degradation was estimated through the degradation of MPF only since the amount of total Cdk1 is constant.
Perturbation on the interaction between CycB and Cdk1 was estimated through Jwee and J25 with a weighting 0.9*Jwee+0.1*J25.
The high ΔΔG is a result of van der Waals clashes when the target residue is mutated to a larger side chain.
To validate the function of our temperature-sensitive yeast strains, their lengths are also measured at the permissive temperature of 25°C: a condition that allows all the mutants and wild-type cells to grow normally, so the effect of mutation on cell length should be minimal. As shown in
The MAPK pathway plays an essential role in cell survival, proliferation, differentiation and development (
(A) A scheme of the MAPK pathway. (B) Mapping the mutations onto the three dimensional structures; mutations located at or close to the active site are colored in blue, otherwise colored in red.
|
|
|
|
|
|
|
|
|
Initial conditions (molecules cell−1): |
ShcGS = 20,000; RasGDP = 20,000; RasGTP = 0; Raf = 10,000; |
Raf* = 0; Mek = 360,000; Mek* = 0; Erk = 750,000; Erk* = 0 |
Rate constants (molecules−1 cell min−1): |
c2 = 7.7⋅10−4; c6 = 8.3; c8 = 4⋅105; c10 = 15; c12 = 4⋅10−6 |
Rate constants (molecules cell−1): |
c7 = 9⋅104; c9 = 6⋅105; c11 = 1.53⋅103 |
Rate constants (min−1): |
c1 = 69; c3 = 14; c4 = 50; c5 = 0.78 |
c1 and c2 are the rate and Michaelis constant for RasGDP activation by the Shc-Grb-Sos (ShcGS) complex, respectively; c3 is the rate for RasGTP to be converted to RasGDP; c4 is the rate for RasGTP to convert Raf from an inactive to an active form (Raf*); c5 is the rate for RasGTP to convert Raf* to Raf; c6 is the rate for Raf* to convert Mek from an inactive to an active form (Mek*); c7 is the rate for Mek* to be converted to Mek; c8 and c9 are the rate and Michaelis constant, respectively, for Mek* to convert Erk from an inactive form to an active form (Erk*); c10 and c11 are the rate and Michaelis constants, respectively, for Erk* to be converted to Erk. Finally, c12 is the rate for the ShcGS complex to be inhibited by Erk* (See simulated curves in
To benchmark the behavior of both the reduced and original model, a sensitivity analysis is performed over three target quantities of the reporter protein Erk (Methods section ‘Quantifying the change of expression curves’): the amplitude (maximum activation), duration (time until signal drops down to 50% of its maximum activation) and peak time (time of maximum activation). For the test, the initial concentration of the key proteins in both models is varied and their effects on controlling the target quantities of Erk is compared (the key proteins include ShcGS (Shc: Src homology and collagen domain protein), GS, Grb2 (growth factor receptor binding protein 2), SOS (son of sevenless homologous protein), Ras, Raf, Mek and Erk). As a result, the control coefficients in both models demonstrate a similar pattern across all three-target quantities (
(A) The reduced G2-M model. (B) The original non-reduced (Brightman and Fell) model.
Here 40 mutations associated with neuro-cardio-facial-cutaneous syndrome are collected and studied (
Unlike missense mutations in the yeast G2-M model, there are no quantitative measurements of the physiological outcomes for the mutations in the MAPK pathway that can be used to calculate the correlation with SIF scores. Hence, as an indirect way to evaluate the relationship between mutations and clinical symptoms, each mutation is represented by three SIF scores calculated according to the systemic impact on the wild-type Erk expression curve: measured as amplitude, duration and peak time differences. The trajectory of the SIFs corresponding to each mutation as a function of these three target quantities shows that mutations in Raf1, B-Raf and Mek are more likely to be overlapped in a similar region, whereas mutations in H-Ras tend to distribute in a very different trajectory to the direction of the other mutations (
(A) The reduced model; (B) the reduced model with initial conditions from Fujioka et al; (C) the reduced model with initial conditions from Fujioka et al and parameters optimized by fitting to the time course data in Fujioka et al; (D) the original non-reduced model. (E) A scheme shows the relationship between the key proteins and their clinical syndromes.
It has been demonstrated that using ensembles of simulated protein structures, rather than a single conformation as represented by a crystal or modeled structure, can improve the estimation of free energy change
A closer examination of mutant SIF scores reveals that H-Ras mutations perturb the MAPK pathway in a distinctly different manner from that of the mutations in Raf-1, B-Raf and Mek (
In this work we presented the SIF function as an effective measure for the systemic impact of missense mutations. SIF values reflect in a simple manner the fact that proteins are functional units in the cell whose interaction networks regulate cellular behavior. It is of particular interest to see that SIF scores reflect the
A potential way to improve the current correlation between SIF and systemic outcome is to consider an additional parameter λ that describes the amount of parameter perturbation caused by free energy change. Now the SIF function becomes:
When we considered only structural mutations in the G2-M model, the correlation between SIF and cell length increases from 0.69 to 0.73 (p value = 0.026). This suggests that the current SIF formula may perform much better in annotating the systemic effect of mutations whose role is more structural than functional. This could be due to the way we approximate the functional impact of a missense mutation through Michaelis constants and link its perturbation to ΔΔG as an approximation of Kd (
Although the current SIF function correlates linearly with
A very intriguing result of this study is that systemic impacts can be reasonably gauged through simple or reduced ODEs. This indicates that it is possible to study the systemic perturbation of a pathway when there is incomplete information about its components – an important observation, given the fact that the majority of biological pathways have missing components waiting to be discovered or confirmed. Another import aspect of this work is that, for the purpose of studying systemic perturbation, it is feasible to study the missense mutations through “fuzzy” parameters – that is, the systemic impact of a mutation can be extrapolated through rate constants that account for general protein-protein interactions rather than detailed enzyme catalytic reactions. Finally, the advantage of using a simpler model is also reflected in facilitating a lower chance of associating multiple parameters with a perturbation, which means the difficulty of discussing the impact of a missense mutation can be reduced.
The simplicity of the G2-M model lies in two aspects. First, it has only four major component proteins (Cdk1, CycB, Wee1 and Cdc25) used to simulate cell growth, and the model can be considered to be linear, terminating when MPF reaches a certain critical concentration. The second aspect is that, rather than capturing their time-course data, the model reflects the relationship between the component proteins. Normally this raises the difficulty of parameter optimization, as it increases the chance of converging to multiple parameter sets that all give simulation curves satisfying a particular phenotypic outcome. Fortunately, parameter inference is not a concern in this case, since the general trend of the Cspi relation between parameters is conserved, regardless of parameter variations (
In preserving the overall dynamics of the original model (
It is generally non-trivial to infer cellular phenotypes from studying pathway dynamics since many cellular functions have complex underlying mechanisms. However, the medium-to-strong correlation between the SIF values and
The SIF values simulated from the MAPK model, on the other hand, reflect a more complex relationship with phenotype. We expected that most of the mutations studied here should be projected into similar regions, as they are associated with overlapping symptoms under a broad term ‘neuro-cardio-facial-cutaneous syndrome’. However, H-Ras mutations are projected into distinctly different trajectories from the other mutations with respect to their effects on the ERK expression profile. This suggests that H-Ras mutations are likely to have different characters in terms of the disease prognosis and risk of complications depending more upon the genotype than on the phenotype. Given the clinical symptoms of patients from which the missense mutations studied here were identified (as shown in
The two systems in our study show that SIF can reflect phenotype or the underlying mechanism of missense mutations in proteins. In general, we may confidently interpret systemic impacts as an indicator for phenotype only if a reporter protein is strongly and non-redundantly linked to a target phenotype; otherwise a more reserved view would be appropriate.
One confounding factor associated with the performance of SIF is the relationship between ΔΔG of a mutation and its actual phenotypic effect. This is because different proteins may have different stability states and hence they may respond differently to the same amount of ΔΔG caused by missense mutations. The issue of benchmarking the effect of ΔΔG on different proteins has been an active topic in annotating nsSNPs. Previous studies show that proteins belonging to different structural families can respond differently to the same amount of ΔΔG, but in general a small margin of ΔΔG (1–3 kcal/mol) can be approximately used to define missense mutations that may not cause an immediate effect on protein fitness
For the proteins studied in this work, the concern of comparing the effect of ΔΔG across different proteins is likely to be alleviated due to the above reasoning. In the G2-M model, CycB and Cdk1 form a complex and hence the uncertainty of comparing ΔΔG in two different proteins is reduced. In the MAPK model, all the key proteins are kinases that share the same well-structured fold.
Another factor that may affect the performance of SIF is the complication of assigning the role of a mutation as mainly functional or structural. This issue is especially hard to deal with if a missense mutation is likely to cause long-range structural effects on its host proteins - for example, a mutation can exist far away from a functional site (and thus is considered as a structural mutation) but still affect the function of its host protein by inducing long-range conformational changes. Hence additional attention should be paid to calculating SIF for mutations located in proteins that are not well studied or have versatile conformations. For the cases studied in this work, the problem of assigning functional and structural mutations is not significant because most of the key proteins are kinases that have well-defined functional sites (see
One other factor that is associated with SIF performance is the accuracy of calculating ΔΔG. So far most of the methods for predicting ΔΔG do not show a good correlation with the experimental ΔΔG; however, they do perform well when used to estimate the average effect of mutations on protein stability
Finally, it is worth mentioning that the performance of SIF can be considerably compromised by mutations with large ΔΔG values. These mutations can be too extreme to be considered a perturbation to a target system, and hence the ODE model describing the wild-type condition is not applicable. On the other hand, large ΔΔG values can also be the result of Van der Waals clash that are often heavily penalized in ΔΔG calculations (as likely the case for mutations G43E and C379Y in the yeast G2-M model). All in all, in the cases where ΔΔG is large, caution should be taken when applying the SIF function.
Our study as a whole suggests that it is beneficial to combine multi-level knowledge to investigate the effects of missense mutations on cellular behavior. The advance in protein structure prediction techniques will particularly make the calculation of SIF more feasible, since it requires the structural information of proteins that host the target missense mutations. Overall, there is sufficient reason for us to be confident that future studies on integrating protein and pathway dynamics will become increasingly viable, as there are constant efforts across the scientific community in solving protein structures and identifying new components in biological pathways.
Simulating pathway dynamics through ODEs, as demonstrated here, provides a convenient platform for utilizing the information on protein structures. However, the application of ODEs implies two major limitations. One is in the availability of time-course data of protein expression in public resources; at the moment this is relatively low and sparse compared to that of gene expression data. This will be alleviated as more high-throughput time course data becomes available. The other limit is in our knowledge of the biological pathways – a majority of them have only been partially uncovered. A feasible way to circumvent the problem is to develop a simpler model by considering only key proteins that are essential for preserving pathway behaviors, as we have demonstrated in the case of MAPK pathway and G2-M transition.
The SIF function in its current form gives a good approximation of systemic perturbation resulting from the missense mutations in the G2-M and MAPK models. With further development on a larger dataset, especially with the inclusion of more parameters to further characterize protein function and structure, we are likely to obtain better correlations with quantitative phenotypes. The process of refining the SIF equation will tell us more about the relationships between protein function and structure, and pathway dynamics, which is one of the most important questions considered by structural biologists.
The advance of high throughput technology has enabled us to identify mutations in a large number of inter-connected pathways. It is becoming apparent that performing experiments to check the impact of individual mutations on the pathway level will be extremely time-consuming and costly, let alone monitoring all the possible cross-interactions and combinatorial effect of multiple mutations. From this perspective, multi-level mathematical modeling, such as that described here, will provide an efficient mechanism for pre-screening systemic impact in a cost-effective way. This is particularly useful for studying the etiology of complex diseases that are usually the result of accumulating multiple mutations.
Yeast strains used in this study are listed in
All the strains except strain 4932 were generated following our protocol previously published by Nurse et al.
Cells were grown to mid exponential growth (∼5×106 cells/ml) in rich media at 25°C and 30°C
Here we applied the replica exchange Monte Carlo method (REM) – also known as parallel tempting (PT) – to implement parameter inference. For a non-linear system, as represented by the G2-M and MAPK model, the energy surface is normally rugged and it is hard to ensure unbiased sampling along the uneven energy space. Nevertheless REM has been shown to be very useful for this purpose, especially at low temperatures, and has been used extensively for finite-temperature simulation of biomolecules
Practically, the exchange of replicas with different temperatures effectively generates repeated heating and annealing cycles, which avoids the parameter search from becoming trapped in a local energy minimum.
For sampling the trajectories, PEPP used the Metropolis algorithm
The iteration of the Metropolis algorithm in our model is as following:
Introduce a perturbation (Δx), whose scale is determined according to the last function shown above, to the initial parameter (X) in a target system.
Run the simulation with the initial X and the perturbed X′, which generates the respective energy E and E′.
Draw a uniform randomly number R
Return to step 1)
In both G2-M and MAPK models, was calculated given ∂pi = 0.1. For mutations that can be associated with two rate constants, e.g. X and Y, ∂pi is defined as ∂piX+∂piY = 0.1.
As mentioned above, is calculated based on ∂S that is the deviation between wild type and mutant type curves of a reporter protein. In the G2-M model, the deviation of the CycB curve is measured as the concentration change of CycB when MPF reaches a dimensionless concentration 2.0. In the MAPK model, the deviation of the Erk curve is measured in three dimensions that are commonly investigated in studying pathway behavior: (1) peak difference, i.e. the difference of the maximum activation; (2) duration difference, i.e. the difference of time until the signal drops down to 50% of its maximum activation; (3) peak time difference, i.e. the difference of the time that the curves reach its maximum activation (
We modeled Cdk1 using human cyclin-dependent kinase 2 (CDK2, PDB code: 1FIN, chain A, sequence identity shared with Cdk1: 64%) as a template. Here we employed MODELLER (version 9v2; set deviation = 4.0; number of models = 50; call routine = ‘model’)
AMBER10 is employed with the ff99SB force field
For each mutation, 100 simulated structures are sampled across the total 100 nanoseconds simulation time. The average ΔΔG is then calculated based on the Boltzmann-Gibbs distribution as discussed in
FoldX (version 3.0) is employed to calculate the ΔΔG for each mutation. Prior to the calculation of ΔΔG, the RepairPDB command in FoldX is used on each sampled structure to fix non-standard angles, distances and side-chain conformations. The default setting of FoldX is used to calculate the ΔΔG of each mutation: Temperature = 298 K, pH = 7, IonStrength = 0.050, VdWDesign = 2.
Checking the robustness of parameters in the G2-M model. (A) Error distribution of the parameter sets sampled 1,000 times with random starting points shows two major clusters of parameters. The first cluster of parameters have chi-squared errors <5 and the other have chi-squared errors between 5 to15 (the error estimates the difference between the simulated curves and experimentally observed ones). The cluster with smaller errors produce curves similar to those from the original Novak and Tyson (1993) model whereas the other with larger errors results in a flat curve of CycB and MPF. Therefore, in this case only the parameters with error <5 are considered. (B) Control coefficients (CSpi) of parameter sets that are close to local minimum of parameter inference, i.e. within the cluster that have smaller errors.
(PDF)
Asymmetric control of Wee1 and Cdc25 on the G2-M model. The absolute values of the control coefficients for Cdc25-associated reactions are larger than those for Wee1.
(PDF)
Simulated curves for the MAPK model. (A) The reduced model (solid lines) and the original Brightman and Fell model (dashed lines). (B) The reduced model with initial concentrations measured by Fujioka
(PDF)
The SIF scores of the mutations in the MAPK model considering conformational ensembles. (A) The reduced model; (B) the reduced model with initial conditions from Fujioka
(PDF)
An overall structure of the original and reduced model. (A) The original non-reduced model and (B) the reduced model.
(PDF)
The three measurements used to quantify the difference between two proteins expression curves.
(PDF)
Structural analysis of Cdk1 model. (A) The alignment of Cdk1 sequence and the template structure PDB: 1FIN. The structure features of the template are shown in the JOY
(PDF)
Structural analysis of CycB model. (A) The alignment of CycB sequence and the template structures PDB: 2JGZ and 3DOG. The structural features of the template are shown in the JOY
(PDF)
Simulated curves of the reduced and original models. (A) Comparison of the relative activation of Mek, for the Brightman and Fell model (solid lines), and the situation where the Mek activation is replaced by Eqn. 21 (dashed line). Here note [Mek*] = [MekP]+[MekPP]. (B) Comparison of the relative activation (concentration of active form, divided by initial concentration of protein) of ErkPP between the original Brightman and Fell model (solid line) and the simplified version in which the Erk activation is replaced by Eqn. 24 (dashed line). (C) Comparison of the relative activation between the Brightman and Fell (2000) model (heavy lines) and the equivalent simplified model (light lines).
(PDF)
Three-dimensional structure of a kinase. The G-rich loop is colored in green; the C-alpha helix is colored in magenta; the catalytic loop is colored in orange; the activation loop is colored in cyan. The N-lobe region is colored in black while the C-lobe is colored in grey.
(PDF)
Mutations and model parameters associated with neuro-cardio-facial-cutaneous syndrome.
(DOC)
Additional information of the
(DOC)
Characterizing structural and functional mutations. This file discusses the separation of structural and functional mutations studied in this work.
(DOC)
Formulating the reduced ODEs based on the Brightman and Fell model. This file describes the steps of producing the reduced model.
(DOC)
We would like to thank Prof. Jane A. Endicott and Dr. Edward D. Lowe in the Department of Biochemistry at University of Oxford for giving valuable advice on modeling the structure of Cdk1 and Cyclin B. We thank Prof. David A. Fell and Dr. Frances A. Brightman at Oxford Brookes University for providing the parameters of their MAPK model and for related discussions. We would also like to thank Dr. Adam Thorn in the Department of Chemistry at University of Cambridge for proofreading the manuscript.