The authors have declared that no competing interests exist.
Conceived and designed the experiments: TS EBB HSC. Performed the experiments: TS. Analyzed the data: TS. Wrote the paper: TS EBB HSC.
Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bi-stable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed.
Proteins are essential molecules for performing a majority of functions in all biological systems. These functions often depend on the three-dimensional structures of proteins. Here, we investigate a fundamental question in molecular evolution: how can proteins acquire new advantageous structures via mutations while not sacrificing their existing structures that are still needed? Some authors have suggested that the same protein may adopt two or more alternative structures, switch between them and thus perform different functions with each of the alternative structures. Intuitively, such a protein could provide an evolutionary compromise between conflicting demands for existing and new protein structures. Yet no theoretical study has systematically tackled the biophysical basis of such compromises during evolutionary processes. Here we devise a model of evolution that specifically recognizes protein molecules that can exist in several different stable structures. Our model demonstrates that proteins can indeed utilize multiple structures to satisfy conflicting evolutionary requirements. In light of these results, we identify data from known protein structures that are consistent with our predictions and suggest novel directions for future investigation.
New functional proteins are likely to evolve from existing proteins. Most existing proteins, however, are under selection to conserve their existing native structure in order to maintain functionality (and also to avoid aggregation and proteolysis). Without such selective constraints, the accumulation of random mutations would soon render a protein nonfunctional. When the same gene (protein) is under two selection pressures, i.e. to evolve a new functional structure while conserving its existing structure, an adaptive conflict arises. This adaptive conflict scenario is at the heart of most contemporary theories of molecular evolution, such as the popular Neofunctionalization and Subfunctionalization models (as reviewed in
Indeed, there is increasing evidence that proteins have a significant capacity for multi-functionality. Not only are many enzymes known to exhibit promiscuity for nonnative reactions and substrates
Because actual protein sequence space is too vast for computational — let alone experimental — exploration using current resources, we rely on a well-established explicit-chain biophysical model with exhaustive sequence-to-structure mapping
In this context, our main aims here are to investigate: (i) where do bridge proteins preferably locate in sequence space, (ii) the manner in which bi-stability is distributed in the sequence-space neighborhood around bridge proteins, and (iii) the role of opposing selection pressures in the evolutionary dynamics that may take advantage of bi-stability. Toward these goals, we will first describe below the characteristics of the sequence space in our simple biophysical protein chain model. We will show that bridge proteins, and bi-stable proteins in general, have a high potential for facilitating evolution under adaptive conflicts. We will further demonstrate that this potential originates from a nonrandom distribution of bi-stability in sequence space. Subsequently, we will apply the concepts and insights gained from our simple model to real protein structures. In particular, we will describe bi-stability in a well-documented experimental case and also in a larger set of putative bi-stable proteins in the Protein Data Bank (PDB).
In our terminology,
For a protein with native-state degeneracy
(
native-state degeneracy |
2 | 3 | 4 | 5 | 6 | all 2,…,6 |
|
11018 | 6212 | 8541 | 5193 | 6690 | 37654 |
|
1421 | 1088 | 967 | 852 | 721 | 5049 |
|
12.897% | 17.514% | 11.321% | 16.406% | 10.777% | 13.408% |
|
0.542% | 0.415% | 0.368% | 0.325% | 0.275% | 1.926% |
|
1 | 3 | 6 | 10 | 15 | n/a |
|
1 | 1.449 | 1.774 | 1.959 | 2.008 | 1.551 |
|
100% | 22.426% | 5.791% | 0.469% | 0.693% | 34.264% |
A bi- or multi-stable protein (with a degenerate native state) is a bridge sequence if it has at least two 1-error mutants that fold non-degenerately into at least two different structures among the sequence's multiple native-state structures, i.e., each mutant folds uniquely to a different structure. In other words, there is at least one connection to the core of each of the two or more neutral networks. In total, for sequences with chain length
Bridges constitute almost
Results in
Having identified bridge proteins among model proteins, we now study the potential advantage of bridges for protein evolution due to the bridges' ability to provide additional viable pathways through sequence space. When comparing all 17205 pairwise combinations of the 186 extended neutral networks (with five or more sequences per core network),
Another factor that makes bridge proteins form viable connections between neutral networks is their significantly higher median native state stability (measured by the fractional population
In contrast to the complete account of bridge proteins in a simple model that we obtained by exact enumeration, it is currently not feasible to achieve the same for actual proteins. Nonetheless, we may scan the available data on proteins to identify candidates that have a high likelihood of bi-stability by using the broadest criteria for a bridge protein, viz., the existence of two distinct structures with similar stabilities that are encoded by the same amino acid sequence. A potential source of such candidates that has recently become available is the Protein Conformational Database (PCDB)
A selection of these putative bridge proteins is highlighted in
accession | protein names | organism | domain pair |
domain length | RMSD |
sequence identity (%) | average stability |
stability difference |
Q8WTS6 | Histone-lysine N-methyltransferase SETD7 |
|
1h3iA01 1mt6A01 | 134 | 3.53 | 98.53 | −3.16797 | 0.00021 |
Q9BMI9 | Purine-nucleoside phosphorylase |
|
1tcvA00 1td1B00 | 271 | 2.16 | 98.18 | −3.26465 | 0.00046 |
P01854 | Ig epsilon chain C region |
|
1fp5A02 1o0vB03 | 108 | 2.06 | 98.18 | −2.93154 | 0.00063 |
P54939 | Talin-1 |
|
1mixA02 1mk7B02 | 93 | 2.12 | 100.00 | −2.96602 | 0.00131 |
P00183 | Camphor 5-monooxygenase (Cytochrome P450-cam) |
|
1k2oA00 1gjmA00 | 429 | 2.13 | 99.75 | −3.13376 | 0.00176 |
Q94734 | Nitrophorin-4 (NP4) |
|
3c78X00 2ofmX00 | 184 | 13.09 | 100.00 | −3.22443 | 0.00204 |
P00489 | Glycogen phosphorylase (Myophosphorylase) |
|
2pyiA01 2gpaA01 | 476 | 4.08 | 99.38 | −3.39796 | 0.00251 |
Q63537 | Synapsin-2 |
|
1i7nA03 1i7lA03 | 119 | 2.15 | 100.00 | −3.08808 | 0.00266 |
Q08012 | Protein enhancer of sevenless 2B |
|
2a36A00 2azsA00 | 59 | 2.34 | 100.00 | −2.51793 | 0.00345 |
P0A0N4 | HTH-type transcriptional regulator qacR |
|
2g0eA02 1jumA02 | 137 | 2.67 | 100.00 | −3.36183 | 0.00366 |
P56210 | 50S ribosomal protein L11 (Fragment) |
|
1foxA00 2fowA00 | 76 | 6.71 | 100.00 | −2.67946 | 0.00875 |
P98170 | Baculoviral IAP repeat-containing protein 4 |
|
1f9xA00 1tfqA00 | 117 | 10.59 | 100.00 | −2.36664 | 0.00994 |
P03051 | Regulatory protein rop |
|
1b6qA00 1gmgA00 | 56 | 2.24 | 100.00 | −3.34805 | 0.01144 |
P69441 | Adenylate kinase |
|
4akeA00 2eckA00 | 214 | 7.2 | 100.00 | −3.39316 | 0.01445 |
Q7SIG1 | Hydrolase |
|
1u4nA00 1qz3A00 | 308 | 8.64 | 99.68 | −3.36802 | 0.01768 |
P01008 | Antithrombin-III (Serpin C1) |
|
1jvqL01 2gd4C02 | 140 | 17.16 | 100.00 | −2.58988 | 0.02646 |
P47992 | Lymphotactin (C motif chemokine 1) |
|
1j8iA00 1j9oA00 | 93 | 11.77 | 100.00 | −2.09003 | 0.02818 |
P69541 | Capsid protein G8P (Gene 8 protein) | Enterobacteria phage M13 | 2cpbA00 2cpsA00 | 50 | 8.53 | 100.00 | −2.90853 | 0.03448 |
Q55080 | Cytochrome P450 119 (Peroxidase) |
|
1io9A00 1f4uA00 | 366 | 2.16 | 100.00 | −3.36023 | 0.03841 |
P0C0Y1 | Light-harvesting protein B-875 beta chain |
|
1dx7A00 1jo5A00 | 48 | 10.11 | 100.00 | −2.69183 | 0.04678 |
P83917 | Chromobox protein homolog 1 (Heterochromatin protein 1 homolog beta) |
|
1ap0A00 1guwA00 | 73 | 7.42 | 100.00 | −2.30409 | 0.05826 |
Proteins are ordered by increasing stability difference (decreasing bi-stability) and they may be classified into the following functional categories: metabolism (Purine-nucleoside phosphorylase, Camphor 5-monooxygenase, Glycogen phosphorylase, Adenylate kinase, Hydrolase, Cytochrome P450 119, Light-harvesting protein B-875 beta chain), transcriptional regulation (HTH-type transcriptional regulator qacR, Regulatory protein rop), epigenetic regulation (Histone-lysine N-methyltransferase, Chromobox protein homolog 1), signalling (Synapsin-2, Protein enhancer of sevenless 2B, Baculoviral IAP repeat-containing protein 4, Antithrombin-III, Lymphotactin), translation (50S ribosomal protein L11), cytoskeleton (Talin-1, Synapsin-2), and host-pathogen interaction/immune system (Ig epsilon chain C region, Nitrophorin-4, Baculoviral IAP repeat-containing protein 4, Lymphotactin, Capsid protein G8P).
CATH
Root Mean Square Deviation of backbone atoms; in units of Å.
Rosetta standard free energy score/domain length.
The proteins in
As discussed above, our model suggests that only a small fraction of the sequence space are bridge proteins (
To elucidate this expected trend, we used our model to measure the stability difference
(
We have quantified the gradual bi-stability change around bridge proteins in our model by considering all non-redundant pairs of extended neutral networks, each with at least five core sequences, wherein the two networks in each pair are connected by at least one bridge sequence (
Inspired by the finding of the gradual sequence-space distribution of bi-stability in our simple biophysical protein chain model, we applied the methodology developed above to study the bi-stability of experimental protein structures. As a first step, we conducted an analysis of a set of sequences discovered by Alexander et al.
To this end, we used Rosetta
In this study, the PDB structures of
The FastRelax
In contrast to the Rosetta/FastRelax method, FoldX fixes the main chain and only optimizes the side chains. A possible consequence of this limitation is that the absolute free energy scores (
Based on the results in
A very similar trend was observed in
As in the computational results in
To better understand the potentially important role of bi-stability in evolution under conflicting selection pressures (adaptive conflicts), we have also performed evolutionary simulations of sequence populations under two selection pressures. Each sequence of the combined neutral networks in
The selection pressure
The negative logarithmic steady-state populations,
Under strong selection (
In contrast, a very different steady-state distribution emerged under weak selection (
To assess the generality of the trends revealed in
We have also examined how evolutionary steady states are achieved in our model. Under weak selection, the model evolutionary dynamics prior to achieving steady state indicates that the population of the initial prototype sequence
To relate our model results to real proteins, we realize that it is unlikely that two selection pressures acting on the same protein are exactly equal. Indeed, the selection pressures may even change over time. To address the impact of such effects on our conclusions, we considered a model in which the selection pressures for two structures are not identical. Now, instead of using a single selection pressure
In our biophysical protein chain model,
Our analysis of an entire model protein sequence space demonstrates that viability of proteins with degenerate native states can confer an advantage under adaptive conflicts. In such situations, extensive overlaps exist between stability funnels of neutral networks, with bi-stable bridge proteins situated at the interface between networks. Although detailed characteristics of real protein sequence space remain to be elucidated, based on our model results we have little doubt that the investigation of bi-stability and evolvability is a promising area of future research. Bi-stability, however, cannot be the only evolutionary response to adaptive conflicts, because the two alternative conformations are mutually exclusive and thus the function of the protein can never be fully optimized. In this regard, the role of gene duplications would also be crucial. We leave this topic for another study
Here we have employed a simple biophysical protein chain model to infer general properties of bi-stable proteins and their distribution in sequence space. The model used here is a simple exact model with an explicit representation of the protein conformations on a two-dimensional lattice. Despite their simplicity, such models capture essential features of the sequence-to-structure mapping of real proteins (see
While the fraction of actual bridge proteins is unknown, one may speculate how the HP model relates to real proteins. For example, consider the following argument: Our simple model only allows for 10 different energy states (
The consequence of bi-stability landscapes (
The simple fitness function in the present study rewards increased protein stability. This fitness function has provided significant insights; but it does not fully capture the subtle relationship between conformational stability and biological function in real proteins
Future work should also improve the computational methods for determining bi-stability changes of
Our evolutionary simulations (
The increasing knowledge of promiscuous enzymes and the high evolvability of new enzyme functions
The theory of neutral networks is impacted by the inclusion of degenerate native-state structures in that the notion of “neutrality” is moderated. While the strictest definition of neutrality (no change in protein activity/stability whatsoever) is usually not realistically applicable, a weaker definition (neutral, if the overall native structure is conserved, but a small loss of stability is tolerated) can be reconciled with experimental data. One can also go one step further and define neutral networks as
The intrinsic mutational robustness of neutral networks has been proposed to promote evolvability, i.e. the capacity to evolve towards new phenotypes
Draghi et al.
The true robustness and evolvability parameters of proteins remain largely unknown. It appears plausible, however, that proteins may have become the dominant type of biopolymer (as opposed to RNA, or other unknown biopolymers that might have existed during early stages of evolution), in part because they produce the right balance between robustness and evolvability that allows for fast adaptation.
Bi-stability as a factor for protein evolution (as opposed to conformational changes that are part of the same protein function) is currently based on a few mostly artificial example cases, but has not been widely observed in natural settings. This may be caused, in part, by experimental limitations in protein structure determination, and possibly also by a lack of research focus. Conformational diversity, as a more general case of bi-stability, has only recently gained broader attention
Our model folds polymers of length
If the number of HH contacts in
In our model, the fitness of an HP sequence
Let
Population dynamics were calculated from an initial state (
The above master-equation approach presupposes an effectively infinite population. To assess the effect of finite population on steady-state distributions, we have also conducted Monte Carlo simulations under the same general conditions with respect to selection pressure and mutation rate (
For illustrative purposes, the sequences belonging to the two adjacent neutral networks in
The NMR model 1 of
In the Rosetta approach (PyRosetta v2.0 implementation
In the FoldX approach, the mutagenesis engine (“BuildModel”) and the standard energy function of FoldX were used to generate and evaluate the sequence variants. For each sequence, the “Repair” function of FoldX was used to optimize the side-chain orientations. In contrast to the Rosetta approach that allows for movement of all atoms to achieve local optimization of the structure, FoldX (version 3.0)
To determine the hydrophobic contact density
All 7989 redundant protein structure clusters were obtained from the protein conformational database PCDB (version 2, August 2011)
Excel table containing accession numbers of potential bridge proteins in the PCDB database.
(XLS)
(TIFF)
(TIFF)
(TIFF)
(TIFF)
We thank Günter Wagner for pointing us toward the concept of fuzzy sets during a discussion.