Conceived and designed the experiments: VBG MC RK. Performed the experiments: VBG JA. Analyzed the data: VBG MC RK. Contributed reagents/materials/analysis tools: VBG MC RK. Wrote the paper: VBG MC RK.
¶ MC and RC both served as co-mentors for VBG.
The authors have declared that no competing interests exist.
Understanding how novel functions evolve (genetic adaptation) is a critical goal of evolutionary biology. Among asexual organisms, genetic adaptation involves multiple mutations that frequently interact in a non-linear fashion (epistasis). Non-linear interactions pose a formidable challenge for the computational prediction of mutation effects. Here we use the recent evolution of β-lactamase under antibiotic selection as a model for genetic adaptation. We build a network of coevolving residues (possible functional interactions), in which nodes are mutant residue positions and links represent two positions found mutated together in the same sequence. Most often these pairs occur in the setting of more complex mutants. Focusing on extended-spectrum resistant sequences, we use network-theoretical tools to identify triple mutant trajectories of likely special significance for adaptation. We extrapolate evolutionary paths (n = 3) that increase resistance and that are longer than the units used to build the network (n = 2). These paths consist of a limited number of residue positions and are enriched for known triple mutant combinations that increase cefotaxime resistance. We find that the pairs of residues used to build the network frequently decrease resistance compared to their corresponding singlets. This is a surprising result, given that their coevolution suggests a selective advantage. Thus, β-lactamase adaptation is highly epistatic. Our method can identify triplets that increase resistance despite the underlying rugged fitness landscape and has the unique ability to make predictions by placing each mutant residue position in its functional context. Our approach requires only sequence information, sufficient genetic diversity, and discrete selective pressures. Thus, it can be used to analyze recent evolutionary events, where coevolution analysis methods that use phylogeny or statistical coupling are not possible. Improving our ability to assess evolutionary trajectories will help predict the evolution of clinically relevant genes and aid in protein design.
Understanding how new biological activities evolve on the molecular level has critical implications for biotechnology and for human health. Here we collect a database of mutations that contribute to the evolution of β-lactamase resistance to inhibitors and to new β-lactam antibiotics in bacterial pathogens, such as
Evolutionary biology seeks to understand how proteins rapidly evolve novel functions and adapt to new environments, while retaining their functional specificity
It has been noted that for a given protein target under selective pressure, the contribution of individual amino acid substitutions to adaptation is highly variable
Bacterial β-lactamases, enzymes that break up the functional ring of β-lactam antibiotics, are a good model system for the study of genetic adaptation. The reason is that acquisition of resistance to inhibitors and newer β-lactam antibiotics
Since the discovery of a β-lactamase known as TEM-1 in 1963, over 170 mutants have been identified in clinical environments, in addition to dozens more described in laboratory evolution experiments (reviewed in
We then focused on a network model of mutant positions involved in extended-spectrum resistance, which is the best-represented resistance phenotype class in our TEM mutant sequence database. We reasoned that generating adaptive evolutionary trajectories involves assembling combinations of mutations that fulfill the specific functional milestones required for genetic adaptation. If we assume that every mutant position represents a potential functional milestone, adaptation involves information transfer across the network. We focused on the most experimentally tractable evolutionary trajectories (trajectories involving three mutations) and identified mutation paths that facilitate the transfer of information across the network as paths of likely special significance for adaptation. The particular significance of these evolutionary trajectories identified by our analysis is demonstrated because they frequently increase protection over constituent double mutation pairs. Even though most of these trajectories had been previously described, our ability to identify them implies that our analysis has predictive value because it had no information about the original sequence context of the co-occurring pairs of mutations.
Our network approach attempts to maximize the amount of genetic information that can be derived from sequences, in the setting of rapid evolution under defined selective pressures, such as drug resistance, virulence, or immune evasion. Detailed phylogenetic or structural information is not required for our method in its current form, but our approach is amenable to the incorporation of biophysical, tertiary structure, and phylogeny variables.
In order to study how new biochemical activities arise during evolution, we compiled a database of TEM mutant sequences that have evolved under antibiotic selective pressure. Our database includes clinical (n = 144
Mutations within TEM β-lactamase sequences can be grouped into the following three phenotypic classes, corresponding to specific functional selections: mutations associated with resistance to penicillins and some earlier generation cephalosporins (broad-spectrum resistance or Class 2b), mutations conferring resistance to later generations of cephalosporins and monobactams (extended-spectrum resistance or Class 2be), and mutations that make β-lactamases resistant to inhibitors (inhibitor resistance or Class 2br)
Our first assumption was that a majority of mutations present in our database would have undergone a degree of positive selection. This assumption was based on the fact that the rapid evolution of β-lactamases in recent years has been linked with the widespread use of antibiotics
Our second hypothesis was that co-occurrence of pairs of mutated residue positions within the same sequence is indicative of a functional relationship between these positions. We constructed an undirected, weighted network representation of co-occurring residue pairs in order to map out potential functional interactions underlying the evolution of β-lactamase under antibiotic selective pressure. In this network model (shown in
The network was constructed based on frequencies of co-occurring mutated residue positions in 363 mutant TEM β-lactamase sequences. Node size is proportional to how well connected a node is to its neighbors and how many neighbors it has (weighted degree centrality,
The network was constructed in the same way as
The weighted degree distribution of the network,
The TEM coevolution network also has a modular structure, with a modularity score
On a narrower level, within the two adaptive community networks (the extended-spectrum and inhibitor-resistant community networks), we found subcommunities,
We reasoned that by analyzing the connectivity of the TEM β-lactamase coevolution network, we could extract functional information about amino acid residue positions in this enzyme. We focused our analysis on the extended-spectrum community, which is the adaptive community network based on the largest number of available mutant sequences.
We used the occurrence count of mutations at a given position as an indication of functional importance for extended-spectrum β-lactamase resistance (
Residue Number |
Count within Data- base | Node Degree Rank | Node Between-ness Rank | Described Function | References |
104 | 48 | 1 | 1 | The long K chain of E104K mutants interacts directly with carboxylic acid group of the substrate. | |
164 | 48 | 2 | 2 | Forms two salt bridges, to E171 and D179, critical for correct positioning of E166. The smaller mutant chain collapses the Ω-loop, resulting in an active site with greater accessibility. | |
238 | 38 | 3 | 3 | Expands the active site either by repositioning the B3 β-strand or by tilting the Ω-loop | |
240 | 31 | 4 | 4 | Interacts with substrate; possibly stabilizing. | |
182 | 27 | 5 | 5 | Increases the thermodynamic stability of the protein; could suppress misfolding and aggregation caused by other mutations. Acts as a global suppressor. | |
265 | 20 | 7 | 9 | Unknown mechanism. Possibly important for enzyme stability. |
|
237 | 9 | 6 | 8 | Introduces another H-bond with carbonyl group of β-lactam ring. | |
173 | 5 | 9 | 6 | Increased resistance, specific for subset of cephalosporins. |
|
120 | 3 | 17 | 8 | Unknown mechanism. Possibly important for enzyme stability. | |
254 | 3 | 8 | N/A | Unknown mechanism. Possibly stabilizing. | |
51 | 2 | 15 | 7 | Unknown mechanism. Possibly important for both enzyme activity and stability. | |
268 | 2 | 10 | 8 | Unknown mechanism. Possibly stabilizing. |
|
Degree centrality rank is based on how well connected a node is to its neighbors and how many neighbors it has (
*Based on Ambler TEM β-lactamase numbering scheme
Each link in the TEM coevolution network represents a potential step within an adaptive evolutionary trajectory. Although, by construction, all two-node paths have been seen in natural or laboratory evolution, by defining longer paths within the network, we should be able to derive evolutionary trajectories consisting of more than two mutations. We chose to analyze two-edge (three-node) shortest paths, each of which represents an evolutionary trajectory that produces a triple mutant sequence, because they are the most tractable to enumerate and explore.
Our hypothesis was that adaptive evolution often involves discrete steps in the form of functional modifications: improved active site fit to a new substrate, suitable chemical environment in the active site, increased thermodynamic stability,
We identified evolutionary trajectories of special significance for adaptive evolution based on shortest path betweenness-centrality — a metric that can be interpreted to measure the efficiency of information transfer through the network. We found that a subset of all possible three-node paths in the network (48 out of 214) had a shortest path betweenness centrality greater than zero. These triple mutant trajectories are listed in
Evolutionary Trajectory | BetweennessCentrality | Count | Previously Reported in Clinical and/or Laboratory-evolved Isolates |
238_104_164 | 96 | 48,48,38 | TEM-008 |
173_164_104 | 92 | 48,48,5 |
|
182_104_164 | 66 | 27,48,48 | TEM-043 |
240_164_104 | 62 | 31,48,48 | TEM-046 |
268_240_164 | 41 | 2,31,48 | TEM-136 |
120_238_104 | 39 | 3,38,48 | |
39_240_164 | 32 | 1,31,48 |
|
237_164_104 | 28 | 9,48,48 | TEM-130 |
104_238_153 | 23 | 48,38,9 | TEM-021 |
240_164_173 | 22 | 31,48,5 | TEM-132 |
104_164_40 | 18 | 48,48,1 | |
238_104_51 | 16 | 38,48,2 | |
215_104_164 | 15 | 48,38,20 | TEML-136 |
104_238_265 | 15 | 2,48,48 | |
39_240_238 | 12 | 1,31,38 | |
182_104_51 | 11 | 27,48,2 | |
173_164_51 | 9 | 5,48,2 | |
215_104_238 | 8 | 2,48,38 | |
182_238_120 | 7 | 27,38,3 |
|
240_164_51 | 6 | 31,48,2 | |
224_164_173 | 6 | 3,48,5 |
|
173_164_237 | 6 | 5,48,9 | |
224_164_240 | 5 | 3,48,31 | |
173_164_40 | 4 | 27,38,20 | |
182_104_215 | 4 | 27,48,2 | |
182_238_153 | 4 | 5,48,1 |
|
240_238_153 | 4 | 31,38,9 | |
182_238_265 | 4 | 27,38,9 |
|
51_164_40 | 3 | 20,38,31 | |
40_164_240 | 3 | 2,31,9 | |
224_164_251 | 3 | 3,48,2 | |
51_164_237 | 3 | 1,48,31 | |
268_240_237 | 3 | 2,48,1 | TEM-136 |
265_238_240 | 3 | 2,48,9 |
|
39_240_237 | 2 | 3,38,9 |
|
39_240_268 | 2 | 2,38,3 | |
120_238_153 | 2 | 31,38,3 |
|
240_238_120 | 2 | 3,48,9 | |
120_238_265 | 2 | 1,31,2 | |
268_238_120 | 2 | 2,38,9 | |
268_238_153 | 2 | 3,38,20 | |
224_164_237 | 2 | 1,31,9 |
|
224_164_40 | 1 | 2,38,20 | |
237_164_40 | 1 | 9,48,1 | |
51_104_215 | 1 | 48,48,3 | |
104_164_224 | 1 | 20,38,9 |
|
265_238_153 | 1 | 3,48,1 | |
268_238_265 | 1 | 2,48,2 |
Triple mutant trajectories are shown as an ordered list of three residue positions, where an ordered pair represents a link in the network. The shortest path betweenness centrality is listed for each triple mutant trajectory, in descending order. We interpret the betweenness centrality of a trajectory as a representation of information flow through this path for the entire community network: Trajectories with high betweenness centrality have the highest information flow (
We investigated the significance of betweenness centrality as an indicator of potential adaptive evolution. Below we show that: 1) the triple mutant trajectories listed in
We next addressed the functional significance of links present in our network. To that end, we compared the level of resistance of pairs of mutations present in nonzero betweenness centrality trajectories to their constituent mutations (
Triplet | Between ness Centrality | Reported? | Resistance outcome [cm] | Doublet 1 | Resistance Outcome [cm] | Triplet Improvement over Doublet 1 [cm] | Doublet 2 | Resistance Outcome [cm] | Triplet Improvement over Doublet 2 [cm] |
104_164_173 | 92 | Y | 16.49 | 104_164 | 8.42 | 8.07 | 164_173 | 6.95 | 9.54 |
182_104_164 | 66 | Y | 16.86 | 182_104 | 2.82 | 14.04 | 104_164 | 8.42 | 8.44 |
39_240_164 |
32 | Y | 9.10 | 39_240 | 2.16 | 6.94 | 240_164 | 9.48 | -0.38 |
104_238_153 | 23 | Y | 17.65 | 104_238 | 16.83 | 0.82 |
238_153 | 11.5 | 6.15 |
240_164_173 | 22 | Y | 17.48 | 240_164 | 9.48 | 8.00 | 164_173 | 6.95 | 10.53 |
104_164_40 | 18 | N | 5.06 | 104_164 | 8.42 | -3.36 | 164_40 | 2.13 | 2.93 |
238_104_51 | 16 | N | 1.88 | 238_104 | 16.83 | -14.95 | 104_51 | 1.65 | 0.23 |
104_238_265 | 15 | Y | 19.40 | 104_238 | 16.83 | 2.57 | 238_265 | 10.84 | 8.56 |
39_240_238 | 12 | N | 9.59 | 39_240 | 2.16 | 7.43 | 240_238 | 12.04 | -2.45 |
182_104_51 | 11 | N | 2.54 | 182_104 | 2.82 | -0.28 |
104_51 | 1.65 | 0.89 |
173_164_51 | 9 | N | 1.79 | 173_164 | 6.95 | -5.16 | 164_51 | 1.9 | -0.11 |
215_104_238 | 8 | N | 11.26 | 215_104 | 2.39 | 8.87 | 104_238 | 16.83 | -5.57 |
182_238_153 |
4 | Y | 17.95 | 182_238 | 16.17 | 1.78 | 238_153 | 11.5 | 6.45 |
120_238_153 | 2 | Y | 14.36 | 120_238 | 7.22 | 7.14 | 238_153 | 11.5 | 2.86 |
104_164_224 |
1 | Y | 9.31 | 104_164 | 8.42 | 0.89 |
164_224 | 3.9 | 5.41 |
Each mutant trajectory is shown as an ordered list of three mutated residue positions (column 1). Each ordered pair of mutated residue positions represents a link in the extended-spectrum community network. The shortest path betweenness centrality is listed for each trajectory (column 2). This metric is unitless and is a measurement of the path's importance in the network. 9 of the 15 tested trajectories were reported in clinical or directed evolution isolates (column 3). The level of cefotaxime resistance (an indicator of extended-spectrum antibiotic resistance) is shown in centimeters of linear growth on a 0.04 µg/ml cefotaxime gradient. The level of resistance is shown for each triple mutant trajectory (columns 1 and 4) and its two ordered constituent double mutants (columns 5 and 6, and 8 and 9). The differences representing the improvement in resistance conferred by the triple mutant trajectory with respect to each double mutant, is shown in columns 7 and 10. Trajectories marked with * had not been reported when this work was done and were not included in input to the network. They were subsequently reported in a recent publication
M1 | M1 Growth [cm] | M2 | M2 Growth [cm] | M1_M2 Growth [cm] | M1_M2 - (M1+M2) [cm] | Significant Epistastic Effect |
Q39R | 2.09 | G238S | 9.61 | 7.76 | -2.35 | |
Q39R | 2.09 | E240K | 1.58 | 2.16 | 0.08 | |
L40W | 2.08 | R164H | 3.43 | 2.13 | -1.79 | negative |
L51P | 1.93 | E104K | 2.20 | 1.65 | -0.89 | negative |
L51P | 1.93 | R164H | 3.43 | 1.90 | -1.87 | negative |
E104K | 2.20 | H153R | 2.17 | 2.73 | -0.05 | |
|
|
|
|
|
|
|
E104K | 2.20 | I173V | 2.10 | 10.84 | 8.13 | positive |
|
|
|
|
|
|
|
E104K | 2.20 | K215E | 1.90 | 2.39 | -0.12 | |
E104K | 2.20 | A224V | 1.86 | 1.90 | -0.57 | |
|
|
|
|
|
|
|
R120S | 1.94 | G238S | 9.61 | 7.22 | -2.74 | negative |
H153R | 2.17 | G238S | 9.61 | 11.50 | 1.31 | |
|
|
|
|
|
|
|
R164H | 3.43 | A224V | 1.86 | 3.90 | 0.20 | |
|
|
|
|
|
|
|
I173V | 2.10 | E240K | 1.58 | 3.62 | 1.53 | positive |
|
|
|
|
|
|
|
K215E | 1.90 | G238S | 9.61 | 6.34 | -3.58 | negative |
G238S | 9.61 | E240K | 1.58 | 12.04 | 2.44 | |
G238S | 9.61 | T265M | N/A | 10.84 | N/A |
Mutated residues (columns 1 and 3) and their individual cefotaxime resistance levels (columns 2 and 4) are compared to resistance levels when they occur together in the same sequence (column 5). The level of cefotaxime resistance (an indicator of extended-spectrum antibiotic resistance) is shown in centimeters of linear growth on a 0.04 µg/ml cefotaxime gradient. The difference between the combined effect (column 5) and the sum of the individual effects (column 2 + column 4), which represents epistasis, is shown in column 6.
*Significant epistatic effect = differences that exceed the margin of standard error (for the number of replicates (n), refer to
M1 | M1 Growth [cm] | M2 | M2 Growth [cm] | M1_M2 Growth [cm] | M1_M2 - (M1+M2) [cm] | Significant Epistastic Effect |
Q39R | 2.09 | E240K R164H | 9.48 | 9.10 | -0.88 | |
Q39R | 2.09 | E240K G238S | 12.04 | 9.59 | -2.95 | negative |
L40W | 2.08 | E104K R164H | 8.42 | 5.06 | -3.85 | negative |
L51P | 1.93 | M182T E104K | 2.82 | 2.54 | -0.62 | |
L51P | 1.93 | I173V R164H | 6.95 | 1.79 | -5.50 | negative |
L51P | 1.93 | G238S E104K | 16.83 | 1.88 | -15.29 | negative |
E104K | 2.20 | R164H L40W | 2.13 | 5.06 | 2.32 | positive |
E104K | 2.20 | R164H A224V | 3.90 | 9.31 | 4.80 | positive |
E104K | 2.20 | K215E G238S | 6.34 | 11.26 | 4.31 | positive |
E104K | 2.20 | I173V R164H | 6.95 | 16.49 | 8.93 | positive |
E104K | 2.20 | G238S T265M | 10.84 | 19.40 | 7.95 | positive |
E104K | 2.20 | G238S H153R | 11.50 | 17.65 | 5.54 | positive |
R120S | 1.94 | G238S H153R | 11.50 | 14.36 | 2.51 | |
R120S | 1.94 | E240K G238S | 12.04 | 12.92 | 0.53 | |
H153R | 2.17 | R120S G238S | 7.22 | 14.36 | 6.56 | positive |
H153R | 2.17 | E104K R164H | 8.42 | 11.50 | 2.50 | positive |
H153R | 2.17 | E104K I173V | 10.84 | 2.65 | -8.77 | negative |
H153R | 2.17 | M182T G238S | 16.17 | 17.95 | 1.20 | |
H153R | 2.17 | E104K G238S | 16.83 | 17.65 | 0.24 | |
R164H | 3.43 | Q39R E240K | 2.16 | 9.10 | 5.10 | positive |
R164H | 3.43 | E104K H153R | 2.73 | 11.50 | 6.93 | positive |
R164H | 3.43 | M182T E104K | 2.82 | 16.86 | 12.20 | positive |
R164H | 3.43 | E104K I173V | 10.84 | 16.49 | 3.81 | positive |
R164H | 3.43 | H153R G238S | 11.50 | 6.00 | -7.34 | negative |
I173V | 2.10 | R164H L51P | 1.90 | 1.79 | -0.62 | |
I173V | 2.10 | E104K H153R | 2.73 | 2.65 | -0.59 | |
I173V | 2.10 | E104K R164H | 8.42 | 16.49 | 7.56 | positive |
I173V | 2.10 | E240K R164H | 9.48 | 17.48 | 7.49 | positive |
M182T | 2.15 | E104K L51P | 1.65 | 2.54 | 0.33 | |
M182T | 2.15 | E104K R164H | 8.42 | 16.86 | 7.88 | positive |
M182T | 2.15 | G238S H153R | 11.50 | 17.95 | 5.89 | positive |
K215R | 1.90 | E104K G238S | 16.83 | 11.26 | -5.88 | negative |
A224V | 1.86 | E104K R164H | 8.42 | 9.31 | 0.62 | |
G238S | 9.61 | E104K L51P | 1.65 | 1.88 | -7.79 | negative |
G238S | 9.61 | Q39R E240K | 2.16 | 9.59 | -0.59 | |
G238S | 9.61 | K215R E104K | 2.39 | 11.26 | 0.85 | |
E240K | 1.58 | Q39R R164H | 3.70 | 9.10 | 5.41 | positive |
E240K | 1.58 | R164H I173V | 6.95 | 17.48 | 10.54 | positive |
E240K | 1.58 | R120S G238S | 7.22 | 12.92 | 5.71 | positive |
T265M | N/A | E104K G238S | 16.83 | 19.40 | N/A |
Mutated residues (columns 1) and residue pairs (column 3) and their corresponding cefotaxime resistance levels (columns 2 and 4, respectively) are compared to resistance levels when they occur together in the same sequence (column 5). The level of cefotaxime resistance (an indicator of extended-spectrum antibiotic resistance) is shown in centimeters of linear growth on a 0.04 µg/ml cefotaxime gradient. The difference between the combined effect (column 5) and the sum of the individual effects (column 2 + column 4), which represents epistasis, is shown in column 6.
*Significant epistatic effect = differences that exceed the margin of standard error (for the number of replicates (n), refer to
The observed disconnect between co-occurrence and the cefotaxime resistance phenotype of pairs of mutations included in our network suggests that the adaptive value of a given mutation or mutation pair is highly dependent on sequence context. Thus, an accurate assessment of the contribution of a given mutation to adaptation involves testing the effect of the mutation in the presence of different additional mutations,
Mutant position | Count within database | Mutation tested | Number of different sequence contexts tested | Average effect [cm] | Interval (min, max) [cm] |
164 | 48 | R164H | 13 | 4.18 | (-5.5, 14.04) |
104 | 48 | E104K | 15 | 4.04 | (-0.28, 9.54) |
238 | 38 | G238S | 11 | 8.03 | (0.23,14.63) |
240 | 31 | E240K | 8 | 3.96 | (0.07,10.53) |
182 | 27 | M182T | 6 | 3.92 | (0.62, 8.44) |
265 | 20 | T265M | 2 | 1.90 | (1.23, 2.57) |
153 | 9 | H153R | 8 | 0.95 | (-8.19, 7.14) |
173 | 5 | I173V | 8 | 3.82 | (-0.11, 8.64) |
237 | 9 | N/A | N/A | N/A | N/A |
224 | 3 | A224V | 4 | 0.33 | (-0.3, 0.89) |
120 | 3 | R120S | 4 | 0.43 | (-2.39, 2.86) |
215 | 2 | K215E |
4 | -2.89 | (-5.57, 0.19) |
51 | 2 | L51P | 6 | -3.69 | (-14.95, 0.34) |
268 | 2 | N/A | N/A | N/A | N/A |
40 | 1 | L40W |
3 | -1.39 | (-3.36,0.49) |
39 | 1 | Q39R | 6 | -0.56 | (-2.45, 0.58) |
Critical triple mutant trajectories (
K215E has equal frequency to K215R and K215Q in the extended-spectrum phenotype sequence database;
L40W and L40V have equal frequencies (column 3). We tested the level of cefotaxime resistance of each mutation (centimeters of linear growth on a 0.04 µg/ml cefotaxime gradient) in a variety of sequence contexts. Each context consists of the relevant mutation plus different additional mutations, all of which are found in the critical triple mutant evolutionary trajectories. The number of sequence contexts tested is shown in column 4 and the different mutant combinations comprising each sequence context are shown in
Note that, in agreement with the epistatic analysis presented in
Cultures of cells expressing the β-lactamase mutants listed at the top of the gradients were stamped on LB plates containing a cefotaxime gradient. The direction of the gradient is from top (minimal concentration) to bottom (maximal concentration). The maximal concentration of the gradient is listed at the bottom. Note that in part B more than one concentration is shown to cover the wide range of resistance phenotypes of the panel of mutants being tested. (A) Two mutant triplets predicted to be of special significance by our analysis but that were not present in the sequence database used to build the network but were subsequently reported in
Here we assembled a large database (n = 361) of mutants of the enzyme TEM-1 β-lactamase to study the genetic basis for adaptive evolution.
In the construction of this database we made the following two assumptions:
1) That most mutated positions would have undergone a degree of positive selection, which was supported by a PAML (codeml) analysis of clinical mutants (
2) That experimental evolution is comparable to clinical evolution, given that both scenarios share a selective pressure (antibiotic resistance selection) and yield similar mutations
We then used co-occurrence,
We distinguished two levels of modular organization within our network of genetically defined interactions in TEM β-lactamase:
Large “communities”: These correspond to three distinct phenotypic categories: broad-spectrum, extended-spectrum and inhibitor resistance (
“Subcommunities” within these communities: We hypothesize that subcommunities are likely to represent parallel strategies of adaptation within the community's phenotype class. This appears to be the case in the two adaptive phenotypic classes included in this study:
Inhibitor-resistance community: this network contains two subcommunities, corresponding to two distinct mechanisms disrupting inhibitor binding at the active site
Extended-spectrum resistance community: this network contains two large subcommunities (
In sum, we find that both distinctive selective pressures and peaks within the enzyme's fitness landscape leave recognizable footprints on the network's connectivity. Furthermore, the amino acid positions within network modules are not necessarily physically close in the protein's tertiary structure, as interactions are defined genetically (functionally) rather than physically. To illustrate this point,
Link weights in our network are proportional to the number of sequence co-occurrence events for the corresponding mutated residue pairs. We implicitly incorporated epistatic information into this metric by using a normalization factor: We compared the number of mutated position pair occurrences with the mutation count at each of the corresponding individual residues (
By connecting individual nodes (representing mutated residue positions), paths through our network define potential evolutionary trajectories. Network metrics allowed us to extend the trajectories beyond the pairs of co-occurring nodes used to build the network. We focused on combinations of three mutations, which are the most experimentally tractable ones. Our basic hypothesis was that genetic adaptation necessitates a specific combination of functional milestones, where each amino acid mutation represents a potential milestone. According to this hypothesis, combinations of mutations that facilitate information flow through the network should contribute prominently to genetic adaptation. We used shortest path betweenness centrality (a metric that can be interpreted as measuring a path's importance for information flow within the network) to identify trajectories of potential special significance for extended-spectrum β-lactamase resistance (
They occur frequently in natural or experimental extended-spectrum β-lactamase evolution experiments (
The higher the betweenness centrality, the more likely they are to have been previously seen (
Presence of these mutations in reported (previously seen) sequences is associated with increased cefotaxime resistance, an indicator of extended-spectrum activity (
All predicted triple mutant combinations that were experimentally tested and that significantly improved resistance over constituent mutant pairs (a total of 8) have been previously described. Of these, only two (M182T G238S H153R and E104K R164H A224V) were absent from our original database and have been reported only recently
By construction, the network only contains information about mutation pair occurrence counts (regardless of whether the pairs are components of more complex mutant sequences). Therefore, all mutation triplets with increased resistance constitute predictive successes, regardless of whether or not sequences containing these mutations were part of the original database. We used the strong association between previous observation of a TEM mutant and its increased resistance to estimate our success rate at 23 out of 48. As a control, we ran a computational simulation to find the success rate we would have obtained by random sampling from positions involved in extended-spectrum resistance weighted by residue mutation frequency in our database. The average result of 10,000 random samplings was 12.8±3.08 out of 48, proving that our method is able to extrapolate triple mutant trajectories from pairs of coevolving mutations more accurately than simply combining mutations of high frequency.
At this time, the predictive value of our method can only be rigorously supported with respect to known TEM mutant combinations.
Our method for identification of paths of special significance for adaptation has limitations, because it assumes that each mutant position has a discrete effect on adaptation and that this effect is sufficiently unique that adaptation requires a composite solution. Therefore, global suppressors (such as mutations at position 182) or mutations with a large impact on their own (S130G, associated with inhibitor resistance, and G238S conferring extended-spectrum resistance) will not be adequately accounted for by our “information flow” metric.
Another example of this method's limitations is illustrated by the absence of the high fitness extended-spectrum triple mutant 104-238-182 in our list of nonzero betweenness centrality triplets (
Next, we investigated whether links connecting co-occurring pairs in our network represent positive functional interactions. We tested the individual vs. combined effects of the mutations in the mutant triplets from
Network is represented as in
The 48 triple mutant paths we identified as of special significance (
We evaluated the relevance of 14 out of the 16 positions identified by our betweenness centrality analysis by experimentally determining their cefotaxime resistance phenotype. In order to factor in the prevalent role of epistasis in extended-spectrum TEM evolution, we determined the impact of a given mutation on cefotaxime resistance as the average phenotype in a variety of sequence contexts (in the presence of a variety of additional mutations). The results are summarized in
In sum, positions present in triple mutant paths with nonzero betweenness centrality identified all but two of the positions with known phenotypic effect on extended-spectrum resistance. We experimentally demonstrated the impact of the additional mutations identified by our analysis, either directly by showing increased cefotaxime resistance (120, 153, 265) or indirectly, by showing large negative effects on resistance (215 and 40). These results suggest that our method is able to accurately identify positions that play an important role in genetic adaptation. It is able to do so because it evaluates mutations in the context of their genetically defined functional interactions.
Many current state-of-the-art bioinformatics methods for predicting mutation effects consider only evolutionary history and/or biophysical properties of single residue positions
In contrast to these methods, our approach uses network analysis to infer higher-order evolutionary interactions between groups of coevolving residues that may not be co-localized in a protein structure. Our focus is not on finding functionally important residues. Rather, we identify communities of residue positions associated with different antibiotic resistance phenotypes and subcommunities representing distinct strategies to acquire a given resistance phenotype. We are also able to extrapolate adaptive evolutionary trajectories – combinations of triple mutants that increase cefotaxime resistance – based only on the initial knowledge of the co-occurrence of mutated residues in resistant mutant sequences. Our method can be applied to protein subfamilies in which there is low sequence diversity, and it does not require a reliable phylogeny or tertiary protein structure on which to base inference.
While we use TEM β-lactamase as a model system in this paper, we believe that our network analysis is generalizable to other genes evolving under defined selective pressures. This model presents a desirable alternative to phylogeny in many situations,
We compiled a set of 363 TEM mutant protein sequences from existing databases and literature: the Lahey Clinic β-lactamase database
Using TEM-1 as the reference sequence
We constructed an undirected, weighted network in which two nodes (two mutated amino acid residue positions) are linked if mutations at both residues exist in at least one TEM sequence in the alignment. The weight
We included a correction term to ensure that mutated pairs, which occur in a single sequence together and never by themselves, are not overweighted. Without this term, these pairs would always have (the maximum) link weight 1.0.
We were able to associate 380 out of 405 TEM naturally occurring or TEM laboratory-evolved mutant sequences in our database with a single major β-lactamase phenotype class (113 broad-spectrum 2b sequences, 201 extended-spectrum 2be sequences, 49 inhibitor-resistant, 2br, sequences). There were also 17 sequences with a combined extended-spectrum antibiotics and inhibitor resistant phenotype class, 2ber, that were not used in our network. The phenotype class of naturally occurring TEMs is determined experimentally, and TEM sequence-to-phenotype-class associations can be found in the Lahey Clinic β-lactamase online database
To explore the subcommunity structure of the 2be phenotype class, we constructed an undirected, weighted coevolution network (as above), using only 201 (naturally occurring and laboratory-evolved) extended-spectrum sequences.
We observed a few differences between the wiring of the extended-spectrum phenotype network and its corresponding community in the TEM coevolution network. In the first case, all residue positions that can be associated with extended-spectrum resistance were included in the sequences used to build the network. In the second case, some mutated residue positions (as opposed to mutant sequences) can be associated with more than one phenotype class (pleiotropy). However, by construction, the community-finding algorithm associates the corresponding nodes with only one community. These differences were minor and did not have an impact on the conclusions of our analysis.
To identify highly connected subnetworks (communities) of mutated residue positions, we used the Community-Structure-Partition algorithm
We used three standard graph-theoretical node centrality metrics to identify important residue positions in our undirected, weighted network: degree centrality, closeness centrality and betweenness centrality. To calculate the closeness and betweenness centrality metrics, we transformed link weights into link costs by taking the inverse of each pair association weight. A detailed description of all node centrality metrics used in our study can be found in
A path is a set of adjacent links in a network, which connects a pair of nodes
We adapted the equation for weighted node betweenness centrality (Equation S3) to multiple-node path betweenness centrality
We performed a PAML (codeml) analysis
48 out of a possible 214 three-node shortest paths in the 2be community network had nonzero betweenness centrality, and we focused our experiments on the corresponding 48 triple mutants. Because all triplets represent a mutational trajectory and are therefore ordered, we compared the activity of each triplet to each possible trajectory (
Our target TEM-1 β-lactamase sequence is in pGPS-ori, a pGPS3-derivative with a β-lactamase gene moved close to the origin of replication and with kanamycin as selectable marker
We found that JS200 (a B strain of E. coli) was more sensitive to cefotaxime than BL21, which is a standard K strain (not shown). Therefore we used JS200 (SC-18 recA718 polA12ts uvrA155 trpE65 lon-11 sulA1) cells complemented with pHSG-Pol I plasmid as hosts
Our gels produced reproducible measurements, with an average standard error of 19% for all the 58 clones tested at the optimized concentration of cefotaxime (
(TIF)
(TIF)
(TIF)
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(DOC)
(FA)
(FA)
(FA)
(XLSX)
(FA)
(FA)
(FA)
(FA)
We thank Dr. Merijn Salverda for sharing his unpublished manuscript and for input on our manuscript.
All data is provided and all algorithms and metrics (computed with custom Python code) are available from the authors on request.