The authors have declared that no competing interests exist.
Current address: Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
Conceived and designed the experiments: DAK IG EBT BRD DCR JSR. Performed the experiments: DAK IG EBT. Contributed reagents/materials/analysis tools: DAK JSR. Wrote the paper: DAK IG BRD DCR JSR.
Amino acid substitutions in protein structures often require subtle backbone adjustments that are difficult to model in atomic detail. An improved ability to predict realistic backbone changes in response to engineered mutations would be of great utility for the blossoming field of rational protein design. One model that has recently grown in acceptance is the backrub motion, a low-energy dipeptide rotation with single-peptide counter-rotations, that is coupled to dynamic two-state sidechain rotamer jumps, as evidenced by alternate conformations in very high-resolution crystal structures. It has been speculated that backrubs may facilitate sequence changes equally well as rotamer changes. However, backrub-induced shifts and experimental uncertainty are of similar magnitude for backbone atoms in even high-resolution structures, so comparison of wildtype-vs.-mutant crystal structure pairs is not sufficient to directly link backrubs to mutations. In this study, we use two alternative approaches that bypass this limitation. First, we use a quality-filtered structure database to aggregate many examples for precisely defined motifs with single amino acid differences, and find that the effectively amplified backbone differences closely resemble backrubs. Second, we directly apply a provably-accurate, backrub-enabled protein design algorithm to idealized versions of these motifs, and discover that the lowest-energy computed models match the average-coordinate experimental structures. These results support the hypothesis that backrubs participate in natural protein evolution and validate their continued use for design of synthetic proteins.
Protein design has the potential to generate useful molecules for medicine and chemistry, including sensors, drugs, and catalysts for arbitrary reactions. When protein design is carried out starting from an experimentally determined structure, as is often the case, one important aspect to consider is backbone flexibility, because in response to a mutation the backbone often must shift slightly to reconcile the new sidechain with its environment. In principle, one may model the backbone in many ways, but not all are physically realistic or experimentally validated. Here we study the "backrub" motion, which has been previously documented in atomic detail, but only for sidechain movements within single structures. By a twopronged approach involving both structural bioinformatics and computation with a principled design algorithm, we demonstrate that backrubs are sufficient to explain the backbone differences between mutation-related sets of very precisely defined motifs from the protein structure database. Our findings illustrate that backrubs are useful for describing evolutionary sequence change and, by extension, suggest that they are also appropriate for rational protein design calculations.
Proteins routinely incorporate amino acid changes over evolutionary time by adapting their conformation to the new sidechain. However, it remains a difficult task to predict such a conformational response, especially when subtle backbone adjustments are involved. This issue is of central importance to the burgeoning field of computational protein design, which has recently enjoyed a string of exciting developments
A number of descriptions of backbone motion have been implemented for the purposes of protein design in the past, each with its own set of advantages and disadvantages. Anticorrelated “crankshaft” adjustments of the ψ(i−1) and φ(i) torsions
One such model is the backrub (
(
Several studies have successfully used the backrub approach to expand the search space of protein design efforts and improve agreement between computed sidechain dynamics and nuclear magnetic resonance (NMR) measurements
The N-cap or C-cap position of a helix is defined as the residue half-in and half-out of the helix: the peptide on one side of the cap makes standard helical backbone interactions, while the peptide on the other side has quite non-helical position and interactions
Notably, Asn/Asp sidechains are longer than Ser/Thr sidechains by one covalent bond, yet their H-bond distances (N-cap sidechain O to
With this motivation, we wished to confirm the appropriateness of the backrub model for this case of mutational rather than rotamer change. However, backbone coordinate shifts due to backrubs are very small – on the order of the coordinate differences between crystal structures of the same protein
First, we needed to determine which subset of helix N-cap conformations would be likely to undergo backrubs and thus merited further examination. To do so, we compared amino acid preferences for α-helix N-caps and 310-helix N-caps relative to general-case protein structure, using a non-redundant, quality-filtered set of structures, the Top5200 database (see Methods). Asn/Asp/Ser/Thr were indeed found to be strongly preferred (by factors of 2.5–3) at α-helix N-caps relative to protein structure in general (
(
With this specific motif in mind, we sought to obtain a preponderance of evidence for a backrub relationship in the form of numerous examples. Therefore, we performed a stringent structural motif search of the Top5200, resulting in identification of 429 Asn/Asp N-caps and a matching sample choice of 500 Ser/Thr N-caps (out of 3208; see Methods).
The backbone conformations differ consistently: the longer Asn/Asp sidechains rotate the first turn's backbone away from residue
Crystal structure ensembles for Asn/Asp (light green) vs. Ser/Thr (light blue) at the N-cap position are related by a backrub. Lowest-energy BRDEE conformations for the N-terminus of an ideal α-helix (see Methods) with Asn (dark green) vs. Ser (dark blue) have a closely similar relationship. Cα and Cβ displacements between Asn/Asp and Ser/Thr for both average crystal structures (lighter, in parentheses) and low-energy BRDEE conformations (darker) evoke a hinge-like backrub operation. Ensemble
Structures | Average Crystal | BRDEE Ideal N-cap |
ΔCα |
0.03 | — |
ΔCα |
0.34 | 0.36 |
ΔCα |
0.03 | — |
ΔCα |
0.03 | — |
ΔCα |
0.04 | — |
ΔCβ |
0.72 | 0.65 |
Backrub (°) | −11 | −12 |
Δτ |
−0.4 | −4.9 |
Δτ |
+2.8 | +6.9 |
S/T HB (Å) | 2.18±0.15 | 2.35 |
N/D HB (Å) | 1.92±0.12 | 2.00 |
Distances are after superposition into the same reference frame using 4 Cαs (N-cap
Distances for BRDEE for atoms at or beyond Cα
The signs of the backrub rotation angles and Δτ values are in terms of Ser/Thr→Asn/Asp.
For average crystal structures, average sidechains (based on average Cβ positions and χ dihedral angles) were added in KiNG. The τ value used for each Δτ is an average across the crystal structure ensemble; this was preferable to measuring τ values directly from the average structures because the average coordinates before the
For input to BRDEE, ideal sidechains were added in KiNG to ideal helices. The τ value used for each Δτ is taken directly from the lowest-energy computed structure.
S/T HB and N/D HB are Ser/Thr and Asn/Asp H-bond lengths from the
The median Asn N-cap sidechain built onto the average Ser N-cap backbone results in a steric clash in which the van der Waals radii of the N-cap sidechain Oδ1 and the
We also examined two control cases with similar backbone geometry but different sidechain-mainchain interactions. First, we identified 538 α-helix N-caps with any amino acid type except Asn/Asp/Ser/Thr, in which case the
We next wished to test whether a simple energy function based on molecular-mechanics terms from Amber
As input to the algorithm, we prepared two versions of an idealized helix N-cap motif (see Methods), one with a short sidechain (Ser) and another with a long sidechain (Asn). We then used BRDEE to compute the lowest-energy model for each template, allowing backrubs and rotamer changes at the N-cap as well as small
The lowest-energy Ser N-cap shifted “forward” whereas the lowest-energy Asn N-cap shifted “backward” in order to establish comparable hydrogen bonds (
The changes in flanking N-Cα-C angles (τ) for the BRDEE lowest-energy conformations recapitulate the changes for the crystal structures in terms of directionality: Δτ<0 for
Additional comparison and contrast of the computed models and experimental structures can be found in
The BRDEE results recapitulate the average crystal structures, confirming the hypothesis that Ser/Thr→Asn/Asp mutations at N-caps are well modeled by a backrub relationship. More generally, this implies that the backrub may reasonably accompany mutations during natural evolution or
Aromatic residues often pair with glycine in antiparallel β-sheet by adopting rotamers with χ1≈+60°, which places the aromatic ring directly over a Gly on the adjacent strand across a narrow pair of backbone H-bonds
A stringent structural motif search, similar to that described for N-caps above, identified 321 Phe/Tyr residues with “plus” χ1 rotamers in antiparallel β-sheet (see Methods). Aromatics are about three-fold as common in antiparallel vs. parallel β-sheet, and are about twice as likely to adopt a plus χ1 rotamer when they do occur in antiparallel vs. parallel β-sheet (data from Top5200), so we focused on antiparallel β-sheet in this study.
In 72 examples the amino acid on the opposite strand is a Gly, in which case the aromatic sidechain moves downward to contact the Gly Cα H. In the other 249 examples the Cβ H atoms of the amino acid on the opposite strand push the aromatic ring upward (
Crystal structure ensembles for Phe/Tyr across from Gly (light blue) vs. anything else (light green) are related by a backrub. Lowest-energy BRDEE conformations for 1z84 Phe171 across from Gln188 (visually truncated at Cβ for clarity) (dark green) vs. Gln188→Gly (dark blue) have a similar relationship. Aromatic Cα and Cβ displacements for both average crystal structures (lighter, in parentheses) and low-energy BRDEE conformations (darker) evoke a hinge-like backrub operation. Ensemble mainchain-mainchain H-bonds are illustrated with “pillows” of green all-atom contact dots
Structures | Average Crystal | BRDEE 1gyh A | BRDEE 1khb A | BRDEE 1z84 A |
Aromatic | F | Y109 | F144 | F171 |
Opposite | G→A | G122→[A] | G157→[A] | Q188→[G] |
ΔCα |
0.01 | 0.02 | 0.01 | 0.01 |
ΔCα |
0.28 | 0.25 | 0.24 | 0.20 |
ΔCα |
0.09 | 0.03 | 0.01 | 0.02 |
ΔCβ |
0.64 | 0.50 | 0.51 | 0.47 |
ΔCζ |
1.34 | 0.96 | 1.05 | 1.01 |
Backrub (°) | −11 | −10 | −11 | −11 |
Δτ |
−0.2 | −0.5 | −0.4 | +0.3 |
Δτ |
+1.0 | +2.8 | +1.5 | +1.2 |
Distances are after superposition into the same reference frame using 5 Cαs (aromatic
Distances for BRDEE for atoms at or beyond Cα
The signs of the backrub rotation angles and Δτ values are in terms of across-from-Gly→across-from-other.
For average crystal structures, average sidechains (based on average Cβ positions and χ dihedral angles) were added in KiNG. The τ value used for each Δτ is an average across the crystal structure ensemble, to be consistent with the methodology for N-caps.
For input to BRDEE, each example was used twice: first with its original deposited sidechain on the opposite strand, and then with a fully ideal sidechain of the opposite type (Gly if originally not Gly, Ala if originally Gly) added in KiNG (residue names in [brackets]). The τ value used for each Δτ is taken directly from the lowest-energy computed structure.
Fewer examples, and more variation for β-sheet than for α-helix conformation, prevented the ideal-start calculation used for the N-cap case. Instead, low-energy conformations were computed by BRDEE for several examples judged to be appropriately representative of their respective type (across from Gly or across from other) (see Methods). In all cases, the lowest-energy conformation appears to match the average crystal structure very well, whether across from Gly or from some other amino acid with a Cβ atom (
These results confirm both that BRDEE in conjunction with a simple force field reproduces natural conformations well and also that backrubs model the relationship well.
It is known that backrubs relate conformations that interchange dynamically
The two specific motifs analyzed here represent only about 0.5% of the protein residues in our Top5200 data set, and thus in one sense the scope of this study is relatively narrow. However, a tight focus was necessary to substantiate the idea of mutation-coupled backrubs with sufficient certainty, due to the coordinate error problem in the alternative approach of comparing individual wildtype and mutant crystal structures directly. These two cases were chosen as common, well-defined motifs where the primary interaction environment of the changing sidechain is provided by local secondary structure and is therefore consistent across hundreds of examples. For the general case of an individual mutation, the potential interaction environment is also the same before and after; however, it is seldom simple enough to be closely repeated in numerous proteins. Furthermore, as shown in previous work at ultra-high resolution
The backbone shift considered on its own is continuous and low-energy, without a barrier, while the two-state behavior is contributed by the sidechain switch between rotamers, between H-bond partners, or between amino acids. In the dynamics case of jumps between distinct sidechain rotamers or H-bonds
Mutation-coupled backrubs are small local changes, which presumably mediate neutral drift much more often than they aid large-scale structural rearrangements or changes in function. However, the accumulation of changes via neutral drift over time may in fact enable future large-scale changes by subtly altering the native state energy landscape such that eventually a tipping point is reached. Recent analysis of the evolution of an ancient protein confirms that some function-altering mutations required structural pre-stabilization by earlier “permissive” mutations
Note that we do not directly address true evolutionary relationships between proteins in this study. Rather, we substantiate the idea that backrubs enable single amino acid changes at specific motifs, which could aid actual evolution within a protein family
It is only natural to segue from the role of backrubs in protein evolution to their utility for protein design – essentially a computational analog of molecular evolution. Our results indicate that, despite the relative simplicity of their functional form, molecular-mechanics-based force fields like Amber plus EEF1 that are commonly used for protein design can in fact accurately recapitulate empirically-observed backbone conformation for multiple specific structural motifs, given the chance to access them via a backrub. (Note that the cases presented here were dominated by single interactions such as H-bonds or steric packing; a higher-cost energy function might be needed to maintain similar accuracy if different interactions are competing and need to be compared quantitatively.)
Thus, predicting the conformational consequences of a sequence change in computational protein design is in large part a search problem: if the appropriate regions of protein conformational space are searched efficiently, in many cases low-cost energy functions can do the rest. Unfortunately, that space is vast indeed even for a single sequence, as we know from Levinthal's famous thought experiment
However, flexible-backbone design algorithms like BRDEE are excellent candidates for this task in many cases because (1) they are based on empirically demonstrated types of flexibility and (2) they come with mathematical guarantees of their accuracy with respect to the input parameters. Other algorithms that search over amino acid and rotamer identities, then minimize over backrub degrees of freedom
Overall, we have demonstrated that the backrub, a model of local backbone motion previously only documented for dynamic rotamer changes, also applies to local sequence changes. This finding is an important direct validation for the application of the backrub to the study of natural protein evolution and to continuing efforts in computational protein design.
To identify numerous examples of the desired motifs, we used a “Top5200” database of high-quality protein structures. The rapid growth of the Protein Data Bank (PDB)
We included at most one protein chain per PDB 70% sequence-similarity cluster as of April 5, 2007. We chose the representative for each such cluster as the chain with the best average of resolution and MolProbity score
To calculate the MolProbity score for each chain, first hydrogens were added with the program Reduce
Two “post-processing” steps were required. First, we removed four chains whose PDB structures had been obsoleted and replaced them with updated structures where possible (1sheA→2pk8A, 1wt4A→2v1tA, 2eubA→2pl1A, 2f4dA→no replacement). Finally, we removed two chains with incomplete or unclear PDB files (1c53A had only Cα atoms, 3ctsA had only “UNK” unknown residue types). The resulting 5199 protein chains make up about a million residues.
We first noticed that backrubs may explain backbone adjustments upon mutation between short and long N-caps sidechains while examining N-caps in T4 lysozyme. Visual analysis using the BACKRUB tool in KiNG
N-caps were identified using a helix recognition algorithm based loosely on DSSP
We also took precautions to help offset any leniency in helix extension introduced in the steps described above. To ensure that a tightly defined subset of N-caps was being examined, H-bond pseudo-energies were computed as in DSSP with the standard −0.5 kcal/mol cutoff. N-caps with an
For N-cap sequence propensity analysis, all α and 310 N-caps of each amino acid type were compared to the general case of all residues in the Top5200 regardless of backbone conformation. For N-cap backrub analysis, on the other hand, only α N-caps with
As described in Results, we also created two control categories. For the “other N-caps” category, the criteria were the same as for Asn/Asp or Ser/Thr N-caps, except we required some other amino acid identity at the N-cap position. For the mid-helix category, we required strictly α-helical φ,ψ values between (−65°,−45°) and (−55°,−35°) for the central residue of interest, at least four residues labeled “H” for α-helix by DSSP
Examples of the N-cap motif were superposed onto one another using the N-cap
Average motifs were then generated by taking the mean position of all backbone atoms (including hydrogens) and Cβ atoms from N-cap
Using the same protocol, we also analyzed 388 Asn/Asp and 976 Ser/Thr examples from the second Ramachandran peak, near (−150°,170°) instead of (−80°,170°) as mentioned above. The
For β aromatics, we identified Phe/Tyr residues with “plus” χ1 (0–120°) in antiparallel β-sheet. The “opposite” residue is in the direction of the aromatic sidechain, between the narrow pair of mainchain H-bonds. Both residues were required to have maximum mainchain or sidechain atom B-factor less than 40, and to have at least one additional β residue (according to the custom DSSP-based secondary structure identification algorithm described above) in each direction along their respective strands.
To avoid irregularities from strand ends, we defined the “fray” parameter f:
where tN (N-ward twist) is the dihedral between Cα
Examples of the β aromatic motif were superposed onto one another using the aromatic
Average motifs were then generated by taking the mean position of all backbone atoms (including hydrogens) and Cβ atoms from residues
For all BRDEE calculations, we used the same input parameters as described previously
For N-cap calculations, we started from an ideal helix (φ,ψ −60°,−40°) with poly-Pro conformation (φ,ψ −80°,170°) for the N-cap and its preceding residue. All residues were Ala except for the N-cap, which was either Asn or Ser. At the N-cap position, primary backrub rotation angles from −15° to +15° in increments of 0.5° were allowed, and the default ε factor of 0.7 was used for back-rotation of the single peptides. At the N3 position, backrubs were not allowed as part of the main search. However, using KiNG
For β aromatic calculations, the strand twist and curl vary too much to construct averaged cases. Instead, we repeated the computational experiment several times with individual examples of the motif from different experimental structures as input coordinates. Representative examples such as 1khbA Phe144-Gly157 and 1z84A Phe171-Gln188 were chosen because they were visually relatively “middle of the pack” and were not obvious outliers in terms of geometric parameters like fray. Primary backrub rotation angles from −15° to +15° in increments of 0.5° were allowed at both residue positions, with the default ε factor of 0.7. (Backrubs at the opposite position were of less interest for this study, and indeed turned out to be much smaller than at the aromatic position.) If the original opposite residue was Gly, BRDEE was run for both the original coordinates and a Gly→Ala mutant. Likewise, if the original opposite residue was anything but Gly, BRDEE was run for both the original coordinates and an Xaa→Gly mutant.
The KiNG graphics program
This is a plain text kinemage graphics file allowing interactive, three-dimensional exploration of the N-cap conformations from
(TXT)
This is a plain text kinemage graphics file allowing interactive, three-dimensional exploration of the aromatic conformations from
(TXT)
This is a compressed tar archive containing lists of all individual examples and PDB coordinates for average and calculated structures (a README file explains the naming conventions) for both the crystal-structure ensembles and the BRDEE calculated structures.
(GZ)
This file provides a discussion of the differences between α-helix N-caps and 310-helix N-caps, and compares and contrasts in more detail the average crystal structures and low-energy BRDEE models for both the N-cap and β aromatic cases.
(DOC)
We thank Tanja Kortemme for suggesting the application of BRDEE to N-cap mutations, Bryan Arendall for database construction, Kyle Roberts for technical assistance with running BRDEE, Mark Hallen and Stuart Endo-Streeter for useful comments on the manuscript, and Andreas Pfenning for help with N-cap sequence statistics.