The author has declared that no competing interests exist.
Conceived and designed the experiments: BYC. Performed the experiments: BYC. Analyzed the data: BYC. Contributed reagents/materials/analysis tools: BYC. Wrote the paper: BYC.
Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins.
Proteins, the ubiquitous worker molecules of the cell, are a diverse class of molecules that perform very specific tasks. Understanding how proteins achieve specificity is a critical step towards understanding biological systems and a key prerequisite for rationally engineering new proteins. To examine electrostatic influences on specificity in proteins, this paper presents VASP-E, a software tool that generates solid representations of the electrostatic potential fields that surround proteins. VASP-E compares solids with constructive solid geometry, a class of techniques developed first for modeling complex machine parts. We observed that solid representations could quantify the degree of charge complementarity in protein-protein interactions and identify key residues that strengthen or weaken them. VASP-E correctly identified amino acids with established experimental influences on protein-protein binding specificity. We also observed that solid representations of electrostatic fields could identify electrostatic conservations and variations that relate to similarities and differences in binding specificity between proteins and small molecules.
This is a
Software for comparing protein structures is widely used to make inferences about protein function. These methods assist in function annotation by revealing proteins that perform similar biological functions despite vast evolutionary differences. Many methods focus on the discovery of subtle structural similarities among very different molecules
An emerging second type of comparison algorithm is designed to find subtle differences among very similar proteins. These methods seek to annotate protein specificity by proposing structural causes for different binding preferences among proteins that perform the same function
The problem we are specifically addressing is the case where several closely related proteins have already been structurally aligned and we seek to identify spatially conserved and varying regions in their potential fields that might cause differences in binding specificity. Conserved regions, where the fields have similar potentials, might stabilize a molecular fragment attracted by all proteins (
a) A demonstration of CSG operations, illustrating the borders of input (dotted) and output (solid) regions in grey (grey everywhere). b,c) Shapes representing the regions occupied by protein
The solid representations employed by VASP-E differ in kind from existing electrostatic analyses. While VASP-E deconstructs the electrostatic field to identify conserved and varying electrostatic phenomena, existing methods summarize and quantify the field with comparison scores
This paper explores two applications of VASP-E as it might be applied in support of research in structural biology. One objective in many investigations is to discover electrostatic influences on protein-ligand or protein-protein binding specificity. Given the long range nature of electrostatic interactions, many amino acids could potentially be influential, and it could be impractical to create all possible mutants and determine their binding preferences. Here, a first application of VASP-E is to suggest amino acids that create differences between the electrostatic fields of two ligand binding cavities or to suggest amino acids that enhance or diminish electrostatic complementarity between two interacting proteins. Because amino acids are suggested in tandem with a hypothetical electrostatic influence on binding, VASP-E provides reasons to produce and test certain mutants first, where no reason might have existed before. The second application of VASP-E examined in this paper is the classification of protein-ligand binding cavities based on their electrostatic fields. This application can support efforts to discover patterns of electrostatic similarities or differences among related binding sites. In studies seeking to identify a possible ligand, electrostatic classification can reveal similarities to other proteins that may have known binding partners. Together, these applications of VASP-E represent two of many capabilities that become possible by combining CSG and volumetric representations of electrostatic isopotentials. We validate these capabilities in the results section against established experimental observations.
The underlying observation exploited by VASP-E is that geometric comparisons of electrostatic potential fields can focus on biologically relevant regions and specific potential ranges by using CSG. Constraining the comparison of potential fields in this manner ensures that comparisons reflect aspects of electrostatic fields that influence binding, rather than spurious variations that occur by random chance or outside of binding sites. To achieve this kind of focus, comparisons always begin with a multiple structure alignment of whole proteins
Structures aligned in this manner are then used to generate solid representations of electrostatic isopotentials and protein structure. To represent electrostatic isopotentials, we first solve the potential field of a given structure using DelPhi
The resulting solids, regardless of their origin, are basic inputs for CSG operations, which we described earlier
Cavity fields and interface fields are the constrained representations used by VASP-E to focus the comparison of electrostatic fields on biologically significant regions. To quantify similarities, we compute the CSG intersection of two regions and then evaluate the volume of the resulting intersection region. To quantify differences, we measure the volume of the CSG difference. Large volumes of intersection imply similar fields while large differences are characteristic of fields that vary. To estimate the volume
Further CSG operations permit deconstructive comparisons of cavity and interface fields that identify similarities in some regions and differences in other regions within the fields they describe. While many applications this kind are possible with VASP-E, we describe two below: First, we can use VASP-E to trace differences in electrostatic fields to individual amino acids that contribute to these differences, thereby predicting residues that influence binding specificity. Second, we can integrate multiple electrostatic similarity measurements between a family of cavity fields to reveal patterns of ligand binding specificity.
As input, marching cubes begins with a molecular structure from the Protein Data Bank (PDB)
a) The input electrostatic field, illustrated as a gradient of red (negative potential) and blue (positive potential) regions. The solid region to be approximated is within the heavy black line. b) Axis aligned cubic lattice surrounding solid isopotential (black grid). c) Lattice points (circles) evaluated as being inside (red) or outside (green) the isopotential. d) Selected edges, found between interior and exterior lattice points (short black lines), intersect the electrostatic isopotential (grey curved line). e) Intersection points along each selected edge (small white circles). f1) A two dimensional illustration of the solid isopotential passing through a lattice square (red, left), with interior lattice points shown with red circles, and exterior lattice points shown with green circles. An approximation of the solid isopotential using a straight line is shown on the right. f2) A three dimensional illustration of the surface of a solid isopotential (red gradient, left) inside a lattice cube. Lattice points inside the solid isopotential are shown as red circles, lattice points outside are shown in green. An approximation of the solid isopotential triangles connecting intersection points (white circles) is shown on the right. g) Together, the triangles in all cubes (black lines) form the boundary surface approximating the solid isopotential (h).
First, we protonate the PDB structure using the
Marching Cubes begins by establishing a regular lattice of cubes around the protein, whose borders fall within the bounding box (
Once the lattice is initialized, we evaluate the potential
Next, we select every lattice edge that connects an inside lattice point to one outside. Since isopotentials are topologically closed surfaces, the selected edge must intersect the desired isopotential (
Finally, we consider every lattice cube joined to at least one lattice edge with an intersection point. On the cube, the intersection points collectively approximate the places where the isopotential passes through the cube. In two dimensions, this can be drawn as a shape passing through a square (
A cavity field is a solid representation of the region inside a ligand binding cavity that is also inside a solid isopotential. To generate a cavity field, we require the solid isopotential and a solid representation of the ligand binding cavity (
We compare cavity fields to detect local electrostatic differences that might affect specificity. Our approach follows the assumption that the user has selected solid isopotentials at a threshold that is relevant for ligand binding. For example, if a negative potential is influential for the selection of positively charged substrates, comparing regions of negative potential in several cavities could reveal electrostatic causes for different binding preferences. We discuss the selection of these potentials in Supplemental
Our comparison begins by structurally aligning two proteins,
Computing
We quantify differences by measuring the volume of
An interface field is a solid representation of a region of electrostatic complementarity between two proteins
To generate the interface region, we first identify amino acids at the interface (
a) Two proteins in complex (rounded rectangles), and amino acids at the protein-protein interface (yellow, green). b) Spheres around every atom in the interfacial amino acids (orange). c) CSG union of interfacial spheres. d,e) Red and blue gradients representing the electrostatic potential field in the interfacial regions of protein
Electrostatically significant isopotentials
Since the interface field represents electrostatic complementarity in a given complex, we can use interface fields to compare electrostatic complementarity in two complexes. For two complexes,
We evaluate the difference
Where
DelPhi
Amino acids that create electrostatic differences between two ligand binding cavities can cause different binding preferences. To identify amino acids like these, we begin with a
Each nullified cavity field
Throughout this process we may observe that two amino acids
Finally, we define a conservative prediction threshold for identifying amino acids that influence specificity. First, we compute
For protein-protein interfaces, we can perform a similar analysis to identify amino acids that affect electrostatic complementarity. Here, we begin with the structure of an input complex
Next, we compare the interface fields of each variant complex
We may observe that nullification of two amino acids
Finally, we define two conservative prediction thresholds to predict electrostatically influential amino acids in protein-protein interactions. Given a complex
Cavity fields based on a given family of proteins were clustered based on the Jaccard distance
Members of a given family of proteins were also clustered based on amino acid sequence alignments and backbone structure alignments. ClustalW 2.0.7 was used to compute multiple sequence alignments. The resulting alignments were passed to the protpars tool from Phylip
Because VASP-E is designed to identify electrostatic influences on specificity, we validate it using families of proteins for which the mechanisms that achieve specificity are well understood and fundamentally electrostatic. The serine protease and cysteine protease superfamilies were selected for validating that VASP-E finds amino acids that influence protein-ligand binding specificity because many mutational studies confirm the role of specific residues in achieving specificity. The same studies permit the validation of VASP-E as a method for clustering proteins based on ligand binding preferences.
The protein data bank (PDB)
We used ska
We demonstrate the comparison of interface fields on two protein complexes: barnase-barstar (pdb: 1brs) and rap1A-RAF (pdb: 1c1y). We selected these complexes because electrostatic potential is known to affect their binding preferences and because detailed experimental studies have established how binding preferences are affected by mutations on both sides of the interface. These studies create a well-defined gold standard for evaluating how accurately VASP-E can predict amino acids that alter binding preferences. The data set is summarized in
VASP-E was developed in ansi C/C++ using gcc (the Gnu Compiler Collection) version 4.4.7, on 64 bit linux-based computing platforms. Experimentation was performed on Corona, a cluster at Lehigh University with 1056 Opteron cores (model 6128) running at 2.0 Ghz. Each compute node on corona had 16 cores with access to either 2 or 4 GB of random access memory (RAM) per core. VASP-E is a single-threaded process that runs on one core and approximately 1 GB of random access memory. All experimentation was conducted at .5 Å resolution, which permitted accurate results and practical runtimes.
Visualization for some figures was performed with SURFview, a tool written using the OpenGL library and running on Intel Core i7 and Nvidia Geforce GTX 660 chipsets, in Microsoft Windows 7. Trees representing clusterings were visualized using Newick Utilities
The performance of VASP-E depends on the volume and resolution of the molecular surfaces or electrostatic isopotentials analyzed. On our dataset, generating solid isopotentials for entire proteins required approximately 9.5 seconds on average, to process an average of 1,337,083 lattice cubes. Comparing cavity fields required 1.06 seconds on average, to process an average of 41,984 cubes via CSG, while interface fields from two complexes required 23.4 seconds on average, to process an average of 729,321 cubes.
The website
Serine proteases exhibit affinity for amino acids at specificity subsites called S4, S3, …, S1, S1′, …, S3′, S4′
Using VASP-E, we identified amino acids that create electrostatic differences between trypsins and chymotrypsins at S1.
The red arrow indicates a trypsin residue associated with increased electrostatic similarity (downward spikes) when it is nullified. The dashed line represents the average prediction threshold between chymotrypsin and trypsin cavity fields.
One notable exception stands out. Nullifying aspartate 189 in all trypsins results in a large reduction in the average electrostatic difference with chymotrypsin at all potential thresholds, suggesting that the presence of aspartate 189 makes their S1 pockets electrostatically different.
a) S1 cavity of atlantic salmon trypsin (pdb: 1a0j) shown in teal. b) Intersection region (teal) of S1 cavities from trypsin and chymotrypsin (transparent yellow). c) S1 cavity of bovine chymotrypsin (pdb: 8gch) shown in teal. Inset figs. d-g illustrate cavity fields, all with potential less than −10 kT/e (teal), inside the intersection region (transparent yellow). d) The wild type trypsin cavity field occupies 152 Å3. e) The trypsin cavity field with D189 nullified (32 Å3). f) The wild type chymotrypsin cavity field (9 Å3). g) The chymotrypsin cavity field with D189 nullified (2 Å3).
The color coding, which is independent of tree topology, indicates the types of P1 residue preferred by each protein. Trypsins (blue) prefer basic amino acids and chymotrypsins (red) prefer large hydrophobic amino acids. The topology of the tree reflects patterns of similarity measured with the Jaccard distance. Proteins on adjacent branches have greater similarity than proteins on different subtrees. The topological separation of the chymotrypsins from the trypsins indicates that similarities and differences in the electrostatic character of S1 subsites, which create the differences in their binding preferences, were detected and correctly classified by VASP-E, using the Jaccard distance.
Clusterings based on cavity fields generated at −2.5, −5.0, −7.5, or −10.0 kT/e (
Cathepsin B is involved in the onset of pancreatitis
We used VASP-E to identify amino acids that create electrostatic differences between cathepsin B and cathepsin L.
The red arrows indicate amino acids in cathepsin B associated with increased electrostatic similarity (downward spikes) to cathepsin L, when they are nullified.
The nullification of two amino acids, glutamic acid 171 or glutamic acid 245, reduced electrostatic differences between cathepsin B and cathepsin L beyond the prediction threshold. This observation suggests that both amino acids create electrostatic differences between the S2 subsites of cathepsin B and L. Indeed, glutamic acid 245 has been shown to cause Cathepsin B to bind arginine residues at the S2 cavity
The color coding, which is independent of tree topology, indicates the types of P2 residue preferred by each protein. Cathepsin B's (red) prefer basic amino acids and cathepsin L and papain (blue) prefer large hydrophobic amino acids. The topology of the tree reflects patterns of similarity measured with different comparison algorithms. Proteins on adjacent branches have greater similarity than proteins on different subtrees. The topological separation of the cathepsin B's from cathepsin L and papain indicates that similarities and differences in the electrostatic character of S2 subsites, which create the differences in their binding preferences, were detected and correctly classified by VASP-E, using the Jaccard distance.
Barnase is an guanine-preferring endo-ribonuclease expressed by Bacillus amyloliquefaciens
The red arrows indicate amino acids in barnase that are associated with decreased electrostatic complementarity with barstar, when they are nullified. Blue arrows indicate amino acids associated with increased electrostatic complementarity, when they are nullified. Green arrows indicate amino acids below the prediction threshold that are known to influence specificity.
For four barnase residues, K27, R59, R83, and R87, nullification significantly reduced electrostatic complementarity, predicting correctly that mutations abrogating net charge at these positions could reduce affinity. These predictions are consistent with experimental observations established earlier: K27A decreases association rates by a factor of 7 to 10 times
Nullification of barnase residues 54 and 73 significantly increased electrostatic complementarity, correctly predicting that substituting these amino acids with alanine should increase affinity. Predictions for D54 and E73 reproduced established observations: Substituted individually, D54A and E73A increase association rates by 2 to 4 fold
Three known influences on affinity fell below our prediction threshold. Nullifying residues 39 and 102 reduced electrostatic complementarity, but not significantly enough to achieve our prediction threshold. The mutation K39A is known to reduce affinity
Nullifications of influential amino acids identified by VASP-E create changes in electrostatic complementarity that can be localized to specific regions. For example,
a) The molecular surface of Bacillus amyloliquefaciens barnase (teal). b) The molecular surface of Bacillus amyloliquefaciens barstar (transparent yellow) and barnase (teal). c) The interface region (transparent yellow) between barnase and barstar (teal). d) Electrostatic isopotential at +3 kT/e (blue) near barnase (teal). e) The same isopotential shown in transparent yellow. f) The electrostatic isopotential at −3 kT/e near barstar (red) and it's overlap with the electrostatic isopotential at +3 kT/e near barnase (transparent yellow). g) Electrostatic isopotential at +3 kT/e (blue) near barnase (teal), where Lysine 27 is nullified.
The red arrows indicate amino acids in barstar that are associated with decreased electrostatic complementarity with barstar, when they are nullified. Numbers in yellow ovals indicate inclusive intervals of amino acids where electrostatic focusing enhances the volume of the electrostatic potential inside the barnase/barstar interface. The green arrow indicates an amino acid below the prediction threshold that is known to influence specificity.
Nullifying three barstar residues, 35, 39 and 80 reduced electrostatic complementarity. These observations correctly predict experimental observations that these amino acids are crucial for affinity between barnase and barstar, and that diminishing their electrostatic contribution interferes with binding: Charge reversal mutations individually converting aspartate 35 and 39 to lysine were shown to halt the inhibition of barnase by barstar
The nullification of glutamic acid 76 insufficiently reduced electrostatic complementarity to be associated with a prediction. Nonetheless, the mutation of E76 to alanine was shown to reduce the binding energy by 1.6 kcal/mol and increases the dissociation constant by 10 fold relative to the wildtype complex
Nullifying the uncharged interfacial amino acids 29–31, 36–38, and 40–46 generated increases in electrostatic complementarity via electrostatic focusing. This enhancement creates isopotentials with larger volume, especially when the isopotentials are generated at low absolute thresholds (e.g.+/− 1 kT/e). Since these amino acids are uncharged, their nullification enlarges the isopotentials of nearby charged amino acids D35 and D39.
Ras is a master regulator that transmits a wide range of signals via protein-protein interactions. Downstream, its effectors are involved in many crucial systems, including cell cycle progression, cell division, apoptosis, lipid metabolism, DNA synthesis, and cytoskeletal organization
The red arrows indicate amino acids in rap1a that are associated with decreased electrostatic complementarity with barstar when they are nullified. Blue arrows indicate amino acids associated with increased electrostatic complementarity, when they are nullified. The white arrow indicates an open prediction.
Nullification of six rap1a residues, 33, 37, 38, 54, 57 and 62 reduced electrostatic complementarity beyond the lower prediction threshold, suggesting that loss of charge mutations would reduce complex affinity. These predictions were consistent with established experimental observations: Substituting aspartate 33 for alanine in rap1a results in a binding energy reduction of 1.2 kcal/mol
Nullification of two rap1a residues, 31 and 41, increased electrostatic complementarity beyond the upper prediction threshold, suggesting that mutations removing their net charge should also increase affinity. Established results confirm these observations: Charge reversal of lysine 31 to glutamic acid is known to create a 30 fold increase in affinity
Finally, VASP-E predicted that the nullification of lysine 5 could result in an increase in binding affinity. This observation suggests that lysine 5 may normally reduce affinity. However, to our knowledge, no current experimental results that establish this claim, and hence we leave it as an open prediction.
The red arrows indicate amino acids in rap1a that are associated with decreased electrostatic complementarity with barstar when they are nullified. The green arrow indicates an amino acid below the prediction threshold that is known to influence specificity. The white arrow indicates an open prediction.
Nullification of four residues in raf, 59, 67, 84, and 89 reduced electrostatic complementarity below the lower prediction threshold and correctly predicted experimentally established substitutions that correspond to reductions in affinity. Substituting arginine 59 with alanine is known to reduce the rate of association by 25 fold
Nullification of lysine 65 did not reduce electrostatic complementarity below the lower threshold. While lysine 65 was therefore not predicted to have a significant electrostatic influence on specificity, the mutation K65A is known to reduce the rate of association by 4.5 fold
By collecting the predictions made on our dataset, we can measure the prediction performance of VASP-E. We begin by counting true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs). TPs are defined as amino acids that are both predicted by VASP-E to have an influence on specificity and also published in experimental findings to have such an influence. The predictions detailed earlier in this section cite these findings as specific validation for the predictions made with VASP-E. FPs are amino acids that are both predicted by VASP-E to have an influence on specificity and are documented in the literature to not have an effect on specificity. TNs are amino acids that are predicted to not have an influence on specificity that are also documented in the literature to not have an effect on specificity. FNs are amino acids predicted to not have an influence on specificity but are established in the literature as having a role in specificity. Finally, we VASP-E made two predictions that were neither confirmed nor denied in the literature. We leave these two observations as open predictions and do not include them in our evaluation of prediction performance.
Of these statistics, TNs cannot be fully counted because no studies categorically classify the role of every amino acid in specificity, including those that are distant from the binding site. For this reason, we describe the number of true negatives as unknown. Nonetheless, we do not require TNs in order to compute
We have presented VASP-E, a new program for the comparison of electrostatic isopotentials. To our knowledge, VASP-E is the first program capable of comparing isopotentials using CSG, enabling a new unified approach to the characterization of protein-ligand and protein-protein binding specificity. In an application to the serine and cysteine proteases, we demonstrate that VASP-E is capable of reproducing known ligand binding preferences and of detecting differences in electrostatic potential among proteins that, based on global sequence and structure similarity, might have been expected to be similar. Subtle differences like these, which can arise from variations in single amino acids, can still be detected by VASP-E because they are reflected in differently shaped isopotentials.
Central to our approach is a novel solid representation of electrostatic isopotentials that can also represent regions within molecular surfaces. This seamless integration of two nearly orthogonal aspects of protein structure enables analytical capabilities that were not possible before. One capability is the identification of amino acids that create differences in electrostatic isopotentials at binding cavities. Using the molecular surface to exclude electrostatic variations outside the binding cavity, we identified three amino acids in trypsin and cathepsin B that create electrostatic differences in binding specificity. These predictions correctly reflected experimentally established observations regarding their electrostatic influence. VASP-E also finds amino acids that change electrostatic complementarity in protein-protein interfaces. In an analysis of the barnase-barstar and rap1a/raf complexes, VASP-E predicted 22 amino acids that either increase or decrease affinity upon mutation, all in agreement with established experimental results. Solid representations enable a deconstructive analysis of electrostatic fields that permits the discovery of individual residues that influence binding preferences in protein-ligand and protein-protein binding sites.
As the first approach to the comparison of electrostatic isopotentials with CSG, VASP-E exhibits novel potential for useful experimental applications. In experimental settings, identifying mutants that may alter binding specificity can be a time consuming and expensive effort with many possible mutants to consider. VASP-E identifies amino acids that might play a role in specificity, and, in addition, it suggests a biophysical mechanism for that amino acid: It may increase or decrease electrostatic complementarity. This additional information, beyond simply identifying an important amino acid, provides utility beyond the identification of important amino acids because it suggests how that amino acid might be tested, such as by mutation to an uncharged or oppositely charged residue. When comparing protein-ligand binding cavities, pointing out amino acids that create electrostatic differences can inform experimental design.
VASP-E has the potential to serve broad applications. For example, identifying groups of amino acids that work together to achieve specificity can be an especially difficult problem, because of the combinatorial space of variants that must be considered. Nullification, as applied to individual amino acids in this paper, could be exhaustively applied to many combinations of residues to assist in experimental design. Given the rapid performance of VASP-E and the expanding availability of parallel computing, examining combinations of influential amino acids would also be very practical. Furthermore, the analysis of influential amino acids at protein-protein interfaces is not limited to dimers; the approach described here could be logically extended to higher order interactions. For such applications, interfaces between specific chains could be considered individually or in groups, to reflect the order in which the complex associates. Finally, while VASP-E is designed to identify subtle variations among highly similar proteins, VASP-E could in principle be used to analyze electrostatic similarities and differences among binding sites from very different proteins, as long as structural alignments could be correctly generated and binding cavities can be properly defined. These diverse applications suggest that the integrated representation and comparison of structure and electrostatics may offer an important new tool in the study of drug resistance and algorithms for specificity annotation.
Patterns of electrostatic similarity in the S1 specificity pockets of trypsins and chymotrypsins, relative to P1 binding preference. The color coding in all trees, which is independent of tree topology, indicates the types of P1 residue preferred by each protein. Trypsins (blue) prefer basic amino acids and chymotrypsins prefer large hydrophobic amino acids (red). The topology of each tree reflects patterns of similarity measured with the Jaccard distance on cavity fields generated at different isopotential thresholds. In each tree, proteins on adjacent branches have greater similarity than proteins on different subtrees. The topology of the trees reflect UPGMA clustering of serine protease cavity fields generated at (a) 2.5 kT/e, (b) 5.0 kT/e, (c) 7.5 kT/e, (d) and 10.0 kT/e.
(EPS)
Patterns of similarity and variation in the sequence, backbone structure, and cavity fields of trypsins and chymotrypsin, relative to P1 binding preference. The color coding in all trees, which is independent of tree topology, indicates the types of P1 residue preferred by each protein. Trypsins (blue) prefer basic amino acids and chymotrypsins prefer large hydrophobic amino acids (red). The topology of each tree reflects patterns of similarity measured with different comparison algorithms. In each tree, proteins on adjacent branches have greater similarity than proteins on different subtrees. The topology of tree (a) reflects sequence similarity measured with Clustalw 2.0.7, the topology of (b) reflects backbone structure similarity measured with ska, the topology of (c) reflects cavity field similarity measured with the Jaccard distance, and the topology of (d) reflects sequence similarity as measured with clustal omega. Jaccard similarity positions serine proteases with similar P1 binding preferences more closely than the other similarity measures do.
(EPS)
Patterns of electrostatic similarity in the S2 specificity pockets of cathepsin B, cathepsin L, and papain, relative to P2 binding preference. The color coding in all trees, which is independent of tree topology, indicates the types of P2 residue preferred by each protein. Cathepsin B's (red) prefer basic amino acids and cathepsin L and papain (blue) prefer large hydrophobic amino acids. The topology of each tree reflects patterns of similarity measured with the Jaccard distance on cavity fields generated at different isopotential thresholds. In each tree, proteins on adjacent branches have greater similarity than proteins on different subtrees. The topology of the trees reflect UPGMA clustering of cysteine protease cavity fields generated at (a) 2.5 kT/e, (b) 5.0 kT/e, (c) 7.5 kT/e, (d) and 10.0 kT/e.
(EPS)
Patterns of similarity and variation in the sequence, backbone structure, and cavity fields of cysteine proteases, relative to P2 binding preference. The color coding in all trees, which is independent of tree topology, indicates the types of P2 residue preferred by each protein. Cathepsin B's (red) prefer basic amino acids and cathepsin L and papain (blue) prefer large hydrophobic amino acids. The topology of each tree reflects patterns of similarity measured with different comparison algorithms. In each tree, proteins on adjacent branches have greater similarity than proteins on different subtrees. The topology of tree (a) reflects sequence similarity measured with Clustalw 2.0.7, the topology of (b) reflects backbone structure similarity measured with ska, the topology of (c) reflects cavity field similarity measured with the Jaccard distance, and the topology of (d) reflects sequence similarity as measured with clustal omega. Jaccard similarity positions cysteine proteases with similar P2 binding preferences in a manner similar to the other similarity measures.
(EPS)
(PDF)
The author sincerely thanks Barry Honig for his thoughtful advice on this study and Remo Rohs for insightful discussions.