Conceived and designed the experiments: HMM GR. Performed the experiments: HMM CMP NP GR. Analyzed the data: HMM CMP NP GR. Contributed reagents/materials/analysis tools: NP GR. Wrote the paper: HMM CMP NP GR.
The authors have declared that no competing interests exist.
A long-standing goal in biology is to establish the link between function, structure, and dynamics of proteins. Considering that protein function at the molecular level is understood by the ability of proteins to bind to other molecules, the limited structural data of proteins in association with other bio-molecules represents a major hurdle to understanding protein function at the structural level. Recent reports show that protein function can be linked to protein structure and dynamics through network centrality analysis, suggesting that the structures of proteins bound to natural ligands may be inferred computationally. In the present work, a new method is described to discriminate protein conformations relevant to the specific recognition of a ligand. The method relies on a scoring system that matches critical residues with central residues in different structures of a given protein. Central residues are the most traversed residues with the same frequency in networks derived from protein structures. We tested our method in a set of 24 different proteins and more than 260,000 structures of these in the absence of a ligand or bound to it. To illustrate the usefulness of our method in the study of the structure/dynamics/function relationship of proteins, we analyzed mutants of the yeast TATA-binding protein with impaired DNA binding. Our results indicate that critical residues for an interaction are preferentially found as central residues of protein structures in complex with a ligand. Thus, our scoring system effectively distinguishes protein conformations relevant to the function of interest.
Proteins participate in most of the doings of the cells through a variety of interactions. There is an intimate relationship between the function of a protein and its three-dimensional structure, but understanding this relationship remains an unsolved problem, in part due to the limited information on protein structures bound to other biological molecules. On the other hand, thousands of protein structures in the unbound or free form, are made public every year and these differ from those of the bound structures. How to predict the protein structure in the bound form may assist researchers in understanding the structure/function relationship. Here we report that protein structures bound to other molecules tend to present, as central amino acids, those that are critical for binding other molecules. This feature allowed us to identify the protein structures known to be involved in protein interactions from a screening of thousands of structures derived from the free form.
Proteins are dynamic molecules that adopt multiple structures in vitro and in vivo
To effectively link protein dynamics to protein structure and function using computational modeling techniques, it is required to know the structure of a protein bound to a natural ligand, considering that protein function at the molecular level is understood by the ability of proteins to bind to other molecules (e.g., biological macromolecules and/or small molecules). However, public databases of protein structures scarcely show this information: for instance, in September 4 2007, the PDB release contained 45,632 entries including 1,856 protein-DNA complexes (data obtained from the Protein Data Bank
In this work, we introduce a computational approach aimed at identifying functional conformers of proteins. To explain the basis of our approach, we have established some definitions and axioms.
We refer to a A A
Furthermore, to model protein function in terms of protein dynamics, we will assume as Proteins accomplish their function through a set of conformations Critical residues for protein function play their roles in that set of conformations.
Note that experimental evidence supports axiom
In order to relate different conformations with different critical residues we need to estimate a property of the residues that varies with the conformation of proteins; the property used in this study is centrality. One of the reasons to choose centrality comes from the observed alteration in the centrality values of critical residues involved in binding in the dihydrofolate reductase enzyme upon ligand binding
It is important to note that many possible conformations could be involved in binding a ligand, provided that the ligand as well presents several conformations accessible to the protein. In this regard, our method does not attempt to identify all of them or a specific one. Instead, here we show that our method can determine from a population of protein conformations, which ones are those related to the binding of a ligand.
In summary, the goal of our work is to identify the functional conformers of proteins. For that, we describe a method that accounts for the presence of critical residues important for ligand binding in different protein conformations. We tested our method in 24 different proteins and more than 260,000 conformations of these proteins both in the absence of a ligand or bound to a ligand. Our results indicate that functional conformers harbor preferentially the critical residues for ligand binding as central residues, thus providing a procedure to effectively identify the functional conformers of proteins.
Our group
A first step in our approach is to build a network representation of a
To this end, we have reported that using multiple protein conformations derived from the normal modes of vibration improves the sensitivity of predictions based on the transitivity
The sensitivity and specificity for predicting critical residues are plotted for 2 well-characterized proteins: HIV-protease (squares) and the T4 lysozyme (circles). The empty symbols correspond to the values obtained with a single protein conformer and the shadowed symbols correspond to those obtained with multiple conformers. For comparison, the filled symbols correspond to the values obtained with conserved residues predicted as critical residues (see
For every structure of the SCOP structural family 51351 (Triose Phosphate Isomerase family, including: 1TIM, 1AMK, 1CI1, 1HG3, 1M6J, 1B9B, 1TCD, 1TRE, 1YYA, 1HTI, 1R2R, 1MO0, 1YDV, 1YPI, 1WYI, 8TIM), we calculated their central residues. Using a multiple sequence alignment, we mapped each central residue into the 1TIM structure. Then, we counted the frequency that each position of 1TIM was found as a central residue in all the family (centrality score). Here, we show the relationship of this frequency with a conservation score for each position of 1TIM derived using the Bayesian ConSeq procedure
Thus, including multiple protein conformers does improve the relationship between central residues and critical residues providing support to axiom
Our results suggest that different sets of protein conformers harbor different sets of central and critical residues. That is, each protein conformer presents several and different central residues. If this were correct, then it would be possible to find the set of protein conformers harboring the critical residues for ligand binding: the functional conformers. That is the contention of axiom
In
The fraction of identical central residues shared by every pair of conformers (y-axis) is plotted against every pair of conformer analyzed (x-axis). The results are shown for every pair between the 23 T4 Lysozyme structures analyzed (filled circles) and the 31 complexed HIV-1 protease structures compared against all the 42 non-complexed HIV-1 protease structures (empty triangles). Please refer to
Combined Sensitivity (CS) is plotted against the Root Mean Square Deviation (RMSD) values observed for every pair of structures compared. 31 HIV-1 protease structures in complex with a substrate were compared against 42 HIV-1 protease structures without a substrate. Please refer to
Thus, we have shown that different protein conformers have different central residues despite the small geometrical differences observed between the proteins and, consequently, that there is no relationship between the overall geometrical differences observed between protein conformers and the occurrence of central residues in these conformers. These results provide the basis to assess axiom
We propose that if a protein conformer participates in a given protein function, it must harbor as central residues those that are critical for that function (axiom
To evaluate axiom
The overall and average sensitivity for predicting critical residues of the HIV-protease was significantly higher when we used crystallographic structures of the HIV-protease associated with a substrate (black dots) than when the crystallographic structures did not include the substrate (white dots). To facilitate visual analysis, the points of each group were sorted in ascending order according to their sensitivity value.
We also analyzed multiple computationally generated protein conformers. In these studies, we used the yeast TATA binding protein (TBP), which has been solved both in the presence
The overall and average sensitivity for predicting critical residues for the binding of the TBP to the TATA sequence was significantly higher when we used structures derived from a molecular dynamics simulation of the TBP associated with the TATA sequence, (labeled TBP+WtDNA, black dots) than when the simulated structures were without DNA, (labeled TBP, red dots). To facilitate visual analysis, the points of each group (63,000 structures each) were sorted in ascending order according to their sensitivity value. See
TBP conformers with the highest and lowest values of both sensitivity and specificity in the four molecular dynamic simulations of TBP were used to show the relationship between the sensitivity value and the RMSD of the conformer with respect to the 1YTB structure.
WT Residue | Mutants | References |
Pro65 | Ser | |
Leu67 | Lys | |
Asn69 | Ser, Arg, deletion | |
Val71 | Ala, Met, Arg, Glu | |
Leu76 | Lys | |
Leu80 | Lys | |
Leu82 | Lys | |
Lys97 | Glu | |
Arg98 | Glu | |
Phe99 | Lys, Leu | |
Ala100 | Pro | |
Ile103 | Lys | |
Arg105 | Leu, Cys | |
Pro109 | Ala, Gln | |
Lys110 | Leu | |
Thr111 | Ile | |
Thr112 | Lys | |
Ala113 | Lys, Leu | |
Leu114 | Lys, Phe | |
Ile115 | Lys | |
Phe116 | Tyr, Lys, Leu | |
Ser118 | Leu | |
Lys120 | Leu | |
Met121 | Lys | |
Val122 | Arg, Lys | |
Thr124 | Asn, Arg | |
Gly125 | Deletion | |
Lys127 | Leu | |
Ser126 | Asn | |
Arg141 | Ala | |
Ile143 | Asn | |
Phe148 | Leu | |
Lys156 | Ala | |
Asn159 | Asp, Leu, Arg | |
Val161 | Ala, Glu, Arg | |
Leu172 | Lys | |
Leu175 | Lys | |
Leu189 | Pro, Ser | |
Phe190 | Arg, Gln, Thr | |
Pro191 | Ala | |
Leu193 | Lys | |
Ile194 | Arg, Phe | |
Arg196 | Glu, Cys | |
Lys201 | Glu, Leu | |
Val203 | Glu, Lys, Thr | |
Leu204 | Lys | |
Leu205 | Arg, Val, Lys, Phe | |
Phe207 | Leu, Tyr | |
Lys211 | Leu | |
Val213 | Arg | |
Leu214 | Lys | |
Thr215 | Arg | |
Lys218 | Leu |
WT residue column describes the residue (3-letter code amino acid and its position) in the wild-type TBP that once mutated to any of the amino acids described in the Mutants column, abolished the ability of TBP to bind DNA. The reference numberings reporting such mutants are indicated.
Group | N | Mean | SD |
TBP+WtDNA | 63000 | 0.254 | 0.122 |
TBP-WtDNA | 63000 | 0.249 | 0.116 |
TBP-GCDNA | 63000 | 0.234 | 0.117 |
TBP | 63000 | 0.23 | 0.109 |
Compared Groups | ModelDF | Model MS | Error DF | Error MS | F | α |
TBP+WtDNA–TBP-WtDNA | 1 | 0.671 | 125998 | 0.014 | 46.709 | 0.05 |
TBP-WtDNA–TBP-GCDNA | 1 | 7.319 | 125998 | 0.013 | 533.58 | 0.05 |
TBP-GCDNA–TBP | 1 | 0.564 | 125998 | 0.012 | 43.873 | 0.05 |
(Upper Part) Each row shows the statistical parameters for each group of TBP conformers derived from molecular dynamics simulations. TBP+WtDNA: TBP with the TATA sequence. TBP-WtDNA: TBP orginally resolved with the TATA sequence but removed during the simulation. TBP-GCDNA: TBP with a GCGC sequence. TBP: TBP originally resolved without DNA and simulated without DNA. (N: number of conformers, SD: Standard deviation).
(Lower Part) Each row summarizes the results for a one-way ANOVA (Null hypothesis: mean(1st group) = mean(2nd group)) for the pairs of groups indicated in the first column. In each case the null hypotheses is rejected at the 0.05 level of significance (DF: degrees of freedom, MS:mean square, F: Calculated F-value , α: level of significance).
In order to analyze the veracity of axiom
The sensitivity value for predicting critical residues in the MolMov set (see
PDB Code | Protein Name | Ligand Name | SCOP Classification |
1BJY | Tetracyclin repressor | Tetracycline | All alpha |
1DQY | Antigen85c (mycolyltransferase) | Diethyl phosphate inhibitor | Alpha/beta |
1CRX | CRE recombinase | DNA | All alpha |
1EX7 | Guanylate kinase | Guanosine 5-monophosphate | All beta |
1QUK | Phosphate-binding protein | Phosphate | Alpha/beta |
1GTR | Glutaminyl-tRNA synthase | ATP | Alpha/beta |
2DRI | Ribose-binding protein | Ribose | Alpha/beta |
1SSP | Uracyl-DNA glycosylase | Uracyl-DNA | Alpha/beta |
1CIP | Guanine nucleotide-bindign protein | Phosphoaminophosphonic acid-guanylate ester | Alpha/beta |
3PJR | Helicase | DNA | Alpha/beta |
1B0O | Beta-Lactoglobulin | Palmitate | All beta |
6TIM | TriosePhosphate Isomerase | 3-Phosphoglycerol | Alpha/beta |
1F8A | Peptidyl-prolyl cis-trans isomerase | Phosphoserine-proline peptide | Alpha + beta |
1DVJ | Orotinide monophosphate dehydrogenase | 6-AZA Uridine MonoPhosphate | Alpha/beta |
1FTM | Glutamate receptor | AMPA | Alpha/beta |
3MBP | Maltose-binding protein | Maltotriose | Alpha/beta |
1QAI | Reverse transcriptase | Nucleic acid | Multidomain protein (alpha and beta) |
2RKM | Oligopeptide binding protein | Lys-Lys peptide | Alpha/beta |
1I7D | DNA toposiomerase II | 8-bases single-stranded DNA | Multidomain protein (alpha and beta) |
1PFK | Phosphofructokinase | Fructose diphosphate | Alpha/beta |
The proteins solved in complex with a ligand in the MolMov set are listed with their ligands. The first ten rows correspond to the protein whose predicted critical residues were close to the ligand; the last ten rows are the proteins whose predicted critical residues were not close to the ligand. The last column indicates the structural classification as indicated in the SCOP database.
The 53 mutants listed in
All 53 critical residues in TBP involved in DNA binding qualified as central residues in the structures generated during the simulations (see
WT Residue | TBP+WtDNA | TBP-WtDNA | TBP | TBP-GCDNA |
Pro65 | 0.579 | 0.277 | 0.247 | 0.429 |
Leu67 | 0.786 | 0.621 | 0.552 | 0.57 |
Asn69 | 0.237 | 0.051 | 0.045 | 0.026 |
Val71 | 0.268 | 0.016 | 0.015 | 0.015 |
Leu76 | 0.842 | 0.911 | 0.81 | 0.689 |
Leu80 | 0.897 | 0.969 | 0.861 | 0.728 |
Leu82 | 0.435 | 0.286 | 0.255 | 0.161 |
Lys97 | 0.001 | 0 | 0 | 0 |
Arg98 | 0.026 | 0.034 | 0.03 | 0 |
Phe99 | 0.233 | 0.294 | 0.261 | 0.098 |
Ala100 | 0 | 0.044 | 0.039 | 0.003 |
Ile103 | 0.045 | 0.022 | 0.019 | 0.037 |
Arg105 | 0.102 | 0.013 | 0.011 | 0.02 |
Pro109 | 0.075 | 0.044 | 0.04 | 0.057 |
Lys110 | 0.029 | 0.017 | 0.015 | 0.034 |
Thr111 | 0.279 | 0.124 | 0.11 | 0.208 |
Thr112 | 0.433 | 0.551 | 0.49 | 0.287 |
Ala113 | 0.049 | 0.029 | 0.025 | 0.098 |
Leu114 | 0.578 | 0.652 | 0.579 | 0.21 |
Ile115 | 0.35 | 0.551 | 0.49 | 0.422 |
Phe116 | 0.184 | 0.244 | 0.217 | 0.389 |
Ser118 | 0.001 | 0 | 0 | 0 |
Lys120 | 0.344 | 0.496 | 0.441 | 0.421 |
Met121 | 0.326 | 0.089 | 0.079 | 0.128 |
Val122 | 0.665 | 0.961 | 0.854 | 0.569 |
Thr124 | 0.11 | 0.042 | 0.037 | 0 |
Gly125 | 0.025 | 0.032 | 0.029 | 0.01 |
Lys127 | 0.338 | 0.273 | 0.243 | 0.274 |
Ser136 | 0.221 | 0.122 | 0.109 | 0.103 |
Arg141 | 0.014 | 0.001 | 0.001 | 0 |
Ile143 | 0.168 | 0.132 | 0.117 | 0.075 |
Phe148 | 0.164 | 0.071 | 0.063 | 0.078 |
Lys156 | 0.213 | 0.156 | 0.139 | 0.223 |
Asn159 | 0.323 | 0.343 | 0.305 | 0.287 |
Val161 | 0.247 | 0.221 | 0.197 | 0.141 |
Leu172 | 0.897 | 0.998 | 0.887 | 0.758 |
Leu175 | 0.958 | 1.007 | 0.895 | 0.757 |
Leu189 | 0 | 0.033 | 0.029 | 0 |
Phe190 | 0.046 | 0.03 | 0.027 | 0 |
Pro191 | 0.001 | 0 | 0 | 0 |
Leu193 | 0.479 | 0.497 | 0.442 | 0.336 |
Ile194 | 0.016 | 0.016 | 0.015 | 0.002 |
Arg196 | 0.045 | 0.035 | 0.031 | 0.031 |
Lys201 | 0.003 | 0.003 | 0.002 | 0.082 |
Val203 | 0.063 | 0.023 | 0.02 | 0.048 |
Leu204 | 0.087 | 0.053 | 0.048 | 0.061 |
Leu205 | 0.071 | 0.033 | 0.03 | 0.023 |
Phe207 | 0.054 | 0.001 | 0.001 | 0.002 |
Lys211 | 0.049 | 0 | 0 | 0 |
Val213 | 0.093 | 0 | 0 | 0 |
Leu214 | 0.453 | 0.336 | 0.299 | 0.245 |
Thr215 | 0.013 | 0 | 0 | 0 |
Lys218 | 0.572 | 0.644 | 0.572 | 0.477 |
The observed frequencies of DNA-binding null mutant positions (WT residue) for each of the 4 molecular simulations, including: a) TBP+WtDNA, b) TBP-WtDNA, c) TBP and d) TBP-GCDNA. The frequencies were obtained by normalizing the number of times any of the residues in this table was detected as central in all of the 63,000 conformations analyzed.
Under the current view that proteins accomplish their function through a set of conformations
Note that simply including many protein conformers in the analysis may not identify more critical residues. In this case, it is important to take into account the diversity of conformations being analyzed and the mechanism used by the protein to recognize the ligand (e.g., induced-fit versus selected-fit mechanisms. See below).
Additionally, our results are in agreement with the notion that conserved residues are not always functionally important, yet some conserved residues have functional roles (e.g., catalytic residues). Also, our results indicate that different protein conformers may harbor different central residues and, presumably different functions (axiom
Indeed, we show that different protein conformers harbor different sets of central residues (see
Understanding this correspondence between centrality and protein structure may lead to generate protein structures hosting specific sets of critical and central residues. This will require a more in-depth characterization of the topological features of protein structures represented as networks. Recognizing our current limitation to generate protein conformers harboring a specific set of central residues, our best approximation to identify functional conformers of proteins is through the screening of collections of protein structures.
We determined the central residues for 73 experimentally determined conformers of the HIV protease and for 252,000 computationally generated conformers of TBP. For these two proteins, the critical residues for binding the substrate or other ligand have been identified
Thus, according to axiom
We noticed that some conformers derived from the protein structure in the absence of a ligand actually present large sensitivity values (see
To illustrate the usefulness of our method in the study of the structure/dynamics/function relationship of proteins, we examined previously reported mutants of the yeast TBP that have been identified as critical for DNA binding. Since binding to DNA is a dynamic process, it is important to keep in mind that a single structure of TBP in complex with DNA may not be sufficient to determine which of the residues have a role in binding or in keeping the structure. We explored the use of our method for distinguishing these residues. Our results show that residues Lys97, Ser118, Pro191, Lys211, Val213 and Thr215 are more likely involved in binding, while residues Leu67, Leu76, Leu80, Val122, Leu172 and Leu175 appeared to be involved in the preservation of the structure of yeast TBP. It is important to note that our method does not use a criterion based on the distance of the protein to the ligand; nonetheless our results are in consonance with the distance and orientation of the critical residues observed in the structure of yeast TBP in complex with the TATA-box DNA. Likewise, mutations on the residues predicted to be involved in maintaining TBP structure (Leu67, Leu76, Leu80, Leu172 and Leu175) do not transcribe in either an activated (in the presence of transcription activators) or basal fashion, supporting the idea of a structural role for these residues
Our results support the notion that protein function is achieved through an ensemble of protein conformations
To study the relationship between conserved residues and central residues in multiple protein structures, two proteins were used: HIV protease and the T4 lysozyme. For the HIV protease, 73 experimentally determined crystal structures were used: 1a30, 1a8g, 1a9m, 1aaq, 1ajv, 1ajx, 1axa, 1bdr, 1bv7, 1bv9, 1bwa, 1bwb, 1cpi, 1dif, 1dmp, 1gnm, 1gnn, 1gno, 1hbv, 1hih, 1hiv, 1hos, 1hps, 1hpv, 1hpx, 1hsg, 1hte, 1htf, 1htg, 1hvc, 1hvi, 1hvj, 1hvk, 1hvl, 1hvr, 1hvs, 1hwr, 1hxb, 1hxw, 1mer, 1mes, 1met, 1meu, 1mtr, 1odw, 1odx, 1ody, 1ohr, 1pro, 1qbr, 1qbs, 1qbt, 1qbu, 1sbg, 1tcx, 1vij, 1vik, 1ytg, 1yth, 2aid, 2bpv, 2bpw, 2bpx, 2bpy, 2bpz, 2upj, 3aid, 4hvp, 4phv, 5hvp, 7hvp, 8hvp, 9hvp. For the T4 lysozyme 23 experimentally determined crystal structures were used: 1ctw, 1cu0, 1cu2, 1cu3, 1cu5, 1cu6, 1cup, 1cuq, 1cv0, 1cv1, 1cv3, 1cv4, 1cv5, 1cv6, 1cvk, 1cx7, 1d2w, 1d2y, 1d3f, 1d3j, 1d3m, 1d3n, 1qsq.
To identify functional conformers three sets of protein structures were used: HIV protease, the yeast TATA-Binding Protein (TBP) and the MolMov set of proteins. For the HIV protease, the same protein structures described above were used. The PDB code of those structures in complex with a substrate analogue are: 1aaq, 1cpi, 1dmp, 1hbv, 1hih, 1hiv, 1hos, 1hps, 1hpv, 1hte, 1htf, 1htg, 1hvi, 1hvj, 1hvk, 1hvl, 1hvr, 1hvs, 1ohr, 1sbg, 2bpv, 2bpw, 2bpx, 2bpy, 2bpz, 4hvp, 4phv, 5hvp, 7hvp, 8hvp, 9hvp. For TBP, the crystal structures used had the PDB codes: 1tbp for TBP without DNA, and 1ytb for the TBP complex with a TATA box (TATATAAA).
In the case of the MolMov set, we used the proteins reported at the database of macromolecular movements
The initial structure for the simulation of free TBP was 1TBP
Networks were derived from protein structures by a distance criterion. That is, two residues were considered neighbors and consequently to interact if at least 1 atom on each residue is 5 Angstrom (Å) apart or closer. The atoms within that distance may be part of the amino acid's main chain and the amino acid's side chain. Therefore, the networks that were built had amino acid residues as nodes and their interactions as links. Links were labeled with identical weights. We previously reported that among 21 different ways to build networks from protein structures (e.g., distance between center of masses, charge, different distance cut-off values), this way reproduces with better results the prediction of critical residues from central ones
Two measurements were used to account for this: sensitivity and specificity. Sensitivity, Se, is defined as Se = (TP+FN)/AP, where TP: true positives, FN: false negatives and AP: all positives. In our case, AP are all the critical residues determined experimentally, TP are the critical residues correctly predicted and FN the critical residues not predicted as critical. Specificity, Sp, is defined as Sp = (AN−FP)/AN; where AN: all negatives and FP: false positives. In our case, AN are the non-critical residues determined experimentally and FP are the residues predicted as critical, which are not critical. Additionally, in order to compare the sensitivity of the predictions in paired comparisons (see
The
The transitivity values (Y-axis) obtained for each residue (X-axis) in the yeast TATA-Binding Protein (1TBP, chain B) are shown as rhombs. The values are ordered by transitivity value to facilitate the visual analysis of the data. The central residues are the most traversed residues that present the same frequency, and are presented as filled rhombs on the top right corner. That is, there are 6 residues with the largest transitivity value of 17 (Tyr139, Met121, Phe227, Ile212, Ile160, Leu175); the next lower transitivity value is 16 and also presents the same frequency (6 residues: Ile143, Val123, Ile70, Leu76, Ile223, Leu214) than those with transitivity value of 17; similarly there are 6 residues with transitivity value of 15 (Ile115, Ser136, Met104, Ile170, Leu234, Ile206). Note that residues with transitivity value of 14 have a frequency different than 6 and thus were not considered as central. Only the 18 residues with transitivity values of 17, 16, and 15 are considered central to the 1TBP structure.
(0.08 MB TIF)
We acknowledge the technical assistance received from the Information Technology core of the Instituto de Fisiologia Celular-UNAM and Alondra Solares for the compilation of experimental reports of TBP mutants.