The authors have declared that no competing interests exist.
Conceived and designed the experiments: JEF SvG RGH MAM HGW KRL. Performed the experiments: JEF. Analyzed the data: JEF SvG RGH MAM GMS HGW KRL. Contributed reagents/materials/analysis tools: JEF SvG HGW KRL. Wrote the paper: JEF SvG RGH MAM GMS HGW KRL.
A purely information theory-guided approach to quantitatively characterize protease specificity is established. We calculate an entropy value for each protease subpocket based on sequences of cleaved substrates extracted from the MEROPS database. We compare our results with known subpocket specificity profiles for individual proteases and protease groups (e.g. serine proteases, metallo proteases) and reflect them quantitatively. Summation of subpocket-wise cleavage entropy contributions yields a measure for overall protease substrate specificity. This total cleavage entropy allows ranking of different proteases with respect to their specificity, separating unspecific digestive enzymes showing high total cleavage entropy from specific proteases involved in signaling cascades. The development of a quantitative cleavage entropy score allows an unbiased comparison of subpocket-wise and overall protease specificity. Thus, it enables assessment of relative importance of physicochemical and structural descriptors in protease recognition. We present an exemplary application of cleavage entropy in tracing substrate specificity in protease evolution. This highlights the wide range of substrate promiscuity within homologue proteases and hence the heavy impact of a limited number of mutations on individual substrate specificity.
Proteases show a broad range of cleavage specificities. Promiscuous proteases as digestive enzymes unspecifically degrade peptides, whereas highly specific proteases are involved in signaling cascades. As a quantitative index of substrate specificity was lacking, we introduce cleavage entropy as a measure of substrate specificity of proteases. This quantitative score allows for straight-forward rationalization of substrate recognition by a subpocket-wise assessment of substrate readout leading to specificity profiles of individual proteases as well as an estimate of overall substrate promiscuity. We present an exemplary application of the descriptor ‘cleavage entropy’ to trace substrate specificity through the evolution of different protease folds. Our score highlights the diversity of substrate specificity within evolutionary related proteases and hence the complex relationship between sequence, structure and substrate recognition. By taking into account the whole distribution of known substrates rather than simple substrate counting, cleavage entropy provides the unique opportunity to dissect the molecular origins of protease substrate specificity.
Proteases catalyze cleavage of peptide bonds and are involved in virtually all fundamental cellular processes
Cleavage specificity is generally originating from distinct molecular interactions between substrate and enzyme. Simple cleavage rules for serine proteases only rely on the prominent P1-S1 interactions. For instance, the hydrophobic S1 pocket of chymotrypsin causes specificity for substrates providing hydrophobic residues at their P1 position. In contrast, an Asp residue in the S1 site of the homologous trypsin determines specificity for Arg and Lys at P1
Interactions between enzyme and substrate span several subpockets in the protease binding site. Experimental data shows that S2–S3 sites hardly affect substrate specificity in chymotrypsin
A plethora of experimental cleavage data for proteases is available in several databases. Cleavage information is generated experimentally by several methods reviewed by Diamond
Although cleavage information for known proteases is easily accessible, by now no attempt has been made to develop a quantitative measure for subpocket-wise and total protease specificity in contrast to pure feature extraction techniques as for example cascade detection
To generate subpocket-wise specificity entropies, cleavage data were extracted from the MEROPS database
Protease-wise cleavage sequence matrices were normalized according to the natural abundance of individual amino acids
Subpocket-wise substrate specificity information is of high interest to compare individual subpockets of a single protease and individual specifity profiles between proteases. To facilitate analysis of different proteases as a whole, a summation of individual subpocket cleavage entropies yields quantitative overall cleavage entropy per protease (see Formula 2). This total cleavage entropy over eight substrate positions in the central binding site region (P4 to P4′) allows for ranking of proteases with respect to their whole substrate specificities. Entropy values range from 0 for a single conserved substrate to 8 for a random distribution of amino acids in cleaved substrates.
Although cooperativity effects between subpockets were described for subtilisins
As only trypsin provides a sufficient data basis to study subpocket correlation effects with more than 14000 substrates listed in MEROPS, we performed an inter-subpocket correlation analysis only for this protease. The one-dimensional subpocket-wise cleavage entropy calculations presented above can directly be extended to a more-dimensional case yielding for two dimensions a pairwise cleavage entropy score depending on amino acids a and b at position i and j and their respective probabilities pa,i, pb,j.
This measure for inter-subpocket correlation effects yields as in the independent analysis (cleavage entropy) a score of 0 for a conserved single amino acid pair and a value of 1 for a distribution of amino acid pairs as expected by random chance from natural abundance
As part of the discussion, protease specificity is compared to evolutionary distance. Sequences downloaded from Uniprot
Protein structure visualizations were created with PyMOL
Entries with more than 100 annotated substrates in the MEROPS database represent 47 proteases comprise all major protease catalytic types. The three major protease catalytic types, serine, metallo and cysteine proteinases, covering more than 90% of known proteases
Serine proteases show pronounced specificity at the P1 substrate site occupying the characteristic deep S1 pocket with an averaged cleavage entropy as low as SP1 = 0.256 (see
Serine proteases and associated MEROPS clans sorted according to the number of known substrates n with their respective subsite-wise cleavage entropies Si. Specific subpockets showing a cleavage entropy equal or less than an arbitrary cutoff of 0.85 are highlighted in yellow, values lower than 0.5 indicating stringent specificity in red.
Subpocket-wise cleavage entropies mapped to the binding site region of trypsin (top) in a color spectrum of red (low, specific) over yellow to green (high, unspecific) highlight the central S1 pocket as only determinant of substrate specificity within the binding region S4-S4′ (left to right). By contrast, thrombin (bottom) binding the same small molecule inhibitor BIBR1109
All serine proteases in the test set show pronounced specificity in the P1-region, including even so-called unspecific proteases as trypsin binding to highly conserved arginine and lysine residues at the P1 site. An extension of this specific reading frame in both directions of the substrate is observed for example for thrombin and furin, where the latter protease shows extraordinary specificity at the P4 site independent of other specific residues. These lowered entropy values reflect the proposed Arg-Xaa-Lys/Arg-Arg consensus in the P4-P1-region for furin substrates
Metallo proteases in general show less intense subpocket-wise specificity patterns than serine proteases. Their substrate readout is most pronounced in the P1′ position with an average cleavage entropy of 0.703 (see
Subpocket-wise cleavage entropies Si of metallo proteases and associated MEROPS clans sorted by decreasing number of known substrates n. Specific pockets are highlighted in yellow and red according to their respective substrate promiscuity (yellow: 0.5<Si<0.85, red: Si<0.5).
We find matrix metallo proteases (MMPs) to differ in their substrate specificity from other members of the metallo proteases. Cleavage entropy calculation highlights the P1′ position as major determinant of specificity in MMP-2, hence named “specificity pocket”
Cysteine proteases are characterized by cleavage entropies comparable to serine proteasaes rather than metallo proteases. P1 interactions dominate substrate specificity with a cleavage entropy of SP1 = 0.630 similar to serine proteases (see
Cysteine proteases and associated MEROPS clans sorted according to the number of known substrates in MEROPS n. Subpocket-wise cleavage entropies Si are color-coded to highlight specific pockets in yellow (0.5<Si<0.85). Highly specific subpockets are shown in red (Si<0.5).
Caspases are shown to read conserved aspartate residues in P1 position with an extraordinarily high specificity (P1<0.05), a characteristic not present in all other cysteine proteases. Subsite specificity of apoptosis signaling caspases
Besides the three main classes of proteases, six further proteases with more than 100 cleavage patterns were found within MEROPS (see
Further proteases in the test set and associated MEROPS clans not belonging to the catalytic types cysteine, serine or metallo proteases sorted according to decreasing number of known substrates n. Specific subpockets (subpocket cleavage entropy 0.5<Si<0.85) are shown in yellow, highly specific pockets (Si<0.85) in red. Five aspartic proteases are marked with ‘*’.
The signal peptidase complex is a membrane-bound protease involved in membrane translocation signaling
All five aspartic proteases are found to depend mostly on P1 interactions with an average SP1 = 0.768. Other subpockets in P- and P′-region tend to exhibit likewise unspecific substrate binding (SP4-P1 = 0.892, SP1′-P4′ = 0.909). HIV retropepsin, a prominent target in drug design, shows distinct specificity at P2′ position with SP2′ = 0.768 supporting findings of Schilling et al.
Aspergilloglutamic and scytalidoglutamic peptidase are added to the data set though sparse cleavage data to cover the group of glutamic peptidases represented by the members with highest number of annotated subtrates (68 and 37 respectively). Aspergilloglutamic and scytalidoglutamic peptidase provide two examples of variable cleavage profiles amongst the same protease class: Whereas the P1 position shows nearly identically lowered cleavage entropies, scytalidoglutamic peptidase reads substrate residues over the whole range of eight covered subpockets in contrast to aspergilloglutamic peptidase not showing pronounced substrate preferences at other subpockets than P1.
Summing up previous findings, average subpocket cleavage entropy profiles were calculated for protease catalytic types (see
Subpocket-wise cleavage entropy profiles for protease catalytic classes reveal distinct substrate readout patterns for each of the protease groups. Serine proteases show most prominent subpocket specificity at the S1 site, whereas metallo proteases show specific binding behavior over a larger part of the binding pocket S4-S4′.
Summation of subpocket-wise cleavage entropies yields a total estimate of protease specificity (see
Ranking of 49 proteases with respective MEROPS clan (including the added 2 glutamic proteases) in respect to their total cleavage entropy Scleavage. Specific proteases (SCleavage<6.8, corresponding to an average subpocket cleavage entropy Si of 0.85 over eight investigated subpockets)are highlighted in yellow. No protease in the core test set of 47 proteases is found to be highly specific (SCleavage<4.0, reflecting an average Si of lower than 0.5 over the whole binding site region of S4-S4′). Scytalidoglutamic peptidase present in the extended test set exhibits such strict substrate cleavage with a total cleavage entropy SCleavage of 2.932 owing to substrate recognition spreading over 7 highly specific subpockets (compare
Proteases span a wide range of substrate specificites directly related to their biological roles. Ranking of the protease test set in respect to overall cleavage entropy SCleavage thus yields a clear separation between unspecific digestive proteases and specific proteases involved in signaling pathways. The protease with highest observed cleavage entropy SCleavage = 7.528, thermolysin, is involved in bacterial nutrition by unspecificly degrading exogenous peptides
An exemplary analysis of inter-subpocket correlation was carried out based on over 14000 trypsin substrates listed in MEROPS (see
We proved cleavage entropy calculation as an intuitive approach to assess protease specificity quantitatively. In a first application of the presented score metric, we dissect the protease test set into groups of common cleavage machinery groups to elucidate potential descriptors of protease substrate specificity. This split yields four separate groups indicating distinct catalytic function: serine, metallo, cysteine and aspartic proteases (see
Protease cleavage entropies indicate specific as well as unspecific members for each of the investigated protease catalytic machineries. As cleavage entropies (indicated by averages, maxima, minima and standard deviations) overlap between each of the types, the catalytic mechanism is found not to determine substrate specificity.
Strikingly, both extrema on the presented quantitative protease specificity scale for the core set of 47 proteases represent members of the metallo proteases (thermolysin and neurolysin respectively). This indicates that the catalytic cleavage machinery cannot be the major determinant of substrate specificity. Similarly, serine proteases including the prominent digestive enzymes trypsin, chymotrypsin, elastase as well as signaling peptidases kexin and furin show diverse substrate specificity. Solely the smaller sample of five aspartic proteases shows predominantly unspecific cleavage behavior with an average total cleavage entropy of SCleavage = 7.205 compared to an average of SCleavage = 6.608 for the other catalytic types. Other protease classes do not show significant differences in their substrate specificity (serine proteases: average SCleavage = 6.433, metallo proteases: average SCleavage = 6.652, cysteine proteases: average SCleavage = 6.820). All protease types except for aspartic proteases therefore include specific as well as unspecific members. Thus, our study underlines the broadly accepted finding that protease substrate specificity is determined by subpocket interactions of the protease rather than directly at the catalytic site.
As apparent from
Splitting of protease catalytic types into homologous protease clans allows to separate specific from unspecific members although they share a catalytic mechanism. Clan-wise total cleavage entropies are shown for MEROPS clans PA, SF, SB (all serine proteases), MA (metallo proteases), CA, CD (both cysteine proteases) and AA (aspartic proteases) with indicated averages, maxima, minima and standard deviations.
Surprisingly, subdivision into homologue clans allows to subdivide proteases sharing the same catalytic mechanism into specific and unspecific subgroups. Cysteine proteases are divided into a more specific clan CD (average SCleavage = 6.020) and a relatively unspecific clan CA (average SCleavage = 7.163). Only caspases, known to be highly specific signaling proteases
The same subdivision into specific and unspecific folds works for serine proteases that comprise clans of high specificity (clan SB: average SCleavage = 5.429), intermediate specificity (clan SF: average SCleavage = 6.370) as well as less specific proteases (clan PA: average SCleavage = 6.779). Standard deviations of cleavage entropies calculated within clan members are low (see
Thus, the whole structure of protease clans has to be considered to shed light on the molecular origins of general protease cleavage spectra. Consistently, single mutations within specificity pockets of proteases are known to shift substrate spectra to other preferred substrates rather than to interchange specific and non-specific cleavage behavior. Nevertheless, a smooth interchange between specific and unspecific behavior including specialization and despecialization steps has been shown in case of granzymes
Further tracing the evolutionary development of protease specificity into particular protease clans arises the question, if evolutionary distance at sequence level is related to substrate specificity in these groups with conserved three-dimensional fold. Therefore, we performed a phylogenetic analysis for individual protease clans with more than five members contained in the test set (see
Scattering of specific and unspecific behavior over respective evolutionary distances is apparent from the color-coded total cleavage entropy in all protease clans. Red fields indicate specific substrate recognition, whereas green fields mark unspecific proteases.
Divergent evolution towards specific as well as unspecific members can be identified within all protease clans. Whereas a phylogenetic tree of metallo proteases of clan MA groups the highly specific members neurolysin and thimet oligopeptidase in a separate branch, indicating a close interplay between evolutionary distance and substrate specificity, this observation can not be extended to the whole set of proteases. The opposite holds even true in the MA clan for M10 family, where specific and unspecific members are grouped almost randomly compared to their evolutionary distance. The same complex behavior is found for cathepsins in clan CA: This branch includes the most specific member cathepsin L1 as well as the least specific member cathepsin K. Nevertheless, these members are grouped in closely related taxa indicating evolutionary proximity. Evolutionarily closely related proteases exhibit diverse substrate promiscuity in this protease group. Hence, protease evolution is capable of rapidly interchanging specific and non-specific substrate binding, implying a complicated relationship between protease sequence and substrate specificity.
The largest group of serine proteases of clan PA also groups specific and unspecific members in related taxa. E.g., cathepsin G and granzymes B of human and rodent origin exhibiting major different cleavage behavior are found as subbranch of closest evolutionary relation. Similarly, a branch including the rather specific signaling protease plasmin as well as the unspecific digestive enzymes trypsin 1 and chymotrypsin A, the most promiscuous members of this family, are grouped in close evolutionary proximity.
We therefore surmise that a detailed understanding of protease specificity is only in reach within an even smaller subset of homologue proteases, where changes in substrate specificity can be attributed to a limited set of amino acid mutations, and hence atom exchanges, in the binding region. We propose to join forces between computational and experimental groups to elucidate structural hot-spots crucial for binding specificity in particular protease folds. According to the observed small fluctuations in specificity within respective clans, a smaller set of homologous proteases should be suitable to allow such in-depth investigations.
The presented specificity metric “cleavage entropy” for proteases can be applied to map subpocket-wise specificity contributions based on experimental data to individual subpockets of proteases as well as to calculate an estimate of overall substrate specificity. Furthermore, the extension of subpocket-wise cleavage entropies to pairwise cleavage entropies facilitates the detection of subpocket cooperativities in proteases provided that a sufficient number of substrates for this two-dimensional analysis is known. Thereby, drug design targeting proteases will profit from a thorough understanding of specific interactions to achieve desired protease selectivity
A straight-forward interpretable specificity score generally applicable to all families of proteases was presented that confirms widely accepted rules of thumb for protease cleavage in a quantitative way. Calculated cleavage entropies purely based on amino acid frequencies in known substrates allow a straight-forward assessment of subpocket-wise substrate specificities. According to our specificity metric, the catalytic cleavage machinery and thus, protease class, does not discriminate specific and unspecific proteases. In contrast, homologue protease clans share intrinsic specific and non-specific properties suggesting that protease specificity is encoded directly in the shared three-dimensional protein fold. Within particular protease clans and folds, a small number of mutations can cause drastic alterations of substrate specificity. These subtle changes at sequence, structure and flexibility level, but heavily impacting substrate promiscuity, are thus of high interest for structural biology but challenging to predict.
Unlike classical rules-of-thumb for protease specificity, the quantification of subpocket-wise and overall substrate specificity provides a continuous metric for specificity rather than a ‘yes’-or-‘no’ decision. The provided quantitative measure thus facilitates the comparison of the macromolecular descriptor “substrate specificity” with physicochemical, evolutionary and structural descriptors in protease recognition. Mapping of specificity to subpockets allows for intuitive visualization of structure-selectivity relationships in proteases and will thereby support the establishment of rules linking local protein structure and specificity.
(PDF)