ACC and AL were paid employees of Amgen, Inc., and KAL was a paid employee of Schrodinger, LLC. This does not alter our adherence to the PLOS policies on sharing data and materials.
Conceived and designed the experiments: ACC KAL. Performed the experiments: ACC KAL AL. Analyzed the data: ACC KAL AL. Contributed reagents/materials/analysis tools: ACC KAL AL. Wrote the paper: ACC KAL. Reviewed the manuscript: ACC KAL AL.
Current address: Stem CentRx, South San Francisco, California, United States of America
Advances reported over the last few years and the increasing availability of protein crystal structure data have greatly improved structure-based druggability approaches. However, in practice, nearly all druggability estimation methods are applied to protein crystal structures as rigid proteins, with protein flexibility often not directly addressed. The inclusion of protein flexibility is important in correctly identifying the druggability of pockets that would be missed by methods based solely on the rigid crystal structure. These include cryptic pockets and flexible pockets often found at protein-protein interaction interfaces. Here, we apply an approach that uses protein modeling in concert with druggability estimation to account for light protein backbone movement and protein side-chain flexibility in protein binding sites. We assess the advantages and limitations of this approach on widely-used protein druggability sets. Applying the approach to all mammalian protein crystal structures in the PDB results in identification of 69 proteins with potential druggable cryptic pockets.
Advances reported over the last few years and the increasing availability of protein crystal structure data have greatly improved structure-based druggability approaches. These algorithms predict our ability to discover small molecule drugs for protein targets and can help in identifying promising new biological targets for small molecule drug discovery. However, in practice, nearly all druggability estimation methods are applied to protein crystal structures as rigid proteins, with protein flexibility often not directly addressed. The increasing interest in finding small molecule drugs to protein-protein interfaces makes this issue particularly acute since these interfaces tend to have substantial flexibility compared to traditional enzyme targets. Here, we apply an approach that accounts for light protein backbone movement and protein side-chain flexibility in protein binding sites. We present the results of applying this method to all publicly available mammalian protein crystal structures.
The majority of small molecule drug discovery efforts towards new, unprecedented biological targets do not progress past high-throughput screening or hit-to-lead optimization due to lack of pursuable chemical matter
In a drug discovery setting, small molecule druggability is commonly defined as whether a small molecule can bind a desired biological site with good, nanomolar range potency, and, at the same time, also have good, drug-like properties conducive to oral bioavailability and clinical progression
Druggability estimation has historically been based on precedence, that is, whether there are known drugs targeting the protein or one of its homologs
Many current structure-based methods for druggability estimation are remarkably accurate if the potential small molecule binding site is largely rigid
To begin to address these sites, we apply an approach to modeling conservative movements in pockets using comparative protein modeling approaches coupled closely with structure-based druggability analysis. The approach models relatively light protein motions, involving side-chain flexibility and local protein backbone movements, and maintains reasonable prediction accuracy in retrospective validation studies. It allows us to take pockets that a rigid-protein druggability analysis would deem to have some drug-like properties, but not have sufficient drug-like size, and assess whether local protein motion can result in the pocket having all the drug-like properties, including drug-like size. The approach is computationally efficient enough to enable mining of the structural proteome while taking into account light protein flexibility. Applying the method to roughly 18,000 mammalian protein crystal structures in the PDB results in prediction of one percent of proteins as containing likely druggable cryptic pockets.
We combine a druggability scoring model with protein modeling and docking methods to first identify candidate pockets that have drug-like physicochemical properties but may lack sufficient drug-like size, and then seek out energetically accessible side-chain and backbone motions near these potential pockets using protein modeling approaches.
For determining whether a target pocket has drug-like physiochemical properties, we use an adaptation of a validated druggability score
The method consists of the three steps depicted in
Multiple pockets per protein are considered, but one is shown here for simplicity.
We chose naphthalene and a larger tetra-substituted naphthalene because they are hydrophobic and aromatic—known features of drug-like molecules. Additionally, naphthalene is rigid so docking is fast. The tetra-substituted naphthalene molecule we use is a natural progression from naphthalene, and includes four substituents (ethyl, propyl, and cyclohexyl at two positions) that we thought could help in opening pockets. These simple-minded choices performed reasonably well in validation studies, and limited experimentation with a few other molecules gave similar or worse results. In particular, use of benzene in place of naphthalene resulted in a large false positive rate because benzene is small and much more promiscuous, fitting into many small sites. For the larger molecule, we tried five molecules similar to tetra-substituted naphthalene and the results were not substantially different. It is certainly plausible that more systematic experimentation with a larger number of ligands could result in improved performance.
Other approaches that address the issue of protein flexibility for druggability assessment use computational solvent mapping or molecular dynamics simulations, or both.
Our approach to flexibly treating potentially druggable binding sites is substantially less compute-intensive, which is important for our goal of analyzing the structural proteome. For a single binding site where flexibility is modeled, our approach requires between one and two hours for most individual protein structures on a current scientific workstation with a four core CPU. In contrast, a 30 ns molecular dynamics simulation on a single protein would require about a week, and the computational solvent mapping approach using FTMAP requires about half a day for each protein binding site since FTMap must be run for each discrete side-chain configuration and each configuration requires about two hours on a single CPU core
While accurate modeling of protein motion continues to be difficult and the subject of substantial research, we found the approach we present here to be sufficiently accurate and efficient for the purposes of mining the structural proteome. We note that previous efforts we are aware of to identify the “druggable genome” rely on sequence-similarity to known druggable proteins
We applied the method to two widely-used druggability validation sets to check its performance and measure any increase in false positive rate due to allowance of protein flexibility.
The first validation set is a published set that covers a variety of targets, and consists of 27 targets: 17 druggable targets and 10 difficult targets
The top panel depicts results with the original crystal structures used rigidly, with a red line indicating the Dscore+>1.3 cutoff used in this work. The top Dscore+ value is shown for each of the 27 protein targets (17 druggable and 10 difficult, where the prodrug targets are considered difficult). The bottom panel depicts results after modeling of protein flexibility, with difficult targets in ghost outline because flexibility modeling is not usually applied to sites that score below the Dscore+>1.3 cutoff. Difficult targets are indicated by the lighter bars, while druggable targets are indicated by the darker bars. See text for further discussion.
We also investigated the effect of flexibility modeling on targets with scores of Dscore+≤1.3. For these additional targets, we again find an increase in scores by an average of 0.4 (σ = 0.3). Thus, difficult and druggable targets in the validation set can still be distinguished after flexibility modeling, although the distinction is less crisp than it was when scoring rigid structures. Comparing the score distributions for druggable and difficult targets using the two-sample Kolmogorov-Smirnov (K-S) statistic finds that the scores are significantly different from each other, both with and without flexibility modeling (
Taken together, the results suggest that a Dscore+ threshold of 1.7 (i.e., 1.3+0.4) should be applied to sites resulting from flexibility modeling, and this threshold is depicted in
Turning to pocket volumes, the method should not lead to all pockets increasing significantly in volume, consistent with the belief that some sites are inherently flexible while others are less so. In this first validation set, which is composed largely of enzyme active sites, the average volume before and after flexibility modeling are both about the same (420 Å3 and 360 Å3, respectively, with standard deviations of 190 Å3 and 130 Å3), and are both within the drug-like range, as discussed later. In the second, validation set of protein-protein interfaces, we will see that the binding site volumes change more significantly. Resulting volumes tend towards a volume of around 300–400 Å3, if the flexibility of the protein allows, and this appears to be related to the size of the second ligand (tetra-substituted naphthalene) used in the induced-fit docking step of the flexibility modeling. In the mammalian proteome analysis, we find that only one percent of proteins analyzed have cryptic pockets that change substantially from a volume substantially below the drug-like range (≤100 Å3) to a drug-like volume (160–800 Å3). In developing our approach, the drug-like volume range was initially set roughly to 150–600 Å3 based on our judgment, and later refined to 160–800 Å3 based on quantitative analysis of the mammalian proteome results.
The second validation set addresses protein-protein interaction (PPI) targets, and includes six targets from the 2P2I database and Wells et al. (2007): Bcl-xL, HDM2, IL-2R, HPV E2, ZipA, TNFα
Target name | PDB ID | Ligand type | Assigned druggability | Crystal pocket | Flexible model | ||
Dscore+ | Volume | Dscore+ | Volume | ||||
Bcl-xL | 2bzw | protein | druggable | 112 | |||
Bcl-2 | 2xa0 | protein | druggable | ||||
HDM2 | 1ycr | protein | druggable | ||||
TNFα | 1tnf | protein | difficult | 126 | |||
IL-2Rα | 1z92 | protein | difficult | 0.9 | 49 | * | * |
HPV E2 | 1tue | protein | difficult | 0.8 | 57 | * | * |
ZipA | 1f46 | protein | difficult | 0.7 | 105 | * | * |
ZipA | 1f47 | protein | difficult | 0.9 | 141 | * | * |
Bcl-xL | 2yxj | cmpd | druggable | 141 | |||
Bcl-xL | 3qkd | cmpd | druggable | 132 | |||
Bcl-xL | 4ehr | cmpd | druggable | 100 | |||
Bcl-2 | 4aq3 | cmpd | druggable | 113 | |||
HDM2 | 1rv1 | cmpd | druggable | 147 | |||
HDM2 | 1t4e | cmpd | druggable | ||||
HDM2 | 3jzk | cmpd | druggable | ||||
HDM2 | 3lbk | cmpd | druggable | 150 | |||
HDM2 | 3lbl | cmpd | druggable | ||||
HDM2 | 3tu1 | cmpd | druggable | ||||
HDM2 | 4dij | cmpd | druggable | ||||
HDM2 | 4ere | cmpd | druggable | ||||
TNFα | 2az5 | cmpd | difficult | ||||
TNFR1 | 1ft4 | cmpd | difficult | 1.2 | * | * | |
IL-2Rα | 1py2 | cmpd | difficult | 1.2 | 66 | * | * |
IL-2Rα | 1pw6 | cmpd | difficult | 73 | 72 | ||
HPV E2 | 1r6n | cmpd | difficult | 1.2 | 95 | * | * |
ZipA | 1y2f | cmpd | difficult | 0.9 | 91 | * | * |
ZipA | 1y2g | cmpd | difficult | 0.9 | 96 | * | * |
These targets have
We also analyzed all targets listed in 2P2I where crystal structures are provided, but some targets have either unclear experimental druggability because efforts on the targets are more recent, or known inhibitors involve metal chelation. The results for these additional targets are included in
Comparing Dscore+ and pocket volume calculation results with and without protein flexibility modeling finds that Bcl-xL (PDB IDs: 2bzw, 2yxj, 3qkd, 4ehr) and a minority of HDM2 structures (PDB IDs: 1rv1, 3lbk) would have been missed without the additional flexibility modeling to open up pockets to a drug-like volume. Interestingly, one IL-2Rα structure (PDB ID: 1pw6) has a Dscore+ that places the target in the low end of the druggable score range, but the pocket volume does not satisfy the drug-like criteria, and this remains the case after protein flexibility modeling. Protein flexibility modeling does not always open pockets significantly.
With TNFα, the known pocket at the trimer interface was identified as the top pocket in the apo-structure, and flexibility modeling resulted in a binding site with good druggability score and good drug-like volume. This result is consistent with the scores obtained using the co-complex structure with SPD-304
With Bcl-xL, comparison of a BAD peptide-bound structure (PDB ID: 2bzw) with a small molecule-bound structure (PDB ID: 2yxj) shows that two residues, Phe105 and Leu130, adopt alternate conformations, and the helix around Leu108 becomes disordered to create the ligand binding pocket
Phe105, Leu108, and Leu130 are shown in stick in all structures. (a) Crystal structure of Bcl-xl protein bound to a BAD peptide (red and gray, respectively, PDB ID: 2bzw), (b) two naphthalene induced-fit docked models (orange), (c) one TSN induced-fit docked model (green), and (d) ABT-737-bound crystal structure (blue, PDB ID: 2yxj) with TSN induced-fit model (green).
To investigate the behavior of the flexibility modeling approach on targets where protein flexibility is known based on crystal structures, we applied the method to a set of protein crystal structures with binding site flexibility from
Target | Structural data | Docking-based druggability | Protein flexibility | Variation | ||
PDB ID | RMSDave (Å) | [A] dock hit rate | [B] DScore+ | [A] | [B] | |
CDK2 | 1aq1 | 1.32 | 1.7 | 21% | 11% | |
1buh | 1.8 | 1.44 | 1.7 | |||
1dm2 | 1.8 | 1.62 | 1.9 | |||
ER | 1l2i | 1.69 | 2.9 | 9% | 7% | |
3ert | 2.6 | 1.55 | 2.7 | |||
1err | 2.0 | 1.61 | 2.8 | |||
HIV RT | 1vrt | 1.66 | 2.5 | 8% | 13% | |
1rt1 | 1.5 | 1.75 | 2.3 | |||
1c1c | 1.9 | 1.61 | 2.2 | |||
1rth | 1.6 | 1.61 | 2.3 | |||
p38α | 1a9u | 1.00 | 1.8 | 49% | 15% | |
kinase | 1kv1 | 3.8 | 1.16 | 2.1 | ||
1kv2 | 3.5 | 1.61 | 2.1 | |||
PPARγ | 1fm6 | 1.46 | 2.9 | 13% | 34% | |
1fm9 | 1.5 | 1.62 | 3.0 | |||
2prg | 0.7 | 1.43 | 2.1 | |||
TK | 1kim | 1.58 | 2.7 | 12% | 4% | |
1ki4 | 1.8 | 1.40 | 2.6 | |||
IL-2 | 1z92 | 0.13 | * | 107% | ||
1py2 | 2.6 | 0.62 | * | |||
1m48 | 2.5 | 0.62 | * | |||
Bcl-XL | 2bzw | 1.04 | 2.4 | 21% | 4% | |
2yxj | 2.5 | 0.84 | 2.5 | |||
TNF | 1tnf | 0.95 | 2.4 | 1% | 18% | |
2az5 | 2.9 | 0.96 | 2.0 | |||
MDM2 | 1ycr | 0.45 | 2.5 | 69% | 18% | |
1rv1 | 1.8 | 0.92 | 2.2 | |||
1t4e | 1.6 | 0.66 | 2.1 | |||
HPV E2 | 1tue | -0.24 | * | 323% | ||
1r6n | 2.8 | 1.02 | * |
Targets are from
Examining the variation in scores between different crystal structures of the same target finds that while both the static protein and flexible protein methods yield similar score variation for non-PPI targets, they have substantially different variation with PPI targets. In particular, the docking hit-rate method shows large variation in score among structures of IL-2 (107%), MDM2 (69%), and HPV E2 (323%) compared with a median variation of 21% in all 11 targets. The flexibility modeling method, on the other hand, results in score variation on PPI targets that is consistent with that found with non-PPI targets.
Overall, the docking method has a median score variation of 21% with a standard deviation of 94% in the dataset, while the flexibility modeling approach has a median score variation of 13% with a standard deviation of 10%. Yet, when the PPI targets are removed, the two methods have comparable score variation. Taken together, the flexibility modeling method appear to provide more reliable, consistent predictions at PPI interfaces, and this makes sense because PPI interfaces are much more likely to involve substantial protein flexibility
We next applied the flexibility modeling approach to all publicly-available crystal structures containing mammalian proteins to estimate the number of druggable targets and identify potential druggable cryptic pockets. Analyzing the over 18,000 structures in the Protein Data Bank (PDB)
The results are summarized in
Structures | Proteins | ||
mammalian crystal structures | 18,879 | 5,807 | 105% |
structures analyzed | 17,834 | 5,551 | 100% |
flexibility modeling applied |
7,427 | 2,875 | 52% |
potentially druggable |
5,739 | 1,134 | 20% |
involves intermolecular interface |
2,095 | 730 | 17% |
cryptic pocket |
105 | 69 | 1% |
The “Structures” column provides the number of unique PDB entries represented, and “Proteins” represents the number of unique Swiss-Prot entries.
Dscore+>1.3 for original rigid structure.
Dscore+≥1.7, drug-like volume (160–800 Å3) after flexibility modeling, protein at least 100 amino acids in size (equivalent to about 10 kDa molecular weight).
Intermolecular interfaces are further defined to include only protein-protein interaction dimer interfaces and protein-ligand pockets.
Cryptic pockets are further defined as pockets that are less than 100 Å3 in volume in the crystal structure, but fall into the drug-like volume range after modeling of protein flexibility. An additional criteria of enclosure <96% was applied to eliminate small buried sites.
To identify druggable pockets with the greatest likelihood of biological relevance, we winnowed the list to protein sites in 2095 PDB structures where a small molecule could potentially disrupt a known intermolecular interaction. The interacting partner should be transiently-bound (as opposed to obligately-bound) and can be a protein, natural co-factor, natural ligand, or synthetic ligand. These sites are either at protein-protein interfaces or contain a small molecule of molecular weight less than 1000 Da. Including these criteria gives us higher-confidence druggability predictions and may remove many false-positives, but could result in removing sites that are functionally relevant but perhaps not well-characterized. For sites at protein-protein binding interfaces, we assessed whether the relevant protein-protein interaction is an obligate or transient interaction based on a published database, Interevol
Overall, we identified predicted druggable pockets in 2,095 PDB structures representing 730 unique proteins. In
A) Volumes of MDDR protein structure pockets. B) For pockets with volumes between 160 and 800 Å3, Dscore+ distribution of MDDR protein structures (top) and all protein structures (bottom). C) Range of Dscore+ after flexibility modeling of MDDR protein structure pockets supports a modified 1.7 Dscore+ cut-off after flexibility is applied.
To identify cryptic pockets, we looked at potential druggable pockets that were small (volume ≤100 Å3) in the static structure, as long as the initial cavity was not fully buried (enclosure ≤96%). Less than 20% of these, representing 105 structures, met the flexible druggability criteria, opening up to at least 160 Å3 with flexibility modeling. These targets representing 69 unique proteins are provided in
To compare the mammalian PDB results to a positive control set, we mapped known oral drugs from MDDR (2008 release) to ligands in known crystal structures. Of the 421 oral drugs administered in tablet form, 109 could be mapped to PDB co-crystal structures that had crystallographic resolution ≤2.5A. The 102 pockets with ligand overlap to known co-crystalized ligands (ligand overlap >0) are plotted by volume in
A 2D histogram showing all pockets found in the mammalian structural proteome with initial volumes less than 800 Å3 and with greater than 100 amino acids (about 10 kD in weight). The vertical and horizontal white lines indicate the 160 Å3 volume cut-off. While the modeling method likely overpredicts volume increases in pockets, the majority of pockets that increase in volume increase by less than 50 Å3. The color bar on the right side indicates the number of pockets at each 2D histogram bin.
The range of druggability scores for the known oral tablet drug set versus all pockets is shown in
To assess the effect of our flexibility modeling approach on pocket volumes, we looked at all pockets at intermolecular interfaces before and after flexibility modeling and show the results in
Pockets found where a bound ligand would disrupt a protein-protein interaction are shown in the blue circle, and pockets found where a bound ligand would disrupt a protein-ligand interaction are shown in red. The overlap region shown in purple indicates where a protein or structure contains a pocket at both a protein-ligand and protein-protein interface. The blue region indicates proteins or structures containing only protein-protein interfacial pockets, and the red region indicates proteins or structures containing only protein-ligand pockets. Ligands are defined as any molecule with molecular weight ≤1000 kDa.
In
While the analysis provides a good set of putative druggable proteins in the mammalian structural proteome, we are not blind to deficiencies in this analysis. The prediction error rate in the large mammalian structural proteome analysis is hard to know, and we discuss the limitations in the next section.
The automated approach to protein flexibility we report here is useful for identifying druggable targets in the structural proteome. We are aware of three areas for further improvement.
The first is related to pocket selection. Pocket selection is based on geometric considerations, and the pockets are subsequently scored for druggability using Dscore+ as well as, potentially, protein flexibility modeling. Ideally, the pocket selection and scoring would be done simultaneously to yield pockets that maximized the druggability score
(a) gray spheres defining the full automatically identified site, (b) purple spheres depicting the edited subsite. The crystal structure is of AMP-bound PDE-4D.
A second area relates to the false positive rate, that is, the fraction of pockets identified as druggable that are not truly druggable. Even though we restrict protein flexibility to side-chain motion and localized backbone movement, the protein flexibility modeled and our selection of proble molecules are biased towards increasing the hydrophobicity of the pocket under analysis, and relaxation of the resultant structures may improve results. In addition, the degree of protein flexibility modeled is probably more than that present in reality. In this work, we empirically compensated for these issues by measuring the impact of flexibility modeling in
Lastly, we need to consider that the protein structures observed in crystal structures, in a minority of cases, may not be the biologically relevant constructs or complexes. Crystal structures may be synthetic constructs or portions of proteins, which, in the context of the full-length protein, have predicted binding sites occluded. Similarly, biological obligate dimers not seen in the crystal structures can occlude the binding site. Co-factors can also affect the druggability of binding sites; here, we only account for selected, particularly tight-binding co-factors such as metals and hemes. We analyze both biological assemblies defined in the PDB as well as the individual monomer components to account for binding to intact complexes as well as unbound partners. Other partially dissociated complexes may exist however. In addition, we are looking at binding, and not functional effects of binding; weak binding at an allosteric site is sometimes sufficient to generate the desired inhibition or activation of biological activity
We leverage advances in druggability assessment and modeling of protein flexibility to create an approach that allows light flexibility in the protein backbone and side-chains. The method improves the accuracy of druggability assessments when tested on two validation sets representing general pharmaceutical targets and protein-protein interactions of pharmaceutical interest. Combining this with the wealth of crystal structures available in the PDB allows us to find new protein binding sites that are potentially druggable by small molecules. Searching for such sites is thought to be like finding needles in a large haystack, and a systematic, automated approach is thus useful. Accurate modeling of protein flexibility continues to be difficult and the subject of substantial research. Even so, our approach is useful in exploring induced druggable pockets and provides a substantial number of hypotheses. For applications focused on analysis of protein pockets, the approach we take is computationally efficient and may be complementary to comprehensive analyses of static crystal structures
Protein structures were downloaded from the biounits repository at the RCSB based on criteria that the structure (1) contains protein, (2) is categorized as deriving from the class
Ligands, defined as having molecular weight ≤1000 Da, are removed with the exception of heme groups, zinc, and magnesium (PDB het groups HEM, MHM, HEV, VER, SRM, HEO, HEB, HEC, HDM, HDD, DDH, ZN, MG). Protein structures were prepared using Schrödinger Protein Preparation Wizard (version 2012, Schrödinger LLC, New York, NY), on the command line with the following options: –watdist 0, –fillsidechains, –rehtreat, –mse, –noepik, –noimpref. These options assign bond orders, add hydrogens, remove all waters, create zero-order bonds to metals, create disulfide bonds for close cysteines, mutate selenomethionines to methionine, fill in any missing side-chains with Prime (v3.1, Schrödinger LLC, New York, NY), and optimize hydrogen placement and polar residue flips using PropKa. Validation test runs using restrained minimization to a heavy atom RMSD of 0.3 Å, a procedure known as “Impref”, did not change which sites were found and did not significantly change druggability scores on the validation dataset proteins, so we chose to increase workflow speed by avoiding this step in the protein preparation.
Next, initial potential druggable surface patches were identified using Schrödinger SiteMap (v2.6, Schrödinger LLC, New York, NY), the results of which are used to compute Dscore+. We run Sitemap with a fine grid (0.35 Å spacing) and “loose” definition of hydrophobicity. In this study, all calculations were performed from the command line with options that return the 5 largest SiteMap sites, in order of the number of site points they contain. Our modified settings allow more shallow binding sites to be found and include binding site regions with slightly weaker vdW interaction energy. We used the following non-default Sitemap parameters: maxdist = 10, enclosure = 0.4, maxvdw = 1.0, dthresh = 5.0, mingroup = 7, nthresh = 7, grid = 0.35, modphobic = 0. The smaller value of maxvdw (default is 1.1 kcal/mol) and the less restrictive definition for modphobic of zero together allow gridpoints with slightly weaker vdW interaction energy to be included as sitepoints. The smaller enclosure score (default is 0.5) and larger maxdist value (default 8.0 Å) allow more shallow binding sites to be found. The enclosure score is computed by drawing radial rays from each sitepoint, and the score is the fraction of rays that strike the receptor surface within a distance of 10 Å (maxdist), averaged over the sitepoints. Decreasing dthresh from the default (6.5 Å) and increasing nthresh from the default (3) causes SiteMap to return smaller, more compact sites than it otherwise would when using a fine grid. When considering a gridpoint for inclusion in a site, there must be at least nthresh other points within 1.76 Å (square root of d2thresh) for it to be considered. When considering whether two sites should be joined, the closest points in the two sites must be closer than dthresh. The parameter, “mingroup”, is the only parameter here that limits the number of sites found; this is the minimum number of points in a site-point group required to constitute a site (default = 7). We found that including sites with less than seven points in combination with a fine grid of 0.35A resulted in merging of many very small pockets to form long, stringy sites that were not realistic as small molecule binding sites. Overall, these modified SiteMap settings allow us to find shallow pockets with less hydrophobic character than is possible to find with default settings.
From the SiteMap results, sites identified with a druggability score, Dscore+, of greater than 1.3 are taken as candidate sites regardless of volume, where Dscore+ is defined as Dscore + 0.3*hydrophobic, as previously described
To identify binding sites with potential flexibility, we used an iterative protein-modeling and docking approach
The tetra-substituted naphthalene compound, 2, was designed to facilitate opening of pockets.
To analyze the druggability and protein-protein interaction validation data, we automatically compared each SiteMap site to the corresponding ligand-bound structure using the Phase (version 3.4, Schrödinger LLC, New York, NY) command-line utility phase_volcalc to compute the overlap (measured in Å3) between the SiteMap sitepoints and the bound crystal ligand. After the IFD steps, we used the same utility to compute the overlap between the tetra-substituted naphthalene and the bound crystal ligand. This value is positive when there is direct overlap between the two sets of atoms. For the validation studies only, we identified the relevant protein biological assembly based on the known literature, and only retained those assemblies or protein monomers that are biologically meaningful. The calculations were otherwise performed automatically.
For calculations run on all mammalian PDB structures, we used a purely automated procedure applying the method to the first “biological unit” as defined in the PDB. Calculations were performed on commodity cluster hardware running RedHat Enterprise Linux. Failed calculations were re-run up to five times, including at both Amgen and Schrödinger facilities, to ensure that failures were not the result of compute infrastructure issues. To identify protein-protein interaction interfaces, we checked whether any of the TSN molecules modeled into a predicted druggable site also overlapped with another protein chain in the crystal structure. Overlap was defined as at least one atom of the TSN molecule being within 2 Å of the additional protein chain, where hydrogens were included. To identify protein-ligand interfaces, we used the previously-described volume overlap calculation. Finally, to analyze the results of the mammalian proteins in the PDB for obligate dimers, we used the Interevol database, publicly available at
To map MDDR drugs to PDB co-crystal structures, we first identified all oral drugs in MDDR that were annotated as ‘marketed’ and delivered orally as tablets or pills. PipelinePilot (ver. 8.5., Accelrys Software, San Diego, CA) was used to identify identical compounds based on structural identity when compared with the SMILES strings included in the HET code file downloaded from RCSB LigandDepot
Matlab version 7.9 (R2009b, The Mathworks Inc., Natick, MA) was used to generate
Calculations were performed on Intel Xeon CPU (2.7GHz) multi-core processors running RedHat Enterprise version 6. CPU timings quoted in the paper are per single core.
We thank Nigel Walker, Philip Tagari, Yax Sun, Paul Kassner, Mike Ollman, and Astrid Ruefli-Brasse at Amgen, and Woody Sherman and Alessandro Monge at Schrödinger for their support and helpful discussions.
(DOCX)
(DOCX)
(DOCX)
(DOCX)