Conceived and designed the experiments: MH HZ HS SW GS. Performed the experiments: MH HZ MW. Analyzed the data: MH HZ EP SW HS GS. Contributed reagents/materials/analysis tools: MR FR EP. Wrote the paper: MH GS.
The authors have declared that no competing interests exist.
We present a computational method for the reaction-based
The computer program DOGS aims at the automated generation of new bioactive compounds. Only a single known reference compound is required to have the computer come up with suggestions for potentially isofunctional molecules. A specific feature of the algorithm is its capability to propose a synthesis plan for each designed compound, based on a large set of readily available molecular building blocks and established reaction protocols. The
Most of the approaches to
Only a small fraction of all molecules amenable to virtual construction can in fact be synthesized in a reasonable time frame and with acceptable effort.
Here, we present a new approach to computer-assisted
DOGS grows new molecules in a deterministic and stepwise process: in each step, complete enumeration of a subspace of all possible solutions is performed. Following a greedy strategy, top-scoring intermediate products are submitted to subsequent growing steps. The quality of designed (intermediate) products is assessed by a ligand-based scoring scheme. Similarity to a reference ligand is computed by a graph kernel method. Two different graph representations of molecules (
In a recently published work, we have successfully applied DOGS in a first prospective study to designing a selective inhibitor of human Polo-like kinase 1 (Plk1) in its inactive (DFG-out activation-loop) conformation
The DOGS algorithm builds up new candidate structures by mimicking a multi-step synthesis pathway. This strategy is supposed to deliver a direct blueprint for the actual synthesis of proposed candidate structures. For this approach, established reaction protocols need to be formalized in order to make them processable by a computer. Reactions were encoded using the formal language Reaction-MQL
Example of a Paal-Knorr pyrrole reaction encoded as Reaction-MQL expression (
DOGS implements 83 reactions (termed
A subset of the Sigma-Aldrich (Sigma-Aldrich Co., 3050 Spruce St, St. Louis, MO 63103, USA) catalog containing 56,878 chemical building blocks was downloaded from the ZINC database
In the first step, building blocks were standardized, and unsuitable entries were eliminated. For this purpose, a preprocessing routine was developed and implemented using the software MOE (version 2009.10; Chemical Computing Group, Suite 910, 1010 Sherbrooke Street West, Montreal, Quebec, Canada):
Compounds with a molecular mass of less than 30 Da or more than 300 Da were removed.
Compounds containing more than four rings were removed.
Compounds with any element type other than C, N, O, S, P, F, Cl, Br, I, B, Si, Se were removed.
Compounds containing more than three fluoride atoms were removed.
Compounds featuring atoms with incorrect valences were removed.
Compounds containing unwanted substructures were removed according to the recommendations by Hann
Protonation states and formal charges were set according to MOE's washing routine (carboxylic acids were deprotonated; most of the primary, secondary and tertiary amines were protonated).
Duplicate entries were removed.
In the second step, the filtered compound set was subjected to a collection of preprocessing reactions. A set of 15 functional group addition (FGA) and functional group interconversion (FGI) reactions was compiled from the literature and encoded as Reaction-MQL expressions (for a complete list of preprocessing reactions see Table S2 in
The third and final step of the preparation process comprises the annotation of reactive substructures (
DOGS generates new molecules by iterative fragment assembly. The design cycle comprises the modification of a current intermediate product by applying one of the chemical reactions from the library,
For extending an intermediate compound
The algorithm evaluates every building block processed by the dummy reaction steps according to the scoring function. Each of the
Once the design of a new compound based on a selected starting building block is initiated it will be continued until one of two stop criteria is fulfilled.
The first stop criterion controls the molecular mass of the designed compounds. The reference compound's mass (100%) defines a relative lower (70%) and upper (130%) bound. A constructed molecule has to exhibit a molecular mass lying within these boundaries to be accepted as a valid final product. During the design of a new molecule the algorithm continuously adds building blocks until the constructed intermediate product exceeds the lower mass boundary. Up to this step the extension of the intermediate product is accepted even if its score value decreases. Once the molecular mass of the intermediate product exceeds the lower mass boundary, the algorithm will only accept a subsequent extension step if it leads to an improved score. In case the addition of a building block leads to a lower score or causes the molecular mass to exceed the upper mass limit, the last reaction step is neglected and the previous intermediate product is added to the list of final products.
The second stop criterion is supposed to truncate the number of synthesis steps to keep proposed synthesis pathways short. A pathway is interrupted regardless of any other condition when it exceeds a user-defined maximal number of synthesis steps (set to a value of four steps in all runs presented in this study). In this case, the intermediate product formed by the last valid reaction step is added to the list of final products, and a new synthesis pathway is initiated based on another starting building block.
(
DOGS tries to construct at least one compound starting from each of the
The scoring function assesses the quality of a molecule with respect to the design objective. Products of each stage of a virtual synthesis pathway (dummy products, intermediate products, final products) are evaluated by the same scoring function. DOGS employs a two-dimensional (2D) graph kernel method (ISOAK
Briefly, ISOAK computes similarity values for two molecules based on their 2D topological structures. Molecules are interpreted as graphs, where atoms are represented as vertices and covalent bonds as edges between vertices (
In addition to the molecular graph described in the previous section, a
(
Vertices of reduced graphs are labeled with bit vectors that store information about the atoms they represent. These bit vectors consist of ten bits (one for each of the eight atom types, and two additional bits standing for ‘ring’ and ‘amalgamated ring system’, respectively). A vertex bit is set if the corresponding feature is present in the set of atoms the vertex encodes. Vertices not only store the bit vector but also the number of atoms they represent. Accordingly, a benzene substructure would be converted to a single vertex which is labeled by a bit vector with bits for ‘ring’ and ‘aromatic’ set to 1, and stores an atom count of six. Pyridine would be encoded in the same way, except for the bit ‘hydrogen-bond acceptor’ being also set to 1.
Bit vectors (
All other components of ISOAK including the edge comparison are identical to the molecular graph comparison. ISOAK can only processes graphs with a maximum vertex connectivity of six,
The molecular representation used in a design run is selected by the user,
The DOGS software was implemented in the programming language Java (Oracle Corporation, 500 Oracle Parkway, Redwood Shores, CA 94065, USA) version 1.6 and uses the Chemistry Development Kit (CDK, version 1.0.2)
Our initial theoretical analyses of the algorithm were based on
Five trypsin inhibitors served as reference compounds for DOGS design runs (Camostat
Although successful
Comparison of property distributions between compounds designed by DOGS (
Lipophilicity is considered a relevant physicochemical property for drug candidate molecules
It is of critical importance that molecules designed
In summary, the majority of the DOGS designs possesses drug-like properties and is chemically plausible. Most compounds are deemed being amenable to chemical synthesis. The proposed molecules resemble the reference compounds in properties that are not explicitly considered by the scoring function.
Bioisosteric replacement
Side-chains addressing the S1 pocket present in the reference compounds (
Starting at rank position 78 (compounds on higher ranks exhibit one of the fragments present in the references), DOGS suggested eleven different side-chains replacing the reference fragments. Most of them offer the possibility to interact with the negatively charged aspartate side-chain of the S1 binding pocket of trypsin by a positively ionizable nitrogen atom. The terminal urea group and the two aromatic fragments (pyrimidin-2-amine and pyridin-2-amine) are exceptions, where the nitrogen will likely not carry a positive charge. The formation of this salt bridge is a known key interaction inside the S1 pocket
Known inhibitors of trypsin exhibiting pyrimidin-2-amine
In summary, DOGS was able to suggest reasonable bioisosters for parts of the reference ligands addressing the S1 pocket of trypsin, including experimentally validated examples.
Two examples selected from the list of structures proposed by DOGS as potential trypsin inhibitors are presented in
Compounds
Compound
DOGS was employed to propose candidate structures as new modulators of γ-secretase, an aspartic protease that cleaves the amyloid precursor protein (APP) and generates potentially toxic amyloid-β (Aβ) peptides
Four different reference ligands known to modulate γ-secretase were selected. For each reference compound, two DOGS runs (molecular graph representation, α = 0.875; reduced graph representation, α = 0.4) were performed. The resulting eight compound lists were visually inspected, and two appealing ligand candidates
Compounds
Synthesis plans were readily traceable as suggested by the software. One-step reactions yielded the desired products in both cases. Hence, DOGS demonstrated its ability to come up with compounds considered as promising candidates by medicinal chemists and proved to be chemically accessible as suggested (
Histamine is a biogenic amine involved in a plethora of signaling pathways as a messenger. Four subtypes of histamine receptors (
We applied DOGS to provide ideas for new selective antagonists or inverse agonists of
Compounds
The attempt to follow the synthesis scheme proposed for compound
Compound
Highlighted features of known H4R ligands (compound
In order to test for the hypothesis that a combination of the features – as found in compound
A reason for the weak affinity of
Additionally, compound
Although the DOGS design approach is capable of suggesting compounds of practical relevance, a potential improvement to scoring would be to directly incorporate knowledge of a particular pharmacophore,
In conclusion, we present a detailed description of a new method for automated
Supplementary material comprises coupling reactions, preprocessing reactions, unwanted substructures, description of pharmacophore substructures, synthesis protocols and analytical data.
(PDF)
The authors thank Tim Kottke and Stephan Schwed for determining biological activity data of the H4 receptor ligand. We are thankful to Franca Klingler and Dr. Udo Meyer for their help on compiling the reaction database.