Conceived and designed the experiments: FS LIF. Performed the experiments: ABM LIF. Analyzed the data: ABM LIF. Wrote the paper: ABM LIF. Contributed the drug metabolism data: EAH SB. Designed and Developed the cglAlertService web services: MCC RGS JM. Designed and developed the adrPathService web services and the ADR-S workflow: ABM LIF. Designed and developed the ADR-FM workflow: PA GD. Designed and developed the ADR-FD workflow: EMvM BS JAK. Contributed to the overall development of web services, schema and workflows: PL JLO. Performed the statistical analysis: ABM JP.
The authors have declared that no competing interests exist.
Drug safety issues pose serious health threats to the population and constitute a major cause of mortality worldwide. Due to the prominent implications to both public health and the pharmaceutical industry, it is of great importance to unravel the molecular mechanisms by which an adverse drug reaction can be potentially elicited. These mechanisms can be investigated by placing the pharmaco-epidemiologically detected adverse drug reaction in an information-rich context and by exploiting all currently available biomedical knowledge to substantiate it. We present a computational framework for the biological annotation of potential adverse drug reactions. First, the proposed framework investigates previous evidences on the drug-event association in the context of biomedical literature (signal filtering). Then, it seeks to provide a biological explanation (signal substantiation) by exploring mechanistic connections that might explain why a drug produces a specific adverse reaction. The mechanistic connections include the activity of the drug, related compounds and drug metabolites on protein targets, the association of protein targets to clinical events, and the annotation of proteins (both protein targets and proteins associated with clinical events) to biological pathways. Hence, the workflows for signal filtering and substantiation integrate modules for literature and database mining,
Adverse drug reactions (ADRs) constitute a major cause of morbidity and mortality worldwide. Due to the relevance of ADRs for both public health and pharmaceutical industry, it is important to develop efficient ways to monitor ADRs in the population. In addition, it is also essential to comprehend why a drug produces an adverse effect. To unravel the molecular mechanisms of ADRs, it is necessary to consider the ADR in the context of current biomedical knowledge that might explain it. Nowadays there are plenty of information sources that can be exploited in order to accomplish this goal. Nevertheless, the fragmentation of information and, more importantly, the diverse knowledge domains that need to be traversed, pose challenges to the task of exploring the molecular mechanisms of ADRs. We present a novel computational framework to aid in the collection and exploration of evidences that support the causal inference of ADRs detected by mining clinical records. This framework was implemented as publicly available tools integrating state-of-the-art bioinformatics methods for the analysis of drugs, targets, biological processes and clinical events. The availability of such tools for
Drug safety issues can arise during pre-clinical screening, clinical trials and, more importantly, after the drug is marketed and tested for the first time on the population
In 1998, Lazarou et al estimated that yearly about 2 million patients in the US are affected by a serious adverse drug reactions (ADRs) resulting in approximately 100 000 fatalities, ranking ADRs between the fourth and sixth cause of death in the US, not far behind cancer and heart diseases
Due to the important implications of an ADR in both public health and the pharmaceutical industry, unraveling the molecular mechanisms by which the ADR is elicited is of great relevance. Understanding the molecular mechanisms of ADRs can be achieved by placing the drug adverse reaction in the context of current biomedical knowledge that might explain it. Due to the huge amounts of data generated by the “omics” experiments, and the ever-increasing volume of data and knowledge stored in databases related with ADRs, the application of bioinformatics analysis tools is essential in order to study and analyze the molecular and biological basis of ADRs.
Although the factors that determine the susceptibility to ADRs are not completely well understood, accumulating evidence over the years indicate an important role of genetic factors
Other cases of ADRs may arise as a consequence of drug-drug interactions, or the interplay between the effect of the drug and environmental factors
From the above paragraphs, it is clear that the study of the molecular and biological mechanisms underlying ADRs requires achieving a synthesis of information across multiple disciplines. In particular, it requires the integration of information from a variety of knowledge domains, ranging from the chemical to the biological up to the clinical. Different resources cover information about these different knowledge domains, and many of them are freely available on the web, such as biological and chemical databases and the biomedical literature. On the other side, new data is produced continuously, and the list of resources and published papers that a researcher interested in ADRs needs to cope with is turning more into a problem than into a solution. It has been recognized that the adequate management of knowledge is becoming a key factor for biomedical research, especially in the areas that require traversing different disciplines and/or the integration of diverse and heterogeneous pieces of information
On the other hand, approaching current biomedical research questions by computational analysis requires a combination of different methods. An attractive approach that emerged in the last years is the combination of different bioinformatics analysis modules by means of pipelines or workflows
In this article we present a general framework developed in the context of the EU-ADR project for a systematic analysis of adverse drug reactions. The entry point of the system is a potential drug safety signal, which is composed of the drug and its associated adverse reaction. In the process of
The here presented framework for the filtering and substantiation of drug safety signals consists of placing the potential signal in the context of current knowledge of biological mechanisms that might explain it. Essentially, we are searching for evidence that supports causal inference of the signal, i.e. feasible paths that connect the drug with the clinical event of the adverse reaction. The signal filtering analysis looks for evidence reporting the drug-event association in the biomedical literature and biomedical databases. The signal substantiation process considers two scenarios able to provide a causal inference of the signal (see
The signal substantiation process involves the automatic search for evidences that support the causal inference of the potential signal. A. Signal substantiation through proteins. The profile of targets of the drug and its metabolites is obtained by
Our approaches for
In the following section we describe the results of the analysis of potential drug safety signals as a proof of concept of the here proposed framework and tools.
In the 1990s, the occurrence of several cases of serious, life-threatening ventricular arrhythmias and sudden cardiac deaths, secondary to the use of non-cardiac drugs raised concerns with regulators
Workflow | ||||||
ADR-FM | ADR-FD | |||||
Risk of QTPROL | Drug Name | ATC code | MesH | Medline | DailyMed | DrugBank |
|
Sulpiride | N05AL01 | 7 | 6 | NA | 0 |
Quetiapine | N05AH04 | 7 | 18 | 2 | 0 | |
Olanzapine | N05AH03 | 14 | 20 | 1 | 0 | |
|
Ziprasidone | N05AE04 | 15 | 38 | 3 | 0 |
Pimozide | N05AG02 | 0 | 16 | 0 | 0 | |
Haloperidol | N05AD01 | 23 | 55 | 12 | 0 |
For the ADR-FD, the individual results obtained from the three different sources used (Medline, DailyMed and DrugBank) are shown. The table shows the number of records found in each case. NA: Not Available.
Risk of QTPROL | Drug Name | ATC code | Events | Drug-event linking proteins | p-value |
|
Sulpiride | N05AL01 | None | None | None |
Quetiapine | N05AH04 | LONG QT SYNDROME 1/2, 2, 2/5 and 2/3, TIMOTHY SYNDROME, Torsades de Pointes, Romano-Ward Syndrome | HERG (KCNH2, pKi 5.24) | 0.0190 | |
Olanzapine | N05AH03 | LONG QT SYNDROME 1/2, 2, 2/5 and 2/3, TIMOTHY SYNDROME, Torsades de Pointes, Romano-Ward Syndrome | HERG (KCNH2, pKi 4.64, pIC50 6.18) | 0.0190 | |
|
Ziprasidone | N05AE04 | LONG QT SYNDROME 1/2, 2, 2/5 and 2/3, TIMOTHY SYNDROME, Torsades de Pointes, Romano-Ward Syndrome | HERG (KCNH2, pKi 6.77, pIC50 6.36) | 0.1979 |
Pimozide | N05AG02 | LONG QT SYNDROME 1/2, 2/3, 2 and 2/5, TIMOTHY SYNDROME, Torsades de Pointes, Romano-Ward Syndrome, cardiac arrhythmia | HERG (KCNH2, pKi 6.99, pIC50 6.73), Cav1.2 (CACNA1C, pKi 6.7), hEAG1 (KCNH1, pIC50 6.2) | 0.0025 | |
Haloperidol | N05AD01 | LONG QT SYNDROME 2/3, 2, 2/5 and 1/2, TIMOTHY SYNDROME, Torsades de Pointes, Romano-Ward Syndrome | HERG (KCNH2, pKi 6.99, pIC50 6.73), Cav1.2 (CACNA1C, pKi 6.7), hEAG1 (KCNH1, pIC50 6.2) | 0.0025 |
The columns display the risk of producing QTPROL for each drug, the drug name, the ATC code of the drug, the proteins that explain the connection between the drug and the event (Drug-event linking proteins), the clinical events associated with these proteins (Events), as well as p-values. For the drug-event linking proteins, the common protein name is given, and the Gene Symbol and the drug activity values of each drug-event linking protein (pKi or pIC50, average of the multiple values from different sources) are shown in parenthesis.
We furthermore explored the mechanisms underlying the association between QTPROL and the selected antipsychotics using the substantiation workflow. The results are summarized in
The results of the ADR-S workflow can be visualized as a graph in which the nodes are proteins, compounds and clinical events. A: Detail of the network depicting the haloperidol targets, the proteins associated with QTPROL and the connection between them. The proteins encoded by the genes KCNH1, KCNH2 and CACNA1C constitute Drug-Event linking proteins between haloperidol and the terms corresponding to QTPROL. B: Detail of the targets of haloperidol, showing the adrenergic receptors (light blue) and the drug transporter encoded by the gene ABCB1 (purple). In both graphs, the multiple edges between two nodes represent different evidences for the corresponding association between the nodes.
Gene Symbol | Approved name (HGCN) | Other names | UniProt Accession | UniProt Identifier | NCBI Entrez Gene |
|
potassium voltage-gated channel, subfamily H (eag-related), member 1 | hEAG1, Kv10.1, eag, eag1, h-eag | O95259 | KCNH1_HUMAN | 3756 |
|
potassium voltage-gated channel, subfamily H (eag-related), member 2 | HERG, Kv11.1, erg1 | Q12809 | KCNH2_HUMAN | 3757 |
|
calcium channel, voltage-dependent, L type, alpha 1C subunit | Cav1.2, CACH2, CACN2, TS | Q13936 | CAC1C_HUMAN | 775 |
|
ATP-binding cassette, sub-family B (MDR/TAP), member 1 | Multidrug resistance protein 1, ABC20, CD243, GP170, P-gp | P08183 | MDR1_HUMAN | 5243 |
HGNC: HUGO Gene Nomenclature Committee (
Interestingly, our analysis also indicates that the antipsychotics in our study have an important activity on adrenergic receptors (
Moreover, haloperidol shows activity on the drug transporter encoded by the gene ABCB1 (Ki 0.2 µM,
Regarding the substantiation through pathways, for haloperidol and pimozide we found several Reactome pathways (Integration of energy metabolism, Axon guidance, Synaptic transmission, Signaling by GPCRs and Diabetes pathways), which connect the drug and the event, and where the involved proteins are expressed in cardiac tissues. It is likely that the effect of a drug on its target proteins will affect proteins in their direct neighborhood in the biological pathway. Hence, we computed the average shortest path length between pairs of drug and event associated proteins in the Reactome pathways and compared them to the average shortest path length between randomly selected drug and event proteins. Interestingly, for all five antipsychotic drugs, the drug and event proteins are in close proximity in the Reactome pathways with average shortest path lengths between 2 and 3, which are significantly shorter than the average shortest path length of 5 of randomly selected drug and event proteins (p-value< = 0.05).
In summary, the ADR-S workflow provides different hypotheses explaining the antipsychotics-induced QTPROL, including the direct action of the drug on proteins associated with the clinical event (e.g. HERG), the cross-talk between different biological processes (adrenergic signaling and cardiac action potential), and the differential distribution of drugs among tissues (due to inhibition of transporters exerted by the drug). Moreover, it also highlights several interesting evidences that might explain the differences between low and high-risk antipsychotics.
In addition to the example case presented above, the ADR-S workflow was evaluated on a large-scale data set. The SIDER database was used to extract drug-event pairs (see
Recent studies highlight the use of disparate data sets in the study of ADRs, enabled by bioinformatics methodologies. Combining the study of protein–drug interactions on a structural proteome-wide scale with protein functional site similarity search, small molecule screening, and protein–ligand binding affinity profile analysis, Xie and colleagues
All these examples illustrate how computational approaches are paving the way toward elucidating the molecular mechanisms of ADRs. The here presented framework follows this direction, by traversing and integrating information from the chemical domain, through genes and proteins, molecular and cellular networks, and finally to the clinical domain. The filtering workflows interrogate specialized databases and literature repositories in order to determine the novelty of a drug-event association. On the other hand, the substantiation framework seeks to find hypotheses that might explain drug-induced clinical events by looking for evidences supporting causative connections between the drug, its targets, and their direct or indirect (through biological pathways) association to the clinical event. The signal substantiation process can be framed as a closed knowledge discovery process, analogous to the Swanson model based on hidden literature relationships
We illustrate our approach by analyzing a clinically relevant drug safety signal: prolongation of the QT interval (QTPROL) leading to cardiac arrhythmias produced by a set of antipsychotic drugs. The results of the filtering workflows show that the association of QTPROL with the antipsychotic drugs has been extensively discussed in the literature and is documented in specialized databases. On the other hand, the substantiation workflow provides different hypotheses explaining the antipsychotics-induced QTPROL. First, we were able to confirm the widely accepted mechanism proposed for drug-induced QTPROL, in which the drug blocks the potassium channel HERG (encoded by the KCNH2 gene) and this blockade leads to a prolongation of the QT interval
We furthermore found that activities of haloperidol and pimozide on the drug transporter encoded by the gene ABCB1 (Ki 0.2 µM,
Regarding the analysis through biological pathways, our workflow does not provide novel hypotheses that might explain drug-induced QTPROL in addition to the above presented hypotheses. Nevertheless, it is interesting that the drug target proteins and event-associated proteins are closely located in the Reactome pathways. All in all, a detailed analysis of the generated paths might add valuable information about the mechanism underlying the drug adverse reaction. Ultimately, the usefulness of the pathway module strongly depends on the drug-safety signal of interest. For example, the cholesterol-lowering drug cerivastatin was withdrawn from the market in 2001 due to its fatal risk to induce rhabdomyolysis leading to kidney failure
In summary, using antipsychotics and their risk to induce QTPROL, we showed that the filtering workflows are able to extract relevant information from the literature and dedicated databases. We also showed that the substantiation workflow provides different hypotheses explaining the antipsychotics-induced QTPROL. These hypotheses include the direct action of the drug on proteins associated with the clinical event (e.g. HERG), the cross-talk between different biological processes (adrenergic signaling and cardiac action potential), and the differential distribution of drugs among tissues (due to inhibition of transporters exerted by the drug). Moreover, the analysis also highlights several interesting evidences that might explain the differences between low and high-risk antipsychotics. In addition, we provide the results of a large-scale analysis of drug-side effect pairs from SIDER and show that about 22% of the known side effects of drugs might involve direct effects of drugs on proteins being associated with the events. This relatively small number is not surprising because not all drug side effects can be attributed to the direct action of the drug onto its targets, such as on-target and off-target pharmacological effects. Other mechanisms of drug toxicity have been discussed. For example, metabolites can react with nucleophiles including DNA, which can trigger regulatory processes leading to inflammation, apoptosis and necrosis
Both filtering and substantiation workflows are available to the community and allow a systematic and automatic analysis of drug safety signals detected by mining clinical records, providing a user-friendly framework for the analysis of drug-event combinations. We believe that with the availability of such tools for
The signal filtering and substantiation framework has been implemented by means of software modules that perform specific tasks of the processes. To allow access and integration of the modules in high-level analysis pipelines, the modules were implemented as web services and combined into data processing workflows to achieve the aforementioned signal filtering and signal substantiation. To standardize data exchanges between the different web services, we have developed two complementary schemas using XSD to define a common XML interoperability structure. The first one describes general data types (
We have implemented two workflows for signal filtering. The ADR-FM workflow is a MeSH®-based approach to find drug-event pairs in Medline® citations. The ADR-FD workflow uses text-mining to find the drug-event pairs in Medline® abstracts, databases such as DrugBank and drug labels available at DailyMed®.
The aim of this signal filtering workflow is to automate the search of publications related to a given drug-adverse event association. It is based on an approach that uses the MeSH® annotations of Medline® citations, in particular the subheadings “chemically induced”, “adverse effects” and “Pharmacological Action”
URL | Description | Type |
|
XSD schema defining common data types. | XSD schema |
|
XSD schema defining specific types used in the EU-ADR project. | XSD schema |
|
Web service with the method getListPublis | Web service endpoint |
|
Web service with the method get FilteredRelations | Web service endpoint |
|
Web service with the methods getSmileFromATC and getUniprotListFromSmile | Web service endpoint |
|
Web service with the methods getDiseaseAssociatedProteins andgetPathways | Web service endpoint |
|
ADR-FM workflow | Workflow |
|
ADR-FD workflow | Workflow |
|
ADR-S workflow | Workflow |
The ADR-FM workflow accepts two inputs, the ATC (Anatomical Therapeutic Chemical,
Event code | Event name |
BE | Bullous Eruptions |
AS | Anaphylactic Shock |
ARF | Acute Renal Failure |
AMI | Acute Myocardial Infarction |
ALI | Acute Liver Injury |
CARDFIB | Cardiac Valve Fibrosis |
UGIB | Upper gastrointestinal bleeding |
RHABD | Rhabdomyolysis |
PANCYTOP | Aplastic anemia/Pancytopenia |
NEUTROP | Neutropenia/Agranulocytosis |
QTPROL | QT Prolongation |
The workflow returns an XML file and an HTML page summarizing the results, showing the PubMed identifiers of the retrieved citations grouped by publication type. A chart of the number of retrieved citations per year is generated using Google Charts Tools (
This workflow looks for associations between drugs and side effects that have been recorded in literature (Medline®) or in databases (DailyMed® and Drugbank). These resources have been indexed, and co-occurrences of drugs (corresponding to ATC codes) and side effects as defined in the EU-ADR project were captured and stored in a database. Briefly, all abstracts in the Medline database were split into sentences, and all sentences were indexed by the concept-recognition tool Peregrine
The ADR-FD workflow accepts three inputs: the ATC code of the drug at the 7-digit level (e.g., M01AH01 for celecoxib), the event as defined in the EU-ADR project (
The output of the workflow consists of a list of links to entries in the input data sources (Medline® abstracts, DailyMed® SPCs, or Drugbank cards) in which the input drug-event association is mentioned. The output is generated in XML format and in HTML format.
The ADR substantiation (ADR-S) workflow seeks to establish a connection between the clinical event and the drug through (i) proteins targeted by the drug (or by its metabolites) and associated with the clinical event and (ii) biological pathways. In the first connecting path, the link between the drug and the event is established through the set of proteins in common between the Drug-Target-Profile and the Event-Protein-Profile (
This method accepts as input a drug encoded by the ATC code at the 7-digits level and provides as output the chemical structure by means of SMILE (Simplified Molecular Input Line Entry Specification).
This method accepts as input a drug or metabolite encoded by a SMILE and returns a list of proteins that are related to the drug (Drug-Target-Profile). We use known drug-target associations (
Database | Description | URL |
AffinDB | The Affinity Database (AffinDB) contains affinity data for protein-ligand complexes of the PDB. |
|
BindingDB | BindingDB is a public, web-accessible database of measured binding affinities for biomolecules, genetically or chemically modified biomolecules, and synthetic compounds. |
|
ChemblDB | ChEMBL is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data). |
|
DrugBank | DrugBank is a unique bioinformatics and chemoinformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. |
|
hGPCRlig | hGPCRlig is a bank of 3-D human G-Protein Coupled Receptor models and their known ligands. |
|
IUPHARdb | IUPHARdb incorporates detailed pharmacological, functional and pathophysiological information on G Protein-Coupled Receptors, Voltage-Gated Ion Channels, Ligand-Gated Ion Channels and Nuclear Hormone Receptors. |
|
MOAD | Binding MOAD's goal is to be the largest collection of well resolved protein crystal structures with clearly identified biologically relevant ligands annotated with experimentally determined binding data extracted from literature. |
|
NRacl | NRacl is an annotated compound library directed to nuclear receptors as a means for integrating the chemical and biological data being generated within this family. All data incorporated in NRacl were collected from public sources of information, mainly reviews and medicinal chemistry journals of the last 10 years |
|
PDSP | This service provides screening of novel psychoactive compounds for pharmacological and functional activity at cloned human or rodent CNS receptors, channels, and transporters. |
|
PubChem | PubChem provides information on the biological activities of small molecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. |
|
This method accepts as input a clinical event (encoded as a list of UMLS® concept identifiers or as a string as defined in
This method assesses if proteins associated with the drug and the event are annotated to the same biological pathway by interrogating Reactome
The substantiation workflow has five input ports, called
The output of the signal substantiation workflow consists of 7 ports representing different layers of the results. Besides the raw outputs from the individual web services (
Entity | ID | SMILE | styleName | nodeType |
|
Internal identifier for the node in the network. | The SMILE string corresponding to the drug structure. | Common name for the node. | Drug |
The ATC code for the drug. | The generic drug name. | |||
|
Internal identifier for the node in the network. | Not provided | Common name for the node. | Drug |
Internal identifier for the metabolite. | Numbered metabolite. | |||
|
Internal identifier for the node in the network. | Not applicable | Common name for the node. | Event |
The UMLS® CUI for the event. | Name of the UMLS® CUI concept extracted from UMLS®. | |||
|
Internal identifier for the node in the network. | Not applicable | Common name for the node | Protein |
The UniProt accession number for the protein. | Gene symbol for the protein as in UniProt. |
ID | bindingValue | evidenceLink | evidenceSource | evidenceType | relationshipType | |
|
Internal identifier constructed of the ATC code of the drug and the UniProt identifier of the protein. | The binding affinity value as reported in the original database. | Not applicable | Database providing the association. | OBSERVATIONAL for associations taken from databases. SIMILARITY for associations from |
BINDS for drug-target binding |
|
Internal identifier constructed of the metabolite identifier and the UniProt identifier for the protein. | The binding affinity value as reported in the original database or transferred during |
Not applicable | Database providing the association. | OBSERVATIONAL for associations taken from databases. SIMILARITY for associations from |
BINDS for metabolite-target binding. |
|
Internal identifier constructed of the UMLS® CUI concept and the UniProt identifier of the protein. | Not applicable | PubMed identifier of the publication supporting the association, empty if not available. | Database providing the association. | OBSERVATIONAL for associations from curated databases. TEXT-MINING for text-mining derived associations. | Association type according to the gene-disease association ontology available in |
The different web services run in parallel. The drug ATC code is first processed by the module getSmileFromATC, which returns the SMILE code of the drug. The SMILE code is then further processed by the module getUniprotListFromSmile, which returns the relationships between the drug and its targets, including targets of the metabolites of the drug. The event is processed by the module getDiseaseAssociatedProteins, which returns relationships between the event and associated proteins. The lists of proteins associated with drug or event are extracted by means of Java scripts using XPath queries and are further processed to remove duplicates. The module ConvertToCytoscapeGraph converts the output of the web services to a Cytoscape graph for user-friendly visualization by means of XSL transformation. For the signal substantiation through proteins, the two protein profiles are combined to determine the proteins in common between the two profiles (module CheckIntersection). For the signal substantiation through pathways, the two protein profiles are subjected to the module getPathways, which returns a list of pathways to which at least one drug and one event protein that are expressed in the same tissue are annotated to. The output is further processed by module ConvertToHTML, which generates an HTML file listing the pathways that connect the drug and the event.
A dataset of drug-side effects was downloaded from SIDER (December 2011)
We used the protein-protein interaction representation of the Reactome pathways (
The EU-ADR project focuses on a selection of adverse drug reactions that are monitored in electronic health records and further analyzed by the filtering and substantiation workflows
The availability of web services and workflows presented in this work is detailed in
Results of the large-scale analysis of drug-side effects from SIDER using the module ADR-S through proteins.
(TXT)
Results of the large-scale analysis of drug-side effects from SIDER using the module ADR-S through pathways.
(TXT)
Tutorial for the ADR-S workflow.
(PDF)
The authors wish to thank the NLM® for making UMLS® and MesH® available free of charge.