The authors have declared that no competing interests exist.
Conceived and designed the experiments: JDD XL SDH YJ JTC MJO DAF RMS SKQ LL. Performed the experiments: JDD XH ZW AS SDK LL. Analyzed the data: XH ZW LL. Wrote the paper: JDD XL SKQ LL. Proposed the whole picture of the DDI research project: LL. Guided all the data analyses: LL. Finalized the writing of the paper: LL. Contributed to the pharmaco-epidemiology design: JDD XL RMS SKQ. Contributed to the interpretation of the DDI results: JDD RMS SKQ. Performed the DDI/myopathy association analysis: XH. Provided the mechanistic interpretation of DDIs: SDH. Curated the data: XH AS SDK. Performed the EMR data mapping, data extraction and merging, and analyzable data preparation: ZW. Performed the literature mining: AS SDK. Contributed to the initial ideas formulation: SDH YJ JTC MJO. Contributed to the analysis strategy: XL.
Drug-drug interactions (DDIs) are a common cause of adverse drug events. In this paper, we combined a literature discovery approach with analysis of a large electronic medical record database method to predict and evaluate novel DDIs. We predicted an initial set of 13197 potential DDIs based on substrates and inhibitors of cytochrome P450 (CYP) metabolism enzymes identified from published
Drug-drug interactions are a common cause of adverse drug events. In this paper, we developed an automated search algorithm which can predict new drug interactions based on published literature. Using a large electronic medical record database, we then analyzed the correlation between concurrent use of these potentially interacting drugs and the incidence of myopathy as an adverse drug event. Myopathy comprises a range of musculoskeletal conditions including muscle pain, weakness, and tissue breakdown (rhabdomyolysis). Our statistical analysis identified 5 drug interaction pairs: (loratadine, simvastatin), (loratadine, alprazolam), (loratadine, duloxetine), (loratadine, ropinirole), and (promethazine, tegaserod). When taken together, each drug pair showed a significantly increased risk of myopathy when compared to the expected additive myopathy risk from taking either of the drugs alone. Further investigation suggests that two major drug metabolism proteins, CYP2D6 and CYP3A4, are involved with these five drug pairs' interactions. Overall, our method is robust in that it can incorporate all published literature, all FDA approved drugs, and very large clinical datasets to generate predictions of clinically significant interactions. The interactions can then be further validated in future cell-based experiments and/or clinical studies.
Drug-drug interactions (DDIs) are a major cause of morbidity and mortality and lead to increased health care costs
Several methodological approaches are currently used to identify and characterize new DDIs.
Finally,
The aforementioned
In this paper, we present a novel approach using literature mining for screening of potential DDIs based on mechanistic properties, followed by EMR-based validation to identify those interactions that are clinically significant. We focus on clinically and statistically significant DDIs that increase the risk of myopathy.
Our initial drug dictionary consisted of 6937 drugs. Of these, 1492 drugs were validated as FDA approved drugs (
These drugs' metabolism and inhibition enzymes were experimentally determined by probe substrates and inhibitors recommended by the FDA Drug-Drug Interaction guidelines. Their categorizations are reported in
Among 232 drugs with known metabolism and/or inhibition enzyme information (
The predicted DDIs were from the literature mining. DDIs with EMR data mean DDIs with non-zero frequency among the co-medication data in the EMR.
Among those 3670 predicted DDI pairs from
In our CDM dataset, there were medication records on 817,059 patients. Among these patients, 59,572 (7.2%) experienced myopathy events (
Variables | Characteristics | |||
Myopathy | Myopathy Concept ID | Myopathy Concept Name | Frequency | |
Yes | 59,572 (7.2%) | 446370 | Antilipemic and antiarteriosclerotic drugs causing adverse effects in therapeutic use | 206 |
No | 769,333 (92.8%) | 4262118 | Other myopathies | 7 |
80800 | Polymyositis | 372 | ||
73001 | Myositis | 53 | ||
84675 | Myalgia and myositis | 48877 | ||
4217978 | Myalgia and myositis, unspecified | 185 | ||
439142 | Myoglobinuria | 52 | ||
4147768 | Myopathy, unspecified | 1 | ||
4345578 | Rhabdomyolysis | 52 | ||
4248141 | Rhabdomyolysis | 1 | ||
79908 | Muscle weakness | 12720 | ||
4218609 | Muscle weakness (generalized) | 22 | ||
|
40.2+/−23.0 (11,846 missing) | |||
|
Female | 489,669 (59.1%) | ||
Male | 327,390 (39.5%) | |||
missing | 11,846 (1.4%) | |||
|
3.8+/−2.5 | |||
|
White | 185,675 | 22.4% | |
Black | 65,484 | 7.9% | ||
Asian | 1,741 | 0.2% | ||
Hispanic | 30,670 | 3.7% | ||
Native American | 61 | 0.0073% | ||
Missing | 545,277 | 65.8% |
Note: some of the myopathy Concept ID categories overlapped.
Variables | Effect | ||
|
Male | 0.054 (0.00045) | |
Female | 0.086 (0.00067) | ||
OR | 1.64+/−0.0039 | p-value<2e-16 | |
1.0015+/−0.000012 | p-value<2e-16 |
The 3670 DDI pairs identified in the CDM database were tested using the additive model, i.e. whether an inhibitor would increase the myopathy risk of the substrate compared to the substrate alone. Both age and sex were justified in the logistic regression. The p-value threshold was chosen as 0.05/3670 = 0.0000136 after Bonferroni justification, with OR greater than 1. There were 124 and 287 significant DDI pairs for CYP2D6 and CYP3A4/5 enzymes, respectively (
Both x- and y-axis represent different drug names from a DDI pair. A red-dot highlights a DDI pair showing a strong association with myopathy risk (p<0.0000136, odds ratio>1).
In order to remove the effect of myopathy risk of the inhibitor itself, a synergistic DDI test was conducted to determine whether substrate and inhibitor together have higher risk than the combined additive risk when the substrate or inhibitor is taken alone. Both age and sex were justified as covariates. DDI pairs were removed if either one of the drugs was prescribed to treat symptoms of myopathy. We set the significance threshold as p = 0.0000136, as justified the multiple primary hypotheses on 3670 predicted DDI pairs.
drug 1 | drug 2 | enzymes | Risk1 | Risk2 | Risk12 | Risk Ratio | p-value | sample size (m1/n1, m2/n2, m12/n12) |
|
|
CYP3A4 | 0.022 | 0.033 | 0.093 | 1.69 | 2.03E-07 | (1264/44245, 4197/102345, 137/1223) |
|
|
CYP3A4 | 0.022 | 0.029 | 0.095 | 1.86 | 2.44E-08 | (1257/43341, 2251/52341, 176/1448) |
|
|
CYP2D6 | 0.020 | 0.047 | 0.130 | 1.94 | 5.60E-07 | (1220/43552, 1385/23470, 90/631) |
|
|
CYP2D6 | 0.020 | 0.018 | 0.122 | 3.21 | 2.60E-07 | (1218/43491, 164/6531, 17/123) |
|
|
CYP2D6 | 0.011 | 0.020 | 0.093 | 3.00 | 8.22E-07 | (1332/78334, 109/3745, 23/224) |
Note: Risk1 and risk2 are myopathy risks for drug 1 and drug 2 respectively. The risk-ratio is calculated as risk12/(risk1+risk2). The p-value is calculated from a multivariate logistic regression, in which age and sex were included. (n1, n2, n12) are sample sizes for drug exposure groups of drug 1 alone, drug 2 alone, and both drugs, respectively; and (m1, m2, m12) are myopathy frequencies for drug exposure groups of drug 1 alone, drug 2 alone, and both drugs, respectively.
Additional analyses of myopathy were performed for these five DDI pairs. In the first myopathy analysis, the total number of medications ordered during the drug exposure window was added as a covariate in the logistic regression. This variable was used as a surrogate marker for the comorbidities of a patient. The average number of medications used by individuals during the drug exposure window was 3.6 with SD = 2.4.
drug 1 | drug 2 | Enzymes | Risk1 | Risk2 | Risk12 | Risk Ratio | p-value |
|
|
CYP3A4 | 0.0085 | 0.0016 | 0.027 | 2.72 | 2.95E-12 |
|
|
CYP3A4 | 0.0086 | 0.0041 | 0.045 | 3.58 | <2.00E-16 |
|
|
CYP2D6 | 0.0084 | 0.019 | 0.080 | 2.89 | <2.00E-16 |
|
|
CYP2D6 | 0.0083 | 0.0028 | 0.078 | 7.00 | <2.00E-16 |
|
|
CYP2D6 | 0.0040 | 0.013 | 0.089 | 5.10 | <2.00E-16 |
Note: Risk1 and risk2 are myopathy risks for drug 1 and drug 2 respectively. The risk-ratio is calculated as risk12/(risk1+risk2). The p-value is calculated from a multivariate logistic regression, in which age, sex, and co-medications were included.
In the second myopathy analysis, only the first myopathy events were considered, because co-medications administered after the first myopathy event but before the follow-up myopathy events were potential confounders. In other words, it was difficult to justify whether the co-medication drug exposure resulted from the myopathy or caused myopathy.
Unlike DDI signal detection from AERS by Dr. Altman's group
Among the 13197 predicted DDIs from in vitro PK study literature mining, 3670 of them may have clinical relevance, i.e. they were taken as co-medications by at least some of the 2.2 million patients in our clinical dataset. However, only 196 of them (5.3%) have been tested in clinical pharmacokinetic DDI trials. Among these 196 clinically tested DDIs, 123 of them (62.7%) showed significant substrate drug exposure increase when co-administrated with the inhibitor. This striking finding calls for further evaluation of those predicted DDIs that have not been subjected to rigorous study. As a matter of fact, all five DDI pairs which showed an increased myopathy risk in our pharmaco-epidemiology study lack clinical pharmacokinetic studies.
The FDA labels of all 7 of the drugs which comprise the five significant DDI pairs report myopathy related side effects (
The metabolism enzymes of a drug are characterized with major, partial, or not. The inhibition potencies of a drug are characterized with strong (Ki<10 uM), moderate (10<Ki<100 uM), and weak (Ki>100 uM).
Drug 1 | Drug 2 | Enzymes | Metabolism Routes | Inhibition potency | DDI Prediction |
|
|
CYP3A | major | strong | Strong |
|
|
CYP3A | minor | moderate | Moderate |
|
|
CYP2D6 | major | moderate | Moderate |
|
|
CYP2D6 | major | moderate | Moderate |
|
|
CYP2D6 | minor | strong | Strong |
Two DDI data analysis strategies were implemented to identify drug-drug interactions associated with an increased risk for myopathy. The first approach employed an additive model coupled with a CYP metabolism pathway enrichment analysis. This strategy stems from the newly formed discovery nature of bioinformatics research, i.e. to search for commonality among many hypothesis tests. The second strategy employed a synergistic model coupled with extensive confounder justification. This strategy follows the more stringent pharmaco-epidemiology considerations, which heavily controls for false positives. Unlike the additive model, the synergistic model can justify the myopathic risk effect from an inhibitor in the presence of other potential confounders. Therefore, the additive model would potentially identify more false positive DDIs. However, the additive model is more powerful than the synergistic model in identifying the true positive DDIs. Many more DDIs were identified by the additive model based DDI analysis than by the synergistic strategy. Because pathway enrichment analysis allows more flexibility toward false positive DDIs, the additive model identified CYP3A4/5 and CYP2D6 enzymes as they have the enriched DDI pairs. Although the synergistic model DDI analysis only inferred five significant DDI pairs, upon additional literature review, it was found that these pairs also showed mechanistic involvement of CYP2D6 and CYP3A4/5 enzymes. The consistency of the mechanistic interpretations of the two separate DDI analysis strategies delivers an encouraging message: the bioinformatics approach and the pharamco-epidemiology approach are complementary and mutually supportive.
Our synergistic DDI test is a very stringent approach, compared to the additive approach used by the other investigators
Like many pharmaco-epidemiology studies using observational data, our analysis of the DDI effect on myopathy has several limitations. Creating an accurate phenotypic definition using billing codes may be unreliable, with both false-positives and false-negatives likely to occur. Our dataset also lacked clinical notes from which more detailed symptom data could be extracted. Further research including validation with manual chart review is necessary to establish optimal phenotypic definitions for myopathy, as well as more granular definitions for myotoxicity and rhabdomyolysis. Further research including validation with manual chart review is necessary to establish optimal phenotypic definitions for myopathy, as well as more granular definitions for myotoxicity and rhabdomyolysis using a combination of ICD9 codes, lab tests, and clinical notes.
Another limitation of our analysis is that it is subject to several potential population bias introduced by the EMR database itself. Our retrospective observational data do not allow for controlling many potential covariates that a traditional prospective study offers. In particular, the race data is not complete in our database. It is also equally challenging to design a prospective study to validate our results from a pharmaco-epidemiology study. Clinical pharmacokinetic studies or further in vitro metabolism/inhibition studies of the selected DDI pairs found to increase myopathy may provide further validation of an interaction between the drugs. We are also looking forward to validating our results in another large EMR database.
Our text mining and DDI prediction is CYP metabolism enzyme based. Therefore, our interpretation of the five significant drug interactions focuses only on CYP drug-drug interaction mechanisms. However, this does not preclude the involvement of other DDI mechanisms, such as drug transporter interactions or pharmacodynamic interactions. In a recent GWAS study, expression of the OATP1B1 transporter was shown to predict myopathy risk associated with simvastatin
The concomitant use of CYP3A metabolized statins (atorvastatin, lovastatin and simvastatin) with strong CYP3A inhibitiors (e.g. ketoconazole and itraconazole) reportedly increases risk of statin-induced myopathy. In addition, case reports of increased myopathy in transplant recipients being treated with tacrolimus or cyclosporine
Drug 1 | Drug | |||
Atorvastatin | Lovastatin | Pravastatin | Simvastatin | |
|
0.53 (0.22, 1.27); (4113/156140, 614/26961, 6/194) | 0.39 (0.16, 1.02); (437/16612, 662/28349, 5/256) | 0.38 (0.10, 1.34); (597/20974, 663/28324, 5/278) | 0.43 (0.10, 1.76); (10057/445885, 570/24234, 2/100) |
|
0.95 (0.30, 2.96) (4164/157745, 53/2764, 3/69) | 0.07 (0.00, 102.7); (442/16833, 56/2825, 0/2) | 0.05 (0.00, 24.9); (510/21220, 56/2817, 0/7) | 0.26 (0.03, 1.92); (10154/449828, 54/2659, 1/89) |
|
0.93 (0.66, 1.32) (4130/157280, 424/28661, 32/835) | 1.22 (0.46, 3.24); (436/16778, 452/29352, 4/79) | 1.63 (0.78, 3.40); (499/21147, 441/29328, 7/111) | 0.70 (0.40, 1.21); (10115/448703, 407/27583, 13/499) |
|
2.25 (0.99, 3.89) (4156/157704, 40/3832, 11/133) | 0.23 (0.09, 22.2); (442/16828, 51/3958, 0/9) | 0.06 (0.00, 29.6); (510/21225, 51/3957, 0/7) | 0.29 (0.09, 1.05); (10154/449790, 48/3689, 3/286) |
Note: The p-values of the synergistic drug interaction tests among these drug pairs are larger than 0.05. In each cell, the reported numbers represent relative risk (95% CI) and (m1/n1, m2/n2, m12/n12), where (n1, n2, n12) are sample sizes for drug exposure groups of drug 1 alone, drug 2 alone, and both drugs, respectively; and (m1, m2, m12) are myopathy frequencies for drug exposure groups of drug 1 alone, drug 2 alone, and both drugs, respectively.
As described in the introduction, an
The Indiana Network for Patient Care (INPC) is a heath information exchange data repository containing medical records on over 11 million patients throughout the state of Indiana. The Common Data Model (CDM) is a derivation of the INPC containing coded prescription medications, diagnosis, and observation data on 2.2 million patients between 2004 and 2009. The CDM contains over 60 million drug dispensing events, 140 million patient diagnoses, and 360 million clinical observations such as laboratory values. These data have been anonymized and architected specifically for research on adverse drug reactions through collaboration with the Observational Medical Outcomes Partnership project
This CDM model is a de-identified eletronic medical record database. All the research work has IRB approval.
Our drug dictionary consists of 6,837 drugs names that include all brand/generic/drug group names. They were primarily derived from DrugBank
The INPC CDM data set has 54490 unique drug “Concept IDs”. A Concept ID in the CDM typically maps to an RxNorm clinical drug (e.g., simvastatin 20 mg) or ingredient (simvastatin). Some Concept IDs may contain multiple drug components (e.g., lisinopril/hydrochlorothiazide). Our drug dictionary was mapped to CDM Concept ID's using regular expression matching and manual review. In total, 1293 unique drugs identified from DrugBank were mapped successfully, while 199 drugs could not be matched. The unmatched drugs were categorized as follows: banned drugs, illicit drugs, organic compounds, herbicide/insecticides, functional group derivatives, herbal extract, DrugBank drugs not covered by CDM, and literature only drug names. In our CDM dataset, 817059 patients had medication records available.
Literature mining was conducted on 10 CYP enzymes: (CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4/CYP3A5) (
A drug's ability to be metabolized or inhibited by a specific CYP enzyme is categorized by its published enzyme-based
Information Retrieval (IR) step is a two-step rule based approach. In step one, a template (comprising key terms) was constructed to retrieve PubMed abstracts. The template included required terms: targeted drug name, targeted enzyme name, enzyme specific probe substrates or inhibitors, experiment key terms (i.e. cell systems and equipment set-up), and experiment type (experiment design and parameters); and it included prohibited terms, mostly related to cancer studies. In step two, a natural language processing (NLP) based filter was developed to check the expression patterns in each sentence and decide whether an abstract has DDI-relevant sentences.
In describing the IR process, we will reference the following symbols: O1 denotes inhibitor/inhibit; O2 denotes substrate, probe, metabolized by, or catalyze; O3 denotes inducer/induce; INT denotes interaction, interference, affect, and impact; D denotes drug; and E denotes enzyme. Using these symbols, the patterns are defined as [DEO] : D <D, D…> <not> E O : “drug is (not) enzyme's substrate”; [DOE] : D <not> O E : “drug inhibits enzyme”, “drug is an inhibitor of enzyme”; [EOD] : E <not> O1 <O3> by D : “enzyme is induced by drug”; [IDD1] : <not> INT between D and D : “there is not interaction between drug A and B”; [IDD2] : <not, no> INT D on D : “no impact of drug A on B”; [DID] : D <not, no> INT D : “drug A does not interact with drug B”; Note : also add [OED, ODE, EDO]. Using these expression patterns, a search algorithm was developed to scan each sentence of an abstract, scan for the existence of these patterns, and output the sentence and any DDI patterns/instances.
All of the abstracts identified from the IR step (true positives) were combined with a random subset of PubMed abstracts (n = 10,000) (false positives), where the overlapped ones were true positives. The recall rate was calculated as the percentage of true positive abstracts been selected by the IR algorithm.
Enzyme E's substrates and inhibitors that were mined from the literature were paired to establish the predicted enzyme E DDIs. At this point, the DDI prediction is based only on the text mining results.
Each drug's metabolism enzyme information was further reviewed in the full text papers and the extent of metabolism by each enzyme was categorized as one of three groups: major, minor, or not involved. The inhibition enzyme information for each drug was also categorized as one of three groups: strong, moderate, or not involved); and they are based on numerical values of Ki: <10 uM, 10–100 uM, or >100 uM, respectively. A DDI is concluded as a strong DDI pertinent to enzyme E, if enzyme E is the major metabolism route for at least one drug of the drug pair, and if the other drug shows strong inhibition potency of enzyme E.
If drug A was shown to have increased systemic exposure by the co-administration of drug B, then A and B have pharmacokinetics drug interaction. The increased systemic drug exposure is usually measured by the area under the drug concentration curve ratio (AUCR), half-life ratio, Cmax ratio, metabolic ratio, or steady state drug concentration ratio.
The
The
All of the abstracts identified with from the IE step (true positives) were combined with a random subset of PubMed list (n = 10,000) (false positives), where the overlapped ones were true positives. The subset was subjected to our proposed IR step, and the recall rate was calculated as the percentage of true positive abstracts selected by the IR algorithm.
Our health outcome of the interest (HOI) for this task is
Among patients having a myopathy event, the drug-condition relationship is anchored by the date of myopathy. Any drug exposure occurring within a one month window before the diagnosis of myopathy is considered a positive exposure. If a substrate falls within this window but no inhibitor is present, the event is categorized as “substrate alone” exposure; if both a substrate and an inhibitor fall within this window, it is categorized as “substrate+inhibitor” exposure (
Because of well-defined cases and controls in this cohort study, a logistic regression model was used to analyze the data. Two logistic regression analyses were performed to test each DDI effect on myopathy. The first is an additive model (
(A) Additive DDI Model; and (B) Synergistic DDI Model.
The additive model cannot differentiate whether the increased myopathic risk is inherent to the inhibitor or if it is the effect of a drug interaction leading to increased substrate drug exposure. The synergistic model can identify a greater than expected additive risk of myopathy from the two drugs, indicating a drug-drug interaction. On the other hand, the synergistic model is less powerful in identifying the true DDI than the additive model.
Our primary goal was to identify clinical DDIs resulting in increased risk of myopathy based on the CYP-mediated DDI's identified form literature abstract data. Our hypothesis was that individuals treated with the combination of interacting drugs would have increased risk of myopathy compared to individuals treated with either drug alone (additive model). These hypotheses were tested in the EMR data set, and Bonferroni justification was implemented for the family wise type I error. DDI was also tested among any drug combination effect on myopathy, and these tests serve as the hypothesis generation, instead of the hypothesis testing. In addition, statistical enrichment analysis is performed to identify over-represented CYP enzymes comparing to the rest of the enzymes
Demographic variables, age and sex, were justified in the DDI association analyses. The total number of different medications ordered during the one month drug exposure window was used as a covariate in the logistic regression. It serves as a surrogate of the patients' overall health status, and justifies for myopathy effects from medications other than the hypothesized DDI drug pair. It is recognized that an individual patient can experience multiple myopathy events. Our drug-condition model considered two situations: all myopathy events and the first myopathy event. The advantage of selecting the first myopathy event is that it is not confounded with other medications taken between the first and the follow-up myopathy events. However, limiting the data to first myopathy even reduces the sample size, and thus the power to identify a DDI. DDI pairs, in which at least one drug was prescribed to treat symptoms of myopathy (e.g. narcotic and non-steroidal analgesics), were excluded from the DDI tests. However, the patients prescribed these drugs are kept in the data analysis.
(TIF)
CYP pathway based categorizations of text mined drug from published
(TIF)
CYP pathway enrichment analysis of DDI associations of the myopathy risk.
(TIF)
Significant synergistic DDI effects on the myopathy risk. Only the first drug exposure/myopathy event was counted for each subject. Risk1 and risk2 are myopathy risks for drug 1 and drug 2 respectively. The risk-ratio is calculated as risk12/(risk1+risk2). The p-value is calculated from a multivariate logistic regression, in which age and sex were included.
(GIF)
Myopathy related adverse drug reactions from FDA labels.
(TIF)
Literature review on drug metabolism and inhibition of the seven drugs. We included both
(XLSX)
Myopathy Concept IDs in the Common Data Model.
(XLS)