Conceived and designed the experiments: RCD LH FPR. Performed the experiments: RCD LH. Analyzed the data: RCD LH GP DC. Contributed reagents/materials/analysis tools: GDL RSV TJW REG. Wrote the paper: RCD RSV DC TJW REG FPR.
The authors have declared that no competing interests exist.
Human disease is heterogeneous, with similar disease phenotypes resulting from distinct combinations of genetic and environmental factors. Small-molecule profiling can address disease heterogeneity by evaluating the underlying biologic state of individuals through non-invasive interrogation of plasma metabolite levels. We analyzed metabolite profiles from an oral glucose tolerance test (OGTT) in 50 individuals, 25 with normal (NGT) and 25 with impaired glucose tolerance (IGT). Our focus was to elucidate underlying biologic processes. Although we initially found little overlap between changed metabolites and preconceived definitions of metabolic pathways, the use of unbiased network approaches identified significant concerted changes. Specifically, we derived a metabolic network with edges drawn between reactant and product nodes in individual reactions and between all substrates of individual enzymes and transporters. We searched for “active modules”—regions of the metabolic network enriched for changes in metabolite levels. Active modules identified relationships among changed metabolites and highlighted the importance of specific solute carriers in metabolite profiles. Furthermore, hierarchical clustering and principal component analysis demonstrated that changed metabolites in OGTT naturally grouped according to the activities of the System A and L amino acid transporters, the osmolyte carrier SLC6A12, and the mitochondrial aspartate-glutamate transporter SLC25A13. Comparison between NGT and IGT groups supported blunted glucose- and/or insulin-stimulated activities in the IGT group. Using unbiased pathway models, we offer evidence supporting the important role of solute carriers in the physiologic response to glucose challenge and conclude that carrier activities are reflected in individual metabolite profiles of perturbation experiments. Given the involvement of transporters in human disease, metabolite profiling may contribute to improved disease classification via the interrogation of specific transporter activities.
Human disease is complex, arising from the interaction of many genetic and environmental factors. Efforts to personalize treatment have been thwarted by “phenotypic heterogeneity”, the apparent similarity of disease states with diverse underlying causes. One approach to resolve this heterogeneity is to redefine diseases on the basis of abnormal physiologic activities, which should allow grouping patients into categories with similar treatment response and prognosis. Physiologic activities can be identified and assessed through quantitative measurements of biomolecules—proteins, mRNAs, metabolites—in individual patient samples. The field of metabolomics involves the analysis of a broad array of metabolite levels from clinical fluid samples such as blood or urine and can be used to evaluate disease states. Because metabolic profiles are complex, we have taken an integrative network-based approach to understand them in terms of abnormal activities of enzymes and small molecule transporters. We have focused on the oral glucose tolerance test, used to diagnose diabetes, and have found that multiple transporters play an important role in the normal response to ingesting sugar. Many of these transporter activities are abnormal in individuals with impaired glucose tolerance and differing activities among them may reflect the diverse underlying causes and variable clinical courses of such patients.
Disease heterogeneity has challenged the practice of medicine. Individuals with the same apparent disease at our current diagnostic resolution often show remarkable variation in prognosis and treatment responsiveness, presumably because a superficially similar disease state can arise from diverse combinations of genetic and environmental factors
Using tumor biopsy samples, oncologists are now exploring the incorporation of genomewide expression profiling into therapy
Metabolomics data analysis may be facilitated by techniques applied to other high-throughput ‘omic data types. For microarray data, the integration of network information from protein-protein interaction data or predefined biologic pathways has greatly assisted elucidation of underlying processes and led to the development of increasingly robust and accurate gene-based classifiers for disease
We use data derived from oral glucose tolerance tests (OGTT) in 25 individuals with normal (NGT) and 25 with impaired (IGT) glucose tolerance
We examined metabolite profiles from a previously descibed oral glucose tolerance experiment (OGTT)
We were thus interested in further elucidating the underlying biologic processes leading to the observed pattern of changes. Analyzing the OGTT metabolite profiles of the 25 NGT and 25 IGT Framingham Heart Study participants (see
We evaluated NGT and IGT individually (comparing metabolite abundance before and after oral glucose load) and found enrichment solely in NGT for Bile Acid Biosynthesis at an adjusted
The low yield of pathway enrichment could arise in part from the sparseness of our metabolome coverage or from the fact that most metabolites are implicated in multiple pathways. Furthermore, even if a pathway has uniformly increased flux, this will not generally lead to uniform increases in metabolite abundance. The relationship between enzymatic activity and metabolite concentration can be understood in terms of the relative contribution of “metabolic regulation” and “hierarchical regulation”. Metabolic regulation involves control of reaction flux through the interaction of enzymes with the rest of the metabolic network, such as changing substrate, product or modifier concentrations
We based our analysis on the fact that metabolites are linked via chemical reactions. We hypothesized that OGTT is a physiologic stimulus that alters flux through specific metabolic reactions. Since products from one reaction may serve as reactants for and drive other reactions, we sought groups of metabolites that are connected through metabolic reactions and collectively show a high degree of change. Furthermore we hypothesized that a perturbation such as OGTT would increase the activity of enzymes and transporters, many of which have multiple substrates. Thus, we were also interested in groups of changed metabolites linked by virtue of being substrates of a common enzyme or transporter.
We framed the search for functionally-linked, highly changed metabolites in OGTT in terms of the discovery of active modules (or subnetworks). Active module approaches have previously been applied in bioinformatics analysis to elucidate underlying biologic processes in gene expression data. In such analyses, the investigators typically overlaid gene scores based on differential expression in microarray experiments onto protein-protein interaction
We first built a Metabolic Reaction Network (MRN) using the 3338 metabolic reactions in Recon 1. Although Recon 1 includes most known transport reactions, the specific transporters were not always explicitly mentioned. Thus we expanded this list with 737 additional reactions explicitly modeling transport processes for the metabolites measured in this experiment (see
We converted experimental measures of significance of change (
Distributions of active module scores were evaluated for statistical significance relative to those obtained from random networks, where metabolite scores were permuted randomly amongst measured nodes. At an FDR threshold of 0.01, all of the solutions were highly significant (
We selected all metabolites that appeared with sufficient frequency (see
We next sought to characterize whether the AMG metabolites are active in any particular human tissue. To do so, we exploited recent predictions of which metabolic reactions in the Recon 1 network were likely to be active in ten specific human tissues, using constraint-based flux modeling
An inspection of the AMGs for NGT samples (
Panels (a) and (b) correspond to NGT-EMRN and NGT-CMRN, respectively. Nodes in the AMGs correspond to metabolites in chemical reactions and edges are drawn between reactant-product pairs or shared substrates of enzymes/transporters. A gradient from gold to blue was used to denote reduced percentage change in metabolite abundance after glucose challenge. For clarity, changes were truncated at ±60%. Unmeasured nodes are shown in grey. Edges corresponding to different types of functional links between metabolites are indicated. Cellular locations for metabolites in (a) are assumed to be extracellular unless denoted by [c] for cytoplasmic. Likewise, cellular locations in (b) are assumed to be cytoplasmic unless denoted by [e] for extracellular. The lac-pyr-cit-akg group of metabolites in (a) is connected to the remainder of the set via metabolites with relative frequencies<0.20 across solutions; the same is true of the bile salts cluster in (b).
Amino acid transport activities have historically been grouped into “Systems” that describe the chemical properties of the transported molecules (e.g. cationic or small/neutral) and the response to specific inhibitors
Enzyme or Transporter Family | Enzyme or Transporter Family Member | System | Measured Substrates in Active Module Groups | Reaction | Tissue Distribution |
SLC6 | SLC6A14 |
B(0,+) | Citr-L, Leu-L, Ile-L, Met-L, Lys-L, Val-L, Phe-L, Tyr-L, Trp-L, His-L, Orn-L, Ser-L, (reduced transport for Thr-L, Hom-L, Asn-L, Gln-L) | Facilitated | lung, trachea, salivary gland, mammary gland, pituitary, stomach, colon |
SLC6 | SLC6A15 |
NA | Val-L, Leu-L, Met-L, Ile-L | Facilitated | brain |
SLC6 | SLC6A19 |
B(0) | Citr-L, Leu-L, Ile-L, Phe-L, Trp-L, Tyr-L, Gln-L, Met-L, Asn-L, Hom-L, Thr-L, Ser-L | Facilitated | kidney, intestine |
SLC3/SLC7 | SLC7A1 |
y+ | Lys-L, Arg-L, Orn-L, His-L | Facilitated | Ubiquitous except liver |
SLC3/SLC7 | SLC7A2 |
y+ | Lys-L, Arg-L, Orn-L, His-L | Facilitated | liver, skeletal muscle, pancreas |
SLC3/SLC7 | SLC7A3 |
y+ | Lys-L, Arg-L, Orn-L, His-L | Facilitated | thymus, overy, testis, brain |
SLC3/SLC7 | SLC3A2/SLC7A5 |
L | Tyr-L, Phe-L, Trp-L, Leu-L, Ile-L, Val-L, His-L, Citr-L | Exchange | brain, ovary, testis, placenta |
SLC3/SLC7 | SLC3A2/SLC7A8 |
L | Citr-L, Gln-L, Leu-L, Ile-L, Met-L, Val-L, Phe-L, Thr-L, Asn-L, Trp-L, Ser-L, Tyr-L, Hom-L | Exchange | kidney, intestine, brain, placenta, ovary, testis, muscle, epithelium |
SLC3/SLC7 | SLC3A1/SLC7A9 |
b(0,+) | Lys-L, Val-L, Orn-L, Met-L, Ile-L, Leu-L | Exchange | kidney, intestine, lung, placenta, brain, liver, endothelium |
SLC43 | SLC43A1 |
L | Val-L, Ile-L, Citr-L, Leu-L, Phe-L, Met-L | Facilitated | kidney |
SLC43 | SLC43A2 |
L | Val-L, Ile-L, Citr-L, Leu-L, Phe-L, Met-L | Facilitated | kidney |
SLC38 | SLC38A4 |
A | Met-L, Lys-L, His-L, Arg-L, Asn-L, Ser-L | Facilitated | liver, skeletal muscle, kidney, pancreas |
SLCO1 | SLCO1A2 |
NA | taurochenodeoxycholate, glycocholate, glycochenodeoxycholate | Facilitated | brain, kidney, liver, ciliary body |
SLCO1 | SLCO1B1 |
NA | taurochenodeoxycholate, glycocholate, glycochenodeoxycholate | Facilitated | liver |
SLCO1 | SLCO1B3 |
NA | taurochenodeoxycholate, glycocholate, glycochenodeoxycholate | Facilitated | liver |
The FuncAssociate program
*Transporters/enzymes with
†transporters with
In addition to the core cluster of amino acids, the AMGs include additional changed metabolites on their periphery. These peripheral metabolites are connected to the amino acid core via unmeasured metabolites, which represent potential functional links. For example, in the NGT-CMRN AMG (
The other peripheral metabolite clusters in the NGT-CMRN and NGT-EMRN AMGs capture other insulin-stimulated activities including glycolysis, triglyceride biosynthesis, and an increase in bile salt plasma levels (by unknown mechanisms). Although these were commented upon previously
For the NGT group, there is a significant drop in L-Proline and N,N-dimethylglycine levels and an increase in glycine betaine levels. All three amino acids appear in the NGT-EMRN and NGT-CMRN AMGs with the edges between them representing shared transport by the SLC6A12 carrier
The connections among the 3 metabolites (and proline) in the NGT-EMRN and NGT-CMRN AMGs are shown, along with the Recon 1 betaine-homocysteine methyltransferase catalyzed reaction.
The IGT-EMRN and IGT-CMRN (
Although the AMGs convincingly illustrate that changed metabolites are common substrates of small molecule transporters, they cannot establish coordinated activity of these cotransported substrates. To explore whether metabolite substrates of individual transporters are in fact coregulated, we performed hierarchical clustering across the 25 individuals from the NGT and IGT groups, looking to identify metabolites that show a similar absolute percentage of change across individuals.
Heatmaps of the results of hierarchical clustering (
Grouping is according to 1−|ρ|, where ρ is the Spearman correlation coefficient for percentage change in metabolite abundance. Metabolite clusters that correspond to established transporter activities are highlighted. Cluster I corresponds to the SLC25A13 transporter (liver variant); Cluster II corresponds to SLC6A12; Cluster III corresponds to the small aliphatic system A transport system (SLC6, SLC7 and SLC38 transporters); and cluster IV corresponds to the hydrophobic/aliphatic system L transport system (SLC6, SLC7, SLC43).
Grouping is according to 1−|ρ|, where ρ is the Spearman correlation coefficient for percentage change in metabolite abundance. Metabolite clusters that correspond to established transporter activities are highlighted. Cluster numbering is as in
Cluster II in NGT includes all 3 of the measured SLC6A12 substrates (proline, glycine betaine and dimethylglycine), which demonstrate absolute pairwise correlation coefficients ranging from 0.23 (for dimethylglycine and glycine betaine) to 0.57 (for glycine betaine and proline). Proline and glycine betaine also are strongly correlated and co-cluster in IGT (Spearman correlation coefficient = 0.60). The proline-betaine correlation likely reflects the fact that these metabolites are cotransported by at least three carriers (SLC6A12, SLC6A20 and SLC36A2). By contrast, these two metabolites are not known to participate in any common metabolic pathways, supporting the hypothesis that coordination of measured plasma levels of proline and glycine betaine is via regulation of their common transporters.
In Cluster I, bile salts are found with citrulline in both NGT and IGT. In IGT, malate also clusters closely with citrulline. We searched PubMed (
To further examine the relationship between distinct transport activities in OGTT metabolite profiling, we analyzed change in plasma levels of metabolites for NGT and IGT using principal component analysis (PCA). This analytic technique attempts to find linear combination of metabolites that best explain the interindividual variation seen in metabolite profiles. PCA revealed that the top two eigenvectors for NGT coincided with SLC25A13 and amino acid transport activities, respectively, explaining a total of 39% of interindividual variance in metabolite changes (see
Panels (a) and (b) correspond to NGT and IGT, respectively. Principal component #1 largely corresponds to pathways regulated by hepatic SLC25A13 activity, including glycolysis (lac, pyr) and gluconeogenesis (ala, ser), nucleotide biosynthesis (OMP, r1p, hxan, xan, xtsn, ncam), bile salt (gchol, tdchol) and citrulline (citr) accumulation, and NAD+/NADH balance by malate shuttling (glu, akg, mal). Principal component #2 largely corresponds to System A and L amino acid transport.
Given that metabolite profiling of perturbation experiments can interrogate specific underlying transporter activities, we investigated to what extent transporters are involved in human disease. We consulted the OMIM database of Mendelian diseases (
Given that we have measured plasma levels for only a small fraction of the human metabolome, the pathway models that we have discovered may be smaller than the actual enriched pathway. Conversely, for those AMGs that include unmeasured metabolites, measurement of additional metabolites may show that other pathways more convincingly explain the observed physiological changes. A further limitation is that our scoring method considered all significant changes in plasma levels equivalently, without considering direction of change. Additionally, we expect there are alterations in metabolic reaction flux within the cell in response to glucose challenge that may be difficult to decipher from plasma metabolite levels. Finally, metabolites that are significantly changed but which are not closely linked to other metabolites via chemical reactions or shared enzymes/transporters are unlikely to appear in AMGs, but may still reflect important altered tissue activities during OGTT.
We have directly integrated metabolic reaction connectivity and a collection of shared transporter relationships, extended here by manual literature curation, into metabolite profile interpretation to identify biologic processes relevant to a physiological perturbation experiment. Our approach makes use of a deterministic approach to identify active modules and directly integrates plasma measurements of metabolites with a unipartite graph capturing interrelationship between metabolic substrates. Through this method, we have uncovered a potentially important contribution for transporter activities in plasma metabolite profiles, which is ignored by using more traditional analysis of metabolic pathways.
A prior application of active modules to metabolism relied on integrating microarray-derived gene expression information for enzymes with enzyme connectivity in metabolic graphs to identify clusters of functionally connected enzymes that collectively show a high degree of change in a perturbation experiment
One motivation for this approach is to redefine human disease in terms of aberrant metabolic activities. Given groups of affected and control individuals, active module analysis can achieve this in a number of ways. Baseline differences in each metabolite's abundance can be scored between affected and unaffected individuals
As the relationship of metabolite abundance with disease incidence and outcomes is better understood, we may ultimately be able to use integrated analyses of metabolic profiles to subclassify disease on the basis of distinct enzymatic/transporter activities, thus allowing a more individualized approach to clinical medicine.
Metabolite abundance measurements from an oral glucose tolerance test have been described
Metabolite peak intensities were determined as described previously
In order to identify significantly changed clusters of functionally connected metabolites, we converted experimental
Using this statistic, nodes for which the
A Metabolic Reaction Network (MRN) was constructed based on the 3338 metabolic reactions in Recon 1. We treated all 1500 reactant and product metabolites in Recon 1 as nodes. Cellular locations were assigned to each metabolite as specified in Recon 1, and metabolites were split into multiple nodes (each corresponding to a different location). This process resulted in 2779 total nodes.
For each metabolic reaction, edges in the MRN were drawn between each reactant and product nodes and between all common metabolites substrates (reactants and products) of enzymes and transporters. Furthermore, since transporter annotation was not complete, for each of the measured metabolites we manually searched the literature to identify transporter-substrate relationships and identified all substrates for any transporter found. Finally, we expanded the list of reactions so that each transporter-reaction relationship was reflected in an independent entry. This approach resulted in an additional 737 reactions (see
To improve the specificity of the active module discovery process, nodes and edges involving the following 28 ‘promiscuous’ and/or buffered metabolites were also eliminated: UMP, UDP, UTP, FAD, FADH2, Na+, K+, SO4, NH4, CO2, Phosphate, O2, Pyrophosphate, H2O, H+, OH−, ATP, ADP, AMP, CTP, CDP, CMP, NAD, NADH, NADP, NADPH, H2O2, and HCO3−. As many of these metabolites serve as cofactors, their inclusion would contribute to non-specific bridges between metabolites in active modules and would thus be of limited use for biological processes. We did however include nodes and edges for reactions where nucleotides serve as primary reactants and products, such as those involved in nucleotide biosynthesis or catabolism; NAD metabolism; and Riboflavin metabolism.
Because abundance measurements were only available for a small fraction of the metabolic network, we limited the MRN to the union of measured metabolites and all nodes found on paths (up to path-length three) between two measured nodes. This filtering, used to reduce computing cost, did not alter downstream results because to be included in an active module, a metabolite must either be measured or lie on a path between measured metabolites.
Scores generated above were assigned to measured nodes in the MRN. We built both a Scored Extracellular MRN (EMRN) and a separate Scored Cytoplasmic MRN (CMRN). For the EMRN, if a metabolite had two cellular locations, the metabolite score was assigned to the extracellular metabolite, modeling extracellular levels; for the CMRN, scores, in such a situation, were assigned to the cytoplasmic metabolite. The EMRN and CMRN networks had 297 nodes and 5515 edges and 344 nodes and 6089 edges, respectively, after applying the above steps.
To identify active modules in our MRN, we used a recently developed method
For the purpose of evaluating statistical significance of observed active modules, we generated random solutions by repeating the active module discovery process for 100 random scored MRNs. Although the topology was preserved, the scores for each random MRN were randomly permuted among measured nodes and
For each scored MRN, the frequency of appearance in the corresponding 100 solutions was measured for all nodes. Nodes with ≥0.20 relative frequency were grouped together to form an Active Module Group (AMG), which was examined for significant overlap with metabolite sets corresponding to predefined pathways. To identify predefined pathway enrichment in the AMGs, we used a modified version of the FuncAssociate program
We generated pathway and enzyme/transporter-to-metabolite mappings for Recon 1, limiting our analysis to pathways, enzymes or transporters that included at least three metabolites in the metabolite universe. We were interested in enrichment for analysis for both AMGs and for our ranked list of changed metabolite. To look for pathway enrichment in the ranked list of changed metabolites, we used the “ordered” setting in FuncAssociate (
For assessing pathway and enzyme/transporter enrichment in our AMGs, we used as our universe of metabolites all nodes in the reduced networks. To address the fact that the AMGs may be biased in composition towards measured nodes, we modified FuncAssociate so that the null distribution of
In a recent manuscript
Hierarchical clustering and PCA was performed on the subset of metabolites that changed significantly in either NGT or IGT at an FDR<0.05, determined using the
In our analysis, three parameters determined the balance between measured and unmeasured metabolites in our active module solutions: 1) the false-discovery rate (FDR) threshold used in the determination of scores for measured metabolites; 2) the path-length threshold between measured metabolites used in filtering unmeasured metabolites from the MRN; and 3) the frequency threshold used in selecting active module solution metabolites for inclusion into active module groups. Although unmeasured metabolites are useful for hypothesis generation in terms of identifying potentially novel markers of insulin function, such hypotheses should not be so numerous as to dominate the analysis. At the stringent FDR of 0.01 selected, the active module discovery process was relatively insensitive to changes in the other two parameters, with little difference in the observed results at higher or lower path lengths. In fact, all unmeasured metabolites in active module groups were directly connected to one or more measured metabolites. For the frequency cutoff for active module group metabolites, our primary conclusions were robust to all thresholds from 0.10 to 0.50.
Metabolite profiles for NGT and IGT groups. Recon 1 symbols and names for measured metabolites are listed alongside magnitude (fraction of baseline) and significance (
(0.13 MB XLS)
Manually curated transport reactions used in addition to Recon 1 reactions to derive Metabolite Reaction networks. The first tab corresponds to 190 reactions already described in Recon 1 - a column was added that makes explicit the mapping to a single solute transporter. The second tab corresponds to 737 additional transport reactions identified by manual literature curation - these have also each been mapped to a single transporter.
(0.22 MB XLS)
Relative frequencies for metabolites in Active Module Solutions. A frequency threshold of 0.20 for appearance of metabolites was used for subsequent analysis and figure construction.
(0.07 MB XLS)
Metabolic reactions and/or enzymes/transporters corresponding to edges in AMGs. Edges correspond to reactant-product pairs in individual reactions and/or substrates of common enzymes/transporters.
(0.12 MB XLS)
Active Module Group from a) IGT-EMRN and b) IGT-CMRN. For details see legend for
(0.46 MB EPS)
The authors would like to acknowledge helpful discussions with Gunnar Klau regarding the