Conceived and designed the experiments: BvO HD CAD JLM HMR. Performed the experiments: ACT ST IMFG PPM. Analyzed the data: MJM. Wrote the paper: MJM. Reviewed the manuscript: MJM ACT BvO HD ST IMFG ICG PPM CAD JLM HMR.
The authors have declared that no competing interests exist.
Understanding the molecular link between diet and health is a key goal in nutritional systems biology. As an alternative to pathway analysis, we have developed a joint multivariate and network-based approach to analysis of a dataset of habitual dietary records, adipose tissue transcriptomics and comprehensive plasma marker profiles from human volunteers with the Metabolic Syndrome. With this approach we identified prominent co-expressed sub-networks in the global metabolic network, which showed correlated expression with habitual n-3 PUFA intake and urinary levels of the oxidative stress marker 8-iso-PGF2α. These sub-networks illustrated inherent cross-talk between distinct metabolic pathways, such as between triglyceride metabolism and production of lipid signalling molecules. In a parallel promoter analysis, we identified several adipogenic transcription factors as potential transcriptional regulators associated with habitual n-3 PUFA intake. Our results illustrate advantages of network-based analysis, and generate novel hypotheses on the transcriptomic link between habitual n-3 PUFA intake, adipose tissue function and oxidative stress.
A fundamental goal in the field of nutritional genomics is defining the molecular link between diet and health. Human nutritional genomic studies are frequently hindered by a high level of unexplained variation in gene expression, protein and metabolite abundance, and clinical parameters – potentially attributable to variation in genotype, background diet, anthropometry, physical activity and health status. In our present study, relationships between adipose tissue gene expression, habitual diet and clinical markers of metabolic health were investigated in a cohort of individuals with impaired metabolic health, typical of the Metabolic Syndrome (MetS). Using multivariate statistics in conjunction with a novel approach to metabolic network analysis, we identified regions of the human metabolic network showing coordinated transcriptomic response to variation in n-3 PUFA intake and correlation with markers of metabolic health.
Dietary fat intake has profound effects on molecular processes of metabolic health. These effects are diverse and often subtle, representing a considerable analytical challenge in reaching system-level understanding. Transcriptomics has become a central technology in the development of molecular nutrition, having the capacity to produce expression data for every gene in a given genome. However, the major challenge is to apply appropriate techniques for extracting information from high-throughput datasets. Differentially expressed gene lists are an intuitive first choice, but they are hard to interpret in a biological context. Pathway analysis – typically implemented using gene set enrichment analysis – has become a standard method in the field of transcriptomic analysis
Network-level analysis has revealed detailed insight on metabolic regulation in type 2 diabetes and insulin resistance
A number of methods exist for analyzing transcriptomic data in the context of a global interaction network
The LIPGENE human dietary intervention study was a randomized, controlled trial that complied with the 1983 Helsinki Declarations, approved by the local ethics committees of the 8 intervention centres (Dublin, Ireland; Reading, UK; Oslo, Norway; Marseille, France; Maastricht, The Netherlands; Cordoba, Spain; Krakow, Poland; Uppsala, Sweden). Written informed consent was attained from every participant as approved by each institutional ethical committee.
The current study was conducted within the framework of the LIPGENE Integrated Project “Diet, genomics and the metabolic syndrome: an integrated nutrition, agro-food, social and economic analysis” (Clinical Trials. gov number: NCT00429195) and NuGO, The Nutrigenomics Organization (
Fatty acid profile | Lipids | Apolipoproteins | IVGTT | Inflammatory markers |
C14:0 | Triglycerides | ApoA1 | Glucose AUC |
C-Reactive protein |
C16: | Cholesterol | ApoB | IL-6 | |
C16:1 | NEFA | ApoCII | TNFα | |
C18:0 | TRL-TG | ApoCIII | sICAM | |
C18:1 | TRL-C | ApoE | sVCAM | |
C18:2 n-6 | LDL-C | TRL Apo B | Resistin | |
C18:3 n-6 | T-HDL | Adiponectin | ||
C18:4 n-3 | PAI-1 | |||
C20:1 | tPA | |||
C20:3 n-6 | Fibrinogen | |||
C20:4 n-6 | Leptin | |||
C20:4 n-3 | 8-iso-PGF2α (urinary) | |||
C20:5 n-3 | 15-keto-PGF2α (plasma) | |||
C22: 4 n-6 | ||||
C22:5 n-3 | ||||
C22:6 n-3 |
*Derived from relative area under the curve (AUC) of plasma glucose measurements (mmol/L) at 12 time points from 0 to 180 minutes following intravenous glucose challenge.
Subcutaneous adipose tissue samples were taken from the periumbilical area of 19 volunteers from the Norwegian and Spanish cohorts (10 female, 9 male) after an overnight fast. Needle biopsies were obtained after a 5 mm transdermal incision under local anaesthesia. Samples were rinsed in saline, put in RNA later and frozen immediately (−80°C) for subsequent analysis. Total RNA was extracted from adipose tissue using the RNeasy lipid tissue mini kit (Qiagen, U.K.). Briefly, 100 mg of adipose tissue was homogenised in Qiazol lysis reagent. After addition of chloroform, the homogenate was centrifuged to separate the aqueous and organic phases. Ethanol was added to the upper aqueous phase, and applied to the RNeasy spin column, where the total RNA was bound to the membrane, and phenol and other contaminants were washed away. RNA was then eluted in RNase-free water.
Extracted RNA was sent to ServiceXS (a high-throughput data service provider;
Raw microarray data were first assessed for quality using a set of standard QC tests, including array intensity distribution, positive and negative border element distribution, GAPDH and β-actin 3’/5’ ratios, centre of intensity and array-array correlation check. All QC tests were implemented in the R programming language (Version 2.11.1l, R Foundation for Statistical Computing), using the affyQCReport library. A batch effect was noted due to the arrays being hybridized on two separate days; thus, all subsequent analyses accounted for this effect by including batch number as a covariate in statistical models. It was also noted that the β-actin 3’/5’ ratios were higher than recommended (
Diet and plasma marker variables were first normalized with log or square root transformation as appropriate to reduce skewness and kurtosis. Sparse partial least squares regression (sPLS;
The mixOmics library was used for rCCA modelling of plasma marker and gene expression data. The
We used the Edinburgh human metabolic network reconstruction
Coexpression was assessed for each gene-gene pair in the metabolic network reconstruction using Akaike's information criterion (AIC), a criterion used to select an optimal model among competing possibilities
To identify paths of interest in the global interaction network, Dijkstra's shortest paths
The algorithm of network analysis in this study includes a two-step process: 1) extraction of connected paths from the node of interest to all others in the network; and 2) evaluation of metabolic feasibility of each candidate path. Given a candidate (
The TFM-explorer tool
Results from sPLS indicated that among all dietary variables, the registered dietary intake of n-3-PUFA showed the strongest covariance with adipose tissue gene expression. Of the 53 n-3-PUFA-correlated genes identified in the sPLS analysis, 41 positively correlated and 12 negatively correlated with n-3-PUFA intake (
Green nodes: dietary variables; yellow: lipid, fatty acid and apolipoprotein variables; red: inflammatory and oxidative stress markers; blue: genes (enzymes). Solid lines: positive correlation (rCCA)/covariance (sPLS); dashed lines: negative correlation/covariance.
rCCA results showed that among the measured plasma lipids, fatty acids and apolipoproteins, plasma DHA, stearic acid and EPA correlated most strongly with adipose tissue gene expression (
The complete metabolic network included 1371 nodes and 65637 directed edges; the transcriptionally coexpressed (TC) subset contained 602 nodes and 5414 directed edges (supplementary
Diet-sensitive path extraction from the TC network revealed 755 unique paths greater than length 2 originating from 30 n-3 PUFA-sensitive genes, although paths leading from each diet-sensitive gene collapsed into tree-like structures (
Green nodes: dietary variables; yellow: lipid, fatty acid and apolipoprotein variables; red: inflammatory and oxidative stress markers; blue: genes (enzymes). Dashed lines indicate negative correlation. A: Path linked to
The
The
The
To assess whether a similar group of paths would be extracted from any TC network –
To compare our network analysis with a standard approach to pathway analysis, hypergeometric tests were performed to identify KEGG pathways significantly enriched (using the
Term | Expected count | Observed count | Pathway size | P value |
Biosynthesis of plant hormones | 3.607 | 9 | 60 | 0.007 |
Biosynthesis of terpenoids and steroids | 2.886 | 7 | 48 | 0.020 |
Biosynthesis of alkaloids derived from terpenoid and polyketide | 3.066 | 7 | 51 | 0.028 |
3-Chloroacrylic acid degradation | 0.361 | 2 | 6 | 0.046 |
The promoter region of each gene is depicted, with coloured boxes denoting binding site location(s) of transcription factors displayed at right. TSS: transcription start site.
SP1 is a broadly acting transcription factor operating in conjunction with NF-YA, SREBP and PPARγ in promoting lipogenesis. The NF-YA and PPARγ TFBSs were also significantly over-represented in our group of n-3 PUFA-sensitive genes, although SREBP TFBS was not. E2F1 is a transcription factor involved in early adipogenesis, and positively regulates transcription of PPARγ
An emerging limitation to pathway analysis of transcriptomic data is that documented pathway models tend to overlap and intersect, yielding analytical results that are biased, incomplete or both
Previous work has described an inverse relationship between n-3 PUFA intake and n-6 fatty acid-derived prostaglandins (
To understand the potential regulatory consequences of dietary n-3 PUFA intake on adipose tissue biology, we analysed the promoter regions of n-3 PUFA-correlated genes to identify significantly over-represented transcription factor binding sites. Results from this analysis highlighted significantly over-represented transcription factors related to adipogenesis. The most strongly over-represented transcription factors were KLF4, SP1 and E2F1. SP1 and KLF4 share similar GC-rich target binding sites
In conclusion, we have taken a joint multivariate and network-based approach to transcriptomic analysis, relying on known metabolic reaction information to reveal coordinated paths of metabolite conversion. This approach highlighted coexpressed regions of the metabolic network with opposing direction of correlation with habitual n-3 PUFA intake and urinary isoprotane levels - relationships that were not identified using a traditional pathway enrichment test. Promoter analysis further highlighted adipogenic transcription factors as potential transcriptional regulators of n-3 PUFA-correlated genes.
Schematic illustration of pathway extraction algorithm; data frame at right shows example network data file at each step of algorithm. The key goal of this algorithm is to assess a linked path of nodes (in this case A→B→C→D), to identify if an unbroken path of metabolite conversion can be traced from the first node to the last. Given a node of interest (A) and a network path leading from A (as determined by Djikstra's algorithm; step 1) the algorithm examines each reaction pair in sequence, starting with the pair linked to the node of interest (step 2). The total list of interactions from A→B, and B→C are extracted from network file (step 3). Metabolites (and associated reactions) are removed from the pair of reactions if they cannot be associated with a path of conversion linking reaction 1 (A→B) to reaction 2 (B→C) (step 4). Self-linked reactions are included – e.g., B→B, where node B produces a metabolite through interaction with A, and converts the same metabolite to a different one that is further metabolized by node C (the algorithm only considers a single self-linked loop within each reaction pair, if present). If an unbroken path remains after removal of extraneous metabolites (step 4a), reaction pair is valid and algorithm continues to next pair of reactions (step 5). If not (step 4b), function exits.
(EPS)
Diet-gene and phenotype-gene relationships, and modular partitioning mapped to the transcriptionally coordinated human metabolic network. Green nodes: dietary variables; yellow: lipid, fatty acid and apolipoprotein variables; red: inflammatory and oxidative stress markers; blue: genes (enzymes). Enzyme nodes are connected if they meet two conditions: enzyme 1 produces a metabolite that is metabolized by enzyme 2, and genes encoding enzymes 1 and 2 show positive coexpression in the adipose tissue transcriptomic data. Dashed lines connecting diet-gene and plasma marker-gene pairs indicate negative correlation. Node shape indicates assignment in the 3 primary topological modules. Diamond: module 1; triangle: module 2; square: module 3.
(EPS)
Summary of anthropometric characteristics and habitual dietary patterns in the LIPGENE transcriptomic study cohort.
(DOCX)
Summary of plasma and urinary markers of metabolic health in the LIPGENE transcriptomic study cohort.
(DOCX)
Results from sPLS of adipose tissue gene expression and components of recorded habitual diet. Diet-gene pairs passing the similarity threshold of 0.7 are shown.
(DOCX)
Results from rCCA of adipose tissue gene expression and plasma fatty acids, lipids and apolipoproteins. Plasma marker-gene pairs passing the similarity threshold of 0.75 are shown.
(DOCX)
Results from rCCA of adipose tissue gene expression and plasma cytokines, IVGTT measurements, prostaglandin and urinary isoprostane. Plasma marker-gene pairs passing the similarity threshold of 0.7 are shown.
(DOCX)
Significantly overrepresented Gene Ontology ‘biological process’ terms in the adipose tissue TC network modules. Top 10 terms for each module are shown.
(DOCX)
Paths detected by applying network analysis algorithm to test muscle tissue dataset (GEO accession GSE474).
(DOCX)
We would like to sincerely thank Peadar Ó Gaora for informative feedback on the methodological approach.