Introduction
Genome sequencing has enabled the characterization of biological systems in a more comprehensive manner. Recent research in bioinformatics and systems biology has resulted in the development of numerous systematic approaches for the analysis of cellular physiology that have been reviewed elsewhere [1]–[4]. However, constraint-based reconstruction and analysis (COBRA), a mathematical framework for integrating sequence data with a plethora of experimental ‘omics’ data has been shown to be successful in the genome-wide analysis of cellular physiology [5]–. In addition, this approach has also been utilized to explore the metabolic potential, as well as the gene essentiality analysis of several organisms across different kingdoms of life [8]–[13]; however, the COBRA approach has not yet been implemented for Dehalococcoides, or any other known dechlorinating bacterium.
Using acetate as a carbon source and hydrogen as an electron donor, small, disc-shaped anaerobic bacteria Dehalococcoides are capable of dehalogenating a variety of halogenated organic compounds as electron acceptors, of which many are problematic ground water pollutants [14]–[17]. Dehalococcoides ethenogenes strain 195 (strain 195) is the first member of this phylogenetic branch that was grown as an isolate [18]. Subsequently, a number of Dehalococcoides strains were isolated: strain CBDB1 [19], strain BAV1 [20], strain FL2 [21], [22], strain GT [23], and strain VS [24]. The strains respire through a membrane-bound electron transport chain (ETC) [25]–[27], which is incompletely defined. Reductive dehalogenases (RDases), encoded by reductive dehalogenase homologous (rdh) genes, are pivotal membrane-associated enzymes of the ETC [26], [27]. Genome sequencing has revealed the presence of multiple non-identical putative rdh genes in each strain [28]–[31]. Since these microbes respire chlorinated pollutants by RDase-catalyzed reductive dechlorination reaction, rdh genes determine a significant part of Dehalococcoides' phenotypes. Functional characterization of only 5 of the over 190 rdh genes reveals that cobalamin — a corrinoid compound — is an essential cofactor for the corresponding RDases [32]–[36]. Hydrogenase (H2ase) is another key enzyme of Dehalococcoides ETC [26], [27], [29], [30]. Interestingly, the genomes of Dehalococcoides strains encode 5 different types of H2ases: membrane-bound hup, ech, hyc, hym, and cytoplasmic vhu [29], [30], [37], [38]. The presence of multiple types of H2ases clearly emphasizes the importance of H2 in their energy metabolism [18]–[21]. This multiplicity of H2ases and RDases further highlights redundancy in the organisms' energy conservation process that may ensure a rapid and efficient response of their energy metabolism towards changing growth conditions [39], [40].
In addition to RDase and H2ase, the ETC likely requires an in vivo electron carrier to mediate electron transport between these enzymes. Previous studies have shown that the reductive dechlorination reaction requires an in vivo electron donor of redox potential (E0′) ≤−360 mV [25], [27], similar to other dechlorinating bacteria [17], [41], [42]. The cob(II)alamin of corrinoid cofactor in the RDase enzyme is reduced to cob(I)alamin during the reductive dechlorination reaction; hence, necessitating a low-potential donor because the redox potential (E0′) of Co(II)/Co(I) couple is between −500 and −600 mV [17], [41], [43]. While quinones, such as menaquinone or ubiquinone could act as electron carriers in anaerobes [44]–[46], experimental evidence suggests this is not the case in Dehalococcoides [27], [47]. Moreover, the redox potentials for quinones (Menaquinone ox/red E0′ = −70 mV, Ubiquinone ox/red E0′ = +113 mV; [48]) are not compatible with the RDases' requirement of a low potential donor. Furthermore, cytochrome b — a typical donor for the quinones to participate in the redox reactions of anaerobic ETCs [49], [50] — appears to be absent in the genomes of Dehalococcoides [29], [30]. However, the genomes have ferredoxin, an iron-sulphur protein, which can act as the low-potential donor for RDases because ferredoxin is the most electronegative electron carrier yet found in the bacterial ETCs [42], [48], [51]–[55].
Although, Dehalococcoides are capable of harnessing free energy from the RDase catalyzed exergonic reductive dechlorination reactions by coupling to ATP generation for growth [14], [17], their pure culture growth is much less robust than their growth in mixed cultures [24], [56], [57]; even in mixed cultures, their growth yield is not as high as that predicted from the free energy of reductive dechlorination [26], [58]. Thus, in order to better understand dechlorination-metabolism, and given that to-date sequenced Dehalococcoides genomes are more than 85% identical at the amino acid level [38], [59], we developed a pan-genome-scale constraint-based in silico metabolic model of Dehalococcoides. The model was constructed from the complete genome sequences of 4 geographically distinct strains: strain CBDB1 from the Saale river near Jena, Germany [60], [61], strain BAV1 from Oscoda, Michigan, USA [62], [63], strain 195 from a wastewater treatment plant in Ithaca, New York, USA [18], [64], [65], and strain VS from Victoria, Texas, USA [24], [66]. Although the model comprises multiple genomes, it analyzed the outcome of metabolic genes only. Also, it did not include information about cellular regulation due to the lack of adequate knowledge about Dehalococcoides regulatory networks. Nonetheless, the model was primarily used to investigate the intrinsic metabolic limitations, in addition to addressing open questions regarding Dehalococcoides physiology, such as the incomplete nature of various metabolic pathways, and attendant implications on metabolism and growth. We also identified the environmental conditions from the model simulations that resulted in faster in silico growth of Dehalococcoides. Furthermore, the constraint-based model, along with the comparative analysis of 4 genomes, clarifies both similarities and differences among the strains in terms of their core metabolism and other biosynthetic processes leading to an improved understanding of metabolism and evolution in Dehalococcoides.
Materials and Methods
Dehalococcoides Pan-Genome
In order to develop the pan-genome of Dehalococcoides, we obtained strain CBDB1 genome sequence from JCVI (http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi) while strain 195 and strain BAV1 genome sequences were downloaded from the IMG database (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi). Strain VS genome sequence was obtained from Alfred Spormann at Stanford University, CA. The genome sequences were compared using OrthoMCL [96], a widely accepted method for finding orthologs across different genomes [97]. OrthoMCL is based on reciprocal best BLAST hit (RBH), but recognizes co-ortholog groups using a Markov graph clustering (MCL) algorithm [98]. The Dehalococcoides pan-genome was developed following a previously described approach [67], [68] outlined in Figures 1–4 in Text S2 and in the following section.
First, we identified putative orthologs between a reference genome and a subject genome which were selected arbitrarily from the 4 genomes compared. The analysis was conducted by OrthoMCL keeping the parameters of the algorithm in default settings. Subsequently, those genes that were present only in subject genome 1 were identified and combined with the reference genome to create the augmented genome 1 (Figure 1 in Text S2). Then, the augmented genome 1 was compared and analyzed with subject genome 2 as described above to construct the augmented genome 2. The pan-genome was obtained by comparing the augmented genome 2 and subject genome 3. The number of genes in a pan-genome was reported to depend on both the order of genomes analyzed and the reference genome [67]; hence, we constructed 6 pan-genomes for 6 different genome-order combinations. Of these 6 pan-genomes, we selected the one with the highest number of genes (2061) as Dehalococcoides pan-genome in order to capture the entire gene repertoire of Dehalococcoides species [74]. We also identified the core, dispensable and unique genomes for Dehalococcoides pan-genome using methods described previously [2], [67], (Figures 2–4 in Text S2).
Reconstructing the Metabolic Network of Dehalococcoides
The pan-genome was used to reconstruct the pan-genome-scale metabolic network, and the constraint-based model of Dehalococcoides metabolism was developed from this reconstruction. Since the strains of Dehalococcoides share a high degree of sequence identity, we arbitrarily chose strain CBDB1 genome as a reference and constructed the metabolic network from its annotated genome sequence [29], publications regarding its physiology, and various genomic and biochemical databases [7]. Then, we included other metabolic genes from the pan-genome into the reconstructed network that were missing from strain CBDB1 genome. Five gene correspondence tables for the four genomes were prepared (Tables 3–7 in Text S1) for facilitating gene identification and cross-reference regardless of the genome of interest. We developed and manually curated the reconstructed network using the procedures described previously [3], [7], [99], [100] with the SimPheny platform (Genomatica Inc., San Diego, CA). Since genome annotations are error prone [101], annotated genes of strain CBDB1, as well as the pan-genome genes with defined metabolic functions were verified by identifying their homologs in other well characterized organisms, including Escherichia coli, Bacillus subtilis, Geobacter sulfurreducens and Saccharomyces cerevisiae with BLAST [102]. Subsequently, confidence levels were assigned based on the degree of sequence identities or reciprocal best BLAST hits. Dehalococcoides genes, for instance, having >40% amino acid sequence identity with homologs in the protein databases (SWISSPROT [103], IMG [71], PDB [72], GO [104]) were given a confidence level of 3, and genes with >30% and <30% identity were assigned a confidence level of 2 and 1, respectively. In addition, these genes were also evaluated on the basis of both gene order or conserved synteny [105], along with phylogenetic analysis with the updated versions of biological databases, such as UniProt [70], IMG [71], GO [104], and PDB [72]. Afterwards, both elementally and charge balanced biochemical reactions were assigned to the genes to create the gene-protein-reaction (GPR) associations [3]. These reactions were further verified by biochemical literature as well as enzyme databases, such as KEGG [106], BRENDA [107], MetaCyc [108], and ENZYME [109]. In some instances, genes required for some biosynthetic reactions essential for producing all the precursor metabolites for cell biomass were not identified. Such reactions (21 in number detailed in Table 1 in Text S1) were added to the reconstructed network as non-gene associated reactions.
Estimation of Biomass Composition and Maintenance Energy Requirements
The biomass composition (dry basis) of 1 gram of Dehalococcoides cells was calculated from various published and experimental data, and expressed in mmol (millimoles)/g DCW (dry cell weight) (Tables 19–24 in Text S2). Due to the lack of detailed experimental data on the cellular composition of Dehalococcoides, the weight fractions of protein, lipid, carbohydrate, soluble pools and ions of the cell were estimated from the published genome-scale model of Methanosarcina barkeri [83]. We choose to use data from M. barkeri model — an archaeon — because Dehalococcoides cells are enclosed by the archaeal S-layer like protein instead of a typical bacterial cell wall [18]–[20]. The weight percent of DNA was estimated from the cell morphology, length of the genome sequence [110] and molar mass of the DNA, and the weight percent of RNA was calculated from the experimental data on a Dehalococcoides containing mixed microbial culture (see Text S2 for details). In addition, the detailed composition of each macromolecule as well as the composition of cofactors, and other soluble pools and ions are presented in Tables 19–24 in Text S2. The distribution of amino acids, nucleotides and cofactors in the biomass was calculated from the data reported previously [111], [112] while the weight fractions of different fatty acids were estimated from White et al. [113]. These compositions were then integrated into the model as a biomass synthesis reaction, BIO_DHC_DM_61 (see Text S2 for additional details).
Maintenance energy accounts for the ATP requirements of cellular processes, such as turnover of the amino acid pools, polymerization of cellular macromolecules, and ion transport, which are not included in the biomass synthesis reaction [114]–[116]. These ATP requirements can be either growth associated (GAM), i.e., related to assembly and polymerization of macromolecules (eg. proteins, DNA, etc.), or non-growth associated (NGAM) that corresponds to maintaining membrane potential for keeping cellular integrity [114], [115], [117]. Due to the lack of experimental chemostat data required for calculating both maintenance parameters [118], the NGAM for a Dehalococcoides cell (1.8 mmol ATP.gDCW−1.h−1) was calculated from the experimental decay rate (0.09 day−1) [24] and the average of pure-culture experimental growth yields (0.69 g DCW/eeq; Table 25 in Text S2) following the procedures described previously [114], [116]. The GAM was estimated by regression, using an initial estimate of 26 mmol ATP/g DCW for a typical bacterial cell (Table 28 in Text S2) [111]. The initial estimate of GAM and the calculated NGAM were then used to simulate (using flux balance analysis, described below) the average of reported pure-culture experimental growth rates (0.014 h−1; Table 26 in Text S2). A GAM of 61 mmol ATP/g DCW gave the best prediction of the experimental growth rate.
In Silico Analysis of Dehalococcoides Metabolism
Flux Balance Analysis (FBA) relies on the imposition of a series of constraints including stoichiometric mass balance constraints derived from the metabolic network, thermodynamic reversibility constraints and any available enzyme capacity constraints [3], [119], [120]. The imposition of these constraints leads to a linear optimization (Linear Programming, LP) problem formulated to maximize a cellular objective function such as the growth rate. Hence, the biomass synthesis reaction is assumed to be the objective function to be maximized to solve the LP problem in SimPheny. In addition, a number of reversible reactions were added in the network for exchanging external metabolites, such as acetate (ac), chloride (Cl−), carbondioxide (CO2), sulphate (SO4−2) etc., to represent the in silico minimal medium (Table 2) for Dehalococcoides. As discussed earlier, cobalamin is essential for Dehalococcoides growth, but they are unable to synthesize it de novo; hence, they salvage cobalamin from the medium. In order to analyze whether cobalamin flux can limit Dehalococcoides growth, we performed a robustness analysis on the cobalamin exchange reaction for different weight fractions of cobalamin in the biomass. We also simulated growth rates by incorporating all the pathways required for de novo cobalamin synthesis in iAI549 for analyzing cobalamin synthesis cost, and its effect on Dehalococcoides growth. Finally, to identify whether the growth of Dehalococcoides was carbon or energy limited, the growth simulations were conducted by varying acetate fluxes and energy transfer efficiencies since acetate and H2 are the carbon and energy sources of these microbes, respectively [18]–[21], . Energy transfer efficiencies were calculated by normalizing the ATP fluxes to the maximum ATP that could be generated from H2 based on Gibb's free energy of H2 oxidation and the energetic cost of ATP synthesis (mol ATP/mol H2) (see Table 30 and Text S2 for additional details). The constraints set used to simulate Dehalococcoides growth is listed in Table 18 in Text S1, and the SBML file for the reconstructed network (iAI549) is presented as Dataset S1.