We warmly thank the referees for their invaluable suggestions and comments. DC and RM conceived and designed the experiments. DC performed the experiments. DC, PD, and RM analyzed the data and wrote the paper.
The authors have declared that no competing interests exist.
Comparative genomics usually involves managing the functional aspects of genomes, by simply comparing gene-by-gene functions. Following this approach, Mushegian and Koonin proposed a hypothetical minimal genome, Minimal Gene Set (MGS), aiming for a possible oldest ancestor genome. They obtained MGS by comparing the genomes of two simple bacteria and eliminating duplicated or functionally identical genes. The authors raised the fundamental question of whether a hypothetical organism possessing MGS is able to live or not. We attacked this viability problem specifying in silico the metabolic pathways of the MGS-based prokaryote. We then performed a dynamic simulation of cellular metabolic activities in order to check whether the MGS-prokaryote reaches some equilibrium state and produces the necessary biomass. We assumed these two conditions to be necessary for a living organism. Our simulations clearly show that the MGS does not express an organism that is able to live. We then iteratively proceeded with functional replacements in order to obtain a genome composition that gives rise to equilibrium. We ruled out 76 of the original 254 genes in the MGS, because they resulted in duplication from a functional point of view. We also added seven genes not present in the MGS. These genes encode for enzymes involved in critical nodes of the metabolic network. These modifications led to a genome composed of 187 elements expressing a virtually living organism, Virtual Cell (ViCe), that exhibits homeostatic capabilities and produces biomass. Moreover, the steady-state distribution of the concentrations of virtual metabolites that resulted was similar to that experimentally measured in bacteria. We conclude then that ViCe is able to “live in silico.”
The origins of life represent a fascinating problem that has been addressed using different approaches and a wide variety of technologies. A theoretical approach consists of inferring a possible oldest ancestor genome from a well-defined comparison of current ones. A crucial problem concerns the validation of the proposed genome. The direct solution of synthesizing such a genome in a laboratory is often extremely difficult, due to the great complexity of a biological cell. In this paper, we present an approach for evaluating the chances a hypothetical organism has to be really viable, relying on computer simulations. Our method is based on a certain formal language, through which we specify a whole metabolic network, and we study its dynamics, in particular for verifying if a living organism has some fundamental properties, e.g., homeostasis. This approach is not equivalent to a wet-lab one, but it allows for early pruning of most of the inconsistently designed hypothetical organisms, thus saving biologists time and resources.
The search for LUCA, the Last Unknown Common Ancestor, is an open problem in evolutionary theory, which has been addressed using many different approaches. After the completion of several bacterial genomes, some authors tried to infer a possible minimal genome ruling out of non essential genes from existing small bacterial genomes. Dispensable genes were detected using both wet-lab techniques (e.g., see [
This hypothetical minimal genome was claimed to specify for a very essential prokaryote, but no argument was provided to address the fundamental question of whether a cell equipped with MGS (call it MGS-prokaryote) is able to live or not.
A direct, biological approach to answer this question could consist in synthesizing this genome, in cloning it in a ghost bacterium, and in evaluating the overall cell viability. However, there are many severe technical problems along this way, which make it hard to get an answer quickly.
We instead described this hypothetical cell as a computer program and simulated its behavior in silico. We then tested whether it shows some fundamental properties of living organisms. First of all we checked whether it enjoys homeostasis, i.e., the capability to reach a steady state in which the concentration of all the chemical species inside the cell fluctuates within a narrow range. We also investigated the capability of a MGS-prokaryote to produce biomass.
To model the MGS-prokaryote, we used (a variant of) the
To specify the metabolites and their relationships in terms of biochemical reactions, we used an enhanced version of the π-calculus, which has already been shown to be suitable for describing biological entities [
The π-calculus was designed to express, run, and reason about
In addition to communications, the
The main difference between the standard π-calculus and the enhanced version we used in this work is the notion of
The enhanced π-calculus shares with other language-based approaches a number of advantages with respect to other formal descriptions. The very specification of the cell is actually a program and can be executed, giving rise to a virtual experiment, unlike other static descriptions such as the SBML [
We specified in the π-calculus all the elements of the molecular machinery of the cell. Each element is specified in isolation, only defining its potential interactions with the environment. Then these pieces are put together in a compositional, holistic fashion. We wrote an interpreter for the enhanced π-calculus in Java, and we used it to run simulations. Simulations play the role of virtual experiments, performed according to given different initial conditions. The input file contains the definitions of all the metabolites inside the cell, the initial inner concentrations of the metabolites, and the rates of enzymatic activities, derived from the available real experimental data. The interpreter stores and displays some information about the virtual experiment, typically the concentrations of all the virtual metabolites (i.e., the number of the corresponding processes) or the usage of the different enzymes (i.e., the number of accesses of each channel) at given instants. With the first output, we determined the time course of the concentration of any virtual metabolite during the simulation; with the second one, we inspected the usage rate of the enzymes specified in the definitions, and, therefore, we tested the presence of unused metabolic pathways.
The MGS-prokaryote has been exhaustively described in the enhanced π-calculus. We represented the 237 genes, their relative products, and the metabolic pathways expressed and regulated by the genes, as the corresponding processes and channels. In particular: the Glycolytic Pathway, the Pentose Phosphate Pathway, the pathways involved in nucleotide, aminoacids, coenzyme, lipids, and glycerol metabolism. Moreover, MGS genes encode for a set of membrane carriers for metabolite uptake, including the PTS carrier. We placed this virtual cell in an optimal virtual environment, in which all nutrients and water were available, and where no problems were present in eliminating waste substances.
A large number of simulations (about 5,000) have been run, differing in the values of the initial parameters. We independently varied the amount of glucose in the extracellular environment (the number of copies was in the range 100–5,000) and the time interval of observation T (in the range 10–10,000). Recall that in our model time steps correspond to the occurrence of transitions, so we set T establishing the length of the computations performed by the simulator. In all the studied cases, the MGS-prokaryote could not reach a steady state; most of the essential metabolites fell to zero in a short period, as is clearly shown in
Time course of the concentration of ATP in the MGS-prokaryote: this metabolite falls to zero in a short period of time. Abscissa: the simulated time; ordinate: the ATP concentration in arbitrary units.
Time course of the concentration of 2AG in the MGS-prokaryote: this metabolite falls to zero in a short period of time. Abscissa: simulated time; ordinate: 2AG concentration in arbitrary units.
These results lead us to the conclusion that this MGS-based cell was not able to live, at least in silico.
We approached the problem of establishing which genes present in the MGS were really necessary and which were missing for the cell's life. We manually inspected all the metabolic pathways, examining all the possible situations of missing or duplicated functions. In the case of a suspect functional deletion or duplication, we modified the MGS in two possible ways. On the one hand, we tried to recover a “broken pipe” by inserting the orthologous gene playing the requested function. The added gene is taken from the genome of
Once a modification had been made, we iteratively performed several simulations on the newly proposed genome, and we evaluated the time course of the metabolites.
Finally, our efforts converged to the genome of a hypothetical virtual cell, called ViCe. It is able to reach, after a while, a steady state in all its metabolites. Comparing the genome of ViCe with MGS, we note that the most important difference is due to the insertion in ViCe of seven genes which play fundamental roles (those beginning with “mg” come from
As said above, we ruled out 76 genes present in the MGS. We considered some of them dispensable, as their suppression seems to have no influence on the overall behavior of the virtual cell. Among the other excluded genes, there are mg049, mg382, and mg052 that encode for three enzymes involved in de novo nucleotide biosynthesis. In this case we felt free to rule them out because ViCe possesses the “salvage pathways” for nucleotide synthesis. The three enzymes mentioned above turned out to be functionally redundant and so did the correspondent genes.
The modifications described in the previous section resulted in the specification of the virtual prokaryote ViCe. It includes the following components:
1. A complete glycolytic pathway that allows the oxidation of glucose to pyruvate and reduced-NAD. Pyruvate is then converted to acetate which, being a catabolite, can diffuse out of the cell. A transmembrane reduced-NAD dehydrogenase complex catalyzes the oxidation of reduced-NAD; this reaction is coupled with the synthesis of ATP through the ATP synthase/ATPase transmembrane system. This set of reactions enables the cell to manage its energetic metabolism.
2. A Pentose Phosphate Pathway, composed of enzymes leading to the synthesis of ribose phosphate and 2-deoxyribose phosphate.
3. Enzymes for glycerol-fatty acids condensation, but no pathways for fatty acids synthesis. So, the latter metabolites must be taken from the outside.
4. The so-called “salvage pathways” for nucleotide biosynthesis. Thymine is the only nucleotide the cell is able to synthesize de novo.
5. A proper set of carriers for metabolites uptake: (a) a Glycerol Uptake Facilitator Protein; (b) a PtsG System for sugar uptake; (c) an ACP carrier protein for fatty acids uptake; (d) a broad specificity amino acids uptake ATPase; (e) broad specificity permeases for other essential metabolites uptake.
6. The necessary enzymes for protein synthesis, including DNA transcription and translation. The whole machinery necessary for DNA synthesis is also included in ViCe.
ViCe has no pathways for amino acid synthesis. All the necessary amino acids are uptaken from the external environment. All the nucleotide biosynthetic pathways are present in our model, so the cell is equipped with the basic means necessary for cell reproduction; however, at the present stage we have neither designed nor implemented in silico this activity. Some metabolites are considered to be ubiquitary, among which are water, inorganic phosphate, some metals ions, and Nicotinammide. Their concentration in an external or internal environment is assumed to be constant and not to be significantly affected by cellular metabolism.
Summing up, the cell can take metabolites from an external environment using the set of permeases and ATPases specified above. Among the pathways of our virtual cell, there is Glycolysis: glucose and fructose taken from the outside are oxidized yielding energy in the form of ATP and reduced-NAD. Pyruvate, the last metabolite of conventional Glycolysis, then becomes acetate, which, in turn, diffuses out of the cell. The cell “imports” fatty acids, glycerol, and some other metabolites, e.g., Choline, and uses them for the synthesis of triglycerides and phospholipids; these are essential components of the plasma membrane. Our virtual cell is also able to synthesize DNA, RNA, and proteins; the needed metabolites are mostly taken form the external environment or synthesized along its own pathways (e.g., Thymine and Ribose).
The detailed genome of ViCe, composed of 187 different genes, is listed in
The ViCe genome is expressing a cell able to reach a steady state:
Time course of the concentration of ATP in ViCe: this metabolite reaches a steady state. Abscissa: simulated time; ordinate: ATP concentration in arbitrary units.
Time course of the concentration of 2AG in ViCe: this metabolite falls to zero in a short period of time. Abscissa: simulated time; ordinate: 2AG concentration in arbitrary units.
Time course of the concentration of Phosphatidylethanolamine (PEA) in ViCe. The concentration of this metabolite increased with time. Abscissa: simulated time; ordinate: Phosphatidylethanolamine concentration in arbitrary units.
Time course of the concentration of Phosphatidylglycerol (PG) in ViCe. The concentration of this metabolite increased with time. Abscissa: simulated time; ordinate: Phosphatidylglycerol concentration in arbitrary units.
Time course of the concentration of the total amount of proteins in ViCe. The concentration increases with time. Abscissa: simulated time; ordinate: concentration in arbitrary units.
Moreover, we considered the concentrations of the metabolites involved in the glycolytic pathway and we computed their respective proportions. We note that the results are compatible with those measured for real bacteria (see
Relative concentrations of virtual (striped columns) and real (black columns) metabolites of the glycolysis pathway. The χ2 test reveals a significant (
Some authors recently used a wet-lab approach to characterize the minimum set of genes necessary to sustain bacterial life [
The genome of
The intersection between the three sets contains 169 elements.
Because these genes resulted in not being dispensable according to all the three approaches, they are likely to be necessary for a minimal bacterium. Note that these genes represent 90% (169/187) of ViCe's genome. This can be seen as an argument for supporting the validity of our method. Moreover, consider that all but one of ViCe's genes (namely 186/187) are also included in R-genome or in MGS. In other words, the probability of obtaining a false positive (i.e., a necessary gene that resulted in not being dispensable according to the last two approaches) can be estimated to be on the order of 1/187.
Additionally, if we assume all the genes contained in the intersection of R-genome with MGS (namely 169 + 48) are not dispensable, the probability of obtaining a false negative, i.e., the probability of considering an essential gene as dispensable, can be estimated to be on the order of 48/187. Summing up, referring again to
Our approach to a functional screening of genomes was shown to be valid. In particular, our results have been obtained very cheaply with respect to a possible wet-lab approach involving de novo synthesis of the examined genome. Clearly, if a hypothetical genomes does not pass the in silico test, it will be unlikely to give rise to a living organism. It is hard to sustain the opposite: we cannot affirm that a hypothetical genome passing the test is able to sustain a living organism, and only a wet-lab approach can validate the proposal. Indeed, in silico experiments can help us to select which proposals are coherent, and thus more promising. As evidence of this, our work shows that the minimal genome we proposed for ViCe is surely more biologically reliable than an MGS.
Detailed list of the genome of ViCe.
(85 KB PDF)
2-Acyl-Glycerol
Minimal Gene Set
Virtual Cell