Conceived and designed the experiments: WN JBP JD. Performed the experiments: WN. Analyzed the data: WN. Wrote the paper: WN JBP JD.
The authors have declared that no competing interests exist.
The mechanisms by which adaptive phenotypes spread within an evolving population after their emergence are understood fairly well. Much less is known about the factors that influence the evolutionary accessibility of such phenotypes, a pre-requisite for their emergence in a population. Here, we investigate the influence of environmental quality on the accessibility of adaptive phenotypes of
Adaptation involves the discovery by mutation and spread through populations of traits (or “phenotypes”) that have high fitness under prevailing environmental conditions. While the spread of adaptive phenotypes through populations is mediated by natural selection, the likelihood of their discovery by mutation depends primarily on the relationship between genetic information and phenotypes (the genotype-phenotype mapping, or GPM). Elucidating the factors that influence the structure of the GPM is therefore critical to understanding the adaptation process. We investigated the influence of environmental quality on GPM structure for a well-studied model of
During adaptation, a population “moves” in genotype space in search of genotypes associated with high-fitness phenotypes. The success of adaptation depends crucially on the accessibility of such adaptive phenotypes. While adaptive phenotypes rely on natural selection for their fixation, their accessibility depends, primarily, on the structure of the genotype to phenotype mapping (GPM) and, secondarily, on the forces that move a population in genotype space – i.e. selection and genetic drift (see
By studying the factors that influence biologically relevant GPMs, we may gain insight into the accessibility of adaptive phenotypes. To that end, we have taken advantage of recent advances in the understanding of bacterial metabolic networks
We define a genotype's phenotype (equivalent, for our purposes, to fitness) using a model of metabolic flux. Specifically, a growing body of experimental and theoretical work
The protein products of the genes b0116, b0726, and b0727 combine to form a protein complex that catalyzes production of succinate coenzyme A (SUCCOA) from alpha-ketoglutarate (AKG) and coenzyme A, with the concomitant reduction of nicotinamide adenine dinucleotide (NAD) and release of carbon dioxide (CO2). A matrix
Below, we describe the results of our analyses of the influence of the environment on aspects of the structure of the
Before we begin presenting our results, we find it useful to put the results into perspective. The structure of an organismal GPM changes on both ecological and evolutionary time scales; changes to the GPM's structure may result from, among other factors, changes to the environment and the outcomes of interactions among individuals within a population. For a given GPM, our ability to make meaningful predictions about its structure by considering only a subset of the factors that determine that structure will depend on the degree of coupling between the underlying factors. The first set of results we describe below takes into account the effects of the environment on the GPM's structure, independently of population-level processes. For a particular environment, these results give insights into (static) statistical structures of the GPM, and they should be interpreted in that light. Subsequently, we show that some of these static insights are consistent with population-level simulations of the adaptation process and with analytic predictions of the relative speed of adaptation to different environments.
We begin by asking: how does the phenotype change as we move in genotype space, in search of genotypes associated with adaptive phenotypes? To answer this question, we computed the PPD, that is, the probability that two genotypes that are separated by a Hamming distance
The PPD was computed in acetate and glucose environments.
To gain further insight into the dependence of phenotype changes on genotype changes, we computed the CL of phenotype differences, which quantifies the robustness of the phenotype to genotype changes. The longer the CL, the more robust is the phenotype. Longer CLs are also characteristic of GPMs that have a relatively smooth structure
Shown are the correlation length (CL) of phenotype differences, the normalized mutual information (NMI) of genotype differences relative to phenotype differences, and the number of essential genes (essentiality) found in the metabolic network, under different environmental conditions. The environments are listed in increasing order of quality, except in the case of lactose whose position in the rank-ordering is not known precisely. The NMI was computed as described in
In addition to the CL, we defined another statistic called the NMI of genotype differences relative to phenotype differences (see
We will use a simple example to explain what the NMI measures. Consider a hypothetical population of individuals with known fitnesses. Suppose we wish to know the difference
It is important to keep in mind that here we are concerned with measuring the amount of information that genotype differences convey about phenotype differences, on which natural selection acts during adaptive evolution, and not, as is often the case (e.g., see
Additional information about the structure of the GPM and its potential impact on the accessibility of adaptive phenotypes is provided by the sizes of neutral networks. Neutral networks are important because they allow the search for adaptive phenotypes to proceed (by neutral drift) even if the GPM has a rugged structure. We estimated the distribution of the sizes of neutral networks by performing neutral walks on the GPM (see
Neutral walks were performed as described in
In the preceding section we inferred, based on static pictures of the structure of the metabolic GPM, that the GPM has a less rugged structure in qualitatively better environments, suggesting that adaptive phenotypes could be comparatively more accessible in such environments. To gain further insight into the possible impact of environmental quality on the dynamics of adaptation, we simulated the evolutionary search for the highest-fitness phenotype in different environments. Specifically, we simulated the adaptive evolution of a population of size 1000, starting at randomly chosen genotypes with fitnesses ≤20% of the highest possible fitness (i.e., 1.0) (see
All evolving populations found the highest-fitness phenotype during adaptation to acetate, while 82% and 78% of the populations did so during adaptation to glycerol and succinate, respectively. In contrast, the highest-fitness phenotype was found by only 67% of populations adapting to glucose and by 63% of populations adapting to lactose. In addition, the populations that found the highest-fitness phenotype did so at a much faster rate in acetate, glycerol, and succinate than in either glucose or lactose (see
The fraction
The results presented above suggest the existence of a positive correlation between the NMI and the speed of adaptation. To shed additional light on this result, we now describe a simple mathematical model that makes explicit the relationship between the NMI and the speed of adaptation to a given environment, under the assumptions of Fisher's fundamental theorem of natural selection (e.g., see
Mathematically, we can express the relationship between the genotype and fitness differences as:
The relationship between genotype and fitness differences for all types of individuals found in the population can be written as:
The mutual information of genotype differences relative to fitness differences is given by (e.g., see
Bacterial evolution experiments have demonstrated that the environment can exert an important influence on the structure of the genotype-phenotype map (GPM). For example, Remold and Lenski
We found that in all environments (except acetate) large genotype changes (>∼30) induce phenotype differences that follow an interesting bi-modal distribution. This bi-modal distribution is characteristic of the expected distribution of phenotype differences between randomly sampled genotypes, suggesting that in the considered environments the
In spite of the predicted ruggedness of the GPM in acetate, the poorest of the considered environments, very long (∼74% of the genotype length) neutral walks could still be performed on the GPM, suggesting that neutral drift can alter a substantial fraction of the phenotype during evolution. In other words, a population evolving in acetate could explore large portions of genotype space by drifting on neutral networks, increasing its likelihood of discovering adaptive phenotypes. Furthermore, the NMI was largest in acetate and smallest in lactose, suggesting that the information-transmission capacity of the GPM does not necessarily increase in better environments.
In order to gain further intuition about how qualitative changes to the environment could influence the dynamics of adaptation, we simulated the adaption of
Note that previous work
We conclude by pointing out some limitations of our empirical GPM model, and we discuss possible directions for future work. Firstly, our approach to analyzing
The GPM model we studied will add to the suite of available models (e.g., see
In addition, since the NMI affords an analytically tractable measure of evolvability, it could be useful to the mathematical investigation of the evolutionarily important relationship between evolvability and robustness (e.g., see
A number of reconstructions of the
A genotype of the metabolic network corresponds to a particular state of the network's genome (defined above). Mathematically, we represent the genotype as an ordered list of binary values (0 or 1), with a “1” at position
A genotype (respectively phenotype) space refers to a structural arrangement of genotypes (respectively phenotypes) based on the Hamming (respectively Euclidean) distances between those genotypes (respectively phenotypes). A GPM is a mapping from genotype space onto phenotype space. When the phenotype is fitness, as is the case in the present study, the geometric structure of the GPM is called a fitness landscape.
The probability
Choose a reference genotype at random.
Sample exactly
Compute the phenotype/fitness (i.e., the optimal biomass yield) of each genotype sampled in step 2. Normalize the computed fitnesses by dividing by the highest-possible fitness in the current environment (this facilitates the comparison of fitnesses across environments). Calculate the absolute difference between the computed fitnesses and the fitness of the reference genotype.
Arrange the fitness differences computed in step 3 into (
Repeat the above steps until convergence of
The above algorithm converges relatively fast (i.e.,
The correlation function describes, for example, how the similarity between the phenotype of a given genotype and that of an ancestral genotype decays as the two genotypes diverge. The correlation function of phenotype differences can be obtained directly from the quantity
Note that the above statistical methods are applicable to any mapping from a combinatorial set (e.g., the set of possible metabolic genotypes, which consist of sequences defined on a binary alphabet) onto a set consisting of either continuous- (e.g., the set of possible metabolic phenotypes/maximum biomass yields) or discrete-valued entities, whenever both the domain and range of the mapping are equipped with appropriate metrics (e.g.,
The mutual information is a standard information-theoretic quantity
When computing the NMI,
In this work, we estimated the value of
A neutral walk proceeds as follows
A “walker” starts at an initial, randomly chosen viable genotype,
A genotype,
The walker moves to
Steps 2 and 3 are repeated until it becomes impossible for the walker to move further.
We ran 100
Computer implementations of the methods and algorithms described above are available upon request.
Reactions found in the E. coli central metabolic network analyzed in this study
(0.14 MB DOC)
Conditional probability distribution of phenotype differences. The distributions were computed as described in the main text. Phenotype differences were binned using bins of sizes 0.01.
(1.05 MB TIF)
Conditional probability distribution of phenotype differences. The distributions were computed as described in the main text. Phenotype differences were binned using bins of sizes 0.05.
(1.72 MB TIF)
Convergence of the conditional probability distribution of phenotype differences. Shown is the Kolmogorov-Smirnov distance between the distribution of phenotype differences de conditioned on genotype differences dh obtained after t iterations of the uniform sampling algorithm described in the main text (denoted p(de|dh,t)) and the distribution p(de|dh,t-10), for three values of dh spanning a wide range. The Kolmogorov-Smirnov distance is given by max{abs(p(de|dh,t)-p(de|dh,t-10))}. The data were collected in a glucose environment.
(0.21 MB TIF)
Rank-ordering of metabolic environments based on the normalized mutual information (NMI). The environments are listed in increasing order of quality, except in the case of lactose whose position in the rank-ordering is not known precisely. The NMI was computed as described in the main text, using different values of p, the mutation rate per genotype position. The measurement scales of NMI values corresponding to different values of p were adjusted in order to facilitate their presentation on the same graph.
(0.20 MB TIF)
We are grateful to Simon Levin, Ned Wingreen, and three anonymous reviewers for very constructive comments on an earlier version of this manuscript; Michael Desai for very helpful discussions; and the laboratory of Bernhard Palsson at the University of California, San Diego, CA, for providing access to the latest reconstruction of the