Conceived and designed the experiments: EV RA AS. Performed the experiments: EV RA AS. Analyzed the data: EV RA AS. Contributed reagents/materials/analysis tools: EV RA AS. Wrote the paper: EV RA AS.
The authors have declared that no competing interests exist.
Yeast successfully adapts to an environmental stress by altering physiology and fine-tuning metabolism. This fine-tuning is achieved through regulation of both gene expression and protein activity, and it is shaped by various physiological requirements. Such requirements impose a sustained evolutionary pressure that ultimately selects a specific gene expression profile, generating a suitable adaptive response to each environmental change. Although some of the requirements are stress specific, it is likely that others are common to various situations. We hypothesize that an evolutionary pressure for minimizing biosynthetic costs might have left signatures in the physicochemical properties of proteins whose gene expression is fine-tuned during adaptive responses. To test this hypothesis we analyze existing yeast transcriptomic data for such responses and investigate how several properties of proteins correlate to changes in gene expression. Our results reveal signatures that are consistent with a selective pressure for economy in protein synthesis during adaptive response of yeast to various types of stress. These signatures differentiate two groups of adaptive responses with respect to how cells manage expenditure in protein biosynthesis. In one group, significant trends towards downregulation of large proteins and upregulation of small ones are observed. In the other group we find no such trends. These results are consistent with resource limitation being important in the evolution of the first group of stress responses.
Although different environmental stresses trigger specific sets of protective changes in the gene expression of yeast, the adaptive responses to these stresses also share some common features. We hypothesize that minimization of metabolic costs may contribute to shaping such adaptive responses. If this is so, then such pressure should be more noticeable in the costliest biosynthetic processes. One of these is protein synthesis. Thus, we analyze the set of genes and proteins whose expression changes during the responses and look for evidence to support or falsify our hypothesis. We find that protein properties that are indicative of protein cost correlate to changes in gene expression in a way that is consistent with that hypothesis for a large number of adaptive responses. However, if changes in gene expression are small during the adaptive response, we find no evidence of protein cost as a factor in shaping the adaptive response.
Unicellular organisms are sensitive to environmental challenges. Their internal milieu acts as a buffer against such changes by mounting an adaptive response involving modifications at different cellular levels. Appropriate adaptive responses require intracellular signaling, changes in the conformation and activity of proteins, changes in transcription and translation of genes, etc.
Such fine tuning is shaped by various functional requirements and physiological constraints. The functional requirements are a result of the specific demands that are imposed on cell survival by the environment. On the other hand, the physiological constraints are defined by the limits within which the cell is physically capable of changing the activity of its component parts to meet the functional requirements. From a global point of view, adaptive responses can be seen as a multi-optimization problem because cells evolved appropriate responses to cope with different types of stress, while optimizing different parts of its metabolism for each of those responses
With these arguments in mind, it is thus important to identify the functional requirements and quantitative physiological constraints that may significantly shape adaptive responses. Among others, minimization of energetic expenditure plays an important role in cells growing exponentially in a rich medium. Several signatures that are consistent with minimization of metabolic cost have already been identified in the properties of the set of proteins that is expressed when cells are growing in rich media (
For example, genes coding for proteins that are highly abundant under basal conditions have a pattern of synonymous codon usage that is well adapted to the relative abundance of synonymous tRNAs in the yeast
Another signature that is found in genes that are highly expressed under basal conditions is a sequence bias that minimizes transcriptional and translational costs
A final example of a general signature is the codon bias of long genes. This bias is such that the probability of missense errors is reduced during translation
This body of results strongly supports the notion that metabolic cost acts as a selective pressure in shaping the properties of cells growing in a rich medium, in absence of environmental stresses. Thus, one might ask if minimization of metabolic cost is also an important factor in the evolution of adaptive responses to stress conditions. It is predictable that this evolutionary pressure might leave stronger signatures in adaptive responses that require the use of higher ATP amounts by the cell, such as adaptation to heat, weak organic acids, or NaCl. In these three cases, it has been reported that ATP concentrations decrease due to a high energy demand
Given that protein synthesis is one of the costliest biosynthetic efforts for the cell
We address these questions by investigating how is the value of several properties of proteins (size and molecular weight of proteins, codon adaptation index, aromaticity, average cost per amino acid, etc.) related to changes in gene expression levels during various environmental changes.
We find that genes whose expression is upregulated during different types of adaptive responses tend to code for proteins that are small, while genes whose expression is downregulated during the same responses tend to code for proteins that are large. This is a signature that is consistent with a selective pressure for minimizing metabolic cost in proteins synthesis. It is more significant in adaptive responses where changes in gene expression levels affect a large fraction of the genome. To our knowledge, this is the first general and global signature that has been identified for the properties of proteins involved in adaptive responses to stress.
Data from 249 published microarray experiments that measure changes in yeast gene expression under a battery of different environmental stresses
Categorization of protein function, biological process, and location was done using Gene Ontology (GO) terms provided by the Saccharomyces Genome Database (SGD,
Whole-genome data for basal protein abundance
The physicochemical properties of proteins as well as the list of protein complexes in yeast were obtained from the SGD ftp site
According to the available data
The microarray data we analyze provide information regarding relative up and downregulation (UpCF and DownCF, respectively) of gene expression with respect to a pre-stress control condition. To facilitate comparison between upregulated and downregulated genes, we use the inverse of the ratio for downregulated genes. Thus, all values for the ratios of changes in gene expression discussed below are greater than 1.
Changes in gene expression during stress responses are dynamic and, for the most part, transient. Because of this, we take the maximum value of up or downregulation as an approximated measure of the maximal change in gene expression during the transient stress response.
Changes in gene expression are underestimated for genes that undergo very strong up or downregulation, due to intrinsic limitations of the microarray technology
Spearman rank correlations are used to characterize the dependencies between properties of proteins and changes in gene expression to a first approximation. However, this statistical index has some constraints that limit its usefulness for our analysis. First, the high number of observations may lead to statistically significant results even with low correlation values. Second, it is very sensitive to noisy data. Third, distributions that are asymmetric and have heavy long tails, such as those of our datasets, may influence the correlations and produce false results. All these constraints may lead to erroneous interpretation of the results. Thus, although correlation analysis gives a global description of the possible trends, such an analysis needs to be complemented with more detailed methods in order to support an interpretation of the set of results.
Thus, to further assess the biological relevance of the correlations we use the following procedure. First, and because the distribution of each of the considered properties has long tails, we select the values that fall within the 80% interquantile for the property of interest. Then, we divide this range into 3 groups. Finally, we compute the ranks of the change in gene expression between the two extreme groups obtained by this criteria, discarding the middle group, and test for distribution differences by using the Mann-Whitney U rank-sum test. In the set of UpCF proteins, a positive z for this test means that the group with high values for the property is less upregulated than the group with low values for the property. In the set of DownCF proteins, a positive z for this test means that the group with high values for the property is more downregulated than the group with low values for the property (see for instance
Environmental condition | Up- CF | Down- CF | Thresholds | |||||
z | P | z | p | Lower | Upper | |||
ST25 | + | 4.19 | *** | + | 21.54 | *** | 7.79 | 14.91 |
ST30 | + | 7.63 | *** | + | 19.20 | *** | 6.17 | 11.70 |
Heat | + | 8.69 | *** | + | 10.54 | *** | 7.27 | 13.89 |
↓N | + | 3.88 | *** | + | 18.95 | *** | 6.90 | 13.12 |
Peroxide | + | 4.19 | *** | + | 13.35 | *** | 6.17 | 11.70 |
NaCl | + | 5.23 | *** | + | 6.49 | *** | 7.04 | 13.39 |
Diauxic | + | 1.82 | *** | + | 16.70 | *** | 6.96 | 13.26 |
↓AA | + | 1.04 | 0.15 | + | 14.88 | *** | 6.13 | 11.60 |
Sorbitol | + | 4.04 | *** | + | 15.91 | *** | 7.27 | 13.89 |
Alkali | + | 1.74 | *** | + | 6.54 | *** | 5.96 | 11.22 |
DTT | − | 11.66 | *** | + | 6.53 | *** | 6.98 | 13.31 |
Diamide | − | 2.77 | *** | + | 11.66 | *** | 8.63 | 16.65 |
Menadione | − | 4.14 | *** | + | 3.47 | *** | 7.56 | 14.49 |
Acid | + | 6.29 | *** | + | 6.36 | *** | 7.37 | 14.07 |
C Source | − | 12.66 | *** | + | 6.09 | *** | 7.03 | 13.40 |
↓Sorbitol | − | 7.87 | *** | − | 6.53 | *** | 5.50 | 10.32 |
We identify the extreme group values for abundance and use the Mann-Whitney analysis for characterizing positive or negative associations with gene expression levels, as detailed in the
Environmental condition | Up- CF | Down- CF | Thresholds | |||||
z | p | z | P | Lower | Upper | |||
ST25 | + | 5.76 | *** | + | 2.15 | *** | 408 | 653 |
ST30 | + | 4.23 | *** | − | 2.08 | *** | 416 | 665 |
Heat | + | 1.67 | *** | + | 4.53 | *** | 419 | 672 |
↓N | + | 9.91 | *** | + | 3.69 | *** | 415 | 662 |
Peroxide | + | 4.72 | *** | + | 8.11 | *** | 421 | 677 |
NaCl | + | 3.05 | *** | + | 4.20 | *** | 415 | 671 |
Diauxic | + | 9.47 | *** | + | 3.62 | *** | 405 | 639 |
↓AA | + | 1.61 | 0.05 | + | 2.65 | *** | 416 | 667 |
Sorbitol | + | 1.58 | 0.06 | − | 1.19 | 0.12 | 416 | 666 |
Alkali | + | 2.89 | *** | + | 2.73 | *** | 434 | 703 |
DTT | + | 13.69 | *** | + | 5.64 | *** | 411 | 658 |
Diamide | + | 11.07 | *** | + | 7.15 | *** | 408 | 660 |
Menadione | + | 7.51 | *** | − | 4.91 | *** | 435 | 702 |
Acid | + | 1.03 | 0.15 | − | 1.57 | 0.06 | 431 | 691 |
C Source | − | 15.20 | *** | − | 8.79 | *** | 413 | 662 |
↓Sorbitol | − | 7.72 | *** | − | 8.64 | *** | 415 | 656 |
We identify the extreme group values for length and use the Mann-Whitney analysis for characterizing positive or negative associations with gene expression levels, as detailed in the
All analyses were done using our own functions implemented in
As discussed in the
Characterize how the selected protein properties (see
Use changes in gene expression as a proxy for changes in protein level and investigate how such changes correlate with the protein properties considered in this work. In order to assure consistency of the results, and because the signal to noise ratio is low for our purposes, a three tiers analysis of the data is required. First, we perform a correlation analysis between changes in gene expression and the different values of the protein properties. However, even a statistically significant correlation coefficient can be misleading because these coefficients are almost always significant in large datasets. Furthermore, a correlation coefficient describes an inhomogeneous set of data with a point measure, which is an important limitation in our case. Second, and to overcome this limitation, we use the Mann-Whitney test to compare the bulk differences in gene expression between proteins that have extreme values for the property of interest. This test enables us to appropriately deal with the asymmetrical and heavy tailed nature of the distributions found in our datasets. Finally, we use moving-quantile plots to represent changes in gene expression as a function of the different properties. This allows us to do both, resolve any apparent contradictions that may arise between the correlation analysis and the Mann-Whitney analysis, and have a finer detail representation of our data.
Results from 2 are consistent with economy being an important factor in shaping different stress responses. To further investigate this issue we define a quantitative index that estimates the cost of changing protein expression. We use clustering analysis and discriminate analysis to investigate how the different stress responses behave with respect to this index.
The results from the previous steps suggest that there are two types of stress responses in regards to the amount of changes in gene expression observed during the response. Therefore, we pool together the gene expression changes in stress responses of the same type, in order to have a stronger signal. We reanalyze the pooled data in order to ensure consistency between this set of responses and the results for the individual responses. Then, we use the pooled dataset to investigate how molecular complexes and protein function might influence our results by analyze the proteins in different Gene Ontology (GO) categories.
We now discuss the results of the analysis in detail.
Some of the protein properties we consider are strongly correlated (see
The only type of data that is available for both, the entire genome and a comprehensive set of yeast adaptive responses, is gene expression data from microarray experiments
As a first step in the analysis of the relationship between changes in gene expression and each of the different protein properties, we evaluate how the maximum change in mRNA level for the microarray data of each stress response correlates to the property of interest. This is done by calculating the Spearman rank correlation coefficient. Upregulated and downregulated genes are analyzed independently. The results are summarized in
Only the results with statistical significance (p<0.05) are shown. Green bars correspond to upregulation. Purple bars correspond to downregulation.
Those results reveal substantial diversity in the properties of the proteins that are induced or repressed during the various responses to stress. Despite this, for most adaptive responses we found a similar pattern for the relationship between changes in gene expression and protein abundance, protein length, codon adaptation index, or mRNA abundance. The value for each of these properties tends to decrease if the gene is more upregulated and increase if the gene is more downregulated.
In 11 stress responses, the most upregulated proteins tend to be less abundant under basal conditions. In 15 stress responses, the more downregulated proteins tend to be more abundant under basal conditions. As expected because of the high correlation between protein abundance and CAI or basal mRNA abundance, the correlation between these properties and changes in gene expression is similar to those for abundance. Surprisingly, and although abundance and length are negatively correlated (
Correlations between changes in gene expression and GRAVY, Aromaticity, IP, ACPA or protein half-life are either non-significant or weak.
An analysis of the results finds that the properties that are more strongly correlated to changes in gene expression are those that can be considered as a proxy for cost of protein synthesis. Because of this we focus the next step of our analysis on those properties, which are protein length, protein abundance, CAI and T1/2. The relationship between each of these properties and cost of protein synthesis can be explained as follows. First, abundant proteins require more resources to synthesize and maintain than proteins that are present in low copy numbers. The same is true for proteins with low T1/2. Second, longer proteins are metabolically more expensive to synthesize than shorter proteins because they use more amino acids per peptide chain. Third, the codon adaptation index can also be a proxy for the rate of synthesis of a protein, given that proteins with a high CAI are more likely to be highly expressed than proteins with a low CAI.
If cost of protein synthesis is an issue that influences evolution of stress response, then proteins that are more expensive should be more strongly repressed and proteins that are cheaper should be more strongly upregulated. Therefore, one needs to analyze if changes in gene expression are different between the cheaper and the most expensive proteins. The Mann-Whitney analysis of the extreme groups for each property, although less intuitive than the correlation analysis, allows us to perform such a comparison. Thus, we are analyzing the groups of proteins in which the signal is likely to be strongest.
The results for abundance and length are summarized in
Results for CAI are almost identical to those for abundance and we do not find any clear trend for T1/2 (data not shown).
In order to have a finer detail representation of our data and resolve any apparent contradictions between the correlation analysis and the Mann-Whitney analysis (for example compare the results for heat shock response between
Plots show the moving-median using a window of 300 elements. Colors: Green for upregulation and purple for downregulation. Length unit is 102 amino acids. The lines represent the moving median plots. The shaded areas represent the regions from quantile 0.25 to quantile 0.75. Note that in most cases there is an upper limit to the length of upregulated proteins. This limit is smaller than the limit found for to the length of downregulated proteins.
Those plots also permit identifying responses where downregulation spans a longer range of protein lengths and abundances than upregulation. For example, in
By and large, only weak relationships are found between change-fold and T1/2 (
Short proteins with a high relative composition of metabolically cheaper amino acids are highly abundant under basal conditions, which is consistent with the hypothesis that lowering protein cost is a driving force in shaping the protein complement of yeast in those conditions
It has been proposed that gene expression profiles have signatures that are specific to the conditions under which they have evolved
To find support for this hypothesis we must estimate that cost for the different stress responses. Changes in protein levels can be roughly estimated over the whole genome by the changes in the levels of gene expression
In this equation
It is likely that specific functional requirements during any given stress response will lead to the synthesis of new proteins whose functionality is required for survival under the new conditions. By calculating a cost index for each of the twenty five Gene Ontology (GO) categories of cellular components defined in the SGD Slim Mapper Tool, we can analyze if the requirement for new functions is restricted to specific categories of the GO classification or not. Such a discriminating cost index can be defined as:
A cluster analysis of the twenty five dimensional vectors built for each adaptive response with the index
Basal Cluster corresponds to adaptive responses that may occur under energy or resources shortage. Trends in up- and downregulation of genes after stress. (A) Upregulation trend with respect abundance, (B) Downregulation trend with respect abundance, (C) Upregulation trend with respect length, (D) Downregulation trend with respect length. In each case, a (+) result indicates a significant result in the expected direction, (−) means a significant result opposite to the expected one, (o) indicates a non-significant result in the Mann-Whitney analysis. All correlations shown here have p<0.05 and p≤0.06 if *.
The normalized values of each component of the vector
The values are normalized so that the maximum calculated value of the index in the whole dataset is 1 and the minimum is 0. The basal condition is rescaled to 0.97 and would plot as a circle.
Altogether, the results presented in this section, suggest two broad types of adaptive responses. In one type, corresponding to responses in the Basal Cluster, the changes in gene expression are small. In this group of responses, we find no correlation between protein properties and gene expression. In the other type of stress, responses have evolved in a way that is consistent with a significant pressure to minimize the metabolic cost of the response.
The previous results suggest that the stress conditions considered can be classified in two broad types with respect to metabolic economy. On one hand, we have the Basal Cluster in
As a control for the adequacy of the lumped dataset, we need to make sure that it has the same characteristics as those of its individual constituent datasets. To do this, we compared the gene expression changes between groups of proteins with high and low values for each property in the lumped set of responses (
Properties | Up- CF | Down- CF | Thresholds | |||||
z | p | Z | p | Lower | Upper | |||
Molecular Weight | + | 8.68 | *** | + | 5.29 | *** | 46.95 | 75.23 |
Length | + | 8.47 | *** | + | 5.62 | *** | 414.33 | 663.67 |
Pr Abundance | + | 6.16 | *** | + | 20.88 | *** | 6.99 | 13.33 |
Pr Half-live (T(1/2)) | + | 0.48 | 0.32 | + | 3.42 | *** | 69.67 | 125.33 |
Isoelectric Point | − | 0.87 | 0.19 | − | 0.76 | 0.22 | 6.52 | 8.40 |
CAI | + | 3.41 | *** | + | 21.76 | *** | 0.17 | 0.24 |
CodonBias | + | 1.22 | 0.11 | + | 20.31 | *** | 0.11 | 0.23 |
FOP | + | 1.49 | 0.07 | + | 20.53 | *** | 0.47 | 0.54 |
GRAVY | − | 1.76 | *** | + | 4.69 | *** | −0.62 | −0.35 |
Aromaticity | − | 4.06 | *** | − | 3.42 | *** | 0.07 | 0.10 |
ACPA | − | 1.06 | 0.15 | − | 3.08 | *** | 22.99 | 24.02 |
[mRNA]A | + | 4.82 | *** | + | 13.55 | *** | 2.56 | 4.29 |
[mRNA]H | + | 5.20 | *** | + | 19.80 | *** | 2.07 | 3.83 |
Lower and Upper Thresholds indicates the cutoff limits for selecting proteins with low and high values for each of the protein properties. (+) z Indicates that proteins in the Lower group present higher up-expression and lower down-expression than those in the Upper group as compared by the Mann-Whitney analysis. (−) indicates the opposite result.
We confirm a strong trend to repress highly abundant proteins and upregulate only proteins that were less abundant under basal conditions. As expected, the other properties follow a coherent pattern that depends on their correlation with abundance (
The plot is the result of moving-quantile 0.75, 0.5 and 0.25 with a window of 300 elements. Green for up-expressed genes and purple for down-expressed. Length is divided by 102 amino acids.
An interesting result of this lumped analysis is that both upregulation and downregulation of genes are inversely correlated with CAI. CAI is a proxy for the rate of protein synthesis. This suggests that rate of protein synthesis (affected by CAI) may not be a significant pressure in shaping the responses we are studying. However, it must be stressed that we use CAI (or CB or FOP) estimates for the basal state. These measurements indicate adaptation to the basal tRNA complement of the cell. This complement is likely to change under varying conditions. Therefore, until genome-wide estimates of CAI during adaptive responses are available, rate of protein synthesis cannot be definitively excluded as an important selective pressure in shaping stress responses.
As stated earlier, results for length and abundance appears to be counterintuitive if one considers that a) large proteins are not abundant under basal conditions, and yet they tend to be more strongly repressed than short proteins, and b) short proteins are abundant under basal conditions, and yet they tend to be more up-expressed than large proteins.
Moving-median plots were calculated using a window of 300 elements. Green - upregulated genes; Purple - downregulated genes. (A) Plot by bins of abundance: (A.1) for proteins with abundance <876 protein per cell, (A.2) abundance between 876 and 2253, (A.3) abundance between 2253 and 6232, and (A.4) if abundance is ≥6232, (B) Shows the results for all bins separated by upregulation (B.1) and downregulation (B.2). Length unit is 102 amino acids and Abundance unit is 103 pr/cell.
Also, by dividing proteins into four different bins of basal abundance,
The Mann-Whitney analysis was also performed for the proteins classified in each GO category (for function, process and cellular component). This helps to evaluate if the energetic constraints to gene expression are a general pattern and allows us to control if specific sets of proteins, with a common GO category, contribute very significantly to the observed correlations.
Because each GO category contains a much lower number of proteins than the whole genome, the impact of the noise can be bigger. Even so,
Category | Up- CF | Down- CF | Thresholds | |||||
z | p | Z | p | Lower | Upper | |||
Molecular function unknown | + | 4.73 | *** | + | 2.98 | *** | 323 | 548 |
Catalytic activity | + | 5.36 | *** | + | 2.30 | *** | 493 | 767 |
Transporter activity | + | 1.62 | 0.05 | + | 2.69 | *** | 435 | 710 |
Structural molecule activity | + | 1.47 | 0.07 | − | 2.54 | *** | 351 | 595 |
Transcription regulator activity | + | 1.70 | *** | + | 3.19 | *** | 480 | 738 |
Other | + | 3.82 | *** | + | 2.27 | *** | 447 | 730 |
Cellular physiological process | + | 7.18 | *** | + | 3.70 | *** | 443 | 713 |
Metabolism | + | 5.33 | *** | + | 2.75 | *** | 425 | 689 |
Biological process unknown | + | 0.03 | 0.49 | + | 0.53 | 0.30 | 306 | 505 |
Transport | + | 2.72 | *** | + | 4.18 | *** | 480 | 773 |
Transcription | + | 1.66 | *** | + | 2.61 | *** | 499 | 792 |
Cell cycle | + | 3.10 | *** | + | 2.37 | *** | 543 | 843 |
Amino acid metabolism | + | 1.23 | 0.11 | + | 1.10 | 0.14 | 467 | 674 |
Signal transduction | + | 2.40 | *** | + | 1.33 | 0.09 | 548 | 885 |
Other | + | 2.16 | *** | − | 0.97 | 0.17 | 385 | 642 |
Cytoplasm | + | 5.67 | *** | + | 4.08 | *** | 416 | 672 |
Nucleus | + | 6.10 | *** | + | 3.66 | *** | 462 | 743 |
Cellular component unknown | − | 4.07 | *** | − | 1.86 | *** | 275 | 446 |
Mitochondrion | + | 5.77 | *** | + | 7.59 | *** | 436 | 719 |
Endoplasmic reticulum | + | 2.29 | *** | + | 2.32 | *** | 384 | 597 |
Cytosol | − | 2.83 | *** | − | 6.73 | *** | 306 | 505 |
Other | + | 3.85 | *** | + | 0.24 | 0.40 | 473 | 741 |
To further investigate the negative results, we analyzed both the basal abundance and the frequency of proteins involved in molecular complexes for each category (
Category | Complexes | Protein Abundance | ||||
N | Freq | Mean | 0.25 | 0.5 | 0.75 | |
Molecular function unknown | 137 | 0.05 | 4.31 | 0.72 | 1.71 | 3.42 |
Catalytic activity | 392 | 0.20 | 15.90 | 1.18 | 3.04 | 8.36 |
Transporter activity | 86 | 0.21 | 18.71 | 0.91 | 2.97 | 8.50 |
Structural molecule activity | 280 | 30.17 | 1.82 | 6.22 | 31.59 | |
Transcription regulator activity | 134 | 0.41 | 3.05 | 0.54 | 1.36 | 3.51 |
Other | 307 | 0.34 | 11.98 | 0.86 | 2.18 | 6.14 |
Cellular physiological process | 1239 | 0.28 | 13.89 | 1.04 | 2.58 | 7.08 |
Metabolism | 1059 | 0.35 | 16.01 | 1.17 | 2.87 | 7.82 |
Biological process unknown | 14 | 0.01 | 3.11 | 0.59 | 1.44 | 3.26 |
Transport | 201 | 0.21 | 12.69 | 1.11 | 2.75 | 6.92 |
Transcription | 246 | 0.49 | 4.28 | 0.77 | 1.73 | 4.49 |
Cell cycle | 140 | 0.34 | 4.00 | 0.53 | 1.38 | 3.69 |
Amino acid metabolism | 19 | 0.10 | 30.86 | 2.02 | 6.90 | 26.52 |
Signal transduction | 13 | 0.07 | 5.68 | 0.72 | 1.52 | 3.95 |
Other | 1 | 0.01 | 3.43 | 0.52 | 1.44 | 5.54 |
Cytoplasm | 670 | 0.20 | 14.04 | 1.08 | 2.73 | 7.39 |
Nucleus | 664 | 0.35 | 7.94 | 0.91 | 2.25 | 5.41 |
Cellular component unknown | 2 | 0.00 | ||||
Mitochondrion | 242 | 0.24 | 10.31 | 1.08 | 2.54 | 6.86 |
Endoplasmic reticulum | 30 | 0.09 | 10.54 | 1.21 | 2.84 | 6.76 |
Cytosol | 188 | 45.80 | 3.46 | 13.67 | 52.33 | |
Other | 86 | 0.15 | 12.05 | 0.63 | 1.73 | 6.07 |
For each group we computed the number (N) and frequency of genes in molecular complexes, and the mean and quartiles of protein concentrations.
We also made the analysis using more detailed GO terms. As shown in Supplementary
One of the six categories in which that relationship is inconsistent with the hypotheses is “Ribosome”. The other five categories are “Structural molecule activity”, “Helicase activity”, “Sporulation”, “Molecular function unknown”, and “Cellular component unknown”. We could expect that ribosomal proteins would contribute strongly to the hypothesized trends because they are highly abundant under basal conditions and highly repressed during stress. However, the results discard that those proteins are a major contributor for the general trends observed for the whole genome.
Several factors may explain the exceptions for some GO categories. First, the category may include mostly proteins whose specific function is required for the response. Such a situation could overcome a pressure for economy in protein synthesis. Interestingly, the consistency of the “Response to stress” category with our hypothesis suggests that such cases may be rare. Second, the relevant category may contain a high proportion of genes that code for proteins of very low basal abundance. Because the proteins in these groups contribute poorly, if at all, to the total cell mass, one could expect that the selective pressure for economy in protein synthesis is weak. Third, a high proportion of genes in a functional group may be involved in complexes. Whenever a GO category contains more than 50% of genes that are involved in molecular complexes, no correlation is found between protein length and gene expression changes (
Understanding if and how the size and abundance of protein in complexes is affected by a pressure to save metabolic costs in protein synthesis would require taking into account the size of the individual complexes. This, in turn, requires that the stoichiometry of those complexes is known with confidence. Because this information is not available for most protein complexes of yeast, a detailed analysis must await accurate data regarding such stoichiometry.
In
To test this hypothesis in the absence of data about stoichiometry of each complex, we selected genes coding for proteins that are flagged in SGD as being part of a protein complex. The analysis confirms that genes coding for proteins involved in complexes are more strongly repressed than other genes (z = 9.46, p<0.05).
Similarly, genes coding for proteins involved in complex formation are less upregulated than those coding for proteins not involved in complex formation (z = 16.22, p<0.05). This can be seen in the quantile-quantile plots of the change-fold shown in
Quantile-quantile plots show the divergence between the two lists by the deviation of the points from the line with a slope of 1. (A) Tendencies of the up-expression change-folds; (B) Tendencies of the down-expression change-folds.
What type of general selective pressure might explain the correlations we find between changes in gene expression and protein abundance or length during stress response? One answer to this question is that minimizing the cost of protein synthesis is a significant pressure that shapes changes in gene expression during adaptive responses. Why would minimizing metabolic costs improve fitness of
Calculations based on the typical cellular composition of yeast and bacteria predict that protein synthesis uses more metabolic resources and ATP molecules than the formation of other macromolecules and it is a limiting step for yield
Under stress, availability of resources may be significantly limited, and the cell must adapt quickly in order to survive. For challenging stress conditions, resource limitation may impose severe limitations to the adaptive response. Exposure to these kinds of stresses causes the cell to deviate considerable resources from its steady state metabolism towards the adaptive response and imposes important constraints to cell economy
Further support for the importance of protein cost as a selective pressure in the evolution of adaptive changes in gene expression is found in different studies. For example, pathways appear to have evolved to maximize flux for a minimum amount of protein, because the enzyme concentration may be limited by both the protein synthesizing capacity and the solvent capacity of a cell
There are three aspects that the cell can tune to decrease cost of protein synthesis. First, it can decrease the amount of protein that it synthesizes per time units. If we take changes in gene expression as a proxy of changes in protein synthesis, we find that, in many cases the overall protein synthesis during stress response is decreased (the yij index defined above is negative). Second, the cell may decrease cost of protein synthesis by expressing at higher levels proteins that are small. This would decrease the biosynthetic cost per protein chain and is consistent with our results. Finally, the cell may decrease the cost of protein synthesis by increasing the half life of proteins. We find no evidence for this strategy.
In summary, if decreasing the cost of protein synthesis significantly contributes to shaping the gene expression profile of an adaptive response, we should find trends in the composition of the changing protein complement that are consistent with the following predictions:
Because long proteins are more expensive to make than small proteins, protein length is an important component of the cost of protein synthesis. If cost of protein synthesis is minimized during the response we would expect that:
The results of our analysis are broadly consistent with these predictions (see
Further analysis that would directly establish whether there are limitations on resources and energy usage during a given adaptive response would require data about ATP usage and production under each relevant condition. Such data would allow us to better understand which constraints are important in shaping the evolution of those responses.
Change-folds of genes with respect to basal abundance. Plots show the moving-quantiles using a window of 300 elements. Colors: Green for upregulation and purple for downregulation. Abundance unit is 104 pr/cell.
(0.35 MB TIF)
Change-folds of genes with respect to protein half-live. Plots show the moving-quantiles using a window of 300 elements. Colors: Green for upregulation and purple for downregulation.
(0.40 MB TIF)
Change-folds of genes with respect to CAI. Plots show the moving-quantiles using a window of 300 elements. Colors: Green for upregulation and purple for downregulation.
(0.45 MB TIF)
Discriminant analysis. Environmental conditions were classified in four groups: 1) Basal Cluster- Basal vector, menadione, acid, change in carbon source, and sorbitol depletion; 2) NaCl, diauxic, aminoacid depletion, presence of sorbitol, akali, DTT, diamide; 3) heat shock, peroxide, nitrogen depletion; 4) stationary phase at 25°C and 30°C.
(0.07 MB TIF)
Spearman Rank Correlation Matrix between different physical properties of genes and proteins. 0 Not statistically significant.
(0.08 MB DOC)
Comparison of changes in gene expression between short and large proteins for different functional Yeast GO Slim categories.
(0.05 MB DOC)
Comparison of changes in gene expression between short and large proteins for different process Yeast GO Slim categories.
(0.07 MB DOC)
Comparison of changes in gene expression between short and large proteins for different cell component Yeast GO Slim categories.
(0.06 MB DOC)
Categorization by Function (Yeast Go-Slim): Molecular complexes and protein concentrations. For each group we computed the number and frequency of genes related to any molecular complex, and the mean and quartiles of protein concentrations.
(0.06 MB DOC)
Categorization by Process (Yeast Go-Slim Molecular complexes and protein concentrations. For each group we computed the number and frequency of genes related to any molecular complex, and the mean and quartiles of protein concentrations.
(0.07 MB DOC)
Categorization by Molecular Component(Yeast Go-Slim): Molecular complexes and protein concentrations. For each group we computed the number and frequency of genes related to any molecular complex, and the mean and quartiles of protein concentrations.
(0.06 MB DOC)
We thank Dr. Armindo Salvador and the anonymous reviewers for invaluable suggestions, discussions and comments about the results of this paper. This paper is dedicated to the memory of Prof. Ruy E. Pinto.