OL, JMN, ZW, and SCP conceived and designed the experiments. OL, JMN, and ZW performed the experiments. OL developed the theoretical model and analyzed the data. ZW and LH contributed reagents/materials/analysis tools. OL, JMN, and SCP wrote the paper.
The authors have declared that no competing interests exist.
In many biological systems, the interactions that describe the coupling between different units in a genetic network are nonlinear and stochastic. We study the interplay between stochasticity and nonlinearity using the responses of Chinese hamster ovary (CHO) mammalian cells to different temperature shocks. The experimental data show that the mean value response of a cell population can be described by a mathematical expression (empirical law) which is valid for a large range of heat shock conditions. A nonlinear stochastic theoretical model was developed that explains the empirical law for the mean response. Moreover, the theoretical model predicts a specific biological probability distribution of responses for a cell population. The prediction was experimentally confirmed by measurements at the single-cell level. The computational approach can be used to study other nonlinear stochastic biological phenomena.
The structure of an unknown biological system is uncovered by experimentally perturbing the system with a series of input signals. The response to these perturbations is measured as output signals. Then, the mathematical relation between the input and the output signals constitutes a model for the system. As a result, a classification of biological molecular networks can be devised using their input–output functional relation. This article studies the input–output functional form for the response to heat shocks in mammalian cells. The Chinese hamster ovary (CHO) mammalian cells were perturbed with a series of heat pulses of precise duration and temperature. The experimental data, taken at the single-cell level, revealed a simple and precise mathematical law for the time evolution of the heat shock response. Parameters of the mathematical law can be experimentally measured and can be used by heat shock biologists to classify the heat shock response in different experimental conditions. Since the response to heat shock is the outcome of a transcriptional factor control, it is highly probable that the empirical law is valid for other biological systems. The mathematical model explains not only the mean value of the response but also the time evolution of its probability distribution in a cell population.
Complex biological systems are built out of a huge number of components. These components are diverse: DNA sequence elements, mRNA, transcription factors, etc. The concentration of each component changes over time. One way to understand the functions of a complex biological system is to construct a quantitative model of the interactions present in the system. These interactions are usually nonlinear in terms of the concentrations of the components that participate in the interaction process. For example, the concentration of a dimer is proportional to the product of the concentrations of the molecules that dimerise. Besides being nonlinear, the interactions are also stochastic. The production process of a molecule is not deterministic, and it is governed by a probability rate of production. In what follows, a nonlinear stochastic model for the response to heat shocks in CHO mammalian cells will be developed. Heat stress is just one example of the many ways a molecular system can be perturbed. From a general perspective, the structure of a molecular system is uncovered by imposing different perturbations (input signals) on the system under study, and then the responses of the system (output signals) are measured. From the experimental collection of pairs of input–output signals, laws that describe the system can be uncovered. This is the fundamental idea in Systems and Synthetic Biology [
To acquire the experimental data, we elected to use a system using a reporter gene where the expression of the green fluorescent protein (GFP) is under the control of the promoter region of the mouse
First, we will follow a description of the time course of the mean response to a heat shock. At elevated temperatures (39 °C to 47 °C), the heat shock promoter HSP70 is active and GFP starts to be synthesized. The input signals were chosen in the form of a pulse at a temperature (
(A) The accumulation of GFP is monitored for 18 h after heat shock. The fold induction is defined as the ratio of the mean value of GFP at different times (mean GFP) over the mean value at 30 min after the shock (mean GFP0).
(B) The logarithm of the fold induction saturates exponentially in time. The last 15 samples were predicted by the fit on the first 25 points.
(C) The formula
The fold induction of GFP with respect to a reference (
The reference is the first measured sample away from the end of the heat shock (30 min after the shock in
The time
The empirical law for the response of the cells to the heat pulse can be thus cast into the form:
The same law appeared in repeated measurements of pulses at 42 °C for 30 min duration (unpublished data). Parameter
These findings suggest that the same law is valid for other heat shock pulses, parameters
To find the range of validity for the empirical law, measurements were taken for the responses to heat shocks at various heat pulse parameters
For each heat shock pulse (T, D), 13 time samples were taken. At each time sample, the intensity of GFP in at least 10,000 cells was recorded. The groups A, B, and C represent weak, moderate, and strong heat shocks, respectively.
The law was again present in all responses for temperatures between 41.5 °C and 42.5 °C, (examples selected in
(A) For weak shocks (39.5 °C to 40.5 °C), the fits are less tight than they are for moderate shocks (41.5 °C and 42.5 °C).
(B) For strong heat shocks (duration greater than 15 min in this figure), the response starts at a slow pace. Later, the response grows faster, overcoming those responses produced by less strong shocks. The time origin and the reference value for fold induction, GFP0, is the mean response at 2 h after the shock.
In the following, a theoretical model will be developed to explain the experimentally discovered law. The exponential accumulation of the GFP shows that the derivative with respect to time of the mean GFP is proportional with itself:
There must be thus a molecular process, described by the exponential term
The accumulation rate of
The “accumulation” variable (
The theoretical model contains two parameters:
It is interesting to notice that the above time evolution can be re-expressed as a conservation law which is independent of any reference time. For any two time points
At this point, there is no more information in the activation–accumulation description above than is in the empirical law. However, one can search for more information hidden in the above two-component description by turning attention to the full data available, not only to the mean value of GFP. For each sampled time, the full data available consists of measured GFP levels for at least 10,000 single cells. These 10,000 single-cell measurements are typically distributed as in
The experimental GFP fluorescence intensities are Gamma-distributed, as predicted by the activation–accumulation model.
The fact that the levels of proteins in gene networks tend to follow a Gamma distribution, which is a continuum version of a discrete negative-binomial distribution, was presented in [
As time develops, the biological heterogeneity increases. At all times, the heterogeneity is Gamma-distributed. Gamma distribution parameters
To further check the reality of the Gamma distribution for heat shock response, a comparison of the Gamma fit with the lognormal fit is presented in
For 37 °C, the lognormal fits data better than the Gamma distribution. As the heat shock is increased from low to moderate, the Gamma distribution becomes a better fit. For strong heat shocks (at 44.5 °C for 30 min), there is no a clear separation between a Gamma distribution and a lognormal one.
The law
(A) The contours for
(B) In the lower left region, the contours for
The conclusion of this section will be rephrased using a control theory perspective. The end result of this paper is an input–output relation for the response of the CHO cells to heat shocks, together with a theoretical model that explains it. The input signals are pulses of a precise time duration
Parameters
The theoretical model is based on an activation variable
The components
At this point, the theoretical model is fixed and what comes next is a sequence of computations to extract information out of it. This information will be compared with the experimental results. Given the transition rates, the equation for the probability
The above equation for
The equation for the function
The goal is to find the time variation of the mean value and standard deviation for the activation and accumulation variable: 〈
The equations for
The activation–accumulation model being nonlinear, the equations for the factorial cumulants cannot be reduced to a finite system of equations, unless some approximation technique is employed. All third-order cumulants were discarded to obtain the above system of equations. In [
The solution to
The origin of time,
The probability
To find the solution, an initial condition
Here
The mean and variance for
Although the assumption that all the cells contain the same number of molecules at
The second step in choosing the probability distribution
The number
The time evolution of the mean 〈
To connect the theory with the experimental results, the probability distribution for the GFP intensity is needed. This distribution is the continuum limit of the distribution for
The change from the integer variable
In the last step, we used the approximation 1 −
To go from the discrete variable
This is a Gamma distribution for GFP ≡
The mean value of the Gamma distribution is
The way the material is organized and presented in this paper is an outcome of a series of guiding principles imposed upon the project. These guiding principles were formulated to keep in balance the experimental data with both the mathematical and biological models. The guiding principles are: 1) start from experimental measurements and discover an empirical law from data using signal generators as input into the system; 2) build a simple mathematical model with as few parameters as possible to explain the empirical law; 3) check the mathematical model using additional experimental information; 4) use a general mathematical technique, likely to be applied to other experimental designs; 5) keep the biological model and the mathematical model to a level of complexity commensurate with the richness of the experimental data
These guiding principles filtered out other possible presentation formats. For example, the fifth principle will prevent the development of a complex mathematical model built on a complex biological model, although many molecules involved in the heat shock response are known. One outcome of the strategy outlined above is the discovery of a new variable,
At a deeper level, the double exponential law and the activation–accumulation model need to be extended by simultaneously measuring the GFP production and the HSF1 activity. Following a series of modelling and data acquisition, more and more molecules can be reliably added into a quantitative description of the heat shock response.
Narrowing the discussion from general views to the specifics of this project, a natural question arises: why would cells evolve such a double exponential response? We can only speculate and say that cells need a very fast response immediately after the shock. Moreover, cells cannot bear for a long time such a fast exponential accumulation, so this initial exponential growth must be stopped. A compromise between these two requirements is the double exponential law for the mean heat shock response
Another aspect to be noted is the time evolution of the stochastic process that describes the heat shock response. Not only the time evolution of the mean value can be mathematically modelled, but also the time evolution of the probability distribution.
The time evolution of GFP distribution can be well-explained by a negative-binomial with a time-dependent parameter. This behavior is obtained by neglecting the statistical correlation between the activation and the accumulation variable in the stochastic activation–accumulation model. It will be interesting to reach a level of experimental accuracy at which the statistical correlation becomes detectable, and then measure the deviation of the probability distributions from the negative-binomial.
From a mathematical point of view, we choose to work with the discrete master equation because it is simple to relate it to a biological model. The transition probability rates can be easily connected with biological phenomena at the molecular level. The ease of building the model is counterweighed by the difficulty of solving the discrete master equation. To overcome this difficulty, we employ the method outlined in [
The biological significance of the approach can be also expressed using a control theory perspective. The structure of an unknown physical system is uncovered by perturbing the system with a series of input signals. The response to these perturbations is measured as output signals. Then the mathematical relation between the input and the output signals constitutes a model for the system. As much as possible, this theoretical model must also incorporate the molecular components of the system. The activation–accumulation model belongs to the category of input–output models. It is possible that other biological systems can be described by other simple models. A classification of molecular networks can thus be devised using their input–output functional relation. Moreover, decomposing the biological system in subsystems, there is a hope that global properties of each subsystem can also be described by a coarse-grained model. In this way, a hierarchy of models can be built to explain more and more details of a complex system.
A 5.3-kilobase DNA containing promoter and 5′-untranslated region of the mouse
CHO-K1 cells (ATCC) were grown in MEM-alpha (Cellgro) containing penicillin, streptomycin, and amphotericin (Cellgro) and complemented with 10% FBS (Gemini Bio-Products). Cells were transfected by lipofection using Lipofectamine (Invitrogen) as previously described. After 10 d of selection in hygromycin (500
The cells were detached with trypsin and allowed to recover in suspension in complete growth medium for 3 to 4 h at 1 × 106 cells/mL at 37 °C in a CO2 incubator. The cells were then aliquoted in 50 mL conical tubes, one for each experimental condition (temperature and duration of heat shock). Up to five different temperatures were tested simultaneously, one water-bath being used for each temperature. The temperature of each water-bath was accurately monitored with a precision Hg thermometer (accuracy ±0.1 °C). Then the cells were centrifuged, the medium was aspirated, and the heat was initiated by resuspending the cell pellet quickly at 5 × 105 cells/mL in a medium prewarmed at the temperature selected for the heat shock. The tube was then placed in the same water-bath for the remainder of the heat shock, after which the tube was placed in ice-cold water and agitated for the amount of time that had previously been determined to be necessary to bring the temperature back to 37 °C (from 2 to 14 s). The tube containing the cells was then placed in a waterbath set at 37 °C. From that point on, samples were taken every 30 min or every 2 h for up to 26 h. In all experiments, a control where the cells were kept at 37 °C for the whole time was included. The exact duration of each heat shock was monitored with a stopwatch. This protocol allowed a very strict control over the amount of input applied to the cells. The cells were kept in suspension in the 50 mL tubes in a CO2 incubator at 37 °C for the rest of the experiment.
At each time point, 1 mL of cell suspension was removed from each tube and placed in a 5 mL tube. The cells were centrifuged for 2 min at 300 g, the supernatant was aspirated, and the cell pellet was resuspended in 500
The samples were analyzed by flow cytometry on an LSR II (Becton-Dickinson) equipped with a 488 nm solid state laser. The performance of the system was routinely checked with fluorescent beads (8-peak beads, Shero Rainbow, Spherotech), and the same instrument settings were used in all experiments, yielding almost identical fluorescence intensities every time for the cells kept at 37 °C. The cells were gated based on their forward scatter (FSC) and side scatter (SSC), and the same gate was used for all the samples. The fluorescence of each cell was measured based on the area of the corresponding pulse. The data were analyzed with the Diva software (Becton-Dickinson) for the mean fluorescence. The flow cytometry binary FCS files were converted to an ASCII text format with FCSExtract utility (Stowers Institute for Medical Research). The data were consequently analyzed with cftool and dfittool from MATLAB (MathWorks).
The time evolution of the mean GFP expressed with respect to a reference initial time
The above time evolution can be reexpressed as a conservation law which is independent of any reference time. For any two time points
As the promoter is activated by increasing temperature pulses, 41.5 °C to 43.5 °C, the Gamma distribution becomes a better description of the biological variation (
The response at strong shocks can also be explained with the help of the activation–accumulation two-component model, by the following scenario. At the beginning of the heat shock, the activation component
In view of the above discussion, for strong shocks the mean GFP is given by a modification of
OL is grateful to W. H. Wong for helpful comments on the manuscript and continuous encouragement. Many thanks go to F. Vaida, Y. Zhang, B. L. Adam, and to E. F. Glynn for FCSExtract software.
Chinese hamster ovary
green fluorescent protein