Conceived and designed the experiments: SRH. Performed the experiments: SRH. Analyzed the data: SRH PM SCM. Contributed reagents/materials/analysis tools: SRH PM. Wrote the paper: SRH PM SCM.
The authors have declared that no competing interests exist.
Response of cells to changing environmental conditions is governed by the dynamics of intricate biomolecular interactions. It may be reasonable to assume, proteins being the dominant macromolecules that carry out routine cellular functions, that understanding the dynamics of protein∶protein interactions might yield useful insights into the cellular responses. The large-scale protein interaction data sets are, however, unable to capture the changes in the profile of protein∶protein interactions. In order to understand how these interactions change dynamically, we have constructed conditional protein linkages for
Many cellular processes and the response of cells to environmental cues are determined by the intricate protein∶protein interactions. These cellular protein interactions can be represented in the form of a graph, where the nodes represent the proteins and the edges signify the interactions between them. However, the available protein functional linkage maps do not incorporate the dynamics of gene expression and thus do not portray the dynamics of true protein∶protein interactions in vivo. We have used gene expression data as well as the available protein functional interaction information for
Gene expression pattern in all organisms is a property of the environmental conditions in which they grow. Expression of a large number of genes is turned on or off conditionally and temporally allowing the organisms to adapt to different growth or changing environmental conditions. While some genes are constitutively expressed under many different conditions, presumably being essential for the organism to carry out basic cellular processes for growth and sustenance, many genes are expressed only under defined conditions. DNA microarray offers a powerful tool to study such gene expression profiling. Studying the gene expression pattern under different conditions therefore offers an attractive approach to study the response of an organism to changing environmental conditions.
The traditional analysis of microarray data involves measuring differential expression between two samples after background elimination and data normalization. An unsupervised classification method such as clustering or principal component analysis is popularly used to identify genes that have a similar regulation pattern
A few attempts have been made to analyze differences in gene expression arising out of different conditions of growth. The gene expression profiling in
While the analysis of gene expression data provides useful insights into the adaptation process, it is believed that the response of organisms is dictated by the dynamics of biomolecular interactions profile. One of the inspirations to carry out the present study was to understand the changing landscape of protein-protein interactions under different environmental conditions. The protein-protein interaction studies carried out experimentally usually represent only a fraction of all the possible interactions among different cellular proteins
There have been a few attempts to combine protein: protein interaction networks and gene expression data
We have used gene expression information of
Predicted functional interaction network for
As a case study, we have chosen to study the gene expression data from UV exposure in wild type and SOS deficient
The
It is anticipated that the effect of turning off or on of the genes expressed under the four conditions will be reflected in the conditional networks. While, this is likely to lead to many local perturbations in the network, the global properties of the four networks are not likely to change significantly. Various topological properties of the conditional networks under the perturbations such as mutation (
The four conditional networks exhibit similar network parameters (
Property | Parent Network | UWT | TWT | UML | TML |
Nodes | 3,682 | 1,899 | 1,865 | 1,957 | 1,947 |
Edges | 78,048 | 34,893 | 34,680 | 31,900 | 33,513 |
Percentage core nodes | 96.9 | 97.4 | 97.9 | 96.1 | 95.5 |
Average degree | 42.4 | 36.7 | 37.2 | 32.6 | 34.4 |
Degree exponent | 1.2 | 1.1 | 1.1 | 1.1 | 1.1 |
Diameter | 11 | 8 | 8 | 8 | 9 |
Mean eccentricity | 7.99 | 5.66 | 5.78 | 5.89 | 6.07 |
Average clustering coefficient | 0.23 | 0.21 | 0.21 | 0.22 | 0.22 |
Fractal dimension | 3.9 | 3.5 | 3.4 | 3.5 | 3.5 |
Network efficiency | 0.36 | 0.37 | 0.38 | 0.36 | 0.37 |
Global network parameters for the parent network (15) and the conditional networks. UWT, UV Untreated Wild Type; TWT, UV Treated Wild Type; UML, UV Untreated
Each of the four conditional networks possesses unique nodes corresponding to the genes that are expressed differentially. Interestingly, the uniquely expressed genes include a few hubs and transcription factors. The lists of proteins that are identified to be uniquely expressed are listed in the
Mapping of the unique nodes of each of the four comparison sets to different metabolic pathways for
One of the interesting genes that we observed to be expressed only in the UV treated cells is the
Another interesting example is the unique expression of genes involved in the iron uptake system in the untreated wild type cells. The proteins EntA, EntB and EntF function in the pathway of enterobactin synthesis and the proteins FepA and FepB form a part of the channel to transport Fe-enterobactin complex inside the cell. When cells are UV treated, reactive oxygen species (ROSs) are synthesized via photo-Fenton reaction which leads to oxidative damage of structural proteins, enzymes, DNA and lipids. Thus, it is likely that cells repress iron uptake to protect cellular macromolecules from damage. The absence of these iron uptake proteins from UV treated wild type network supports this idea.
The analysis of uniquely expressed nodes under one condition, but not in another condition, indicates some of the possible effects of UV radiation on
An interesting aspect in systems analysis is to study the effect of selective removal of nodes on modifications in the shortest path lengths in the conditional networks. The shortest path lengths in a network signify the efficiency of communication between the nodes, and any alteration in these paths might suggest significance of these nodes under the two conditions. Importantly, the overall diameter of the four conditional networks is identical, indicating that diameter as a global property of the network is not subject to change. Moreover, all the networks have small world property; almost all nodes can be reached from every other in a small number of steps. This is not surprising, considering the biological robustness that is reflected in these networks. Thus, analysis of shortest path lengths might yield interesting insights into the relative importance of communication networks in the four sub-networks.
In order to analyze local changes in pathlength differences, the reduced pathlength matrices were constructed for the common nodes in network pairs under study. The pathlength difference of more than or equal to 3 for each node pair in two reduced networks were considered significant. As expected, for most of the node pairs, there is no change in the pathlength as there are multiple paths to reach from one node to another node even in the event of a collapse of a particular path. Interestingly, we observe considerable variation in the path for some node pairs, manifesting their reduced connectivity in terms of efficient information exchange, two examples of which are discussed in detail below.
The shortest pathlength from AmyA, a cytoplasmic α-amylase to many of the glycogen metabolism enzymes is observed to be increased in UV treated wild type network (
In the untreated wild type network, the subgraph for starch and sucrose metabolism pathway proteins is well connected. The absence of MalZ in treated wild type network increases the path from AmyA to some of the glycogen metabolism proteins significantly.
Another interesting example pertains to the phosphotransferase system in
The absence of CmtB in the UV treated
It has been reported that the highly connected nodes of the network (hubs) are three times more likely to be essential than the poorly connected nodes
It is likely that the importance of a functional role of a gene might differ according to the prevailing condition of growth. The relative importance of a node in graph theory can be assessed by calculating various centrality measures. We have therefore analyzed different centrality measures of graph theory with respect to their relevance to the four sub-networks.
Degree centrality is based on how well the node is connected in a graph. Degree centrality thus states that a node tends to be essential in a network if it is highly connected and its removal has severe impact on the overall topology and connectedness of the network
To address conditional or relative criticality of a node, we calculated the difference in the centrality measures for the common nodes in the comparison set. The centrality measure difference is approximately normally distributed, thus about 99.7% values are expected to lie within 3 standard deviations of the mean value. For most of the nodes, there is no change in the centrality value as expected. We have chosen to study those proteins whose centrality measure difference is more than 3 times the standard deviation of the distribution. When untreated wild type and UV treated wild type networks are compared, the proteins belonging to carbohydrate metabolism and energy metabolism such as BglX, Dld, GatB, GlgA, CydA, CydB and YneH have greater centrality measure in the untreated wild type network. The replication and repair proteins, namely RecN, RecO, Tag, HepA and HolC on the other hand have greater centrality values in the UV treated wild type network. Likewise, DnaA, DnaE, Mfd, RecJ and SbcB functioning in the replication and repair machinery possess significantly higher centrality values in UV treated mutant networks compared to their untreated counterparts. We observe no considerable change in the centrality measure for the proteins of the pathways such as polyketide biosynthesis, cell motility and xenobiotics biodegradation. A detailed list of proteins with significant difference in centrality along with their functions in UWT- TWT and UML- TML comparison set is given as
Further, to study the essentiality of the nodes depending on the UV treatment or the
The high degree nodes of each of the four conditions are classified as critical in a UV treatment dependent or independent manner and
Conditional Networks | Number of High Degree Nodes | Networks Compared | Common Nodes | Criticality | Number of Proteins |
UWT | 570 | UWT-TWT | 527 | Mutation independent | 104 |
TWT | 560 | UML-TML | 523 | Mutation dependent | 100 |
UML | 587 | UWT-UML | 465 | UV independent | 42 |
TML | 584 | TWT-TML | 480 | UV dependent | 57 |
The analysis of the top 30% nodes in terms of degree centrality alone in each conditional network revealed the nodes that are proposed to be essential depending on the UV treatment or the mutation. UWT, UV Untreated Wild Type; TWT, UV Treated Wild Type; UML, UV Untreated
We are able to identify many repair proteins such as DinG, DnaN, MutM, MutS, RuvC, Rep and RecF that are likely to be indispensable for the UV treated networks in terms of degree centrality. The criticality of some of the proteins that belong to lipid metabolism and cofactors and vitamins metabolism seems to be UV treatment dependent. One of the proteins that appears to be important from our anslysis in UV treated cells is UspA, the universal stress protein. Earlier study has shown the role of UspA in resistance to DNA damaging agents and that its regulation is
The analysis carried out by us is based on the predicted genome-wide functional linkages
The obtained conditional networks derived from the experimental interactions show topological robustness similar to their parent networks. Interestingly, similar to the conclusions that we have drawn based on the analyses derived from functional linkages network, we observe the UV-dependent criticality of many of the replication and repair proteins through network centrality analysis as well as the analysis of unique nodes of the networks. We also observe the expression of ∼65% hubs in conditional networks derived from Arifuzzaman
Some of the cutoffs applied in our study might appear to be superficially arbitrary. For example, a gene was considered to be expressed if the net signal intensity corresponding to its spot was more than or equal to the median signal intensity of the spots within the sector. Although this cutoff might seem arbitrary, the rationale for using median was based on the observation that gene expression is a stochastic event and hence the expression of a gene as well as copy number of the expressed protein differs from cell to cell even in an isogenic cell population
We further tested the effect of different cutoffs on the overall conclusions of our analysis. With the cutoff of 0.9 and 1.1, we observe that our earlier conclusions, such as increased importance of replication and repair proteins, and cofactor metabolism proteins in the UV treated cells, repression of carbohydrate metabolism upon UV treatment and importance of unique nodes of the conditional networks, remain identical. With further modification of these cutoff values to 1.2, we observe approximately 1600 genes being expressed which might be considered fewer than anticipated
Thus, the comparative analysis does indeed reveal physiologically important changes in the four networks. Some of these changes would not have been apparent by measuring gene expression alone, or by the standard analysis of microarray data. This is partly due to the fact that the levels of expression of many genes do not change under different conditions, but nonetheless the profile of interactions surrounding them changes significantly, thereby altering their significance in the broader picture of the cell. In this manner, studying the dynamics of protein∶protein interactions appears to hold promise for the systems level understanding of an organism.
The analysis proposed in this study can also be potentially applied to disease interaction networks. For example, understanding how the interactions within a pathogen or a host change during the disease process, and the implications of these changes might yield useful insights into the disease. Further, this information can be used to deriving novel therapies against the diseases.
The raw microarray data for
The conditional protein interaction network was built for the expressed genes by mapping them onto an existing predicted functional interaction network for
Network properties such as average degree, degree exponent, diameter, average clustering coefficient were calculated according to
The shortest paths for all pairs of nodes in the network were calculated by Dijkstra's algorithm
Network centrality measures like degree centrality, closeness centrality and betweenness centrality were calculated
Degree centrality of a node
Closeness centrality of a node
Betweenness centrality of a node
Sub-networks were visualized and analyzed using Cytoscape 2.4.1
Microarray data processing and network information for the conditional networks.
(0.05 MB RTF)
Uniquely expressed genes list: Four-way comparison.
(0.09 MB XLS)
Interacting partners of Hda in the UV treated wild-type network with their functions, classified according to functional classes.
(0.06 MB RTF)
Functions of the high centrality measure nodes in the comparison set UWT-TWT and UML-TML.
(0.07 MB RTF)
List of mutation dependent/independent and UV treatment dependent/independent proteins
(0.03 MB XLS)
Differential gene expression and comparison of networks. (A) Pictorial representation of differential gene expression in the network context. Red and green represent the nodes expressed uniquely under the defined conditions, whereas blue nodes are expressed under both the conditions. (B) Four-way comparison of the networks. UWT, wild type; TWT, UV treated wild type; UML, lexA mutant; TML, UV-treated lexA mutant.
(0.47 MB TIF)
The overlap of the interactions and the nodes in UWT-TWT and UML-TML.
(0.76 MB TIF)
Mapping of unique nodes to different metabolic pathways.
(0.45 MB TIF)
Expression of the hubs.
(0.23 MB TIF)
We thank the SUN Centre of Excellence in Medical Bioinformatics, Centre for DNA Fingerprinting and Diagnostics (CDFD), for access to computational facilities.