Conceived and designed the experiments: EZ TMP. Performed the experiments: EZ. Analyzed the data: EZ DPO TMP. Wrote the paper: EZ. Contributed the algorithm for efficient computation of edge-disjoint path integrity measure: JM. Contributed to designing the experiments and writing the paper: DPO. Participated in writing the paper: TP.
The authors have declared that no competing interests exist.
The
Analysis of protein interaction networks in the budding yeast
An intriguing question in the analysis of biological networks is whether biological characteristics of a protein, such as essentiality, can be explained by its placement in the network, i.e., whether topological prominence implies biological importance. One of the first connections between the two in the context of a protein interaction network, the so-called
Jeong and colleagues
Recently, He and colleagues challenged the hypothesis of essentiality being a function of a global network structure and proposed that the majority of proteins are essential due to their involvement in one or more
In this work we carefully evaluate each of the proposed explanations for the centrality-lethality rule. Recently several hypotheses that linked structural properties of protein interaction networks to biological phenomena have come under scrutiny, with the main concern being that the observed properties are due to experimental artifacts and/or other biases present in the networks and as such lack any biological implication. To limit the impact of such biases on the results reported in our study we use six variants of the genomewide protein interaction network for
To assess whether the essentiality of hubs is related to their role in maintaining network connectivity we performed two tests. First, if this were the case, then we would expect essential hubs to be more important for maintaining network connectivity than nonessential hubs. We found that this is not the case. Next, in addition to node degree, we consider several other measures of topological prominence, and we demonstrate that some of them are better predictors of the role that a node plays in network connectivity than node degree. Thus, if essentiality were related to maintaining network connectivity, then one would expect essentiality to be better correlated with these centrality measures than with the node degree. However, we found that node degree is a better predictor of essentiality than any other measure tested.
To reject the essential protein interaction model
Motivated by our findings we propose an alternative explanation for the centrality-lethality rule. Our explanation draws on a growing realization that phenotypic effect of gene-knockout experiments is a function of a group of functionally related genes, such as genes whose gene products are members of the same multiprotein complex
By the very definition, ECOBIMs contain, relative to their size, more essential nodes than a random group of proteins of the same size. But what fraction of all essential hubs are members of such ECOBIMs? How does this number relate to what is expected by chance? In fact, how does the enrichment of hubs that are members/nonmembers of ECOBIMs in essential proteins relate to the enrichment values expected by chance under a suitable randomization protocol? We propose that membership in ECOBIMs largely accounts for the enrichment of hubs in essential proteins. In support of this hypothesis, we found that the fraction of essential proteins among non-ECOBIM hubs is, depending on the network, only 13–35%, which is almost as low as the network average. Furthermore the essentiality of nodes that are not members of ECOBIMs is only weakly correlated with their degree. Finally, using a randomization experiment we demonstrated that these properties are characteristic of the protein interaction network and are unlikely in a corresponding randomized network.
Our source of protein interaction data for the yeast
It was suggested that the centrality-lethality phenomenon is an artifact of a possible bias present in the networks mainly derived from small-scale experiments
We also include two networks derived solely from high-throughput experimental data. The
Finally, we include a network of interactions predicted in silico using the computational approach of Jansen et al.
Number of nodes | Number of edges | Average degree | Average clustering coefficient | |
DIP CORE | 2,316 | 5,569 | 4.81 | 0.30 |
LC | 3,224 | 11,291 | 7.00 | 0.36 |
HC | 2,752 | 9,097 | 6.61 | 0.37 |
TAP-MS | 1,994 | 15,819 | 15.87 | 0.60 |
BAYESIAN | 4,135 | 20,984 | 10.15 | 0.26 |
Y2H | 400 | 491 | 2.45 | 0.09 |
DIP CORE | 0.58 | 0.62 | 0.25 | 0.61 | 0.02 |
0.28 | LC | 0.53 | 0.26 | 0.39 | 0.01 |
0.38 | 0.65 | HC | 0.47 | 0.47 | 0.02 |
0.09 | 0.18 | 0.27 | TAP-MS | 0.36 | 0.00 |
0.16 | 0.21 | 0.20 | 0.27 | BAYESIAN | 0.02 |
0.26 | 0.18 | 0.31 | 0.10 | 0.97 | Y2H |
Each row of the table corresponds to a single network and shows a fraction of its edges contained in other tested networks. Thus, for example, 58% of the edges in the DIP CORE network are also present in the LC network.
In their influential paper, Jeong et al.
To confirm the centrality-lethality rule in the tested networks we used the results of a systematic gene deletion screen
(A) For each tested network the fraction of essential nodes among nodes with highest degree (hubs) is shown. The horizontal axis shows the fraction of the total network nodes that were designated as hubs. (B) Correlation between degree and essentiality is assessed by Kendall's tau and Spearman's rho rank correlation coefficients.
From
It should be noted that in contrast to other networks the Y2H network exhibits only a weak correlation between degree and essentiality. This is in agreement with the study of Batada et al.
A network centrality index assigns a centrality value to each node in the network that quantifies its topological prominence. Topological prominence can be defined in a number of ways, and over the years many centrality indices were introduced that emphasize different aspects of network topology
Even though degree centrality is a local centrality index, in some networks hubs may play an important role in maintaining the overall connectivity of the network. For example, it was demonstrated that in some scale-free networks the removal of hubs affects the ability of other nodes to communicate much more than the removal of random nodes
Here we demonstrate the difference in the five centrality measures on a toy network. (A) The toy network consists of two cliques: K50 with nodes A1–A50 and K10 with nodes B1–B10. The two cliques are interconnected by an edge (A1, B1) and through an additional vertex D. Additional node C attaches to the network through A2. (B) As the measures assign centrality values based on different network properties they will rank nodes differently. Briefly, the eigenvector centrality measure (EC) will assign high-centrality values to nodes that are close to many other central nodes in the network. The subgraph centrality measure (SC) assigns centrality values to a node based on the number of closed walks that originate at the node. The shortest path betweenness centrality measure (SPBC) assigns the node centrality value based on the fraction of shortest paths that pass through the node averaged over all pairs of nodes in the network. The current-flow betweenness centrality measure (CFC) generalizes the SPBC measure by including additional paths, not just the shortest paths, in the computation. Here, the difference between the measures is exemplified by the rankings that they produce for the toy network nodes.
Since betweenness indices rank nodes based on their role in mediating communication between pairs of other nodes in the network, it is interesting to compare the effectiveness of high-degree nodes and nodes with high betweenness centrality in disconnecting the network. One common way to measure the impact of the nodes' removal on the network connectivity is by monitoring the decrease in the size of the largest connected component.
(A–F) The impact of node removal is quantified by the fraction of nodes in the largest connected component. There is one curve for each centrality measure that shows the fraction of nodes in the largest connected component as a function of the fraction of the most central nodes removed. We also show the impact of node removal in a random order and the size of the largest connected component when all essential proteins are removed.
While the removal of a set of nodes may not disconnect various parts of the network, it may impair significantly the “quality of communication” between them. For example, there can be an increase in the length of the shortest path or decrease in the number of alternative paths between pairs of nodes in the network. Therefore, we introduced two additional measures, which we call network integrity measures, to capture various aspects of the effect of the nodes' removal on the ability of other nodes to communicate. (See
Next, we examined whether the disruption power of hubs comes mainly from essential hubs. First, we observe that the removal of all essential proteins from the huge connected component is less disruptive than the removal of an equivalent number of the most central nodes according to any index (
Essential | Random nonessential | |
DIP CORE | 0.519 | 0.504±0.007 |
LC | 0.578 | 0.551±0.010 |
HC | 0.521 | 0.525±0.005 |
TAP-MS | 0.512 | 0.512±0.011 |
BAYESIAN | 0.685 | 0.625±0.006 |
Y2H | 0.410 | 0.397±0.046 |
The impact of removal of a set of proteins is measured by the fraction of nodes in the largest connected component. For each network the effect of the removal of essential proteins and the removal of an equivalent number of random nonessential proteins with the same degree is shown.
Above we demonstrated that various centrality indices vary considerably in their ability to predict disruption in the overall connectivity of the network. Next we asked whether this difference is reflected in the enrichment levels.
Fraction of essential proteins among hubs and an equivalent number of most central nodes according to four other centrality measures. The fraction of essential proteins among the nodes of the network is shown as ntwk.avg.
Eigenvector centrality | Subgraph centrality | |||
DIP CORE | 0.15 (3.5e-19) | 0.064 (8.6e-05) | 0.17 (1.2e-24) | 0.059 (2.5e-04) |
LC | 0.23 (7.9e-56) | 0.094 (3.6e-11) | 0.23 (1.2e-55) | 0.093 (4.9e-11) |
HC | 0.24 (1.8e-54) | 0.107 (2.9e-12) | 0.24 (7.9e-55) | 0.102 (3.4e-11) |
TAP-MS | 0.12 (8.42e-11) | −0.007 (6.5e-01) | 0.12 (8.42e-11) | −0.007 (6.5e-01) |
BAYESIAN | 0.17 (5.7e-39) | 0.046 (1.5e-04) | 0.17 (5.1e-41) | 0.051 (3.1e-05) |
Y2H | 0.05 (1.1e-01) | 0.027 (2.5e-01) | 0.03 (2.0e-01) | −0.024 (7.2e-01) |
Shortest-path betweenness centrality | Current-flow betweenness | |||
DIP CORE | 0.15 (3.2e-18) | −0.002 (5.5e-01) | 0.19 (2.7e-27) | 0.012 (2.5e-01) |
LC | 0.21 (1.4e-46) | 0.003 (4.25e-01) | 0.26 (3.7e-70) | −0.007 (6.8e-01) |
HC | 0.20 (1.9e-36) | 0.005 (3.7e-01) | 0.24 (2.6e-53) | −0.005 (6.2e-01) |
TAP-MS | 0.12 (3.5e-11) | 0.018 (1.8e-01) | 0.16 (3.3e-18) | 0.017 (1.8e-01) |
BAYESIAN | 0.18 (2.4e-41) | 0.005 (3.43e-01) | 0.23 (2.7e-69) | 0.018 (8.1e-02) |
Y2H | 0.10 (1.2e-02) | 0.048 (1.4e-01) | 0.10 (1.4e-02) | 0.041 (1.8e-01) |
The correlation of centrality measures with essentiality (
As there is considerable correlation between degree centrality and other centrality indices, we used Kendall's tau partial rank correlation coefficient to see whether any of the indices is correlated with essentiality beyond its correlation with degree centrality index. We found that, controlling for the correlation with degree, the correlation with essentiality is reduced to statistically insignificant values for betweenness centrality indices and is greatly reduced for local indices (
The above observations indicate that the main topological determinant of essentiality is the node's local neighborhood rather than its role in maintaining the overall connectivity of the network. In particular, even though removing the nodes with high betweenness centrality indices is much more effective in shattering some of our protein interaction networks, their correlation with essentiality is reduced to statistically insignificant levels by subtracting their correlation with degree centrality.
Recently He and colleagues
We note that from the assumptions of the essential protein interaction model it follows that if two proteins do not interact then the essentiality of one protein in such a pair does not depend on the essentiality of the other protein. Furthermore, this independence should also be observed when proteins share interaction neighbors. To test whether this holds in real data, we computed the number of nonadjacent protein pairs, with three or more neighbors (one or more neighbors in the Y2H network), that are either both essential or both nonessential in the tested networks and compared these numbers to the expected number of such pairs under the model. (The model parameters were estimated using three different strategies as described in the
Total number of pairs | Number of pairs of the same type | Expected number of pairs of the same type | |||
Simulation | Line fitting | Weighted line fitting | |||
DIP CORE | 1,849 | 1,135 | 945 (3.6e-10) | 928 (8.6e-12) | 938 (8.0e-11) |
LC | 10,777 | 6,143 | 5,691 (6.6e-10) | 5.556 (1.1e-15) | 5.589 (3.9e-14) |
HC | 5,907 | 3,516 | 3,213 (2.0e-08) | 2,997 (2.2e-16) | 2,994 (2.2e-16) |
Y2H | 3,254 | 2,167 | 1,976 (9.6e-07) | 2,025 (2.6e-04) | 2,052 (3.3e-03) |
The total number of pairs refers to the number of nonadjacent protein pairs with three or more common neighbors in the network. (Due to the sparsity of the Y2H network, the statistics are calculated for nonadjacent pairs having one or more neighbors in common.) The nodes in the pair are of “the same type” if they are both essential or both nonessential.
In the previous section we showed that proteins that share neighbors are more likely to have the same essentiality (be both essential or both nonessential) than expected under the essential PPI model. Moreover, it was observed in another study that essential proteins are not distributed uniformly among in the set of automatically derived multiprotein complexes
To investigate the above question, we introduce a notion of
We developed an automatic method for extraction of ECOBIMs from a protein interaction network. In this work proteins are deemed to share biological function if they are annotated with the same GO biological process term from a set of 192 terms that were selected by a group of experts to represent relevant aspects of molecular biology
Here we demonstrate the major steps of the method on the HC network. The input to the method is a protein interaction network, GO annotation, and the set of essential nodes, which are shown in red. The method considers subnetworks induced by proteins annotated with the same GO biological process term, one subnetwork at a time, to identify densely connected regions or COBIMs. The COBIMs are shown by a COBIM intersection graph, where nodes correspond to COBIMs (the size of the node is proportional to the number of genes in the corresponding COBIM) and there is an edge between a pair of COBIMs if they have at least two proteins in common. The COBIMs that are enriched in essential proteins are selected as ECOBIMs, shown in green.
To examine to what extent the membership in ECOBIMs accounts for the centrality-lethality rule we partitioned hubs into two groups, those that are members of one or more ECOBIMs (ECOBIM hubs) and those that are not (non-ECOBIM hubs), and compared their enrichment values. As shown in
Fraction of essential proteins among various types of hubs: all hubs, hubs that are members of ECOBIMs (ECOBIM hubs), and hubs that are not members of ECOBIMs (non-ECOBIM hubs). The fraction of essential proteins among all proteins in the network is also shown (ntwk.avg.). The numbers above the bars show the number of essential hubs out of the total number of hubs of this type for ECOBIM and non-ECOBIM hubs.
Enrichment of ECOBIM hubs | Enrichment of non-ECOBIM hubs | Corr. degree vs. essentiality for non-ECOBIM hubs | |||||||
Obs. | Rand. | Obs. | Rand. | Obs. | Rand. | ||||
DIP CORE | 0.80 | 0.67 | 1.98e-03 | 0.26 | 0.43 | <1.00e-05 | 0.08 | 0.18 | <1.00e-05 |
LC | 0.80 | 0.69 | 1.88e-03 | 0.32 | 0.48 | <1.00e-05 | 0.17 | 0.27 | <1.00e-05 |
HC | 0.83 | 0.70 | 4.00e-05 | 0.35 | 0.51 | <1.00e-05 | 0.17 | 0.27 | <1.00e-05 |
TAP-MS | 0.76 | 0.62 | 1.00e-05 | 0.24 | 0.40 | <1.00e-05 | 0.12 | 0.20 | <1.00e-05 |
BAYESIAN | 0.77 | 0.65 | <1.00e-05 | 0.18 | 0.36 | <1.00e-05 | 0.09 | 0.20 | <1.00e-05 |
Y2H | 0.85 | 0.66 | 5.81e-02 | 0.13 | 0.25 | 2.00e-05 | −0.04 | 0.05 | 2.00e-04 |
For every quantity three values are shown: the value under the true assignment of essential proteins (Obs.), the mean value under the randomized assignment of essential proteins (Rand.), and the fraction of the randomized assignments that resulted in values stronger (either smaller or larger depending on the context) than those obtained with the true assignment of essential proteins (
One may ask to what extent the difference in the behavior of ECOBIM hubs and non-ECOBIM hubs is due to the particular selection procedure that we employ to identify the putative ECOBIMs. More specifically, there are two concerns that need to be addressed. First, our method is guided by the enrichment in essential proteins when selecting ECOBIMs from COBIMs. Therefore, it is expected that the fraction of essential proteins among ECOBIM hubs should be higher than that among non-ECOBIM hubs. Second, our method considers only annotated yeast genes. Therefore, one might argue that the difference in behavior is due to the fact that ECOBIM hubs are necessarily annotated while non-ECOBIM hubs may include both annotated and unannotated genes.
To address the first concern we performed a control experiment where essential proteins were assigned to a random set of nodes having the same degree distribution as the true set of essential proteins in the network. (A total of 100,000 random assignments were performed, which resulted in 100,000 sets of ECOBIMs.) To address the second concern, we restricted the random assignment to annotated genes only. As shown in
The identified ECOBIMs mostly correspond to large essential multiprotein complexes such as the anaphase promoting complex (APC) and the DAM1 protein complex but not exclusively complexes. For example, one of the largest ECOBIMs identified in the LC network contains multiprotein complexes involved in the process of RNA polymerase 2 transcription
GO:0006508 proteolysis | 27 | 35 | 0.77 |
GO:0042254 ribosome biogenesis and assembly | 27 | 32 | 0.84 |
GO:0016192 vesicle mediated transport | 21 | 30 | 0.70 |
GO:0016071 mRNA metabolic process | 18 | 28 | 0.64 |
GO:0015931 nucleobase, nucleoside, nucleotide and nucleic acid transport GO:0051236 establishment of RNA localization | 15 | 24 | 0.62 |
GO:0016072 rRNA metabolic process | 18 | 21 | 0.86 |
GO:0008380 RNA splicing | 16 | 21 | 0.76 |
GO:0042254 ribosome biogenesis and assembly | 88 | 107 | 0.82 |
GO:0016071 mRNA metabolic process | 37 | 58 | 0.64 |
GO:0008380 RNA splicing | 35 | 52 | 0.67 |
GO:0015931 nucleobase, nucleoside, nucleotide and nucleic acid transport GO:0051236 establishment of RNA localization | 16 | 26 | 0.62 |
GO:0006508 proteolysis | 17 | 24 | 0.71 |
GO:0042254 ribosome biogenesis and assembly | 84 | 100 | 0.84 |
GO:0016071 mRNA metabolic process | 49 | 71 | 0.69 |
GO:0016072 rRNA metabolic process | 63 | 71 | 0.89 |
GO:0008380 RNA splicing | 46 | 63 | 0.73 |
GO:0006508 proteolysis | 28 | 35 | 0.80 |
GO:0042254 ribosome biogenesis and assembly | 90 | 120 | 0.75 |
GO:0016071 mRNA metabolic process | 46 | 66 | 0.70 |
GO:0008380 RNA splicing | 45 | 62 | 0.73 |
GO:0016072 rRNA metabolic process | 37 | 41 | 0.90 |
GO:0016072 rRNA metabolic process | 30 | 32 | 0.94 |
GO:0006508 proteolysis | 17 | 22 | 0.77 |
GO:0042254 ribosome biogenesis and assembly | 119 | 152 | 0.78 |
GO:0016072 rRNA metabolic process | 93 | 106 | 0.88 |
GO:0008380 RNA splicing GO:0016071 mRNA metabolic process | 40 | 50 | 0.80 |
GO:0006366 transcription from RNA polymerase II promoter | 23 | 42 | 0.55 |
GO:0006508 proteolysis | 28 | 37 | 0.76 |
GO:0006913 nucleocytoplasmic transport | 17 | 31 | 0.55 |
GO:0006412 translation | 18 | 27 | 0.67 |
GO:0051169 nuclear transport | 15 | 27 | 0.55 |
GO:0045184 establishment of protein localization | 15 | 27 | 0.55 |
GO:0007010 cytoskeleton organization and biogenesis | 9 | 11 | 0.82 |
GO:0006366 transcription from RNA polymerase II promoter | 7 | 11 | 0.64 |
GO:0045184 establishment of protein localization | 6 | 10 | 0.60 |
GO:0006913 nucleocytoplasmic transport GO:0051169 nuclear transport | 6 | 10 | 0.60 |
For every tested protein interaction network we list the ECOBIMs with at least 20 members; for the Y2H network, the ECOBIMs with at least 10 members are listed. For each ECOBIM the following information is shown: the corresponding GO biological process term, number of essential genes, number of genes, and fraction of essential genes. For a list of all ECOBIMs and their member genes see
Moreover, the ECOBIMs are remarkably different than non-ECOBIM COBIMs. As shown in
Enrich. ECOBIM proteins | Enrich. non-ECOBIM COBIM proteins | |||||
Obs. | Rand. | Obs. | Rand. | |||
DIP CORE | 0.77 | 0.65 | <1.0e-05 | 0.06 | 0.21 | <1.0e-05 |
LC | 0.77 | 0.65 | 1.00e-05 | 0.10 | 0.17 | 1.56e-03 |
HC | 0.81 | 0.68 | <1.00e-05 | 0.12 | 0.18 | 2.31e-02 |
TAP-MS | 0.74 | 0.64 | <1.00e-05 | 0.09 | 0.17 | 1.87e-03 |
BAYESIAN | 0.76 | 0.65 | <1.00e-05 | 0.08 | 0.18 | <1.00e-05 |
Y2H | 0.79 | 0.63 | 9.93e-03 | 0.06 | 0.17 | 3.00e-05 |
For each network the enrichment in essential proteins of ECOBIM nodes and enrichment of COBIM nodes that are not members of one or more ECOBIMs is shown. For each group three values are listed: the fraction under the true assignment of essential proteins (Obs.), the mean fraction under the randomized assignment of essential proteins (Rand.), and
So far, we demonstrated that the high correlation between degree and essentially can be predominantly attributed to the ECOBIMs. In addition, it is well known that certain functions that are essential to the cell, for example, transcription regulation or cell-cycle regulation, rely on large multiprotein complexes. Indeed, many of the GO terms that are overrepresented among ECOBIM nodes are of this type, as seen in
For every network the GO terms that are overrepresented among ECOBIM nodes are shown. The overrepresentation of a GO term is quantified by the natural logarithm of a
To elucidate the role of the ECOBIMs we examined all GO processes that contain at least one ECOBIM.
GO term | Subnetwork nodes | ECOBIM nodes | Non-ECOBIM COBIM nodes |
GO:0016072 rRNA metabolic process | 0.83 | 0.91 | n/a |
GO:0006352 transcription initiation | 0.82 | 1.00 | n/a |
GO:0006383 transcription from RNA polymerase III pro | 0.77 | 1.00 | 0.00 |
GO:0042254 ribosome biogenesis and assembly | 0.72 | 0.87 | n/a |
GO:0008380 RNA splicing | 0.71 | 0.79 | 0.50 |
GO:0006839 mitochondrial transport | 0.64 | 0.80 | n/a |
GO:0006360 transcription from RNA polymerase I pro | 0.64 | 0.80 | 0.00 |
GO:0016071 mRNA metabolic process | 0.63 | 0.75 | 0.40 |
GO:0006260 DNA replication | 0.61 | 0.93 | n/a |
GO:0031123 RNA 3′-end processing | 0.59 | 0.93 | 0.29 |
GO:0006399 tRNA metabolic process | 0.50 | 1.00 | 0.00 |
GO:0007059 chromosome segregation | 0.49 | 0.76 | n/a |
GO:0006944 membrane fusion | 0.48 | 0.75 | 0.22 |
GO:0006508 proteolysis | 0.46 | 0.77 | n/a |
GO:0051169 nuclear transport | 0.44 | 0.80 | 0.47 |
GO:0006997 nuclear organization and biogenesis | 0.43 | 1.00 | 0.33 |
GO:0000278 mitotic cell cycle | 0.43 | 0.81 | 0.19 |
GO:0015931 nucleobase, nucleoside, nucleotide and n | 0.42 | 0.63 | n/a |
GO:0006913 nucleocytoplasmic transport | 0.42 | 0.80 | 0.41 |
GO:0051236 establishment of RNA localization | 0.42 | 0.63 | n/a |
GO:0006366 transcription from RNA polymerase II pro | 0.40 | 0.75 | 0.29 |
GO:0007010 cytoskeleton organization and biogenesis | 0.40 | 0.78 | 0.00 |
GO:0048308 organelle inheritance | 0.39 | 0.86 | n/a |
GO:0006401 RNA catabolic process | 0.38 | 0.83 | 0.41 |
GO:0006461 protein complex assembly | 0.38 | 1.00 | n/a |
GO:0045184 establishment of protein localization | 0.37 | 0.89 | 0.38 |
GO:0009100 glycoprotein metabolic process | 0.37 | 0.63 | n/a |
GO:0006412 translation | 0.36 | 0.85 | 0.00 |
GO:0007005 mitochondrion organization and biogenes | 0.35 | 0.91 | n/a |
GO:0006512 ubiquitin cycle | 0.34 | 0.82 | n/a |
GO:0051325 interphase | 0.33 | 0.83 | 0.00 |
GO:0016192 vesicle-mediated transport | 0.31 | 0.71 | 0.18 |
GO:0000074 regulation of progression through cell cycl | 0.31 | 0.73 | 0.18 |
GO:0000279 M phase | 0.30 | 0.80 | 0.17 |
GO:0006974 response to DNA damage stimulus | 0.28 | 0.67 | 0.11 |
GO:0006323 DNA packaging | 0.26 | 1.00 | 0.16 |
GO:0006417 regulation of translation | 0.26 | 0.80 | n/a |
GO:0016481 negative regulation of transcription | 0.25 | 1.00 | 0.13 |
GO:0007001 chromosome organization and biogenesi | 0.22 | 0.79 | 0.16 |
GO:0016458 gene silencing | 0.22 | 1.00 | 0.00 |
GO:0040029 regulation of gene expression, epigenet | 0.21 | 1.00 | 0.00 |
GO:0007047 cell wall organization and biogenesis | 0.17 | 0.75 | n/a |
For each GO subnetwork that contributed at least one ECOBIM, the fractions of essential proteins among the subnetwork nodes, subnetwork ECOBIM nodes, and subnetwork non-ECOBIM COBIM nodes are shown.
This last observation can also explain the poor correlation between degree and essentiality in Y2H networks, as it indicates that ECOBIMs are likely to contain large, stable multiprotein modules, typically multiprotein complexes. However, interactions recovered by the Y2H technique correspond to physical contacts and as such do not encompass all members of a complex. Moreover, due to its binary nature, the Y2H technique may completely miss interactions in complexes that require cooperative binding
The enrichment of high-degree nodes in essential proteins, known as the centrality-lethality rule, suggests that the topological prominence of a protein in a protein interaction network may be a good predictor of its biological importance. There exist numerous measures of topological prominence, called network centrality indices; local centrality indices assign centrality values based on the topology of the node's local neighborhood, whereas betweenness centrality indices assign centrality values based on the node's role in maintaining the connectivity between pairs of other nodes in the network. Even though by definition degree centrality is a local measure, depending on the structure of the network, hubs may play an important role in maintaining the overall connectivity of the network. In this paper we sought to identify the main topological determinant of essentiality and to give a biological explanation for the connection between the network topology and essentiality.
To address this question we performed a rigorous analysis of six protein interaction networks for
Next we examined whether the essential interactions model, recently proposed to explain the centrality-lethality rule, is valid in the tested networks. We found that the model's central assumption that the majority of proteins are essential due to their involvement in one or more essential protein interactions, which are distributed uniformly at random along the edges of the network, violates basic clustering patterns of essential proteins in the networks that we examined. The uniform distribution of essential protein interactions implies that, as long as two proteins do not interact, the essentiality of one protein in the pair is independent of the essentiality of the other protein. However, in real protein interaction networks the essentiality of pairs of proteins that share many neighbors is correlated, and the number of nonadjacent protein pairs that share three or more neighbors and are either both essential or both nonessential significantly deviates from the expected number of such pairs under the model. Consequently, we rejected the essential interactions explanation with high confidence. We stress that we do not reject the existence of essential protein interactions but rather the assumption that these interactions are evenly distributed along the edges of the network and explain the degree distribution of essential proteins.
The above observations led us to propose an alternative explanation for the centrality-lethality rule. Our explanation builds on a growing body of evidence that gene knock-out phenotypes for genes whose gene products are members of the same multiprotein complex are correlated
In the past, several attempts were made to classify high-degree nodes using additional biological data to obtain a deeper insight into biological and physiological properties that hubs were reported to possess. Here we discuss how our findings fit the results reported in two such studies
In the second study Kim et al. utilized structural data to classify hubs into
It is well known that certain biological functions essential for the cell depend on large multiprotein complexes. (Consider, for example, RNA Polymerase II transcription machinery
In this work we compare the degree centrality measure to two other local measures (eigenvector centrality (EC)
The computation of the eigenvector centrality values can be cast as an iterative process: (i) start with an initial vector of centrality scores
The subgraph centrality value of a node is equal to the number of closed walks that start and terminate at the node. As there is an infinite number of such walks, to obtain finite index values the number of closed walks of length
For the shortest-path betweenness index, the node's centrality value is equal to the average fraction of shortest paths that pass through the node.
The current-flow centrality measure extends the shortest-path centrality measure by taking into account other paths in addition to shortest paths. This is achieved through a current-flow paradigm where the network is viewed as a resistor network with each edge having a unit capacity. For every pair of nodes
We demonstrate the difference between the five centrality measures on a toy network in
We introduced two measures, which we call network integrity measures, to capture various effects of node removal on the ability of other nodes to communicate. An integrity measure maps a set of nodes,
To evaluate the model on the tested networks we used three strategies to estimate the model's parameters: a network simulation procedure, line fitting to points (log(1−
Our method for automatic extraction of putative ECOBIMs is applied to subnetworks induced by proteins annotated with the same biological process GO term. In this work we used a set of 192 biological process terms, which were selected by a group of experts to represent relevant aspects of molecular biology. Thus, the method was applied to 192 subnetworks, one subnetwork at a time.
From each GO subnetwork the method extracts groups of densely connected proteins. An ideal dense network is a
Our method utilizes the following approach to find regions of GO subnetworks that are
Once the COBIMs are computed, the method selects a subset of COBIMs based on the distribution of essential proteins among the COBIM nodes. Namely, the heuristic selects all COBIMs with a fraction of essential proteins that is significantly higher than what would be expected from a uniform distribution of essential genes among the COBIM nodes. More specifically, a COBIM with
Membership in COBIMs. The amount of overlap among COBIMs is quantified by showing the fraction of nodes that are members of several COBIMs.
(0.05 MB DOC)
Using network integrity measures to evaluate the effect of the removal of hubs and equivalent number of the most central nodes according to other centrality measures
(0.04 MB DOC)
ECOBIMs and their member genes. For every tested protein interaction network we list the automatically identified ECOBIMs. For each ECOBIM the following information is shown: the corresponding GO biological process term/terms, number of essential genes, number of genes, and the names of member genes.
(0.05 MB XLS)
Enrichment of ECOBIM and non-ECOBIM COBIM nodes for GO subnetworks in the LC, HC, TAP-MS, BAYESIAN, and Y2H networks. For each GO subnetwork that contributed at least one ECOBIM the fraction of essential proteins among the subnetwork nodes, subnetwork ECOBIM nodes and subnetwork non-ECOBIM COBIM nodes is shown.
(0.15 MB XLS)
The parameters of the essential protein interaction model. We use three strategies to estimate the parameters, α and β, of the essential protein interaction model: the network simulation as described in the original paper (simulation), line fitting to points for as described in the original paper (line fitting), and weighted line fitting to points for all values of
(0.03 MB DOC)
The number of COBIM and ECOBIMs nodes as a function of the parameter . The number of nodes that belong to one or more COBIMs (ECOBIMs) depends on the value of the parameter
(0.03 MB DOC)
The authors thank Eugene Koonin (NCBI) for valuable comments on the manuscript.