Current address: Bioinformatics and Statistics Group, Netherlands Cancer Institute, Amsterdam, The Netherlands
Conceived and designed the experiments: MM GDB. Performed the experiments: MM. Analyzed the data: MM. Wrote the paper: MM GDB.
The authors have declared that no competing interests exist.
Genetic interactions help map biological processes and their functional relationships. A genetic interaction is defined as a deviation from the expected phenotype when combining multiple genetic mutations. In
Genetic interactions map functional dependencies between genes, under a given phenotype. In the budding yeast
A genetic interaction is defined as an unexpected phenotype for a combination of mutations given each mutation's individual effect
A handful of recent studies have examined parts of this question. Linden et al. developed a normalization method to maximize the similarity between genetic interaction networks mapped by different laboratories so they can be combined
While most genetic interaction studies in budding yeast assess cell fitness by measuring cell growth in standard laboratory conditions, an increasing number have mapped genetic interactions under other experimental conditions. These include environmental conditions such as DNA damage
We use this recently available data to conduct a systematic analysis of quantitative genetic interaction networks in budding yeast mapped under different conditions, phenotypic readouts and laboratories (
A) Genetic interaction experiments differ in the phenotypic readout used, the environmental conditions and the laboratory where the experiment was conducted. B) Every network is compared to a common reference, the SGA network
We collected seven different quantitative genetic interaction data sets (
We hypothesized that networks obtained using different phenotypic readouts or in different conditions would be more different than expected, whereas networks obtained in similar experimental conditions would be similar. To investigate the effect of using different phenotypic readouts on the resulting genetic interaction network, we compared two networks (PHENO) that used non-growth phenotypes to define genetic interactions (endocytosis defect
In quantitative genetic interaction networks, nodes represent genes and weighted edges quantify the deviation of the double mutant phenotype from what is expected from the single mutant phenotypes. Edge weight is positive if the phenotypic readout is significantly higher than expected and negative if it is significantly lower. We treated the networks as undirected and did not consider the query or array role. We used four measures to compare networks:
Correlation: Spearman correlation of quantitative interaction scores, where a high value indicates two networks with highly similar quantitative genetic interactions.
Overlap: Amount of qualitative interaction overlap (measured using Jaccard similarity), where interactions (positive or negative) are binarized with ‘interaction’ = one and ‘no interaction’ = zero. A high score indicates that two networks generally agree on whether a given gene pair interacts or not.
Unique: Number of unique interactions in each network. A high number signifies large disagreement between networks.
Disagree: Number of interactions that disagree on interaction sign (positive vs. negative).
These measures were computed only for genes and gene pairs present in both network of interest vs. SGA and in three networks SGA vs. CONTROL vs. PHENO/MMS. We also evaluated how different the resulting measures are for a given network pair from what is expected based on a statistical model that considers known experimental interaction detection error rates.
Analyzing networks obtained using different phenotypic readouts, we find that SGA and PHENO networks have quantitative genetic interaction scores that are less correlated (0.037 on average) than SGA and CONTROL networks (0.13 on average) (
Each square represents the comparison of a network to the reference and is colored according to the group of the networks (CONTROL, PHENO, MMS). The comparison measures are: ‘correlation’ is Spearman's correlation coefficient; ‘overlap’ is the percentage of interactions in common among all observed interactions; ‘negative (resp. positive) overlap’ is the ratio of expected/observed overlap based on our statistical model for negative (resp. positive) networks; ‘unique’ is the percentage of interactions observed in only one network among all observed interactions; ‘negative (resp. positive) unique is the ratio of expected/observed unique ratio based on our statistical model for negative (resp. positive) networks; ‘disagree’ is the percentage of interactions of different type (positive, negative) among all interactions observed in common.
We also find that SGA and PHENO networks overlap less (0.10 on average) than SGA and CONTROL networks (0.19 on average) (
We repeated the analysis on networks obtained in different environmental conditions, and found similar results: SGA and MMS have a lower correlation, lower overlap, higher unique ratio and higher disagreement ratio than networks in the control set (
While we observe a consistent trend across PHENO and MMS vs. reference and CONTROL vs. reference comparisons, it is possible that function-based gene selection in PHENO, MMS and CONTROL networks could bias the data in a way that artificially causes the results we observe. To gain more confidence in our results, we additionally analyzed all gene pairs that were tested in the reference SGA network and one of the PHENO/MMS networks and one of the CONTROL networks. For the 48,499 gene pairs tested in these three categories (SGA, PHENO/MMS, CONTROL), we found that the correlation between SGA reference and PHENO/MMS is lower than between SGA and CONTROL values (paired T-test p<0.003,
Altogether, our results show that genetic interaction networks mapped using different phenotypic readouts and in different environmental conditions provide unique information.
We have shown that genetic interaction networks obtained under different experimental conditions (phenotype readout or environmental condition) provide unique information. We next examined if this unique information is complementary. Since a major goal of mapping genetic interactions is to discover new gene function information, we used gene function prediction performance as a measure of biological information contained in a genetic interaction network. Two genes that genetically interact with a similar set of genes (two genes with similar genetic interaction profiles) are more likely to be in the same pathway or complex
We reasoned that if gene function prediction performance improves when genetic interaction networks are combined then they must contain complementary information. To combine a network of interest with the reference network, we computed a genetic interaction profile similarity network for each one (using Spearman correlation) and then chose the maximum correlation value for a pair of genes to include in the ‘combined’ network. To make the comparison fair, we analyzed just the set of genetic interactions tested in all the networks we compared. We quantified the utility of the individual correlation networks and the combined correlation network for gene function prediction using GeneMANIA with all available Gene Ontology (GO) terms
We find that PHENO/MMS networks each enable a significant performance improvement in PR values when combined with the reference network (
The boxplots show the relative improvement of the area under the receiver operating characteristic (ROC) and the precision recall (PR) curves obtained when predicting gene function with the GeneMANIA algorithm on the Gene Ontology categories when combining each network with the reference, in comparison to predicting with each network separately. The red stars indicate a significant improvement (p-value<0.05). The networks are B–M Bandyopadhyay et al.
PHENO/MMS | CONTROL | ||||||
PR improvement | GLOBAL | BMS | BUR | JON | BUN | COL | SHU |
# terms | 496 | 81 | 49 | 47 | 81 | 179 | 59 |
# positive | 250 | 37 | 27 | 33 | 38 | 88 | 27 |
# negative | 243 | 44 | 22 | 14 | 43 | 88 | 32 |
mean | 0.044 | 0.071 | 0.118 | 0.093 | 0.014 | 0.028 | −0.004 |
p-value |
|
|
|
|
0.27 | 0.078 | 0.55 |
Significant p-values (<0.05) are bolded.
However the ROC results are less clear (
PHENO/MMS | CONTROL | ||||||
ROC improvement | GLOBAL | BMS | BUR | JON | BUN | COL | SHU |
# terms | 496 | 81 | 49 | 47 | 81 | 179 | 59 |
# positive | 300 | 55 | 28 | 33 | 54 | 94 | 36 |
# negative | 192 | 25 | 21 | 14 | 27 | 82 | 23 |
mean | 0.0024 | 0.0031 | 0.0016 | 0.0028 | 0.0019 | 0.0021 | 0.0039 |
p-value |
|
|
0.08 | 0.053 |
|
0.057 |
|
Significant p-values (<0.05) are bolded.
To investigate the differences between the combined networks and the reference, we selected the GO terms with the highest gene function prediction PR value differences (adjusted p-value<0.05) (
As noted above, it is possible that function-based gene selection in PHENO, MMS and CONTROL networks could bias our results. In particular, gene selection bias causes a different set of GO terms to be tested for each network. Thus, we repeated our gene function prediction analysis on triplets of gene pairs tested across SGA, PHENO/MMS and CONTROL networks. The combination of the PHENO/MMS correlation network with the reference correlation network tends to perform better in terms of gene function prediction as compared to that of the CONTROL and reference networks (
Altogether, our results show that genetic interactions mapped in different conditions provide complementary information.
The above results hinted that there may exist factors other than phenotypic readout or condition that explain genetic interaction data set differences. To gain a better understanding of these potential other factors, we generalized our analysis to compare all pairs of networks, by clustering the all data set by all data set comparison matrices for our four measures: correlation, overlap, unique and disagree. The two networks obtained with different phenotypes (Burston and Jonikas) are clearly outliers in this analysis, in particular for the correlation values (
The comparison measures (A: Correlation, B: Overlap, C: Unique, D: Disagree) between all pairs of networks considered in the study are shown in a clustered heat map view.
To create a fair comparison, we previously reduced each set of networks analyzed to common tested gene pairs. However, all of the information available in all networks should be considered for gene function prediction. Thus, we repeated our analysis of gene function prediction performance using genetic interaction profile correlation networks computed using all genes in each data set and combined all seven of them using the same correlation network building methodology described above (max correlation). We find that the combined network provides substantially better results, on average, across GO terms for both ROC and PR performance measures (
The boxplots show the area under the receiver operating characteristic (ROC) curves obtained when predicting gene function with the GeneMANIA algorithm on the Gene Ontology categories for the networks separately and after combination, using all available genes and interactions (full networks): B–M Bandyopadhyay et al.
To illustrate the complementarity of the individual correlation networks, we examined the SWR1 complex, one of the annotation categories that the combined network predicts better than any individual network (
Nodes represent genes and edges represent genetic interaction profile correlations between the genes that are part of the SWR1 complex (GO:0000812). All of its 13 subunits are connected when combining all networks, whereas only subsets of those are connected in each individual network. Networks were visualized using Cytoscape
Genetic interaction experiments are performed using a particular phenotypic readout and set of experimental conditions in a given species. Using recently available data, we conducted a systematic analysis of quantitative genetic interaction networks in budding yeast mapped under different experimental conditions. We showed that genetic interaction networks mapped in different environmental and laboratory conditions or using different phenotypic readouts provide unique and complementary information. The functional interactions defined by genetic interaction profile correlations can be combined using a simple ‘max correlation’ procedure to aid gene function prediction.
Given the low overlap between the data sets, we adopted a reference-based comparison approach where each data set is in turn compared to a common high confidence reference. While this enables a global comparison, it is possible that the reference network is biased towards certain gene sets present in only some compared networks and this could affect our results. Thus, we repeated our analysis on a set of gene pairs present across three networks under comparison. While these results agree, there a many fewer gene pairs tested across three networks than there are for two networks. The SGA dataset continues to grow and will be complete in the future. Also, we expect additional networks to be mapped under different conditions. Ideally, an additional global genetic interaction map of the scale of SGA in different conditions would be available to analyze, but this is unlikely to be available anytime soon, as SGA cost millions of dollars and has already taken more than a decade to achieve a 30% coverage rate of all interactions. Smaller genetic interaction networks mapped under different environment and phenotypic readout among comparable gene sets are more likely to be available in the near future and would help test our results.
We propose a simple method to combine diverse genetic interaction networks and show that this improves gene function prediction. We chose to combine data sets at the level of genetic interaction profile correlations instead of individual genetic interactions for a number of reasons: correlation can be computed for all gene pairs in a sufficiently large genetic interaction map not just those pairs tested in both maps, no tuning of parameters is needed, no normalization of individual data sets is needed as would be required if combining data at the level of genetic interactions
We expect our results to extend to other organisms, which are increasingly targeted for genetic interaction mapping
All genetic interaction data sets were downloaded from original publications or requested from the authors (
The measures used to compare a network to the reference are: ‘correlation’ is the Spearman correlation coefficient of genetic interaction scores for all compared pairs; ‘overlap’ is the percentage of binary interactions in common among all observed interactions; ‘unique’ is the percentage of interactions observed in only one network among all observed interactions; ‘disagree’ is the percentage of interactions of different type (positive, negative) among all interactions observed in common. Gene profile correlation is computed for a given gene as the Spearman correlation coefficient of the genetic interaction profiles of that gene in two data sets, limited to genetic interaction partners found in both data sets. The similarity between two data sets used for clustering is the mean of the gene profile correlation distribution (
To limit the analysis to the best associations, correlation networks only contain correlation values higher than 0.1. To assess each network, we use the command line version of the GeneMANIA Cytoscape plugin (version 2.11)
Similarity measures restricted to the sets of gene pairs tested in the reference, a CONTROL and a PHENO/MMS network. For a given measure, the difference between the PHENO/MMS and CONTROL values is tested by a paired t-test. For the specific case with Bandyopadhyay-MMS as PHENO/MMS and Schuldiner as CONTROL (BMS-SHU), no interactions are observed between the same gene pairs, thus the agreement coefficient is not available.
(EPS)
Performance of the combined and reference networks as measured by the area under the PR curve.
(EPS)
Correlation networks for the SGA and Burston data sets, limited to the gene pairs tested in both. The color of the edges indicates the network. The thicker the edge, the higher the correlation value.
(EPS)
Correlation networks for the SGA and Jonikas data sets, limited to the gene pairs tested in both. The color of the edges indicates the network. The thicker the edge, the higher the correlation value.
(EPS)
Correlation networks for the SGA and Bandyopadhyay networks, limited to the gene pairs tested in both. The color of the edges indicates the network. The thicker the edge, the higher the correlation value.
(EPS)
Performance of the combined and reference networks as measured by the area under the ROC curve.
(EPS)
Improvement in the gene function prediction when combining either the PHENO/MMS or the CONTROL correlation network with the SGA reference correlation network, on the exact same set of gene pairs for all three networks.
(EPS)
Clustering of the data sets based on the gene profile correlation values. The hierarchical clustering was done using different criteria (Ward, Complete, Average, Median).
(EPS)
This document contains more detailed information about the genetic interaction networks, the comparison measures and the gene function prediction performance.
(PDF)
The authors would like to thank Liz Conibear for sharing the quantitative genetic interaction data set on endocytosis defect and interesting discussion, Jason Montojo for the help with the use of the GeneMANIA Cytoscape plugin and the continuous development of new features, and Michael Costanzo and Anastasia Baryshnikova and anonymous reviewers for constructive comments.