Conceived and designed the experiments: RD CLM. Performed the experiments: RD SS. Analyzed the data: RD SS WSH CLM. Contributed reagents/materials/analysis tools: RD. Wrote the paper: RD SS WSH CLM. Contributed to biological interpretation of results: CMV WSH.
The authors have declared that no competing interests exist.
Overlaying differential changes in gene expression on protein interaction networks has proven to be a useful approach to interpreting the cell's dynamic response to a changing environment. Despite successes in finding active subnetworks in the context of a single species, the idea of overlaying lists of differentially expressed genes on networks has not yet been extended to support the analysis of multiple species' interaction networks. To address this problem, we designed a scalable, cross-species network search algorithm, neXus (Network - cross(X)-species - Search), that discovers conserved, active subnetworks based on parallel differential expression studies in multiple species. Our approach leverages functional linkage networks, which provide more comprehensive coverage of functional relationships than physical interaction networks by combining heterogeneous types of genomic data. We applied our cross-species approach to identify conserved modules that are differentially active in stem cells relative to differentiated cells based on parallel gene expression studies and functional linkage networks from mouse and human. We find hundreds of conserved active subnetworks enriched for stem cell-associated functions such as cell cycle, DNA repair, and chromatin modification processes. Using a variation of this approach, we also find a number of species-specific networks, which likely reflect mechanisms of stem cell function that have diverged between mouse and human. We assess the statistical significance of the subnetworks by comparing them with subnetworks discovered on random permutations of the differential expression data. We also describe several case examples that illustrate the utility of comparative analysis of active subnetworks.
Microarrays are a powerful tool for discovering genes whose expression is associated with a particular biological process or phenotype. Differential expression analysis can often generate a list of several hundred or even thousands of significant genes. While these genes represent real expression differences, the large number of candidates can make the process of hypothesis generation for further experimental studies challenging. Use of complementary datasets such as protein-protein interactions can help filter such candidate lists to genes involved with the most relevant pathways. This approach has been applied successfully by many groups, but to date, no one has developed an approach for discovering active pathways or subnetworks that are conserved across multiple species. We propose an algorithm, neXus (Network – cross(X)-species – Search), for cross-species active subnetwork discovery given candidate gene lists from two species and weighted protein-protein interaction networks. We validate our approach on expression studies from human and mouse stem cells. We find many active subnetworks that are conserved across species relevant to stem cell biology as well as other subnetworks that show species-specific behavior. We show that these networks are not likely to have been discovered by chance and discuss several specific cases that reveal potentially novel stem cell biology.
Developments in genomic and proteomic technologies in recent years have given us numerous methods for capturing high resolution snapshots of cellular processes. The end result of a genome-scale experiment is typically a long list of candidate genes that provide a basis for further, more detailed, follow up experiments. For example, gene expression microarrays are a popular approach for identifying differentially expressed genes between two cell types or experimental conditions, and this technology typically yields several hundred to a few thousand differentially expressed genes in a typical comparison
One powerful approach that has been used to aid in the interpretation of candidate genes lists is integrative analysis with complementary genome-scale data. For example, in a landmark study, Ideker
In separate studies, groups have compared and aligned the structure of protein-protein interaction networks across species
In this study, we describe a novel approach for identifying conserved active subnetworks in interaction networks across multiple species. Given differential expression measures representing analogous phenotypes in two different species and corresponding interaction networks (for example, protein-protein interaction networks), our approach identifies tightly connected network modules that show a high degree of differential expression, i.e. dense subnetworks, and are conserved in both networks. This is in contrast to previous approaches, which focused on using differential expression or other activity scores to identify dense subnetworks in protein-protein interaction networks for a single species
In addition to addressing the new question of conservation of network patterns across species, our approach presents a scalable solution to active subnetwork identification, which has typically been restricted to relatively sparse protein-protein interaction networks. Sparse coverage of current protein-protein interaction studies limits the ability to match patterns across species. Recent work in area of genomic data integration helps to address this issue. Several approaches now exist which integrate interaction and other information to infer functional associations between genes, to form functional linkage networks
Given their more comprehensive coverage of a broad variety of gene relationships, functional linkage networks should allow for more sensitive discovery of networks that are differentially expressed under various conditions of interest. However, with their broader coverage also come several computational issues. Given the fact that functional linkage networks are orders of magnitude more dense than protein-protein interaction networks, existing algorithms for the discovery of dense subnetworks do not easily scale to this problem. Using functional linkage networks from human and mouse as a basis, we applied our scalable cross-species network discovery approach to identify conserved subnetworks that are differentially active in stem cells relative to differentiated cells based on parallel gene expression studies in mouse and human. We show that these conserved patterns are not likely to have occurred by chance, and that they are enriched for known as well as novel stem cell and differentiation-related processes. Another useful application of our approach is to find functional modules which have diverged or which have been rewired across the two species, which has been previously approached using expression data alone
We developed an algorithm to find conserved active subnetworks across species (
(A) The flowchart describes the growth of a subnetwork from a candidate seed gene (red) in the functional linkage network. (B) Genes that are functionally related to the seed are defined as those whose path confidence from the seed gene is above a certain threshold (colored yellow in A), and are considered to be the functional neighborhood of the seed. The aim of the approach is to integrate the expression data with functional linkage networks and discover active conserved subnetworks. (C) The candidate subnetwork initially contains the seed gene and is grown by adding genes iteratively from the functional neighborhood so as to maximize the average expression activity score of the genes in the subnetwork. At all iteration steps, the connectivity constraint must be satisfied before a candidate gene is added. The nodes in the growing subnetworks are genes and the edge-weights are derived from the functional linkage network in either species. The genes are colored green if they are up-regulated in stem cells relative to differentiated cells and red if they are down-regulated in stem cells relative to differentiated cells. The color intensity represents the expression normalized fold change in either direction.
To test our subnetwork discovery method, we compiled a compendium of gene expression data for mouse and human pluripotent stem cells. Briefly, 249 mouse and 132 human expression profiles were obtained from several independent datasets from the Gene Expression Omnibus (GEO) database
It is important to note that the method for differential expression analysis (or other means of generating activity scores) is completely independent of the subnetwork discovery algorithm. Our large compendium of stem cell expression data for mouse and human provided an interesting setting for subnetwork discovery, but our approach could also be applied to activity scores derived from more standard, single-dataset differential expression studies, assuming comparable datasets are available for two different species (see
We applied our subnetwork discovery approach to the results of the stem cell differential expression analysis and functional linkage networks from human and mouse. Human and mouse functional linkage networks were obtained from previous work
Conserved active subnetworks between human and mouse were identified by varying the two parameters of the algorithm, the average expression activity (normalized fold change) of the network, and the minimum clustering coefficient. This resulted in between 1 and 255 network(s) from the most conservative to the most lenient parameter settings, respectively. For example, at a network score cutoff of 0.15 (see
(A) The cross-species algorithm mines subnetworks in the functional linkage network with a high density of differentially expressed genes. The network score of a subnetwork reflects the average differential activity of all genes in the network. The number of subnetworks identified at a network score threshold is plotted (solid line) and is compared to the number of subnetworks identified after differential expression scores were randomly shuffled (dotted line). The parameters for average clustering coefficient are 0.1 for mouse and 0.2 for human. (B) The number of conserved subnetworks discovered is plotted for a range of connectedness parameters (minimum clustering coefficient). All clustering coefficients noted are relative to the background, single-gene average clustering coefficient, which is 0.08 for mouse and 0.35 for human.
To assess the statistical and biological significance of the networks, we performed a network randomization analysis. Specifically, the expression activity scores in both mouse and human were randomly shuffled five times with respect to the gene labels, and the algorithm was then applied to the shuffled expression profiles. Any conserved patterns of these randomized expression data on the functional linkage network should then represent false positives and not biologically relevant conservation. In all randomization experiments, the functional linkage network structure was retained and only gene activities were shuffled, so that we could specifically estimate the conserved expression patterns arising out of clustering of the active genes by random chance. Importantly, we found that while some subnetworks were discovered in various instances of the randomization experiment, far fewer subnetworks were discovered than for the original expression profiles (
We also evaluated the subnetworks in terms of their functional coverage and relevance. The function enrichment of the genes contained in each subnetwork was measured based on significant overlap with biological processes in the Gene Ontology
The 2D hierarchically clustered matrix of subnetworks' functions highlights functional enrichments based on Gene Ontology annotations (biological process category) for the mouse counterparts of all conserved active subnetworks. A subnetwork column is colored green if the subnetwork contained genes predominantly up-regulated in stem cells, red if the genes in the subnetwork are up-regulated in differentially expressed cells, and yellow, if the subnetwork contains mixed genes, some of which are more highly expressed in stem cells and some in differentiated cells. Enrichment was measured for all GO terms (Bonferroni-corrected p<0.05), and the enrichment patterns were clustered to reveal patterns of enrichment across the subnetworks. Enriched GO Terms for individual subnetworks have been uploaded on the subnetworks website and can be browsed at
We compared conserved subnetworks discovered by our approach to gene sets obtained from a simple intersection of orthologs on the human and mouse differentially expressed gene lists. One might suggest that a reasonable approach to finding the core conserved modules underlying stem cell pluripotency is to simply analyze the most extreme differentially expressed genes in both species. We attempted this approach by comparing the top 600 differentially expressed genes from mouse and human, which is comparable to the total number of genes contained across our subnetworks. There was relatively low overlap between the gene sets: of the 600 genes, only 36 are up-regulated in the both species while 34 are down-regulated (
Mouse Genes | Human Genes | Intersection |
|
Differentially expressed genes | 8141 | 5353 | 3282 |
Up-regulated in stem cells | 3955 | 3028 | 1367 |
Down-regulated in stem cells | 4186 | 2325 | 986 |
Number of genes covered by subnetworks | 607 | 607 | 601 |
Subnetwork genes which are up-regulated | 214 | 181 | 153 |
Subnetwork genes which are down-regulated | 220 | 214 | 129 |
*orthology clusters which belong to both the relevant mouse and human genes.
We were intrigued by the fact that our conserved subnetworks actually contained a significant fraction of genes (∼20%) that showed no evidence of differential expression. By its design (see
The subnetworks also sometimes contain mixed expression signatures (both up- and down-regulated genes) that are conserved across species, highlighting genes in the same pathway that are antagonistic or genes that exhibit different interactions at various stages of development. For example, one conserved network with mixed expression changes was centered about the important extracellular structural protein ostepontin (also known as secreted phosphoprotein 1, SPP1) (
To our knowledge, our method is the first attempt to interpret differential expression data by integrating with interaction networks across multiple species. Thus, we further assessed the advantages of simultaneous, cross-species network search as compared to active subnetwork discovery in a single species, which has been the focus of previous methods
The number of real subnetworks and random subnetworks at various network score cutoffs are plotted for MATISSE (A), Ingenuity (B), jActiveModules (C) and the single-species version of our algorithm (D). The network scores are the metric used by each algorithm to rank the subnetworks. Random subnetworks were obtained by running respective algorithms on the expression data, whose gene labels have been randomly shuffled. Each of the methods uses different forms of the expression data: MATISSE uses expression profiles; jActiveModules uses significance values of the genes; Ingenuity uses focus genes, for which we took any differential expressed gene whose log fold change value was greater (lesser) than 20% of the maximum (minimum) of the most up-regulated (down-regulated) gene; Our method uses fold change scores from the SAM analysis. The scale of the functional linkage network was reduced for all methods shown in (A–D) for a fair comparison. The cross species algorithm on the full network has also been shown for a complete comparison (E).
First Author | Year | # Nodes | # Edges | Weighted edges | # subnetworks reported in the study | Average size of subnetwork (# nodes) |
Ideker |
2002 | 77 | 362 | No | 5 | 11.4 |
Rajagopalan |
2004 | 9000 | 30000 | No | ∼100 | 34–50 |
Cabusora |
2005 | 106 | 233 | No | 2 | 65 |
Ulitsky |
2007 | 6230 | 89327 | No | 20 | 105.35 |
Guo |
2007 | 6509 | 23157 | No | 1 | 2181 |
Dittrich |
2008 | 2034 | 8399 | No | 1 | 46 |
Ulitsky |
2009 | 6220 | 63989 | Yes | 14 | 33.6 |
Our study - mouse | 17868 | 2700000 | Yes | 116 | 11.7 | |
Our study - human | 15806 | 6000000 | Yes | 127 | 16.6 | |
Our study - Cross species (neXus) | Yes | 255 | 22 |
Although our main contribution in this work is the cross-species algorithm, we found a single-species version of our approach performed favorably in comparison to existing approaches (
Perhaps the most striking result of our comparison was our finding that any single species approach, including our own, performed much worse than our cross-species subnetwork discovery algorithm. For example, in the single-species setting for mouse, we were able to find 164 subnetworks while discovering an average of 71 (standard deviation of 7.8) subnetworks in our randomization experiments under the same setting (mouse, clustering coefficient threshold = 0.1, network score cutoff = 0.3), suggesting an enrichment of approximately 2.5-fold (
The improvement in sensitivity and specificity by the cross-species approach is a particularly interesting result because it suggests that simultaneous cross-species network discovery can serve as an effective means of improving the signal-to-noise ratio in network discovery even if one is not necessarily interested in asking questions about conservation across species. More pessimistically, this result suggests that separating biologically relevant active subnetworks from random networks based on a single functional linkage network is a challenging problem.
The enhanced performance of the cross-species approach can be attributed to the fact that coordinated expression changes can be reasonably clustered in both species' functional linkage networks. Due to the small-world nature of functional linkage networks (or protein-protein interaction networks)
The difficulty in identifying subnetworks from a list of genes within a single species has important implications for how the statistical significance of such networks should be assessed. This problem often arises in practice during the interpretation of candidate gene lists. For example, analysis tools such as Ingenuity Pathway Analysis (Ingenuity® Systems,
Using the cross-species network discovery algorithm, we are able to find subnetworks reflecting conserved functional modules between mouse and human pluripotent stem cells. We found many of these subnetworks to be monochromatically active in stem cells or differentiated cells. This was not a prerequisite for network discovery, but reflects that the majority of genes supporting a local process are regulated in the same direction. Monochromatic subnetworks up-regulated in stem cells were our primary focus because these reflected potential candidate processes that are necessary for maintaining a pluripotent, self-renewing stem cell state. One of the most significant conserved subnetworks of this type captures the core pluripotency circuit in embryonic stem cells (
Subnetworks (A–D) are examples of interesting conserved subnetworks discovered by the cross-species network search algorithm on differentially expressed genes between stem cells and differentiated cells. Each subnetwork represents a subgraph of mouse (left column) and human (right column) functional linkage networks, respectively. Nodes are genes and they are colored green if the gene is up-regulated in stem cells when compared to differentiated cells and red if down-regulated in stem cells relative to differentiated cells. The intensity of green or red color of the genes represents the normalized fold change of the expression. The edge thickness in the subnetworks represents the edge confidence based on the functional linkage networks. The subnetwork (A) shows a conserved subnetwork which contains important stem cell transcription factors. The subnetwork (B) highlights cell cycle related pathway genes. The subnetworks (C, D) are mixed subnetworks, as they contain both up-regulated and down-regulated genes. The genes are functionally related but their mode of function is antagonistic in nature.
Another highly significant subnetwork discovered by our approach pertains to the control of cell cycle progression in ES cells (
Many conserved subnetworks also included genes that are up-regulated during the initiation of differentiation. This supports the idea that the maintenance of ES cell phenotype requires the suppression of differentiation-associated gene expression as well. One interesting example of this phenomenon was highlighted in a third subnetwork discovered by our approach, which was centered on the protein ZIC3 (
This network in particular provides an illustrative example of how subnetwork discovery can provide novel testable experimental hypotheses. This hypothesis could be explored experimentally through RNAi knockdown of
Another interesting subnetwork found by our approach was centered around the seed gene SIRPA. The only gene in the whole subnetwork that is found to be up-regulated in mouse and human pluripotent stem cells is
While the hypotheses suggested by the discovered subnetworks ultimately require experimental follow-up, these examples illustrate that the networks capture many of the well-characterized processes supporting stem cell pluripotency as well as implicating some novel players. In general, the process of active subnetwork discovery can play an important role in interpreting differential expression or other genome-wide data. Active subnetworks, and in particular those that are conserved across species, provide evidence that a whole process or pathway is up/down-regulated, which is more definitive than the type of information provided by a differential expression list, for example. A single highly differentially expressed gene is less compelling than an entire functional module with evidence of differential expression. Furthermore, because the underlying functional linkage networks are based on large collections of genomic data, our approach can potentially identify functional modules that are not yet characterized, but that play a critical role under the conditions being studied.
We modified the cross-species network discovery algorithm to discover subnetworks that are markedly different in the expression patterns between the two species (see
(A) The number of species-specific subnetworks discovered is plotted versus the network score cutoffs and compared with the number of subnetworks generated by applying the same approach after randomly shuffling gene labels in the expression data. Species-specific networks represent subnetworks with highly divergent patterns across species. (B) An example species-specific subnetwork that highlights the difference in expression of BMP2 pathway related subnetwork in human and mouse. The subnetwork nodes are genes, whose color represent whether are they are active in stem-cells (green) or differentiated cells (red) and intensity of the color represent the degree of expression activity. The thickness of edges of the subnetwork represents the edge confidence based on the functional-linkage network.
Nevertheless, we find interesting subnetworks which highlight differences between gene expression in mouse and human stem cells. For example, one species-specific subnetwork (
To facilitate public access to active cross-species subnetworks identified by our approach, we developed a web-based interface for convenient browsing of conserved and species-specific stem cell expression signatures (
We have described a scalable approach for discovering conserved active subnetworks across species. Starting from candidate gene lists reflecting parallel differential expression studies in two different species, we are able to search for dense subnetworks with conserved patterns of differential expression. In contrast to previous active subnetwork discovery algorithms, our approach not only extends this idea across species, but also enables application of the approach to functional linkage networks as opposed to sparse protein-protein interaction networks. Functional linkage networks integrate information from a diverse collection of genomic and/or proteomic studies (including protein-protein interactions), and thus offer the potential for more sensitive discovery of active subnetworks, including those which involve previously uncharacterized genes.
We applied our approach to a differential expression study between pluripotent mouse and human stem cells versus their differentiated cell types to produce several hundred subnetworks that reflect conserved changes between mouse and human. Network search across species produced specific hypotheses about conserved and differentiated mechanisms of stem cell maintenance, and importantly, demonstrated that such an approach can be an effective means of filtering noise from the active subnetwork discovery problem. We found that identifying statistically significant active subnetworks independently within a single species may be a harder problem than previously appreciated, and we suggest the cross-species approach as one solution to this problem.
Despite the success of our approach, there are a number of promising directions for further improvement and broader application of the method. While the approach was successfully applied to relatively dense functional linkage networks for mouse and human, it is a computationally challenging problem, and the algorithm cannot be applied in real-time as it still requires several days to run. Strategies for improving the efficiency of conserved network discovery and more formal selection criteria for the parameters associated with our approach are both useful future directions. Furthermore, the approach can be readily extended to discover conserved subnetworks across more than just two species, which will make another fruitful direction as we begin to accumulate functional genomic data across a broad variety of other model organisms. Finally, although our study focused on the interpretation of candidate gene lists derived from differential expression analyses, the algorithm is general and can be readily applied to interpret lists arising from other genomic screens, including, for example, genome-wide association studies.
249 mouse microarray data samples were obtained from 20 GEO datasets (
We used the mouse functional linkage network previously published in
The algorithm identifies functional modules enriched for active genes in both species under consideration. Conserved active modules are found based on two criteria: (1) a high degree of clustering in both species' functional linkage networks, and (2) a high average normalized differential expression fold-change (network score) sharing the same sign across species. Because the search space is exponential, a greedy heuristic is applied to expand subnetworks from candidate seed genes. Each candidate network is grown until it fails to meet one of the constraints. This algorithm is implemented in Python and the source code can be downloaded from the supplementary website (
# assuming global mouseDifferentialGenes, humanDifferentialGenes, mouseFN, humanFN
function subnetworks()
for seed
mouseGenesInConsideration = DepthFirstSearch++(seed, mouseFN)
humanGenesInConsideration = DepthFirstSearch++(seed, humanFN)
genesInConsideration = mouseGenesInConsideration
growingSubnetwork = [seed] # list with single gene
while growingSubnetwork can be grown
addBestGene(growingSubnetwork, genesInConsideration)
store subnetwork
return stored subnetworks
function DepthFirstSearch++(gene, seed, functionalNetwork, threshold)
for gene
if
return included genes
function addBestGene(growingSubnetwork, genesInConsideration)
return gene in genesInConsideration \ growingSubnetwork such that score(growingSubnetwork+ gene) is the maximum
function score(subnetwork)
if clustering coefficient of subgraphs of subnetwork in mouseFN and humanFN is not within
constraints return 0
return average of score(gene) of all genes in subnetwork
function score(gene) # for neXus, the scoring is simple foldchange[gene] for single species experiment
return sign(mousefoldchange[gene]*humanfoldchange[gene])* sqrt(abs(mousefoldchange[gene]*humanfoldchange[gene] ))
The network score of a cross-species subnetwork is the average activity scores (described below) of the genes in the two species' subnetworks given that they obey the following constraints: first, the subnetworks satisfy a connectedness constraint on their respective functional linkage network; second, the network score of the subnetwork is above a threshold. In all other cases, the score of the subnetwork is zero. The first condition guarantees that the genes in the subnetwork are interconnected in each species' functional linkage network, which suggests the corresponding set of genes represents a functional module. By enforcing this constraint on both species, conserved modules are selectively chosen. The second constraint guarantees that the subnetwork exhibits a high degree of differential expression, which reflects a coherent response to the phenotype or conditions under consideration.
The connectedness of a subnetwork is quantified by the average weighted clustering coefficient of the subnetwork, which is the ratio of existing connections between the neighbors to the total pairs of neighbors possible. The clustering coefficient for node k is given by
Subnetworks are grown greedily to optimize the subnetwork score, starting from each gene as a seed. The genes are added from a pool of genes in functional proximity to the seed gene, which are defined by any genes within a minimum path confidence, i.e. the product of all weighted edge confidences in the path, from the seed gene. This pool of genes is discovered using a modification of the depth first search algorithm. Nodes are picked starting from the seed gene, in depth-first fashion, and if the confidence of the path of the searched gene from the seed gene exceeds a threshold (mouse >0.3, human >0.8), it is selected. Subnetworks are grown iteratively by selecting the single gene from the functional neighborhood pool at each stage that maximizes the subnetwork activity score. For each gene in the pool, this score is calculated by adding that gene
For the cross-species network discovery approach, the networks are simultaneously grown in parallel. As described above, the activity score is based on the geometric mean of two or more orthologs' normalized differential expression scores, so selected orthologs are added to the respective subnetworks at each step.
All genes for both human and mouse were mapped to Inparanoid clusters
To estimate the significance of the obtained subnetworks, randomization experiments were carried out. For both species, the differential expression values were shuffled independently relative to the gene names to remove any connection between them. Fold change values were only shuffled among genes present in the functional linkage network, while the functional linkage network was kept the same. The network discovery algorithm was then run on the shuffled expression data to discover any conserved subnetworks. This entire process was repeated several times to establish a mean and standard deviation for the number of conserved subnetworks identified by chance, which was used to assign confidence values for the real subnetworks. Alternative randomizations schemes provided similar results, and they are described in more detail in
Gene Ontology
neXus applied to a single-dataset differential expression analysis. neXus was applied to differential expression lists resulting from analysis of one mouse dataset (GSE3653) and one human dataset (GSE9940). For a clustering coefficient constraint of 0.1 on the mouse network and 0.2 on the human network, we plotted the number of distinct subnetworks generated for a range of network score cutoffs. Overlapping subnetworks were removed when their member genes overlapped more than 60% with larger subnetworks. The number of subnetworks obtained given randomized differential expression values for human and mouse across 5 different random instances is also plotted. We observe a similar enrichment over random subnetworks as in the analysis described in the
(0.03 MB PDF)
Parameter sensitivity analysis to randomized expression data. The cross-species subnetwork discovery algorithm depends on the setting of two parameters: a network score cutoff and a clustering coefficient constraint. Based on 5 random instances in which the differential expression data were shuffled for both species, this figure shows how the number of random conserved subnetworks discovered varies with changes in both the clustering coefficient and network score parameters. This figure can be compared to the parameter sensitivity analysis of real discovered subnetworks (
(0.03 MB PDF)
Fraction of random to real subnetworks vs. network score cutoff. For a range of network score cutoffs (average normalized fold change), the cross-species subnetwork discovery approach was run on the real differential expression values as well as on several random instances, where the differential expression data were shuffled with respect to the gene labels. At each parameter setting, the ratio of the number of subnetworks obtained from the random instances was measured relative to the number of real subnetworks (noise to signal ratio). The parameters used for this experiment are clustering coefficient 0.1 and 0.2 for mouse and human respectively and >0.15 for network score cutoff.
(0.03 MB PDF)
Analysis of ortholog overlap in differential expression lists vs. conserved subnetworks. To address the question of whether the core conserved modules involved in stem cell pluripotency could be identified by simply comparing the most highly differentially expressed genes in both species, we compared among differentially expressed genes to that obtained from our subnetworks. Specifically, we selected a subset of the significantly differentially expressed genes (based on SAM) that was similar in size to the total number of genes that appear in the human and mouse subnetworks produced by our approach (∼600 genes). This gene list contained roughly half up- and half down-regulated genes. We then measured the intersection (based on our orthology mapping) between the human and mouse gene lists, which resulted in 36 up-regulated and 34 down-regulated genes in common. Although this overlap is highly statistically significant, it is much lower than the overlap between the mouse and human gene lists in the subnetworks produced by our approach (overlap of 601 as compared to 70). The subnetworks from our approach were obtained with clustering coefficient constraints of 0.1 on the mouse network and 0.2 on the human network and a network score cutoff of 0.15.
(0.04 MB PDF)
Example conserved active subnetworks. Subnetworks (a–b) are interesting subnetworks discovered by the cross-species network search algorithm on differentially expressed genes between stem cells and differentiated cells. Each subnetwork represents a subgraph of the mouse (left column) and human (right column) functional linkage networks. Nodes are genes, and they are colored green if they are up-regulated in stem cells relative to differentiated cells. The intensity of the green or red color of the genes represents the normalized fold change in expression. The edge thicknesses in the subnetworks represent the edge confidence based on the functional linkage networks. The subnetwork (a) shows that TEP1 is not differentially regulated in the subnetwork enriched for transcription factor genes. The subnetwork (b) is an interesting case where both up-regulated and down-regulated genes are found in the subnetwork.
(0.04 MB PDF)
Cumulative size distribution of subnetworks generated by existing methods. All methods were run on the mouse reduced functional linkage networks (50,000 highest weight edges). For each method, the subnetworks were sorted in term of the sizes and the sizes were plotted against their rank in the sorted list. The greater the difference between the real and random curve, the greater the confidence we can have in the biological significance of the real subnetworks. To display the utility of our cross species approach, we ran the approach (clustering coefficient parameters >0.1 and >0.2 for mouse and human, respectively and network score >0.15) on the full functional linkage networks which is also shown for comparison.
(0.04 MB PDF)
Evaluation of single species approach. The figures show the comparison of number of real subnetworks to average of random subnetworks over multiple experiments (5), when the single species variant of the network search algorithm was applied to the human and mouse expression data and functional linkage networks. The number of subnetworks identified at increasingly network score criteria is indicated when the algorithm was applied independently to (A) mouse (clustering coefficient criterion >0.2) and (B) human (clustering coefficient criterion >0.5).
(0.03 MB PDF)
Subnetwork evaluation based on alternative randomization schemes. In addition to the randomization scheme described in the
(0.03 MB PDF)
Overlap between human and mouse genes covered by MATISSE and our cross species algorithm.
(0.17 MB PDF)
Analysis of considerable overlap between the subnetworks of the two species obtained through MATISSE and our cross species algorithm.
(0.18 MB PDF)
Summary of Mus musculus microarray data.
(0.27 MB PDF)
Summary of Homo sapiens microarray data.
(0.27 MB PDF)
List of GO Terms enrichments for stem cells, differentiated cells and mixed subnetworks.
(0.19 MB XLS)
This document contains the following supplementary notes: Note 1: Implications of using functional linkage vs. physical interaction networks for active subnetwork discovery; Note 2: neXus applied to single dataset differential expression study; Note 3: Independence of the datasets; Note 4: Comparison of the overlap of mouse and human subnetworks discovered through MATISSE and neXus; Note 5: Other randomizations.
(0.42 MB PDF)