PB and LS analyzed the data and wrote the paper. PB contributed reagents/materials/analysis tools.
The authors have declared that no competing interests exist.
Progress in uncovering the protein interaction networks of several species has led to questions of what underlying principles might govern their organization. Few studies have tried to determine the impact of protein interaction network evolution on the observed physiological differences between species. Using comparative genomics and structural information, we show here that eukaryotic species have rewired their interactomes at a fast rate of approximately 10−5 interactions changed per protein pair, per million years of divergence. For
To understand how the cell performs the required biological functions and reacts to changes in the environment, scientists have been studying how cellular components interact. In recent years, new experimental methods have immensely increased our ability to map out these connections. However, it is important to keep in mind that biological systems are constantly evolving to cope with environmental changes. What then is the impact of the genomic variability brought by point mutations, segmental duplications, etc., on these interaction networks? We have tried here to quantify the rate by which protein interactions changed during the evolution of eukaryotic cells. According to the authors, about 0.5% to 3% of the interactions can change every million years. Also, protein properties, such as binding specificity (defined as the number of binding surfaces or binding partners) and protein function, help determine the rate of interaction turnover. This work suggests that protein interactions are evolutionarily plastic and the fact that a group of proteins has been conserved in different genomes does not mean that their interaction repertoire and functions are necessarily conserved. This work emphasizes the importance of studying biological systems in the context of evolutionary change.
Many partial protein interaction maps for several eukaryotic species have now been published [
These studies have not, however, taken into account the large evolutionary distance separating the species under study. In fact, the four eukaryotic species for which we currently have the most interaction data (
Even without the ability to compare interactomes directly, one could try to obtain estimates for the rate of change in interactomes by mining existing data with comparative genomics. Previously, Wagner [
We have attempted to evaluate the rate of change of interactions in the interactomes of several eukaryotic species (
To calculate the rate of interaction change, we have established for each protein, in all species studied here, an approximate age of origin, according to the presence or absence of an identifiable ortholog in several other reference species (see
Eukaryotic Species Had in the Recent Evolutionary Past a Fast Rate of Change of Interactions
Due to the low coverage of the current interactomes, it is quite possible that these values might change as new data is made available. To study the impact of coverage on the values mentioned above, we have mimicked the effect of lowering the coverage of the current datasets by randomly sampling the interactomes in two ways: randomly removing protein interactions or randomly removing proteins (and their interactions).
The percentage of inherited interactions increased with increasing coverage; approximately linearly in the case of random node removal and nonlinearly in the case of random protein removal (see
The rate of change of interactions, on the other hand, appears to be independent of network size. There was no significant change in the rate in the case of random removal of proteins. In the case of random removal of interactions, only when more than 80% of the interactions were removed was there a significant increase of the rate (see
To test the robustness of our method for variations in accuracy of the data, the human interactome was separated into three subsets, as defined in the Human Protein Reference Database: (1) yeast two-hybrid (including the two recent high-throughput studies [
Topological analysis of protein interaction networks has shown previously that the distribution of the number of interactions follows a power law, such that the frequency of proteins with n interactions falls off as n−α (where alpha is the exponent of the power law) [
We asked if there was also a correlation between the number of partners of a protein and the rate of change of its interactions. Given that the rate of change was calculated as the number of changed interactions per protein pair per My, no bias was expected for proteins having different degrees of connectivity. Using the data for different species, we observed a linear correlation between the number of binding partners and the link turnover for all species studied (
We have binned proteins according to their average number of interactions and calculated for each bin the rate of change of interactions. There is a very strong correlation between the degree of connectivity and the interaction turnover.
The initial observations about the robustness of power law networks to random node removal [
It is known that interfaces of transient protein–protein interactions are less restricted in evolution than the binding surfaces of permanent complexes. Also, interacting residues of transient complexes are less likely to co-evolve than interaction residues of permanent complexes [
Domains Found to Contribute to the Fast Rate of Change of Interactions in at Least Three of the Four Species Studied
Many domain–peptide interactions involved a globular domain binding to a peptide that does not adopt a regular secondary structure and that is not part of the globular region of the target protein [
To test this hypothesis further, we analyzed a database containing structures of interacting protein domains [
We have grouped proteins containing domains with increasing observed structural interactions with other domain types and calculated for each bin the rate of change of interactions. Proteins containing domains known to interact with many other different domains have a higher rate of change of interactions than proteins containing domains with few known interactions.
For promiscuous proteins (containing domains capable of interacting with 15 or more other domains), we calculated the rate of change of interactions and compared this value with that obtained for proteins containing more specific domains (fewer than five structural interactions in iPfam) and the average for all proteins. We observed that peptide-binding domains and promiscuous domains have a higher rate of change of interactions (
Specificity of Protein Binding Is an Important Factor Determining the Rate of Change of Interactions
We have also studied domains with different binding specificities as defined by their number of observed physical interactions. Using the iPfam [
We conclude from these results that the specificity, as defined by the number of binding surfaces or physical interactions, of a binding domain is a strong determinant of the rate of change of interactions, with more promiscuous binding correlating with higher rates of evolution.
The results above suggest that specificity, as defined here, determines link dynamics, affecting the rate at which proteins might explore possible beneficiary interactions and remove deleterious ones. Innovation explored in this way is grounds for natural selection to act upon during evolution. It is then plausible that proteins belonging to different functional classes might have different rates of change of interactions due to differential selection pressures. To study this, we have binned human proteins according to Gene Ontology biological processes [
We have binned proteins according to the biological processes, defined in Gene Ontology, and calculated for each bin the average number of interactions and average rate of change of interactions (see
Most groups of proteins had average rates of change that were not much different than groups of proteins with a similar connectivity. Within these groups we could distinguish between processes that have proteins with similar or lower-than-average rates (such as metabolism) and biological processes that have above-average rates (such as intracellular signaling, phosphorylation, regulation of cellular processes, and regulation of apoptosis). These results confirm the suggestion of Kunin et al. [
More interestingly, we found some biological processes (immune response, transport and localization, cell adhesion, and response to stress/stimulus) that showed higher link dynamics than one would expect from their average number of interactions. On
Biological Processes with Above-Average Rate of Change of Interactions
We redid this analysis removing GO annotations inferred electronically. Although roughly 50% of the annotations were lost, most of the results remain qualitatively the same (unpublished data). Importantly, we still see GO functions that have a rate significantly higher than expected from their average connectivity (organismal physiological process, defense response, immune response, and response to biotic stimuli). We hypothesize that the groups of proteins deviating from the linear preferential turnover have been under particularly strong positive selection for the change of their interactions.
In the seminal work of King and Wilson [
Extending on the work of Wagner [
Some caveats to our estimated rate should be noted. Namely, we have focused our attention on the evolution of protein interactions of single gene duplicates. The effects of single gene duplication could be considerably different from large segment or whole-genome duplication events [
Also, it has been proposed that duplicate genes pass through a period of relaxed selection after gene duplication [
Other studies also point to the importance of change of interactions after gene duplication. In studies of
In a recent review [
We reported that proteins involved in the immune response, responses to external stimuli, transport, establishment of localization, and organismal physiological processes show signs of such positive selection for new interactions. Interestingly, most of these biological processes are known to have an excess of proteins under positive selection as shown by sequence studies [
This study opens up interesting questions regarding the evolution of cellular functions. Some challenges faced by the cells require the interaction of several components to integrate information and provide a solution. One example would be the decision to divide or differentiate given a set of external conditions. It could be said that these challenges require a network solution as opposed to some metabolic problems, such as adapting an enzyme to do a required metabolic step.
In network challenges as defined above, selection forces would not restrain the exact binary interactions, but rather the functional complexes arising from them. It is plausible that the fast link dynamics are then advantageous to the cell, given that it allows for exploration of different network conformations from where innovation might arise.
If there is indeed a fast turnover of interactions that are material for selection to act upon, then we expect to see convergent network motifs that are optimal for solving particular cellular problems. An example of what might be an optimal network solution is the coupling of slow and fast positive feedbacks in cell decision processes [
If fast link dynamics is important for the cell to search for optimal solutions to network problems, then is it also likely that the rate of change itself might be under constraint and therefore under natural selection. Hence, during cellular evolution, the selection of different degrees of specificity is not only important for the functional role of the proteins but it also has direct consequences with regard to the evolvability of the whole cellular network.
Further work on protein interaction maps will help us understand to what extent evolvability constrains the differential usage of protein domains in cellular networks. As was the case for comparative genomics, the availability of more and complete interactomes for different species will vastly increase our understanding of how the cell's complexity arises from the interactions of its components and evolves to cope with changing environments.
For each protein of
We established putative orthologs between
We considered that proteins with no apparent ortholog in any of the reference species likely originated after the divergence of the most recently diverged reference species.
The interactomes used for
We considered that an interaction was inherited in the process of duplication when an interaction to a recently duplicated protein was also observed with its closest homolog. Removing these interactions we were left with protein interactions that were either gained in the copy we are considering or were inherited by duplication and subsequently lost in the homolog. Either of these cases represents an event of interaction change that occurred after the gene duplication event.
The rate of change of interactions can be calculated by: rate = changed interactions/(possible protein pairs * divergence time). Designating the recently duplicated proteins as Pnew and proteins originated before the split with the most recently diverged reference species as Pold, then: changed interactions = changed interactions among Pnew + changed interactions between Pnew and Pold. Possible protein pairs = Pnew * Pold + (Pnew * (Pnew − 1)/2). Divergence time = divergence time of the most recently diverged reference species (see above).
To determine if the number of interaction partners of a protein correlates with the rate of change of interactions, we have binned all proteins in Pold according to the number of interactions to other proteins in Pold. We considered bins of proteins with i to i + 5 interactions, with i ranging from one to 20. For each of these bins, Pold(bin), the rate of change of interactions was considered to be: rate of change (bin) = changed interactions between Pnew and Pold(bin)/(Pold(bin) × Pnew × divergence time).
We observed a very strong preferential turnover in all species such that proteins with a higher degree of connectivity have a higher rate of change. For proteins with k interactions, the rate of change, r, can be calculated by:
To study protein domains with different binding specificity, we binned proteins containing domains with increasing number of interactions with other domains (extracted from the iPfam database) and calculated for each bin the rate of change. We considered bins of proteins in Pold having proteins domains with i to i + 10 iPfam interactions with i ranging from one to 15. We calculated the rate of change (bin) as above.
We have used the iPfam database to search for plausible binding interfaces in all human interactions derived from the human protein reference database. We could assign a possible binding interface to ~20% of the human interactome. We then built two groups of proteins according to the number of binding interactions per domain. We selected a group of proteins that had three or more interactions through one domain (likely more promiscuous domains), and a second group of proteins that interacted with three or more partners via multiple domains (likely more selective domains). We then further subdivided the two groups into bins protein with i to i + 5 interactions with i ranging from five to 15 and calculated the rate (bin) as above (see
To study the different biological processes, we have binned proteins in Pold according to the biological processes defined in Gene Ontology [
(A,B) Sampling of
(A,C,E,G) Sampling was done by randomly removing interactions. Any protein with no interaction is no longer considered as part of interactome.
(B,D,F,H) Sampling was done by randomly removing proteins and their interactions. Any protein with no interaction is no longer considered as part of interactome. Filled squares (▪), rate of change of interactions. Open squares (□), fraction of interactions conserved after duplication.
(145 KB PPT)
(A) We binned all proteins according to the number of interactions with proteins originated before the split with the reference species and calculated the rates of change of interactions with recently duplicated proteins.
(B) We mapped the most likely interacting domains in the human interactome using the database of interacting motifs. We selected two groups of proteins: blue circles (•), proteins having three or more interactions through at least two or more domains; red circles (•), proteins having three or more interactions through the same domain. We binned both groups according to the number of interactions occurring in the full interactome with proteins originated before the split with the reference species and calculated the rates of change of interactions with recently duplicated proteins.
(40 KB PPT)
To test for a possible bias of the experimental method used in determining protein interactions, we divided the interactions of the human dataset into three subsets, as defined in the Human Protein Reference Database: yeast two-hybrid, in vitro studies such as GST pull-down, and in vivo studies such as co-immunoprecipitation. The estimated rate of change of interactions calculated with the yeast two-hybrid method (including human high-throughput studies) was only marginally higher than those observed with the other two datasets (obtained exclusively from literature-derived protein interactions).
(31 KB DOC)
To increase the reliability of the rate of change for each domain, we have selected only domains that were represented in most species by at least 20 domains. Of all Interpro domains, 96 observe this condition. Of these 96 domains, eight have an above average rate of change for at least three species studied. We can say that these eight domains consistently contribute to the fast rate of change in most species.
(26 KB DOC)
We are grateful to Martin Lercher, Ignacio Enrique Sanchez, Mark Isalan, Caroline Lemerle, and Silvia Santos for useful criticism and discussion. Pedro Beltrao is supported by a grant from Fundação para a Ciência e Tecnologia through the Graduate Programme in Areas of Basic and Applied Biology.
million years
ubiquitin-associated