<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "http://dtd.nlm.nih.gov/publishing/2.0/journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="2.0" xml:lang="EN"><front><journal-meta><journal-id journal-id-type="publisher-id">plos</journal-id><journal-id journal-id-type="publisher">pcbi</journal-id><journal-id journal-id-type="flc">plcb</journal-id><journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id><journal-id journal-id-type="pmc">ploscomp</journal-id><journal-title>PLoS Computational Biology</journal-title><issn pub-type="ppub">1553-734X</issn><issn pub-type="epub">1553-7358</issn><publisher><publisher-name>Public Library of Science</publisher-name><publisher-loc>San Francisco, USA</publisher-loc></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.1371/journal.pcbi.0020079</article-id><article-id pub-id-type="publisher-id">05-PLCB-RA-0331R3</article-id><article-id pub-id-type="sici">plcb-02-07-07</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="Discipline"><subject>Biochemistry</subject><subject>Computational Biology</subject><subject>Molecular Biology</subject><subject>Computational Biology/Systems Biology</subject></subj-group><subj-group subj-group-type="System Taxonomy"><subject>Saccharomyces</subject><subject>Drosophila</subject><subject>Caenorhabditis</subject><subject>Homo (human)</subject></subj-group></article-categories><title-group><article-title>Protein–Protein Interactions More Conserved within Species than across Species</article-title><alt-title alt-title-type="running-head">PPIs more conserved within than across species</alt-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Mika</surname><given-names>Sven</given-names></name><xref ref-type="aff" rid="aff1">
            <sup>1</sup>
          </xref><xref ref-type="aff" rid="aff2">
            <sup>2</sup>
          </xref><xref ref-type="corresp" rid="cor1">
            <sup>*</sup>
          </xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Rost</surname><given-names>Burkhard</given-names></name><xref ref-type="aff" rid="aff2">
            <sup>2</sup>
          </xref><xref ref-type="aff" rid="aff3">
            <sup>3</sup>
          </xref><xref ref-type="aff" rid="aff4">
            <sup>4</sup>
          </xref></contrib></contrib-group><aff id="aff1">
				<label>1</label><addr-line> Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
			</addr-line></aff><aff id="aff2">
				<label>2</label><addr-line> Columbia University Center for Computational Biology and Bioinformatics, Irvine Cancer Center, New York, New York, United States of America
			</addr-line></aff><aff id="aff3">
				<label>3</label><addr-line> Institute of Physical Biochemistry, University Witten/Herdecke, Witten, Germany
			</addr-line></aff><aff id="aff4">
				<label>4</label><addr-line> NorthEast Structural Genomics Consortium, New York, New York, United States of America
			</addr-line></aff><contrib-group><contrib contrib-type="editor" xlink:type="simple"><name name-style="western"><surname>Rzhetsky</surname><given-names>Andrey</given-names></name><role>Editor</role><xref ref-type="aff" rid="edit1"/></contrib></contrib-group><aff id="edit1">Columbia University, United States of America</aff><author-notes><fn id="ack1" fn-type="con"><p>SM conceived and designed the experiments. SM performed the experiments. SM analyzed the data. BR contributed reagents/materials/analysis tools. SM and BR wrote the paper.</p></fn><corresp id="cor1">* To whom correspondence should be addressed. E-mail: <email xlink:type="simple">mika@rostlab.org</email></corresp></author-notes><pub-date pub-type="ppub"><month>7</month><year>2006</year></pub-date><pub-date pub-type="epub"><day>21</day><month>7</month><year>2006</year></pub-date><pub-date pub-type="epreprint"><day>18</day><month>5</month><year>2006</year></pub-date><volume>2</volume><issue>7</issue><elocation-id>e79</elocation-id><history><date date-type="received"><day>18</day><month>11</month><year>2005</year></date><date date-type="rev-recd"><day>18</day><month>5</month><year>2006</year></date></history><copyright-statement>Mika and Rost. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</copyright-statement><copyright-year>2006</copyright-year><abstract><p>Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including <italic>in silico</italic> inferences and predictions. <ext-link ext-link-type="uri" xlink:href="http://www.rostlab.org/results/2006/ppi_homology/" xlink:type="simple">http://www.rostlab.org/results/2006/ppi_homology/</ext-link></p></abstract><abstract abstract-type="synopsis"><title>Synopsis</title><p>The IntAct database contains about ten large-scale data sets of protein–protein interactions. Each set contains thousands of experimentally observed pair interactions. Most pairs were observed in yeast (<named-content content-type="genus-species" xlink:type="simple">Saccharomyces cerevisiae</named-content>), fly (<named-content content-type="genus-species" xlink:type="simple">Drosophila melanogaster</named-content>), and worm (<named-content content-type="genus-species" xlink:type="simple">Caenorhabditis elegans</named-content>). These interactions are often perceived as model organisms in the sense that one can infer that two mouse proteins interact if one experimentally observes the two corresponding proteins in worm to interact. Here, the authors analyzed in detail how the sequence signals of physical protein–protein interactions are conserved. It is a common assumption that protein–protein interactions can easily be inferred through homology transfer from one model organism to another organism of interest. Here, the authors demonstrated that such homology transfers are only accurate at unexpectedly high levels of sequence identity. Even more surprisingly, homology transfers of protein–protein interactions are significantly more reliable for protein pairs from the same species than for two protein pairs from different organisms. The observation that interactions were much more conserved within than across species was valid for all levels of sequence similarity, i.e. for very similar as well as for more diverged interologs.</p></abstract><counts><page-count count="12"/></counts><custom-meta-wrap><custom-meta><meta-name>citation</meta-name><meta-value>Mika S, Rost B (2006) Protein-protein interactions more conserved within species than across species. PLoS Comput Biol 2(7): e79. DOI: <ext-link ext-link-type="doi" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.0020079" xlink:type="simple">10.1371/journal.pcbi.0020079</ext-link></meta-value></custom-meta></custom-meta-wrap></article-meta></front><body><sec id="s1"><title>Introduction</title><sec id="s1a"><title>Experiments Peek at Complete Protein–Protein Networks</title><p>The faster large-scale sequencing projects determine the alphabet of life, the higher the pressure to determine some of the actual processes that make life what it is. The understanding of functional relations among all proteins is essential to understanding how cells work. Recent breakthroughs in experimental high-throughput techniques have begun to peek at complete protein–protein interaction networks of entire organisms (<xref ref-type="supplementary-material" rid="pcbi-0020079-st001">Table S1</xref>). One central method is to use yeast two-hybrid (Y2H) assays [<xref ref-type="bibr" rid="pcbi-0020079-b001">1</xref>] that are based on a genially simple idea: first, separate two domains (activation and DNA-binding) of a transcription factor that activates a reporter gene, then merge each of the two domains to a different protein (A and B) [<xref ref-type="bibr" rid="pcbi-0020079-b002">2</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b003">3</xref>]. If A and B interact, the two transcription domains will merge, and thereby activate the reporter gene that will be detected. The difficulty of using Y2H is in mastering the details of the experimental setup. Other high-throughput methods to detect protein–protein interactions, such as phage-display assays [<xref ref-type="bibr" rid="pcbi-0020079-b004">4</xref>], tandem affinity purifications (TAP) [<xref ref-type="bibr" rid="pcbi-0020079-b005">5</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b006">6</xref>], co-immunoprecipitation, and affinity chromatography [<xref ref-type="bibr" rid="pcbi-0020079-b002">2</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b007">7</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b009">9</xref>], are also commonly used. An important advantage of using Y2H over these other high-throughput techniques is the ability to measure physical interactions between proteins as opposed to pure functional associations. Also, Y2H experiments work with physiological conditions, i.e., conditions that resemble those in eukaryotic cells [<xref ref-type="bibr" rid="pcbi-0020079-b002">2</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b003">3</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b010">10</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b011">11</xref>]. Ito et al. [<xref ref-type="bibr" rid="pcbi-0020079-b012">12</xref>] and Uetz et al. [<xref ref-type="bibr" rid="pcbi-0020079-b013">13</xref>] first scanned large fractions of the yeast proteome for protein–protein interactions. Others added further interactions: Ho et al. [<xref ref-type="bibr" rid="pcbi-0020079-b014">14</xref>] used mass spectrometry and Gavin et al. [<xref ref-type="bibr" rid="pcbi-0020079-b015">15</xref>] used TAP. Protein networks in the fly (<named-content content-type="genus-species" xlink:type="simple">Drosophelia melanogaster</named-content>) have been targeted through three different Y2H studies [<xref ref-type="bibr" rid="pcbi-0020079-b011">11</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b016">16</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b017">17</xref>], in the worm (<named-content content-type="genus-species" xlink:type="simple">Caenorhabditis elegans</named-content>) through one [<xref ref-type="bibr" rid="pcbi-0020079-b018">18</xref>], and a large subset of about 1,500 human protein network relations were detected through TAP [<xref ref-type="bibr" rid="pcbi-0020079-b019">19</xref>]. These data bear deeper insights into cellular processes.</p></sec><sec id="s1b"><title>Today's Data Are Incomplete and Not Fully Reliable</title><p>Y2H systems are not 100% accurate; they, for instance, identify many putative interactions that cannot be confirmed by other studies. One reason for false positives (interactions incorrectly postulated) is that the two proteins A and B may activate the reporter gene directly without having to interact [<xref ref-type="bibr" rid="pcbi-0020079-b003">3</xref>]. The Margalit group has estimated the false positive rate in high-throughput Y2H assays to be about 50% [<xref ref-type="bibr" rid="pcbi-0020079-b020">20</xref>]; the Eisenberg group has arrived at the same estimate through measuring the reliability of interactions in the Database of Interacting Proteins [<xref ref-type="bibr" rid="pcbi-0020079-b021">21</xref>]. Y2H experiments also do not achieve complete coverage, i.e., they miss many interactions. Conversely, false negatives (missed interactions) might result from the particular experimental setup (which may prevent the interaction between A and B) or from problems in the assembly of the two transcriptional domains (activation and DNA-binding) needed for Y2H. These problems do not prevent Y2H from evolving as one of the major experimental probes for interactions; they do, however, imply that today's data sets are neither complete nor fully accurate [<xref ref-type="bibr" rid="pcbi-0020079-b020">20</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b022">22</xref>]. One of the strong arguments in favor of large-scale Y2H experiments is that they are more systematic and much less driven by happenstance than hypothesis-driven, detailed experiments.</p></sec><sec id="s1c"><title>Known Interactions Are Expanded through Homology-Based Inference</title><p>Evolutionary connections help explain the rapid success of molecular biology: we can study a particular protein in a simple bacterium and learn about the function of the same protein in multicellular eukaryotes. This idea enables us to use model organisms to predict protein structure [<xref ref-type="bibr" rid="pcbi-0020079-b023">23</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b025">25</xref>], subcellular localization [<xref ref-type="bibr" rid="pcbi-0020079-b026">26</xref>], enzymatic activity [<xref ref-type="bibr" rid="pcbi-0020079-b027">27</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b029">29</xref>], and other aspects of protein function [<xref ref-type="bibr" rid="pcbi-0020079-b030">30</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b034">34</xref>]. The same principle is frequently applied to the extension of interactions (<xref ref-type="fig" rid="pcbi-0020079-g001">Figure 1</xref>): Assume that two proteins A and B are experimentally observed to bind in organism o, and that alignment methods identify related protein pairs in organism o (A′-B′) and in organism p (A″-B″). Can we infer that the pairs A′-B′ and A″-B″ also interact with each other? The Vidal group [<xref ref-type="bibr" rid="pcbi-0020079-b010">10</xref>] has investigated how yeast interactions detected by Ito [<xref ref-type="bibr" rid="pcbi-0020079-b035">35</xref>] and Uetz [<xref ref-type="bibr" rid="pcbi-0020079-b013">13</xref>] map to interactions in worm. They concluded that at BLAST E-values &lt;10<sup>−10</sup>, only 16%–30% of the yeast interactions are transferable [<xref ref-type="bibr" rid="pcbi-0020079-b036">36</xref>]; similar results were reported by the Gerstein group [<xref ref-type="bibr" rid="pcbi-0020079-b037">37</xref>]. Although homology inference is common practice, no large-scale study has ever estimated levels of accuracy and coverage for physical interactions. A particular aspect of this question relates to paralogs and orthologs. Two proteins are often considered as paralogs when they originate from the same organism and differ in function. Paralogs are assumed to have arisen from gene duplication followed by the specialization and drifting away of one of the copies, while the other copy has maintained its original function. Orthologs, on the other hand, are described as two proteins with largely identical function and a common ancestor that reside in different organisms [<xref ref-type="bibr" rid="pcbi-0020079-b037">37</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b039">39</xref>]. Applied to homology-based inference of interactions, a common assumption is that interactions are more conserved between orthologs than between paralogs [<xref ref-type="bibr" rid="pcbi-0020079-b040">40</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b042">42</xref>], i.e., interactions are more conserved between than within organisms. If true, model organisms would be ideal for the study of interactions.</p><fig id="pcbi-0020079-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.g001</object-id><label>Figure 1</label><caption><title>Concept of Homology Inference and Interologs</title><p>Interologs are two pairs of protein interactions that fulfill the following conditions: (A interacts with B) + (A is similar to A′) + (B is similar to B′) → (A′ interacts with B′). All quadruples (A, B, A′, B′) for which this relation is true are referred to as interologs [<xref ref-type="bibr" rid="pcbi-0020079-b037">37</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b079">79</xref>]. To illustrate our analysis, we have to extend this simple relation. Assume that a physical protein–protein interaction (PPI) between proteins A and B is observed in organism o. If A and B are both sequence similar (above a certain threshold) to two other proteins A′ and B′ in the same organism o, we should be able to infer the physical interaction between A′ and B′. Note that both pairs, A/A′ as well as B/B′, have to be above the particular similarity threshold for us to be able to make this inference. Thus, we neither use an average similarity of both pairs (A/A′ and B/B′) nor a minimum similarity for just one pair (A/A′ or B/B′). Now let us assume that we have another pair of proteins A″ and B″ in another organism p, and that both are as similar to A and B as are A′ and B′, respectively. One of our findings was that homology transfers A-B → A′-B′ were more reliable than those from A-B → A″-B″.</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.g001" alt-version="no" mimetype="image" position="float" xlink:type="simple"/></fig></sec><sec id="s1d"><title>Focus on Transient Physical Interactions (PPIs)</title><p>One important difference between Y2H and TAP is that while Y2H aims at the detection of physically interacting proteins, TAP identifies large groups of proteins that are associated, for instance, through a common pathway [<xref ref-type="bibr" rid="pcbi-0020079-b043">43</xref>]. Most high-throughput techniques resemble TAP in the sense that they reveal association rather than physical interaction. To illustrate this difference, assume we hypothesized that co-expressed proteins interact physically, and we wanted to use this hypothesis to predict physical interactions directly from co-expression data. Assume further that six proteins are strung together in a linear pathway (1 binds 2, 2 binds 3, etc.), and that all six are co-expressed. Of the 15 [N*(N − 1)/2] possible interactions, only 5 (N − 1) are physical, i.e., only 33% of the co-expressed proteins interact. Since most pathways involve many more than six interactions this example is likely to significantly underestimate the actual problem. In other words, even if all physically interacting proteins were co-expressed, predictions of interactions based on such association alone would still be more often wrong than right. This significantly constrains the way in which we can use association-type data to analyze physical interactions. In order to emphasize our focus on physical interactions, we used the abbreviation PPI for transient physical protein–protein interactions (as opposed to functional associations as measured by TAP-like data, and as opposed to permanent physical interactions between, e.g., two different domains or two different chains of the same protein [<xref ref-type="bibr" rid="pcbi-0020079-b044">44</xref>]).</p></sec><sec id="s1e"><title>Coping with the Dilemma of Incomplete Data Sets</title><p>How can we evaluate accuracy and coverage of homology transfer (<xref ref-type="fig" rid="pcbi-0020079-g001">Figure 1</xref>) of interactions if the data are incomplete? An extreme stance is to simply not assess the performance at all. The rationale is simple: assume a method inferred that A″ and B″ in <xref ref-type="fig" rid="pcbi-0020079-g001">Figure 1</xref> interacted without any experimental evidence for this interaction. May be the inference was wrong; it also may just have been a new <italic>in silico</italic> discovery not yet identified by experiments. If the set of all interactions were complete, the absence of an observation would imply noninteraction. Although there is currently no such complete set, we challenge that the performance of homology transfer has to be estimated somehow to render a tool that is controllable in the context of genome annotation pipelines. Here, we took the opposite radical stance by treating all interactions that have not been observed as nonexisting. While this is obviously wrong, we assume that today's incompleteness is not systematic. If true, our results will simply underestimate the quantities that we measured, but will correctly capture relative values (such as that homology transfer is half as accurate at ~40% sequence identity as at ~60%, <xref ref-type="fig" rid="pcbi-0020079-g002">Figure 2</xref>). We also did not merge data sets that measure functional association (e.g., TAP) with those that measure physical interaction (e.g., Y2H). Instead, we regarded only physical interactions as positives.</p><fig id="pcbi-0020079-g002" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.g002</object-id><label>Figure 2</label><caption><title>Sequence Conservation of PPIs</title><p>The performance of homology transfer was evaluated with the data sets in Experiment 1 (<xref ref-type="table" rid="pcbi-0020079-t004">Table 4</xref>). Each panel plots the conservation (accuracy of homology transfer) using a different measure for sequence similarity: HVAL (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>), PIDE (percentage pairwise sequence identity), and the PSI-BLAST E-value. It is surprising that even at high similarity thresholds (PIDE &gt; 50; HVAL &gt; 30), accuracy remained low and never reached levels of 20%. This behavior was partially explained by our overlap analysis: for low overlap (Equations 2 and 3) between datasets, we expect a low accuracy. Numbers at HVAL = 40 (which equals a PIDE of 68 at an alignment length of 100 residues) were marked with red lines. HVAL = 40 is the point, where the overlap-values (<xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref>) for two identical datasets seem to indicate a zone of &gt; 70% data consistency (see <xref ref-type="table" rid="pcbi-0020079-t003">Table 3</xref>). Error bars for the three plots were calculated by bootstrapping over the PPIs in the source datasets (see Methods section).</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.g002" alt-version="no" mimetype="image" position="float" xlink:type="simple"/></fig><fig id="pcbi-0020079-g003" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.g003</object-id><label>Figure 3</label><caption><title>Performance of Homology Transfer</title><p>Plots compiled for experiments 2–7 in <xref ref-type="table" rid="pcbi-0020079-t004">Table 4</xref>. Each of the upper three graphs stands for one particular organism o and shows two plots: (1) Use all known PPIs (large-scale and small-scale) of organism o to find Y2H large-scale detected PPIs in the same organism (but from different experiment, blue line). (2) Use all PPIs (large-scale and small-scale) of all other organisms (not o) to find PPIs detected by Y2H in o (red line). Only organisms with available Y2H datasets in IntAct were chosen in order to be able to create complete interaction matrices for the target datasets (yeast, worm, and fruit fly). All error bars were calculated through bootstrapping over the source PPIs (100 times, Methods). Some lines end at certain thresholds because the counts for true positives and false positives were too low (&lt; 30 true or false positives) to calculate accuracy (Equation 4, see <xref ref-type="sec" rid="s4">Materials and Methods</xref>, often also referred to as specificity or precision). <xref ref-type="supplementary-material" rid="pcbi-0020079-sg001">Figure S1</xref> shows the correlation between the size of the error bars and the counts of true positives at each HSSP-value cutoff. The three bottom plots show ROC-like curves, where accuracy is plotted versus coverage for the exact same data as for the three upper plots. The figures demonstrate that for all levels of similarity, the accuracy of intraspecies predictions of PPIs is significantly higher than for predictions across two organisms.</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.g003" alt-version="no" mimetype="image" position="float" xlink:type="simple"/></fig><fig id="pcbi-0020079-g004" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.g004</object-id><label>Figure 4</label><caption><title>Interspecies Failure and Intraspecies Success of Homology Transfer</title><p>(A) Same family, different ancestors, different PPI: Two yeast peroxisomal proteins (<italic>PEX1</italic> and <italic>PEX2</italic>) are closely related through their common ancestor protein and their function as AAA ATPases to the two yeast <italic>26S protease regulatory subunits 6A</italic> and <italic>6B</italic>. In the fruit fly, gene duplication of a second ancestor protein (the <italic>NSF</italic> ancestor) led to two distinct <italic>NSF</italic> proteins (<italic>NSF1</italic> and <italic>2</italic>). Since the ancestors for the NSFs (<italic>NSF1</italic> and <italic>2</italic>) and for the <italic>26S protease subunits</italic> were two different proteins, we conclude that despite their common biochemical function as ATPases, the different cellular functions of NSFs and 26S protease subunits also led to a distinct behavior with respect to protein–protein interactions. Therefore, neither <italic>NSF1</italic> nor <italic>NSF2</italic> were observed to bind to the <italic>26S protease subunit 4</italic>.</p><p>(B) Same pathway, different functions, different binding: Evolutionary plasticity in the <italic>chk2</italic> family led to a diverse range of functions of these proteins while staying in the same pathway. For example <italic>Rad53p</italic> in yeast is a main player in the cell cycle checkpoint during mitosis, whereas <italic>Mek1p</italic> acts in the same position during meiosis. Also, <italic>drosophila chk2</italic> and human <italic>chk2</italic> act at different times during the cell cycle different from <italic>Mek1p</italic> and <italic>Rad53p</italic>. No <italic>drosophila Pp1</italic> homolog in yeast was found to interact with either <italic>Mek1p</italic> or <italic>Rad53p</italic>, even though <italic>drosophila Pp1</italic> was shown to bind to <italic>drosophila</italic> chk2.</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.g004" alt-version="no" mimetype="image" position="float" xlink:type="simple"/></fig><p>Here, we presented the analysis of PPI in, to our knowledge, the largest data set investigated thus far. We defined and measured the overlap between different data sets, and analyzed the expected levels of accuracy and coverage for homology-based inference of PPIs depending on the level of sequence similarity. The most surprising finding originated from differentiating between intraspecies and interspecies inferences (o ≠ p in <xref ref-type="fig" rid="pcbi-0020079-g001">Figure 1</xref>), namely that PPIs are more conserved within than between organisms.</p></sec></sec><sec id="s2"><title>Results/Discussion</title><sec id="s2a"><title>Different Experiments Overlap Very Little</title><p>If we want to homology infer PPIs between organisms, we first have to measure the overlap within organisms and then between organisms. We introduced such a measure (<xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> and <xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref>, see <xref ref-type="sec" rid="s4">Materials and Methods</xref>) and applied it to assessing the overlap between datasets in IntAct [<xref ref-type="bibr" rid="pcbi-0020079-b045">45</xref>]. A large overlap value implies high agreement between two experimental sets of interactions. Our definition of overlap takes into account that two data sets may not have used the same proteins thereby rendering a score that is, in principle, independent of the size of common subsets (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section). The scores are straightforward when comparing different datasets within the same organism (<xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref>) because we only have to identify identical pairs of proteins. As noted before [<xref ref-type="bibr" rid="pcbi-0020079-b022">22</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b046">46</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b049">49</xref>], the data sets overlap maximally for about 30% of all PPIs in yeast (<italic>Saccharomyces Cerevisiae</italic>) and much less for PPIs in fly (<italic>Drosophila Melanogaster</italic>, <xref ref-type="table" rid="pcbi-0020079-t001">Table 1</xref>). Interspecies comparisons are trickier because we now have to identify the corresponding homologous pairs in the other organism. <xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref> solves this problem by counting homologous instead of identical pairs of proteins; it is applicable to intraspecies and interspecies comparisons. A consequence of counting homologous rather than identical protein pairs is that the same data set no longer overlaps 100% with itself (<xref ref-type="table" rid="pcbi-0020079-t002">Table 2</xref>), because the interaction between A and B may be detected while that between the homologs A′ and B′ may not be. The application of <xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref> to the intraspecies comparison for yeast and fly datasets yielded similar results as the application of <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> to the same datasets (<xref ref-type="table" rid="pcbi-0020079-t001">Table 1</xref>). The overlap between different yeast datasets seems to be generally higher than that between different fly datasets. Finally, we merged datasets of different large-scale experiments for each organism and compared these pseudo-complete PPIs between organisms by using <xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref> (<xref ref-type="table" rid="pcbi-0020079-t003">Table 3</xref>). As expected the overlap between organisms was increased with increasing thresholds in what was considered homologous (<xref ref-type="table" rid="pcbi-0020079-t003">Table 3</xref>; HSSP-value (HVAL)&gt;40 highest, HVAL&gt;0 lowest, <xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>; note that the HSSP value (homology derived secondary structure of proteins) is an empirical measure for sequence similarity that empirically embeds the simple fact that high levels of sequence similarity are less meaningful for short than they are for long alignments). This increase in overlap was achieved by finding fewer matches (<xref ref-type="table" rid="pcbi-0020079-t003">Table 3</xref>, empty cells). Conversely, the overlap was very low at levels of sequence similarity that mark the twilight zone of sequence-structure inference [<xref ref-type="bibr" rid="pcbi-0020079-b025">25</xref>], i.e., the line above which most pairs of proteins have largely similar structure (HVAL&gt;0, <xref ref-type="table" rid="pcbi-0020079-t003">Table 3</xref>). In other words, overall fold similarity does not suffice to infer similarity in interactions.</p><table-wrap id="pcbi-0020079-t001" content-type="1col" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.t001</object-id><label>Table 1</label><caption><p>Identity-Based Overlap (<xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref>) between Original Experimental Y2H Datasets from Fly and Yeast</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.t001" alt-version="no" mimetype="image" position="float" xlink:type="simple"/><!-- <table frame="hsides" rules="none"><colgroup><col id="tb1col1" align="left" charoff="0" char=""/><col id="tb1col2" align="left" charoff="0" char=""/><col id="tb1col3" align="left" charoff="0" char=""/><col id="tb1col4" align="left" charoff="0" char=""/></colgroup><thead><tr><td align="left"><hr/>Datasets</td><td colspan="3"><hr/>Overlap<sup>a</sup></td></tr></thead><tbody><tr><td><named-content content-type="genus-species">Saccharomyces cerevisiae</named-content> (yeast)</td><td>Ito &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b034">34</xref>&rsqb;</td><td>Uetz &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b013">13</xref>&rsqb;</td><td></td></tr><tr><td>Ito &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b034">34</xref>&rsqb;</td><td><bold>100</bold></td><td>27.0</td><td></td></tr><tr><td>Uetz &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b013">13</xref>&rsqb;</td><td>27.0</td><td><bold>100</bold></td><td></td></tr><tr><td></td><td></td><td></td><td></td></tr><tr><td><named-content content-type="genus-species">Drosophilia melanogaster</named-content> (fly)</td><td>Giot &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b017">17</xref>&rsqb;</td><td>Stanyon &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b016">16</xref>&rsqb;</td><td>Formstecher &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b011">11</xref>&rsqb;</td></tr><tr><td>Giot &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b017">17</xref>&rsqb;</td><td valign="middle"><bold>100</bold></td><td>3.3</td><td>5.4</td></tr><tr><td>Stanyon &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b016">16</xref>&rsqb;</td><td valign="middle">3.3</td><td><bold>100</bold></td><td>4.3</td></tr><tr><td>Formstecher &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b011">11</xref>&rsqb;</td><td valign="middle">5.4</td><td>4.3</td><td><bold>100</bold></td></tr></tbody></table> --><!-- <table-wrap-foot><fn id="nt101"><p><sup>a</sup>&thinsp;Overlap values are measured between two experimental data sets that have been filtered to account for the different sets of proteins used (Methods). All values compiled according to <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> in percentages.</p></fn></table-wrap-foot> --></table-wrap><table-wrap id="pcbi-0020079-t002" content-type="1col" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.t002</object-id><label>Table 2</label><caption><p>Homology-Based Overlap (<xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref>) between Original Experimental Y2H Datasets from Fly and Yeast</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.t002" alt-version="no" mimetype="image" position="float" xlink:type="simple"/><!-- <table frame="hsides" rules="none"><colgroup><col id="tb2col1" align="left" charoff="0" char=""/><col id="tb2col2" align="left" charoff="0" char=""/><col id="tb2col3" align="left" charoff="0" char=""/><col id="tb2col4" align="left" charoff="0" char=""/></colgroup><thead><tr><td align="left"><hr/>Datasets</td><td colspan="3"><hr/>Overlap<sup>a</sup></td></tr></thead><tbody><tr><td><named-content content-type="genus-species">Saccharomyces cerevisiae</named-content> (yeast)</td><td>Ito &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b034">34</xref>&rsqb;</td><td>Uetz &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b013">13</xref>&rsqb;</td><td></td></tr><tr><td>Ito &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b034">34</xref>&rsqb;</td><td><bold>70.2</bold></td><td>37.7</td><td></td></tr><tr><td>Uetz &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b013">13</xref>&rsqb;</td><td>37.7</td><td><bold>84.8</bold></td><td></td></tr><tr><td></td><td></td><td></td><td></td></tr><tr><td><named-content content-type="genus-species">Drosophilia melanogaster</named-content> (fly)</td><td>Giot &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b017">17</xref>&rsqb;</td><td>Stanyon &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b016">16</xref>&rsqb;</td><td>Formstecher &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b011">11</xref>&rsqb;</td></tr><tr><td>Giot &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b017">17</xref>&rsqb;</td><td valign="middle"><bold>53.5</bold></td><td>4.3</td><td>4.2</td></tr><tr><td>Stanyon &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b016">16</xref>&rsqb;</td><td valign="middle">4.3</td><td><bold>76.6</bold></td><td>7.5</td></tr><tr><td>Formstecher &lsqb;<xref ref-type="bibr" rid="pcbi-0020079-b011">11</xref>&rsqb;</td><td valign="middle">4.2</td><td>7.5</td><td><bold>73.2</bold></td></tr></tbody></table> --><!-- <table-wrap-foot><fn id="nt201"><p><sup>a</sup>&thinsp;All values compiled according to <xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref> in percentages; the minimal sequence similarity required to consider proteins from a different organism to be similar was HVAL &gt; 20 (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>) corresponding to 49&percnt; percentage sequence identity for 100 residue alignments. Overlap values for equal datasets can be smaller than 100&percnt; since homology rather than direct sequence matching is used (<xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref>). Here, we used a very weak constraint of HVAL &gt; 20 (corresponding to about 50&percnt; sequence identity for alignments over 100 residues).</p></fn></table-wrap-foot> --></table-wrap></sec><sec id="s2b"><title>Automatic Homology Transfer of PPIs Is Very Limited</title><p>We generated a homology performance plot (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section) by comparing an unbiased, nonredundant data set (no two pairs of proteins in the set had significant sequence similarity (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section) against the redundant set with all PPIs (note that we removed identical pairs even in this set, <xref ref-type="table" rid="pcbi-0020079-t004">Table 4</xref>, Experiment 1). When using the observed PPI between two proteins (A-B), we applied the same sequence similarity threshold to identify both homologs (A/A′, B/B′) to infer the PPI between A′-B′. Pairs such as A-B′ or A′-B were not counted because those pairs could only be detected within the same organism and not across two species. Not surprisingly, the accuracy of homology transfer was proportional to sequence similarity (<xref ref-type="fig" rid="pcbi-0020079-g002">Figure 2</xref>). However, accuracy dropped rapidly already at very high levels of sequence similarity (e.g., at ~80% pairwise sequence identity, and below position-specific iterative basic local alignment search tool expectation values [PSI-BLAST E-values] &lt; 10<sup>−150</sup>). Closer inspection of the HSSP formula (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>) reveals that the curves for HSSP values and percentage sequence identity were very similar to each other. The problem with E-values largely originated from including short alignments, i.e., many of the proteins identified at very significant E-values (E &lt; 10<sup>−50</sup>) might have been aligned to only small fractions of the source protein. This is a known limitation of E-values that cannot easily be normalized away because PPI interfaces may be rather short (i.e., even alignments of 20 residues in very long proteins may correctly reflect binding similarity). Although the small overlap between experimental data sets (<xref ref-type="table" rid="pcbi-0020079-t003">Table 3</xref>) suggested that these estimates for accuracy at a given similarity threshold were most likely overpessimistic, the overlap scores also showed that at HVAL &gt; 40, the consistency of the data was above 70% (<xref ref-type="table" rid="pcbi-0020079-t003">Table 3</xref>). Therefore, our estimates at such high thresholds might be approximately correct; if so, the accuracy of homology transfer for high similarity (HVAL &gt; 40, Percentage sequence IDEntity (PIDE) &gt; 70) were just over 10% (<xref ref-type="fig" rid="pcbi-0020079-g002">Figure 2</xref>). Clearly, our findings suggested that automatic homology-based inferences of PPIs have to be taken with extreme caution.</p></sec><sec id="s2c"><title>Homology Transfer Is Better within than between Organisms</title><p>Arguably [<xref ref-type="bibr" rid="pcbi-0020079-b040">40</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b042">42</xref>], homology transfer is expected to be slightly better between organisms than within organisms. Instead, we observed the extreme opposite (<xref ref-type="fig" rid="pcbi-0020079-g003">Figure 3</xref>): at all levels of sequence similarity, and for all organisms with sufficient data, homology-inference was significantly more accurate for pairs of homologs from the same organism (intraspecies) than for pairs of homologs between different organisms (interspecies). In other words, if we experimentally observed the interaction between A and B in yeast, and if we found another pair of similar proteins A′ and B′ in yeast (not A-B′ or A′-B), as well as another pair A″ and B″ in fruit fly, then the interactions between A′ and B′ would be much more likely than those between A″ and B″. Consequently, yeast would be a rather poor model organism for the interaction network in fly.</p><p><xref ref-type="table" rid="pcbi-0020079-t004">Table 4</xref> and <xref ref-type="fig" rid="pcbi-0020079-g002">Figures 2</xref> and <xref ref-type="fig" rid="pcbi-0020079-g003">3</xref> clearly establish our main messages that intraspecies homology transfer is more accurate than interspecies transfer and that homology transfer is accurate only at unexpectedly high levels of sequence similarity. These results were stable with respect to different ways of processing the data for the experimental interactions. Changes that influenced the outcome insignificantly included the following alternatives.</p></sec><sec id="s2d"><title>Results Were Stable with Respect to Details in Filtering Data</title><p>(1) Different sampling of intraspecies vs. interspecies: We allowed transfers of the type A-B to A′-B or A-B to A-B′ (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section). The performance became significantly better for intraspecies PPI transfers, thus further widening the gap between intraspecies and interspecies transfers (<xref ref-type="supplementary-material" rid="pcbi-0020079-sg002">Figure S2</xref>A). (2) Inclusion of transfers within the same data set: we included homology transfers within the same experimental dataset (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section). The effect was very similar to those observed for different sampling (see #1), i.e., the gap was widened between intraspecies and interspecies inferences (<xref ref-type="supplementary-material" rid="pcbi-0020079-sg002">Figure S2</xref>B). (3) We used TAP-like data (<xref ref-type="supplementary-material" rid="pcbi-0020079-st001">Table S1</xref>) as a constraint for the negatives. To illustrate this, assume that TAP pulled down a complex of six proteins. While we cannot infer that all 15 possible interactions are physical, all could be. Therefore, we ignored a false positive prediction (i.e., we did not count it) if we could find the interaction in those 15 TAP protein–protein pairs. The accuracy slightly increased for both yeast versus yeast (intraspecies) comparisons as well as for nonyeast versus yeast (interspecies) comparisons (<xref ref-type="supplementary-material" rid="pcbi-0020079-sg002">Figure S2</xref>C). Note that yeast is the only organism with available TAP-like data. (4) We used a redundant dataset (instead of a nonredundant, bias-reduced set) from organism o (<xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>) to hunt for interologs in organism p (<xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>). The main message indicated by the results for this latter experiment stays the same as in our original procedure (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section): Intraspecies comparisons are more accurate than interspecies comparisons. Because there were more samples in the dataset for organism o (<xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>) and thus higher counts, the errors slightly decreased (<xref ref-type="supplementary-material" rid="pcbi-0020079-sg002">Figure S2</xref>D).</p></sec><sec id="s2e"><title>Examples</title><p>In the following, we presented a few representative examples that illustrate these points with more details than it is possible through averages over large data sets. Both show how homology transfer fails across species while it succeeds within an organism (Ao-Bo observed, A′o-B′o observed, A″m-B″m not observed).</p><sec id="s2e1"><title>Example 1: same family, different ancestors, different PPI.</title><p>The two peroxins <italic>PEX1</italic> and <italic>PEX6</italic> are known to functionally and physically interact in both human [<xref ref-type="bibr" rid="pcbi-0020079-b050">50</xref>] and yeast [<xref ref-type="bibr" rid="pcbi-0020079-b051">51</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b053">53</xref>] (<xref ref-type="fig" rid="pcbi-0020079-g004">Figure 4</xref>A). A particular mutation in human <italic>PEX1</italic> disrupts the interaction with <italic>PEX6</italic>, and appears directly linked to the Zellweger Syndrome, an autosomal, recessive peroxisome biogenesis disorder, in which the growth of the myelin sheath (the fatty cover of nerve cells in the brain) is strongly affected. Patients usually suffer from visual disturbances, high iron and copper blood levels, and enlarged livers [<xref ref-type="bibr" rid="pcbi-0020079-b053">53</xref>]. Both proteins <italic>PEX1</italic> and <italic>PEX6</italic> belong to the ATPases associated with various cellular activities (AAA) family and are involved in the import of proteins into the peroxisome [<xref ref-type="bibr" rid="pcbi-0020079-b052">52</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b053">53</xref>]. Thereby, the complex of <italic>PEX1</italic> and <italic>PEX6</italic> is associated with the cytoplasmic side of the peroxisomal membrane [<xref ref-type="bibr" rid="pcbi-0020079-b051">51</xref>]. Searching for proteins that are sequence-similar to <italic>PEX1</italic> and <italic>PEX6</italic> within yeast at an HVAL &gt; 20 (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>, see <xref ref-type="sec" rid="s4">Materials and Methods</xref>) brought up two <italic>26S protease regulatory subunits, 6A</italic> and <italic>6B</italic> (proteins A′o and B′o); experts have also classified both these yeast proteins as AAA ATPases (<xref ref-type="fig" rid="pcbi-0020079-g004">Figure 4</xref>A). The interaction between these two yeast proteins was surprisingly found in all Y2H large scale protein–protein interaction scans [<xref ref-type="bibr" rid="pcbi-0020079-b013">13</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b015">15</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b035">35</xref>]. Using the same threshold (HVAL &gt; 20) the closest proteins in fly were the <italic>26S protease subunit 4</italic> and the <italic>NEM-sensitive fusion protein 2</italic> (<italic>NSF2</italic>) (<xref ref-type="fig" rid="pcbi-0020079-g004">Figure 4</xref>A). The latter<italic>—NSF2</italic>— is a special form of the <italic>NEM-sensitive fusion protein 1</italic> (<italic>NSF1</italic>) and is fly-specific in the sense that it does not exist in yeast, worm, or human [<xref ref-type="bibr" rid="pcbi-0020079-b054">54</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b056">56</xref>]. An interaction between <italic>26S protease subunit 4</italic> and <italic>NSF2</italic> was not found in any of our PPI <italic>drosophila</italic> datasets, nor has it been reported in the literature. <italic>NSF2</italic> is, among other things, responsible for exocytose through vesicle fusion by disassembling the postfusion SNARE protein complexes [<xref ref-type="bibr" rid="pcbi-0020079-b054">54</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b057">57</xref>]. Like the other <italic>PEX1</italic> and <italic>PEX6</italic> relatives discussed so far, <italic>NSF2</italic> is also an ATPase [<xref ref-type="bibr" rid="pcbi-0020079-b054">54</xref>]. A detailed phylogenetic analysis of all proteins in the AAA family has suggested three major subfamilies, one with NSF homologs (<italic>NSF1</italic> and <italic>2</italic>), one with the <italic>26S protease subunits</italic>, and a third with <italic>p97/Cdc48p</italic> homologs [<xref ref-type="bibr" rid="pcbi-0020079-b056">56</xref>]. Most importantly these three subfamilies apparently did not arise from a common ancestor but rather, they evolved independently during speciation [<xref ref-type="bibr" rid="pcbi-0020079-b056">56</xref>].</p><table-wrap id="pcbi-0020079-t003" content-type="1col" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.t003</object-id><label>Table 3</label><caption><p>Homology-Based Overlap (<xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref>) between Merged Datasets for Different Similarity Thresholds</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.t003" alt-version="no" mimetype="image" position="float" xlink:type="simple"/><!-- <table frame="hsides" rules="none"><colgroup><col id="tb3col1" align="left" charoff="0" char=""/><col id="tb3col2" align="left" charoff="0" char=""/><col id="tb3col3" align="left" charoff="0" char=""/><col id="tb3col4" align="left" charoff="0" char=""/></colgroup><thead><tr><td align="left">Datasets</td><td colspan="3"><hr/>Overlap</td></tr><tr><td><hr/></td><td><hr/>Yeast (<named-content content-type="genus-species">Saccharomyces cerevisiae</named-content>)</td><td><hr/>Fly (<named-content content-type="genus-species">Drosophilia melanogaster</named-content>)</td><td><hr/>Worm (<named-content content-type="genus-species">Caenorhabditis elegans</named-content>)</td></tr></thead><tbody><tr><td>HVAL &gt; 0</td><td valign="middle"></td><td></td><td></td></tr><tr><td>&emsp;Yeast</td><td valign="middle"><bold>11.3</bold></td><td>0.5</td><td>0.8</td></tr><tr><td>&emsp;Fly</td><td valign="middle">0.5</td><td><bold>1.5</bold></td><td>0.8</td></tr><tr><td>&emsp;Worm</td><td valign="middle">0.8</td><td>0.8</td><td><bold>7.9</bold></td></tr><tr><td></td><td valign="middle"></td><td></td><td></td></tr><tr><td>HVAL &gt; 20</td><td valign="middle"></td><td></td><td></td></tr><tr><td>&emsp;Yeast</td><td valign="middle"><bold>65.5</bold></td><td>9.2</td><td>13.2</td></tr><tr><td>&emsp;Fly</td><td valign="middle">9.2</td><td><bold>44.9</bold></td><td>5.1</td></tr><tr><td>&emsp;Worm</td><td valign="middle">13.2</td><td>5.1</td><td><bold>69.7</bold></td></tr><tr><td></td><td valign="middle"></td><td></td><td></td></tr><tr><td>HVAL &gt; 40</td><td valign="middle"></td><td></td><td></td></tr><tr><td>&emsp;Yeast</td><td valign="middle"><bold>82.6</bold></td><td>&mdash;</td><td>&mdash;</td></tr><tr><td>&emsp;Fly</td><td valign="middle">&mdash;</td><td><bold>75.5</bold></td><td>13.8</td></tr><tr><td>&emsp;Worm</td><td valign="middle">&mdash;</td><td>13.8</td><td><bold>88.8</bold></td></tr></tbody></table> --><!-- <table-wrap-foot><fn id="nt301"><p>A &mdash; in the table means that the overlap cannot be calculated due to the nonexistence of any shared homologous proteins between the two sets at the given HVAL (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>). Note that for proteins of ~100 residues HVAL &gt; 40 correspond to about 73&percnt; pairwise sequence identity, HVAL &gt; 20 to &gt; 53&percnt;, and HVAL &gt; 0 to &gt; 33&percnt;.</p></fn></table-wrap-foot> --></table-wrap><table-wrap id="pcbi-0020079-t004" content-type="1col" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.t004</object-id><label>Table 4</label><caption><p>Datasets Used for Homology Performance Plots</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.t004" alt-version="no" mimetype="image" position="float" xlink:type="simple"/><!-- <table frame="hsides" rules="none"><colgroup><col id="tb4col1" align="left" charoff="0" char=""/><col id="tb4col2" align="left" charoff="0" char=""/><col id="tb4col3" align="left" charoff="0" char=""/></colgroup><thead><tr><td rowspan="2" align="left"><hr/>Experiment (Figure)</td><td colspan="2"><hr/>Datasets</td></tr><tr><td><hr/>Organism o<sup>a</sup></td><td><hr/>Organism p<sup>b</sup></td></tr></thead><tbody><tr><td>1(2)</td><td>All</td><td>All</td></tr><tr><td></td><td></td><td></td></tr><tr><td>2(3)</td><td>All fly</td><td valign="middle">All fly</td></tr><tr><td>3(3)</td><td>All nonfly</td><td valign="middle">All fly</td></tr><tr><td></td><td></td><td></td></tr><tr><td>4(3)</td><td>All worm</td><td valign="middle">All worm</td></tr><tr><td>5(3)</td><td>All nonworm</td><td valign="middle">All worm</td></tr><tr><td></td><td></td><td></td></tr><tr><td>6(3)</td><td>All yeast</td><td valign="middle">All yeast</td></tr><tr><td>7(3)</td><td>All nonyeast</td><td valign="middle">All yeast</td></tr></tbody></table> --><!-- <table-wrap-foot><fn id="nt401"><p>Organisms o and p are equal for some experiments. Datasets of o have to be nonredundant and can be either small-scale or high-throughput Y2H datasets (no TAP-like data). Datasets of organism p are redundant and have to be Y2H generated in order to guarantee a complete interaction matrix. TAP-like interactions were not used as true positives. Every single graph in <xref ref-type="fig" rid="pcbi-0020079-g003">Figure 3</xref> shows the results of two experiments from <xref ref-type="table" rid="pcbi-0020079-t004">Table 4</xref> (grouped into organisms). Note that for all listed experiments, comparisons between identical datasets were omitted. For example, for experiment 6 in <xref ref-type="table" rid="pcbi-0020079-t004">Table 4</xref>, this means that interactions from <italic>yeast-Ito-2001</italic> (organism o) will not be compared to any other interactions from this dataset in organism p (which in this case is equal to organism o).</p></fn><fn id="nt402"><p><sup>a</sup>&thinsp;Nonredundant; No TAP-like data; PPIs</p></fn><fn id="nt403"><p><sup>b</sup>&thinsp;Redundant; High-Throughput; TAP, tandem affinity purification; PPI, protein&ndash;protein interaction</p></fn></table-wrap-foot> --></table-wrap><p>This particular example illustrated how yeast may generally be a rather poor model organism for more complex species such as fly, worm or vertebrates. Proteins from these higher eukaryotes have to perform many different tasks in often highly specialized cell types (e.g., nerve cells). This might have lead to an evolutionary pressure to build new protein-interaction networks from the available protein building blocks (e.g., ATPase function). Thus, by only slightly altering the existing sequences, new binding properties were added to these proteins, while others were lost. A similar argument could be used to explain a likely poor homology transfer between fly and human or worm and human.</p></sec><sec id="s2e2"><title>Example 2: same pathway, different functions, different binding properties.</title><p>The <italic>drosophila Ser/Thr protein phosphatase 4</italic> (<italic>Pp4</italic>) and the <italic>cyclin dependent kinase 4</italic> (<italic>Cdk4</italic>) were found in our small-scale dataset for <italic>drosophila</italic> PPIs. At HVAL&gt;20, we found two sequence-similar proteins in fly, namely <italic>Ser/Thr protein phosphatase alpha 2</italic> (<italic>Pp1</italic>) similar to <italic>Pp4</italic>, and <italic>chk2</italic> similar to <italic>Cdk4</italic>; both these fly proteins (<italic>Pp1</italic> and <italic>chk2</italic>) have been shown to interact [<xref ref-type="bibr" rid="pcbi-0020079-b016">16</xref>]. Fly <italic>chk2</italic> as well as its sequence relatives in yeast (<italic>Mek1p</italic> and <italic>Rad53p</italic>) and human are involved in cell-cycle checkpoints, which are signal transduction pathways that control the cell cycle and prevent the cell from further replication if the DNA double strand breaks, the DNA is incompletely replicated, or in case of other DNA damages [<xref ref-type="bibr" rid="pcbi-0020079-b058">58</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b060">60</xref>]. A checkpoint can halt an ongoing mitosis or meiosis or even terminate it and induce apoptosis. A phylogenetic analysis of the <italic>chk2</italic> family members found that fly <italic>chk2</italic> and its yeast and human homologs stem from the same ancestor (<xref ref-type="fig" rid="pcbi-0020079-g004">Figure 4</xref>B). Nevertheless, it is also known that this family of proteins has a rather strong evolutionary plasticity in terms of the particular tasks of its members [<xref ref-type="bibr" rid="pcbi-0020079-b060">60</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b061">61</xref>]. For example in yeast, <italic>Mek1p</italic> only controls the meiotic pachytene checkpoint by making sure that only homologous chromosomes recombine with each other [<xref ref-type="bibr" rid="pcbi-0020079-b061">61</xref>], whereas yeast <italic>Rad53p</italic> controls mitotic cell replication and does not seem to be required for meiotic checkpoint control at all [<xref ref-type="bibr" rid="pcbi-0020079-b060">60</xref>]. Also, the timing within the cell cycle is different for yeast <italic>Rad53p</italic> and its <italic>drosophila</italic> ortholog <italic>chk2</italic> [<xref ref-type="bibr" rid="pcbi-0020079-b060">60</xref>]. This plasticity in the chk2 family might explain why many yeast proteins homologous to <italic>drosophila Pp1</italic> were not found to interact with either <italic>Rad53p</italic> or <italic>Mek1p</italic>.</p></sec></sec><sec id="s2f"><title>Sequence-Based Homology Transfer Is Limited Although Binding Sites Are Partially Conserved in Three-Dimensional (3-D) Structure</title><p>Recently, the Sali group analyzed the conservation of protein–protein binding sites on homologous and structurally aligned protein surfaces. They found that the differences in the localization of binding sites between homologous proteins are significantly smaller than the differences expected at random [<xref ref-type="bibr" rid="pcbi-0020079-b062">62</xref>]. On the one hand, this result is similar to what we found for higher levels of similarity (<xref ref-type="fig" rid="pcbi-0020079-g003">Figure 3</xref>). On the other hand of very little similarity the difference between the 3-D–based results and ours lie most likely in the additional constraints implicitly used by the Sali group, namely that we know the 3-D structures and that we can focus in our alignment on all residues in the binding site. Using only sequence information, we cannot do this because binding residues close in 3-D may be separated considerably in sequence, thereby diluting the pattern of conservation picked up by alignment methods. However, for most PPIs from IntAct, we can neither label the binding site, nor do we have 3-D structural information. Therefore, we are limited to having to measure overall sequence similarity. If we were able to predict binding sites [<xref ref-type="bibr" rid="pcbi-0020079-b063">63</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b066">66</xref>], we might improve homology transfer considerably.</p></sec></sec><sec id="s3"><title>Conclusions</title><p>As demonstrated again by our overlap measure, today's datasets of PPIs are still rather inconsistent (<xref ref-type="table" rid="pcbi-0020079-t001">Tables 1</xref>–<xref ref-type="table" rid="pcbi-0020079-t003">3</xref>). The discrepancies were significantly smaller between yeast than between fly datasets (<xref ref-type="table" rid="pcbi-0020079-t001">Tables 1</xref> and <xref ref-type="table" rid="pcbi-0020079-t002">2</xref>). This finding also explains the much higher accuracy for intrayeast as opposed to intrafly or intraworm transfer. Why datasets of yeast appear more consistent than those of fly datasets remains speculation. One reason might be that measurements of protein–protein interactions are performed within yeast (Y2H) and are thus more precise for yeast proteins than for other species′ proteins, since those might behave differently in the unfamiliar yeast cell. Although incomplete and not fully consistent, PPI datasets are finally large enough to validate quantitative analyses. In particular, this enables a large-scale assessment of the performance of automated homology transfer for PPIs. Assuming that today's errors are largely nonsystematic, estimates for the performance of homology transfer will provide correct qualitative pictures, albeit the actual numbers will be overpessimistic. In the extreme regimen of comparing very similar pairs of proteins, we could establish that data sets appeared very consistent (<xref ref-type="fig" rid="pcbi-0020079-g002">Figure 2</xref>). Consequently, our estimates for the performance of homology transfer were likely to be relatively reliable in this regimen. Nevertheless, even for very high similarity, automated homology transfer was often mistaken; it approached random when approaching the sequence-structure twilight zone, i.e. the region in which sequence similarity no longer implies 3-D similarity (<xref ref-type="fig" rid="pcbi-0020079-g003">Figure 3</xref>). Although many interactions observed in one organism were not observed in another, similar interactions in the same organism (at similar levels of sequence similarity) were often observed (<xref ref-type="fig" rid="pcbi-0020079-g003">Figure 3</xref>). Consequently, our results challenge that using homology to transfer a protein–protein interaction from one organism to another is more difficult and less accurate than a transfer within the same species. This implies that distant model organisms have a limited value to unravel protein networks. We showed that these results are stable even when making major changes to the ways in which we analyzed the experimental data. Whether we used high- or low-confidence data, whether we allowed for same-set PPI transfers or not, whether we reduced bias or not, whether or not we filtered the negatives by TAP-like data about putative physical interactions, whether or not we restricted our analysis to limited inferences per family, we always observed the same: PPIs are more conserved within than across species. This discrepancy between intraspecies and interspecies conservation of interologs was valid for all levels of sequence similarity. Finally, we tested the ability of homology transfers to predict another functional annotation and then compared the performances of interspecies versus intraspecies comparisons thereof. We chose subcellular localization as an easily extractable and available protein feature. By using a list of proteins annotated for subcellular localizations from UniProt [<xref ref-type="bibr" rid="pcbi-0020079-b067">67</xref>], we could show that there is no significant difference in performances for interspecies versus intraspecies homology transfers for this particular feature.</p></sec><sec id="s4"><title>Materials and Methods</title><sec id="s4a"><title>Data sets.</title><p>Several publicly available databases such as GRID [<xref ref-type="bibr" rid="pcbi-0020079-b068">68</xref>], BIND [<xref ref-type="bibr" rid="pcbi-0020079-b069">69</xref>], MINT [<xref ref-type="bibr" rid="pcbi-0020079-b070">70</xref>], and DIP [<xref ref-type="bibr" rid="pcbi-0020079-b071">71</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b072">72</xref>] gather information about interacting proteins in different organisms. For our analysis, we used the IntAct database [<xref ref-type="bibr" rid="pcbi-0020079-b045">45</xref>], a protein–protein interaction resource maintained at the European Bioinformaics Institute (EBI) in Cambridge (<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/intact/" xlink:type="simple">http://www.ebi.ac.uk/intact/</ext-link>). IntAct uses the PSI format (extended markup language (XML)-tagged) to store data [<xref ref-type="bibr" rid="pcbi-0020079-b073">73</xref>], fly [<xref ref-type="bibr" rid="pcbi-0020079-b012">12</xref>–<xref ref-type="bibr" rid="pcbi-0020079-b015">15</xref>], fly [<xref ref-type="bibr" rid="pcbi-0020079-b011">11</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b016">16</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b017">17</xref>], worm [<xref ref-type="bibr" rid="pcbi-0020079-b018">18</xref>] and human [<xref ref-type="bibr" rid="pcbi-0020079-b019">19</xref>] as well as about 30 so called small-scale datasets, which are collections of results from many detailed experiments for different organisms. The largest small-scale dataset is that of human with about 38,000 interactions. Concerning the high-throughput datasets, IntAct carries detailed information about which proteins were used as baits and which proteins were used as preys, so that a complete interaction matrix can easily be reconstructed from these sets. <xref ref-type="supplementary-material" rid="pcbi-0020079-st001">Table S1</xref> contains all protein–protein interaction datasets deposited in IntAct at the moment along with links to these datasets (small-scale and large-scale). The Giot [<xref ref-type="bibr" rid="pcbi-0020079-b017">17</xref>], Ito [<xref ref-type="bibr" rid="pcbi-0020079-b035">35</xref>], and Li [<xref ref-type="bibr" rid="pcbi-0020079-b018">18</xref>] datasets contain some information about the level of confidence that was assigned to each interaction. For these three sets, we excluded everything from our analysis that either had a confidence-value of less than 0.4 (Giot: values range from 0 to 1) or those that were not in a so called “core” dataset of trusted interactions (Ito and Li divide their sets into core and full or core and noncore subsets, where core means a higher confidence in the measured interaction). Note that for the initial submission of this manuscript we had compiled all results for unfiltered data sets, i.e., we had included all experimental interactions; the results were qualitatively identical to those given here (data not shown).</p></sec><sec id="s4b"><title>True positives and false negatives: focus on Physical Interactions = PPIs.</title><p>Technically, we realized our goal of exclusively focusing on PPIs through the particular way of labeling positives and negatives. We labeled as positives (true PPIs) only those pairs that were identified by experiments that target the detection of physical interactions (only Y2H experiments).</p><p>We then also assumed that these data for each organism was complete, i.e., we labeled all pairs as negatives that were not detected by Y2H.</p></sec><sec id="s4c"><title>Measuring sequence similarity/homology.</title><p>The term homology usually implies an evolutionary relation in the sense of having a common ancestor. Strictly speaking, we cannot measure homology. Instead, alignment methods measure sequence similarity in some way or other. In our work the ranges of similarity were so high that the pairs of proteins were most likely homologous. We used BLAST and PSI-BLAST [<xref ref-type="bibr" rid="pcbi-0020079-b074">74</xref>] to align all protein sequences in IntAct against each other (standard procedure [<xref ref-type="bibr" rid="pcbi-0020079-b075">75</xref>]: 3 iterations at E&lt;10-<sup>10</sup> against filtered database of all proteins to build clean profiles, then one run with frozen profile against unfiltered database at E &lt; 10<sup>−3</sup>, freeze profile again and run against all IntAct proteins). Then we extracted the PSI-BLAST E-values for each alignment, as well as the percentage of sequence identity (PIDE) and the distance to the HSSP curve, i.e. the HSSP-value [<xref ref-type="bibr" rid="pcbi-0020079-b025">25</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b076">76</xref>,<xref ref-type="bibr" rid="pcbi-0020079-b077">77</xref>] (HVAL). The HVAL is defined as:
						<disp-formula id="pcbi-0020079-e001">
							<graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.e001" position="anchor" alt-version="no" mimetype="image" xlink:type="simple"/>
							<!-- <mml:math display='block'><mml:mrow><mml:mi>H</mml:mi><mml:mi>V</mml:mi><mml:mi>A</mml:mi><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:mi>D</mml:mi><mml:mi>E</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo>&equals;</mml:mo><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:mi>D</mml:mi><mml:mi>E</mml:mi><mml:mo>&minus;</mml:mo><mml:mrow><mml:mo stretchy='true'>&lcub;</mml:mo> <mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>100</mml:mn></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>for</mml:mtext><mml:mspace width="2pt"/><mml:mi>L</mml:mi><mml:mo>&le;</mml:mo><mml:mn>11</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>480</mml:mn><mml:mo>&sdot;</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mrow><mml:mo>&minus;</mml:mo><mml:mn>0.32</mml:mn><mml:mo>&sdot;</mml:mo><mml:mrow><mml:mo>&lcub;</mml:mo> <mml:mrow><mml:mn>1</mml:mn><mml:mo>&plus;</mml:mo><mml:mi>exp</mml:mi><mml:mo></mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo>&minus;</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mn>1000</mml:mn></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&rcub;</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mtd><mml:mspace width="-5pt"/><mml:mtd columnalign='left'><mml:mrow><mml:mtext>for</mml:mtext><mml:mspace width="2pt"/><mml:mi>L</mml:mi><mml:mo>&le;</mml:mo><mml:mn>450</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>19.5</mml:mn></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>for</mml:mtext><mml:mspace width="2pt"/><mml:mi>L</mml:mi><mml:mspace width="2.5pt"/><mml:mo>&gt;</mml:mo><mml:mspace width="1.5pt"/><mml:mn>450</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow> </mml:mrow></mml:mrow></mml:math> -->
						</disp-formula>where L was the number of residues aligned between two proteins, and PIDE the percentage of pairwise identical residues. HSSP values consider both pairwise sequence identity and alignment length: the higher the value the more similar two proteins. Values around 0 typically imply that two proteins have similar 3-D structures and correspond to about 22% pairwise sequence identity at alignment lengths above 250 residues.
					</p></sec><sec id="s4d"><title>Nonredundant data sets.</title><p>We removed bias from PPI datasets by the following procedure (<xref ref-type="fig" rid="pcbi-0020079-g005">Figure 5</xref>). (1) Move down a list L of PPIs starting with pair A-B. (2) Group all interactions in this list into clusters of similar PPIs. Consider two distinct PPIs as similar only if both partners of the first interaction are homologs to the respective protein in the second interaction. For instance, let A′ be a homolog of A, and B′ be a homolog of B. Then all interactions A′-B, A′-B′, and A-B′ will fall into the same group as the interaction A-B. Note that this also means that any interaction A-C will not end up in this group if C is not a homolog of B. Here, we used a very conservative criterion for homolog, namely HVAL &gt; 0 (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>). This threshold is conservative in the sense that it will also remove nonredundant pairs, i.e., many proteins that are actually not homologs. (3) Reduce each group formed in step 2 to one single representative PPI. (4) Continue working with the final unique (nonredundant) dataset.</p><fig id="pcbi-0020079-g005" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.g005</object-id><label>Figure 5</label><caption><title>Creating Sequence-Unique PPI sets</title><p>(1) Starting with a dataset of PPIs, we first cluster the data according to sequence similarity (apply a certain homology threshold) into sequence similar PPIs (2). Note here that the interactions A′-B′ and A′-C′ do not fall into the same cluster because B′ and C′ are unrelated. Thus, for two interactions (e.g., A-B and A′-B′) to be considered similar by our algorithm, both interacting proteins (A and B) have to be homologous to the two proteins of the other interaction (A has to be similar to A′ and B has to be similar to B′). (3) We randomly throw out all redundant interactions in each cluster so that only one PPI remains as a representative of each cluster. (4) Those representatives constitute the final unique dataset of PPIs.</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.g005" alt-version="no" mimetype="image" position="float" xlink:type="simple"/></fig></sec><sec id="s4e"><title>Identity- and homology-based overlap between datasets.</title><p>We defined two procedures resembling the Jaccard correlation to measure the overlap between two different datasets of PPIs in IntAct. <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> defines the first measure; for clarity we refer to this measure as the identity-based overlap. This measure can only be applied to two PPI sets from the same organism.
						<disp-formula id="pcbi-0020079-e002">
							<graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.e002" position="anchor" alt-version="no" mimetype="image" xlink:type="simple"/>
							<!-- <mml:math display='block'><mml:mrow><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&equals;</mml:mo><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&plus;</mml:mo><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mi>x</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:math> -->
						</disp-formula>where <italic>PPI</italic>(<italic>MandN</italic>) is the number of PPIs that were detected in both sets (common PPIs) and <italic>PPI</italic>(<italic>MxorN</italic>) is the number of PPIs that were only detected in one of the two datasets (exclusive or). <xref ref-type="fig" rid="pcbi-0020079-g006">Figure 6</xref>A describes this procedure. Note that only those interactions contributed to the count of <italic>PPI</italic>(<italic>MxorN</italic>) that could possibly have been detected in both datasets. For example, if the PPI A-B is detected in dataset 1, but not in dataset 2, we only increase <italic>PPI</italic>(<italic>MxorN</italic>) by one, if A and B were both included in dataset 2. In other words, we completely ignored interactions A-B in one dataset, if either A, or B (or both) were not present in the other dataset. Given this definition (<xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref>), an overlap value of 0.5 means that every second PPI of dataset 1 is not present in dataset 2. Inversely, every second PPI from dataset 2 cannot be found in dataset 1. Furthermore, applying <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> to calculate the overlap of one dataset with itself always results in 1 (100% overlap).
					</p><fig id="pcbi-0020079-g006" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.g006</object-id><label>Figure 6</label><caption><title>Ways of Calculating the Overlap between Two Y2H Datasets</title><p>(A) Identity-based overlap between Datasets 1 and 2 according to <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref>. Note that we can only calculate this score if both datasets are from the same organism. Starting with the observed interaction C-E in Dataset 1, we are trying to find the exact same interaction in Dataset 2. The following situations might occur: (a) C and E are also observed to interact in Dataset 2. (b) C and E are not observed to interact in Dataset 2. (c) It is impossible for C and E to be interacting in Dataset 2 due to either of these two reasons: (i) Either C or E are not part of Dataset 2 or (ii) C and E are either both used as preys or both used as baits in Dataset 2. Repeating the above procedure for all other observed interactions in Datasets 1 and 2, we finally calculate the identity-based overlap by dividing the number of common interactions found in Datasets 1 and 2 by the total number of expected interactions (observed and not-observed).</p><p>(B) The same procedure as described above is applied to the two Datasets 1 and 3, which are now allowed to be from different organisms. The only difference to <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> (A) is the usage of homology for comparing two PPIs instead of a binary decision scheme (PPIs identical or not-identical). Thus, starting with the interaction D-E from Dataset 1, we try to find possible homologous interactions (not only the identical PPI) in Dataset 3. The only two options in this example are D-E and D′-E (Dataset 3), which in our example are both observed in Dataset 3. Iterating through all observed interactions of Datasets 1 and 3 and summing up the expected interactions and the overlapping homologous interactions, we can then calculate the homology-based overlap (<xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref>). Note that any results from <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> are not comparable to any results from <xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref>.</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.g006" alt-version="no" mimetype="image" position="float" xlink:type="simple"/></fig><p>The second measure capturing an overlap between two interaction datasets was applicable to any two datasets, even if they were from different organisms. We referred to this measure as the homology-based overlap. It was defined as follows (<xref ref-type="fig" rid="pcbi-0020079-g006">Figure 6</xref>B):
						<disp-formula id="pcbi-0020079-e003">
							<graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.e003" position="anchor" alt-version="no" mimetype="image" xlink:type="simple"/>
							<!-- <mml:math display='block'><mml:mrow><mml:mi>o</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&equals;</mml:mo><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo>&plus;</mml:mo><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>I</mml:mi><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>M</mml:mi><mml:mi>x</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:math> -->
						</disp-formula>where <italic>PPI</italic>(<italic>MandN</italic>)<sup>(<italic>h</italic>)</sup> is the number of homologous PPIs reported in both datasets considering a homology threshold of HVAL &gt; h. Assume again that A is homolog of A′ and B of B′. If the interaction A-B is in dataset 1 and the interaction A′-B′ is in dataset 2, the count for <italic>PPI</italic>(<italic>MandN</italic>)<sup>(<italic>h</italic>)</sup> will increase by one. The quantities <italic>PPI</italic>(<italic>MandN</italic>)<sup>(<italic>h</italic>)</sup> and <italic>PPI</italic>(<italic>MxorN</italic>)<sup>(<italic>h</italic>)</sup> are similar to those in <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref> with the simple caveat that we substituted identical pairs with homologous pairs, because there are no identical pairs between two different organisms. Unlike for <xref ref-type="disp-formula" rid="pcbi-0020079-e002">Equation 2</xref>, when using <xref ref-type="disp-formula" rid="pcbi-0020079-e003">Equation 3</xref> to measure the overlap between a dataset and itself, the result usually happens to be &lt; 1 (&lt; 100%). For an explanation consider the following example. Assume that our dataset contains the interaction A(bait)-B(prey) along with another protein A′ (bait, homologous to A) that is not found to interact with B. The absence of A′-B will increase the count of <italic>PPI</italic>(<italic>MxorN</italic>)<sup>(<italic>h</italic>)</sup> by one, thereby yielding a self overlap &lt;1. On the one hand, for very high levels of similarity (say A and A′ have 99% pairwise sequence identity), the reduction from 1 can be interpreted as a reflection of the limitation of experimental accuracy. On the other hand, for low levels of similarity, the reduction is related to the fact that PPIs are simply not conserved between distant relatives. Note that we also investigated overlap when replacing HVAL (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>) by PSI-BLAST E-values as a measure for sequence similarity. While the resulting numbers differed slightly, the trends that we reported remained the same (data not shown).
					</p></sec><sec id="s4f"><title>Homology performance curves.</title><p>For given levels of sequence similarity, we monitored and plotted the accuracy of inferring PPIs through homology from one dataset to another. The procedure is described in <xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>.</p><fig id="pcbi-0020079-g007" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0020079.g007</object-id><label>Figure 7</label><caption><title>Evaluating Homology Inference of PPIs</title><p>Starting with the entirety of observed interactions in any organism o (Y2H plus small scale experiments), we first reduce the sequence redundancy from this dataset as described in <xref ref-type="fig" rid="pcbi-0020079-g003">Figure 3</xref>. Then we try to find homologs in the organism p for each of the unique PPIs of organism o. Since we want to be able to conclude that every nondetected interaction in organism p does actually not exist in real life, we need to have a complete interaction matrix (baits × preys) for organism p. Thus, we are forced to exclude all small-scale data from the organism p dataset and remain with a merger of all (redundant) Y2H interactions for this organism. For each interaction A-B from organism o, we can face any of the following situations: (a) A homologous interaction A′-B′ can be found in organism p, (b) no homologous interaction can be found in p, or (c) It is impossible to detect an interaction of type A′-B′ in p because of one of the following two reasons: (i) either A′ or B′ are missing in the dataset for p or (ii) Both A′ and B′ are either preys or both are baits in the dataset for organism p. The latter case (c.ii) is illustrated by the interaction E-F in organism o, which cannot be detected in organism p only because E′ and F′ are both used as preys in the experiment. No counts for false positives are made for those cases. Adding the numbers of true positives (expected and observed PPIs), false positives (expected but not observed) and false negatives (observed interaction only in organism p) allows us to calculate accuracy and coverage for each homology threshold used to infer interactions (Equation 4). It is important to note that in the case where o = p, comparisons between two identical experimental PPI-sets are ignored (e.g. A-B in o′s set “<italic>yeast-Ito-2001”</italic> is not used to predict A′-B′ in p′s set “<italic>yeast-Ito-2001”</italic>; o = p = yeast).</p></caption><graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.g007" alt-version="no" mimetype="image" position="float" xlink:type="simple"/></fig><p>The resulting curves can be interpreted as the degree to which PPIs are evolutionarily conserved. In a more technical sense, the curves reflect the performance of homology transfer of PPIs (<xref ref-type="fig" rid="pcbi-0020079-g001">Figure 1</xref>). The HVAL (<xref ref-type="disp-formula" rid="pcbi-0020079-e001">Equation 1</xref>) determined the minimal similarity between A and A′, as well as between B and B′. Other ways of considering two pairs of interacting proteins as related, for instance the arithmetic or geometric average of both HVALs (A/A′ and B/B′), led to a slightly worse performance of our homology inferences, i.e. the curves were similar albeit lower overall (data not shown). Note that each large-scale Y2H data set (<xref ref-type="supplementary-material" rid="pcbi-0020079-st001">Table S1</xref>) should, by experimental design, contain a complete interaction matrix (preys × baits) that is, ideally, both fully correct and comprehensive for all the proteins tested in that experiment. Consider an interaction A-B from any dataset (small-scale or large-scale) of an organism o; if we find the homologs A′ and B′ in a large-scale dataset of another organism p, we can transfer the interaction property from A-B to A′-B′. In other words, by looking at the PPI between A and B (A-B), we simply predict that A′ and B′ also interact. Because of the complete interaction matrix that we are looking at for organism p, we can now also say whether this prediction was actually right or wrong. In particular, the prediction is correct, if we find the interaction A′-B′ in p and wrong if we do not find it in p plus A′ and B′ are on different axes of the interaction matrix (A′ = prey, B′ = bait or vice versa). In order to compare the performance of homology transfers across two organisms (o ≠ p) to the one for intraorganism transfers (o = p), we have to allow p and o to be the same. Therefore, in order to be able to compare results from both types of experiments (intraspecies versus interspecies), we have to apply the following restrictions to comparisons within the same species (o = p): Transfers from an interaction A-B to another PPI of the type A-B′ or A′-B (one protein identical, the other homologous) are not allowed since these cases are only observable in intraspecies predictions but not in interspecies transfers. Additionally for intraspecies predictions, we required that A-B and the predicted interaction (A′-B′) stem from different datasets (different Y2H experiments) in order to ignore possible homology-based assumptions about two PPIs within the same dataset. The problem here is that in case a research group found an interaction (e.g., A-B) through a Y2H scan, would they work harder to also find an interaction A′-B′ (A′ = homolog to A, B′ = homolog to B) or A′-B rather than an unrelated interaction (e.g., M-N).</p></sec><sec id="s4g"><title>Accuracy and coverage.</title><p>We measured the accuracy (Acc) and coverage (Cov) for the inference (prediction) of interacting protein pairs by the standard formulas:
						<disp-formula id="pcbi-0020079-e004">
							<graphic xlink:href="info:doi/10.1371/journal.pcbi.0020079.e004" position="anchor" alt-version="no" mimetype="image" xlink:type="simple"/>
							<!-- <mml:math display='block'><mml:mrow><mml:mtext>Acc</mml:mtext><mml:mo>&equals;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mrow><mml:mtext>TP</mml:mtext><mml:mo>&plus;</mml:mo><mml:mtext>FP</mml:mtext></mml:mrow></mml:mfrac><mml:mtext>;&emsp;&emsp;Cov</mml:mtext><mml:mo>&equals;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mrow><mml:mtext>TP</mml:mtext><mml:mo>&plus;</mml:mo><mml:mtext>FN</mml:mtext></mml:mrow></mml:mfrac></mml:mrow></mml:math> -->
						</disp-formula>where TP are the true positives (i.e., physical interactions that are experimentally observed [e.g., by Y2H, note TAP-like relations are not included here] and that are also correctly inferred by homology). FP are the false positives (i.e., the pairs inferred through homology but not observed by Y2H experiments). Finally, FN are the false negatives (i.e., the physical interactions that have been observed but were not identified). We monitored levels of accuracy and coverage as a function of the sequence similarity between the proteins of known and those of unknown annotations. There is a trade-off between these two: the more restrictive the sequence similarity threshold, the more interactions will be inferred (higher coverage) at the expense of reduced accuracy; and the higher the threshold, the more will be right (high accuracy) at the expense of few inferences (low coverage).
					</p></sec><sec id="s4h"><title>Error estimate.</title><p>The error in the estimates of accuracy and coverage were determined by bootstrapping [<xref ref-type="bibr" rid="pcbi-0020079-b078">78</xref>] over the protein–protein interactions in the source datasets. In particular, we picked <italic>n</italic> interactions at random from the non-redundant source dataset and compiled the averages over a larger set with possibly many replicas of the same incidence. The levels of accuracy/coverage for different thresholds in sequence similarity were then calculated according to the procedure described above (<xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>). For the bootstrapping, these two steps had been repeated 100 times before the standard deviation (sigma) for all levels of accuracy were calculated.</p></sec></sec><sec id="s5"><title>Supporting Information</title><supplementary-material id="pcbi-0020079-st001" xlink:href="info:doi/10.1371/journal.pcbi.0020079.st001" mimetype="application/msword" position="float" xlink:type="simple"><label>Table S1</label><caption><title>Large-Scale Protein–Protein Interaction Datasets from IntAct</title><p>(74 KB DOC)</p></caption></supplementary-material><supplementary-material id="pcbi-0020079-sg001" xlink:href="info:doi/10.1371/journal.pcbi.0020079.sg001" mimetype="application/msword" position="float" xlink:type="simple"><label>Figure S1</label><caption><title>Number of true positive counts versus HVAL</title><p>Each curve shows the accuracy (red) as shown in <xref ref-type="fig" rid="pcbi-0020079-g003">Figure 3</xref> and the number of true positives counted at a certain HSSP-value cutoff (green)</p><p>(72 KB DOC)</p></caption></supplementary-material><supplementary-material id="pcbi-0020079-sg002" xlink:href="info:doi/10.1371/journal.pcbi.0020079.sg002" mimetype="application/msword" position="float" xlink:type="simple"><label>Figure S2</label><caption><title>Results Are Stable with Respect to Variations in the Experimental Setup</title><p>(A) Different sampling of intra- versus inter-species: we allowed transfers of the type A-B to A'-B or A-B to A-B' (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section). The performance became significantly better for intra-species PPI-transfers, thus further widening the gap between intra- and inter-species transfers.</p><p>(B) Inclusion of transfers within the same data set: we included homology transfers within the same experimental dataset (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section). The effect was very similar to those observed for different sampling (#1), i.e. widening the gap between intra- and inter-species inferences.</p><p>(C) Using TAP-like data (<xref ref-type="supplementary-material" rid="pcbi-0020079-st001">Table S1</xref>) as a constraint for the negatives. To illustrate this, assume that TAP pulled down a complex of six proteins. While we cannot infer that all 15 possible interactions are physical, all could be. Therefore, we ignored a false positive prediction (did not count it) if we could find the interaction in those 15 TAP protein-protein pairs. The accuracy slightly increased for both yeast versus yeast (intra-species) comparisons as well as for non-yeast versus yeast (inter-species) comparisons. Note that yeast is the only organism with available TAP-like data.</p><p>(D) We used a redundant dataset (instead of a non-redundant, bias-reduced set) from organism o (<xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>) to hunt for interologs in organism p (<xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>). The main message indicated by the results for this latter experiment (#4) stays the same as in our original procedure (see <xref ref-type="sec" rid="s4">Materials and Methods</xref> section): intra species comparisons are more accurate than inter-species comparisons. Due to more samples in the dataset for organism o (<xref ref-type="fig" rid="pcbi-0020079-g007">Figure 7</xref>) and thus higher counts, the errors slightly decreased.</p><p>(153 KB DOC)</p></caption></supplementary-material></sec></body><back><ack><p>Thanks to Jinfeng Liu, Hans-Erik Aronson, Kristen McFadden, and Paul Glick (all from Columbia University) for computer assistance. Thanks to the anonymous reviewers for their helpful criticism. Furthermore, thanks in particular to Amos Bairoch (Swiss Institute of Bioinformatics, Geneva, Switzerland), Rolf Apweiler (European Bioinformatics Institute, Hinxton, United Kingdom), Phil Bourne (San Diego University, San Diego, California, United States), David Eisenberg (University of California—Los Angeles, Los Angeles, California, United States), and their crews for maintaining excellent databases and to all experimentalists who enabled this work by publishing their PPI results in PubMed/MedLine.
			</p></ack><glossary><title>Abbreviations</title><def-list><def-item><term>AAA</term><def><p>ATPases associated with various cellular activities</p></def></def-item><def-item><term>HVAL</term><def><p>measure for sequence similarity</p></def></def-item><def-item><term>PIDE</term><def><p>percentage sequence identity</p></def></def-item><def-item><term>PPI</term><def><p>physical protein–protein interaction</p></def></def-item><def-item><term>PSI-BLAST</term><def><p>position-specific iterative basic local alignment search tool</p></def></def-item><def-item><term>TAP</term><def><p>tandem affinity purification</p></def></def-item><def-item><term>Y2H</term><def><p>yeast two-hybrid</p></def></def-item></def-list></glossary><ref-list><title>References</title><ref id="pcbi-0020079-b001"><label>1</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Fields</surname><given-names>S</given-names></name><name name-style="western"><surname>Song</surname><given-names>O</given-names></name></person-group>
					<year>1989</year>
					<article-title>A novel genetic system to detect protein–protein interactions.</article-title>
					<source>Nature</source>
					<volume>340</volume>
					<fpage>245</fpage>
					<lpage>246</lpage>
				</citation></ref><ref id="pcbi-0020079-b002"><label>2</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Causier</surname><given-names>B</given-names></name></person-group>
					<year>2004</year>
					<article-title>Studying the interactome with the yeast two-hybrid system and mass spectrometry.</article-title>
					<source>Mass Spectrom Rev</source>
					<volume>23</volume>
					<fpage>350</fpage>
					<lpage>367</lpage>
				</citation></ref><ref id="pcbi-0020079-b003"><label>3</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Legrain</surname><given-names>P</given-names></name><name name-style="western"><surname>Wojcik</surname><given-names>J</given-names></name><name name-style="western"><surname>Gauthier</surname><given-names>JM</given-names></name></person-group>
					<year>2001</year>
					<article-title>Protein–protein interaction maps: A lead towards cellular functions.</article-title>
					<source>Trends Genet</source>
					<volume>17</volume>
					<fpage>346</fpage>
					<lpage>352</lpage>
				</citation></ref><ref id="pcbi-0020079-b004"><label>4</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Willats</surname><given-names>WG</given-names></name></person-group>
					<year>2002</year>
					<article-title>Phage display: Practicalities and prospects.</article-title>
					<source>Plant Mol Biol</source>
					<volume>50</volume>
					<fpage>837</fpage>
					<lpage>854</lpage>
				</citation></ref><ref id="pcbi-0020079-b005"><label>5</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Puig</surname><given-names>O</given-names></name><name name-style="western"><surname>Caspary</surname><given-names>F</given-names></name><name name-style="western"><surname>Rigaut</surname><given-names>G</given-names></name><name name-style="western"><surname>Rutz</surname><given-names>B</given-names></name><name name-style="western"><surname>Bouveret</surname><given-names>E</given-names></name><etal/></person-group>
					<year>2001</year>
					<article-title>The tandem affinity purification (TAP) method: A general procedure of protein complex purification.</article-title>
					<source>Methods</source>
					<volume>24</volume>
					<fpage>218</fpage>
					<lpage>229</lpage>
				</citation></ref><ref id="pcbi-0020079-b006"><label>6</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Rigaut</surname><given-names>G</given-names></name><name name-style="western"><surname>Shevchenko</surname><given-names>A</given-names></name><name name-style="western"><surname>Rutz</surname><given-names>B</given-names></name><name name-style="western"><surname>Wilm</surname><given-names>M</given-names></name><name name-style="western"><surname>Mann</surname><given-names>M</given-names></name><etal/></person-group>
					<year>1999</year>
					<article-title>A generic protein purification method for protein complex characterization and proteome exploration.</article-title>
					<source>Nat Biotechnol</source>
					<volume>17</volume>
					<fpage>1030</fpage>
					<lpage>1032</lpage>
				</citation></ref><ref id="pcbi-0020079-b007"><label>7</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Aebersold</surname><given-names>R</given-names></name><name name-style="western"><surname>Mann</surname><given-names>M</given-names></name></person-group>
					<year>2003</year>
					<article-title>Mass spectrometry-based proteomics.</article-title>
					<source>Nature</source>
					<volume>422</volume>
					<fpage>198</fpage>
					<lpage>207</lpage>
				</citation></ref><ref id="pcbi-0020079-b008"><label>8</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Bauer</surname><given-names>A</given-names></name><name name-style="western"><surname>Kuster</surname><given-names>B</given-names></name></person-group>
					<year>2003</year>
					<article-title>Affinity purification-mass spectrometry. Powerful tools for the characterization of protein complexes.</article-title>
					<source>Eur J Biochem</source>
					<volume>270</volume>
					<fpage>570</fpage>
					<lpage>578</lpage>
				</citation></ref><ref id="pcbi-0020079-b009"><label>9</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Lin</surname><given-names>D</given-names></name><name name-style="western"><surname>Tabb</surname><given-names>DL</given-names></name><name name-style="western"><surname>Yates</surname><given-names>JR</given-names><suffix>III</suffix></name></person-group>
					<year>2003</year>
					<article-title>Large-scale protein identification using mass spectrometry.</article-title>
					<source>Biochim Biophys Acta</source>
					<volume>1646</volume>
					<fpage>1</fpage>
					<lpage>10</lpage>
				</citation></ref><ref id="pcbi-0020079-b010"><label>10</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Walhout</surname><given-names>AJ</given-names></name><name name-style="western"><surname>Vidal</surname><given-names>M</given-names></name></person-group>
					<year>2001</year>
					<article-title>Protein interaction maps for model organisms.</article-title>
					<source>Nat Rev Mol Cell Biol</source>
					<volume>2</volume>
					<fpage>55</fpage>
					<lpage>62</lpage>
				</citation></ref><ref id="pcbi-0020079-b011"><label>11</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Formstecher</surname><given-names>E</given-names></name><name name-style="western"><surname>Aresta</surname><given-names>S</given-names></name><name name-style="western"><surname>Collura</surname><given-names>V</given-names></name><name name-style="western"><surname>Hamburger</surname><given-names>A</given-names></name><name name-style="western"><surname>Meil</surname><given-names>A</given-names></name><etal/></person-group>
					<year>2005</year>
					<article-title>Protein interaction mapping: A <italic>Drosophila</italic> case study.</article-title>
					<source>Genome Res</source>
					<volume>15</volume>
					<fpage>376</fpage>
					<lpage>384</lpage>
				</citation></ref><ref id="pcbi-0020079-b012"><label>12</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Ito</surname><given-names>T</given-names></name><name name-style="western"><surname>Tashiro</surname><given-names>K</given-names></name><name name-style="western"><surname>Muta</surname><given-names>S</given-names></name><name name-style="western"><surname>Ozawa</surname><given-names>R</given-names></name><name name-style="western"><surname>Chiba</surname><given-names>T</given-names></name><etal/></person-group>
					<year>2000</year>
					<article-title>Toward a protein–protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins.</article-title>
					<source>Proc Natl Acad Sci U S A</source>
					<volume>97</volume>
					<fpage>1143</fpage>
					<lpage>1147</lpage>
				</citation></ref><ref id="pcbi-0020079-b013"><label>13</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Uetz</surname><given-names>P</given-names></name><name name-style="western"><surname>Giot</surname><given-names>L</given-names></name><name name-style="western"><surname>Cagney</surname><given-names>G</given-names></name><name name-style="western"><surname>Mansfield</surname><given-names>TA</given-names></name><name name-style="western"><surname>Judson</surname><given-names>RS</given-names></name><etal/></person-group>
					<year>2000</year>
					<article-title>A comprehensive analysis of protein–protein interactions in <named-content content-type="genus-species" xlink:type="simple">Saccharomyces cerevisiae</named-content>.</article-title>
					<source>Nature</source>
					<volume>403</volume>
					<fpage>623</fpage>
					<lpage>627</lpage>
				</citation></ref><ref id="pcbi-0020079-b014"><label>14</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Ho</surname><given-names>Y</given-names></name><name name-style="western"><surname>Gruhler</surname><given-names>A</given-names></name><name name-style="western"><surname>Heilbut</surname><given-names>A</given-names></name><name name-style="western"><surname>Bader</surname><given-names>GD</given-names></name><name name-style="western"><surname>Moore</surname><given-names>L</given-names></name><etal/></person-group>
					<year>2002</year>
					<article-title>Systematic identification of protein complexes in <named-content content-type="genus-species" xlink:type="simple">Saccharomyces cerevisiae</named-content> by mass spectrometry.</article-title>
					<source>Nature</source>
					<volume>415</volume>
					<fpage>180</fpage>
					<lpage>183</lpage>
				</citation></ref><ref id="pcbi-0020079-b015"><label>15</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Gavin</surname><given-names>AC</given-names></name><name name-style="western"><surname>Bosche</surname><given-names>M</given-names></name><name name-style="western"><surname>Krause</surname><given-names>R</given-names></name><name name-style="western"><surname>Grandi</surname><given-names>P</given-names></name><name name-style="western"><surname>Marzioch</surname><given-names>M</given-names></name><etal/></person-group>
					<year>2002</year>
					<article-title>Functional organization of the yeast proteome by systematic analysis of protein complexes.</article-title>
					<source>Nature</source>
					<volume>415</volume>
					<fpage>141</fpage>
					<lpage>147</lpage>
				</citation></ref><ref id="pcbi-0020079-b016"><label>16</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Stanyon</surname><given-names>CA</given-names></name><name name-style="western"><surname>Liu</surname><given-names>G</given-names></name><name name-style="western"><surname>Mangiola</surname><given-names>BA</given-names></name><name name-style="western"><surname>Patel</surname><given-names>N</given-names></name><name name-style="western"><surname>Giot</surname><given-names>L</given-names></name><etal/></person-group>
					<year>2004</year>
					<article-title>A <italic>Drosophila</italic> protein-interaction map centered on cell-cycle regulators.</article-title>
					<source>Genome Biol</source>
					<volume>5</volume>
					<fpage>R96</fpage>
				</citation></ref><ref id="pcbi-0020079-b017"><label>17</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Giot</surname><given-names>L</given-names></name><name name-style="western"><surname>Bader</surname><given-names>JS</given-names></name><name name-style="western"><surname>Brouwer</surname><given-names>C</given-names></name><name name-style="western"><surname>Chaudhuri</surname><given-names>A</given-names></name><name name-style="western"><surname>Kuang</surname><given-names>B</given-names></name><etal/></person-group>
					<year>2003</year>
					<article-title>A protein interaction map of <named-content content-type="genus-species" xlink:type="simple">Drosophila melanogaster</named-content>.</article-title>
					<source>Science</source>
					<volume>302</volume>
					<fpage>1727</fpage>
					<lpage>1736</lpage>
				</citation></ref><ref id="pcbi-0020079-b018"><label>18</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Li</surname><given-names>S</given-names></name><name name-style="western"><surname>Armstrong</surname><given-names>CM</given-names></name><name name-style="western"><surname>Bertin</surname><given-names>N</given-names></name><name name-style="western"><surname>Ge</surname><given-names>H</given-names></name><name name-style="western"><surname>Milstein</surname><given-names>S</given-names></name><etal/></person-group>
					<year>2004</year>
					<article-title>A map of the interactome network of the metazoan <named-content content-type="genus-species" xlink:type="simple">C. elegans</named-content>.</article-title>
					<source>Science</source>
					<volume>303</volume>
					<fpage>540</fpage>
					<lpage>543</lpage>
				</citation></ref><ref id="pcbi-0020079-b019"><label>19</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Bouwmeester</surname><given-names>T</given-names></name><name name-style="western"><surname>Bauch</surname><given-names>A</given-names></name><name name-style="western"><surname>Ruffner</surname><given-names>H</given-names></name><name name-style="western"><surname>Angrand</surname><given-names>PO</given-names></name><name name-style="western"><surname>Bergamini</surname><given-names>G</given-names></name><etal/></person-group>
					<year>2004</year>
					<article-title>A physical and functional map of the human TNF-α/NF-κB signal transduction pathway.</article-title>
					<source>Nat Cell Biol</source>
					<volume>6</volume>
					<fpage>97</fpage>
					<lpage>105</lpage>
				</citation></ref><ref id="pcbi-0020079-b020"><label>20</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Sprinzak</surname><given-names>E</given-names></name><name name-style="western"><surname>Sattath</surname><given-names>S</given-names></name><name name-style="western"><surname>Margalit</surname><given-names>H</given-names></name></person-group>
					<year>2003</year>
					<article-title>How reliable are experimental protein–protein interaction data?</article-title>
					<source>J Mol Biol</source>
					<volume>327</volume>
					<fpage>919</fpage>
					<lpage>923</lpage>
				</citation></ref><ref id="pcbi-0020079-b021"><label>21</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Deane</surname><given-names>CM</given-names></name><name name-style="western"><surname>Salwinski</surname><given-names>L</given-names></name><name name-style="western"><surname>Xenarios</surname><given-names>I</given-names></name><name name-style="western"><surname>Eisenberg</surname><given-names>D</given-names></name></person-group>
					<year>2002</year>
					<article-title>Protein interactions: Two methods for assessment of the reliability of high throughput observations.</article-title>
					<source>Mol Cell Proteomics</source>
					<volume>1</volume>
					<fpage>349</fpage>
					<lpage>356</lpage>
				</citation></ref><ref id="pcbi-0020079-b022"><label>22</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Sprinzak</surname><given-names>E</given-names></name><name name-style="western"><surname>Sattath</surname><given-names>S</given-names></name><name name-style="western"><surname>Margalit</surname><given-names>H</given-names></name></person-group>
					<year>2003</year>
					<article-title>How reliable are experimental protein–protein interaction data?</article-title>
					<source>J Mol Biol</source>
					<volume>327</volume>
					<fpage>919</fpage>
					<lpage>923</lpage>
				</citation></ref><ref id="pcbi-0020079-b023"><label>23</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Abagyan</surname><given-names>RA</given-names></name><name name-style="western"><surname>Batalov</surname><given-names>S</given-names></name></person-group>
					<year>1997</year>
					<article-title>Do aligned sequences share the same fold?</article-title>
					<source>J Mol Biol</source>
					<volume>273</volume>
					<fpage>355</fpage>
					<lpage>368</lpage>
				</citation></ref><ref id="pcbi-0020079-b024"><label>24</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Brenner</surname><given-names>SE</given-names></name><name name-style="western"><surname>Chothia</surname><given-names>C</given-names></name><name name-style="western"><surname>Hubbard</surname><given-names>TJP</given-names></name></person-group>
					<year>1998</year>
					<article-title>Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.</article-title>
					<source>Proc Natl Acad Sci U S A</source>
					<volume>95</volume>
					<fpage>6073</fpage>
					<lpage>6078</lpage>
				</citation></ref><ref id="pcbi-0020079-b025"><label>25</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Rost</surname><given-names>B</given-names></name></person-group>
					<year>1999</year>
					<article-title>Twilight zone of protein sequence alignments.</article-title>
					<source>Protein Eng</source>
					<volume>12</volume>
					<fpage>85</fpage>
					<lpage>94</lpage>
				</citation></ref><ref id="pcbi-0020079-b026"><label>26</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Nair</surname><given-names>R</given-names></name><name name-style="western"><surname>Rost</surname><given-names>B</given-names></name></person-group>
					<year>2002</year>
					<article-title>Sequence conserved for sub-cellular localization.</article-title>
					<source>Protein Sci</source>
					<volume>11</volume>
					<fpage>2836</fpage>
					<lpage>2847</lpage>
				</citation></ref><ref id="pcbi-0020079-b027"><label>27</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Todd</surname><given-names>AE</given-names></name><name name-style="western"><surname>Orengo</surname><given-names>CA</given-names></name><name name-style="western"><surname>Thornton</surname><given-names>JM</given-names></name></person-group>
					<year>2001</year>
					<article-title>Evolution of function in protein superfamilies, from a structural perspective.</article-title>
					<source>J Mol Biol</source>
					<volume>307</volume>
					<fpage>1113</fpage>
					<lpage>1143</lpage>
				</citation></ref><ref id="pcbi-0020079-b028"><label>28</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Devos</surname><given-names>D</given-names></name><name name-style="western"><surname>Valencia</surname><given-names>A</given-names></name></person-group>
					<year>2001</year>
					<article-title>Intrinsic errors in genome annotation.</article-title>
					<source>Trends Genet</source>
					<volume>17</volume>
					<fpage>429</fpage>
					<lpage>431</lpage>
				</citation></ref><ref id="pcbi-0020079-b029"><label>29</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Rost</surname><given-names>B</given-names></name></person-group>
					<year>2002</year>
					<article-title>Enzyme function less conserved than anticipated.</article-title>
					<source>J Mol Biol</source>
					<volume>318</volume>
					<fpage>595</fpage>
					<lpage>608</lpage>
				</citation></ref><ref id="pcbi-0020079-b030"><label>30</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Pellegrini</surname><given-names>M</given-names></name><name name-style="western"><surname>Marcotte</surname><given-names>EM</given-names></name><name name-style="western"><surname>Thompson</surname><given-names>MJ</given-names></name><name name-style="western"><surname>Eisenberg</surname><given-names>D</given-names></name><name name-style="western"><surname>Yeates</surname><given-names>TO</given-names></name></person-group>
					<year>1999</year>
					<article-title>Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles.</article-title>
					<source>Proc Natl Acad Sci U S A</source>
					<volume>96</volume>
					<fpage>4285</fpage>
					<lpage>4288</lpage>
				</citation></ref><ref id="pcbi-0020079-b031"><label>31</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Pawlowski</surname><given-names>K</given-names></name><name name-style="western"><surname>Jaroszewski</surname><given-names>L</given-names></name><name name-style="western"><surname>Rychlewski</surname><given-names>L</given-names></name><name name-style="western"><surname>Godzik</surname><given-names>A</given-names></name></person-group>
					<year>2000</year>
					<article-title>Sensitive sequence comparison as protein function predictor.</article-title>
					<source>Pac Symp Biocomput</source>
					<volume>5</volume>
					<fpage>42</fpage>
					<lpage>53</lpage>
				</citation></ref><ref id="pcbi-0020079-b032"><label>32</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Thornton</surname><given-names>JM</given-names></name></person-group>
					<year>2001</year>
					<article-title>From genome to function.</article-title>
					<source>Science</source>
					<volume>292</volume>
					<fpage>2095</fpage>
					<lpage>2097</lpage>
				</citation></ref><ref id="pcbi-0020079-b033"><label>33</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Koonin</surname><given-names>EV</given-names></name><name name-style="western"><surname>Wolf</surname><given-names>YI</given-names></name><name name-style="western"><surname>Karev</surname><given-names>GP</given-names></name></person-group>
					<year>2002</year>
					<article-title>The structure of the protein universe and genome evolution.</article-title>
					<source>Nature</source>
					<volume>420</volume>
					<fpage>218</fpage>
					<lpage>223</lpage>
				</citation></ref><ref id="pcbi-0020079-b034"><label>34</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Rost</surname><given-names>B</given-names></name><name name-style="western"><surname>Liu</surname><given-names>J</given-names></name><name name-style="western"><surname>Nair</surname><given-names>R</given-names></name><name name-style="western"><surname>Wrzeszczynski</surname><given-names>KO</given-names></name><name name-style="western"><surname>Ofran</surname><given-names>Y</given-names></name></person-group>
					<year>2003</year>
					<article-title>Automatic prediction of protein function.</article-title>
					<source>Cell Mol Life Sci</source>
					<volume>60</volume>
					<fpage>2637</fpage>
					<lpage>2650</lpage>
				</citation></ref><ref id="pcbi-0020079-b035"><label>35</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Ito</surname><given-names>T</given-names></name><name name-style="western"><surname>Chiba</surname><given-names>T</given-names></name><name name-style="western"><surname>Ozawa</surname><given-names>R</given-names></name><name name-style="western"><surname>Yoshida</surname><given-names>M</given-names></name><name name-style="western"><surname>Hattori</surname><given-names>M</given-names></name><etal/></person-group>
					<year>2001</year>
					<article-title>A comprehensive two-hybrid analysis to explore the yeast protein interactome.</article-title>
					<source>Proc Natl Acad Sci U S A</source>
					<volume>98</volume>
					<fpage>4569</fpage>
					<lpage>4574</lpage>
				</citation></ref><ref id="pcbi-0020079-b036"><label>36</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Matthews</surname><given-names>L</given-names></name><name name-style="western"><surname>Vaglio</surname><given-names>P</given-names></name><name name-style="western"><surname>Reboul</surname><given-names>J</given-names></name><name name-style="western"><surname>Ge</surname><given-names>H</given-names></name><name name-style="western"><surname>Davis</surname><given-names>B</given-names></name><etal/></person-group>
					<year>2001</year>
					<article-title>Identification of potential interaction networks using sequence-based searches for conserved protein–protein interactions or “interologs.”.</article-title>
					<source>Genome Res</source>
					<volume>11</volume>
					<fpage>2120</fpage>
					<lpage>2126</lpage>
				</citation></ref><ref id="pcbi-0020079-b037"><label>37</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Yu</surname><given-names>H</given-names></name><name name-style="western"><surname>Luscombe</surname><given-names>N</given-names></name><name name-style="western"><surname>Lu</surname><given-names>H</given-names></name><name name-style="western"><surname>Zhu</surname><given-names>X</given-names></name><name name-style="western"><surname>Xia</surname><given-names>Y</given-names></name><etal/></person-group>
					<year>2004</year>
					<article-title>Annotation transfer between genomes: Protein–protein interologs and protein-DNA regulogs.</article-title>
					<source>Genome Res</source>
					<volume>14</volume>
					<fpage>1107</fpage>
					<lpage>1118</lpage>
				</citation></ref><ref id="pcbi-0020079-b038"><label>38</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Chen</surname><given-names>R</given-names></name><name name-style="western"><surname>Jeong</surname><given-names>S</given-names></name></person-group>
					<year>2000</year>
					<article-title>Functional prediction: Identification of protein orthologs and paralogs.</article-title>
					<source>Protein Sci</source>
					<volume>9</volume>
					<fpage>2344</fpage>
					<lpage>2353</lpage>
				</citation></ref><ref id="pcbi-0020079-b039"><label>39</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Tatusov</surname><given-names>R</given-names></name><name name-style="western"><surname>Koonin</surname><given-names>E</given-names></name><name name-style="western"><surname>Lipman</surname><given-names>D</given-names></name></person-group>
					<year>1997</year>
					<article-title>A genomic perspective on protein families.</article-title>
					<source>Science</source>
					<volume>278</volume>
					<fpage>631</fpage>
					<lpage>637</lpage>
				</citation></ref><ref id="pcbi-0020079-b040"><label>40</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Tirosh</surname><given-names>I</given-names></name><name name-style="western"><surname>Barkai</surname><given-names>N</given-names></name></person-group>
					<year>2005</year>
					<article-title>Computational verification of protein–protein interactions by orthologous co-expression.</article-title>
					<source>BMC Bioinformatics</source>
					<volume>6</volume>
					<fpage>40</fpage>
				</citation></ref><ref id="pcbi-0020079-b041"><label>41</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Lehner</surname><given-names>B</given-names></name><name name-style="western"><surname>Fraser</surname><given-names>AG</given-names></name></person-group>
					<year>2004</year>
					<article-title>A first-draft human protein-interaction map.</article-title>
					<source>Genome Biol</source>
					<volume>5</volume>
					<fpage>R63</fpage>
				</citation></ref><ref id="pcbi-0020079-b042"><label>42</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Bhardwaj</surname><given-names>N</given-names></name><name name-style="western"><surname>Lu</surname><given-names>H</given-names></name></person-group>
					<year>2005</year>
					<article-title>Correlation between gene expression profiles and protein–protein interactions within and across genomes.</article-title>
					<source>Bioinformatics</source>
					<volume>21</volume>
					<fpage>2730</fpage>
					<lpage>2738</lpage>
				</citation></ref><ref id="pcbi-0020079-b043"><label>43</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Bowers</surname><given-names>PM</given-names></name><name name-style="western"><surname>Cokus</surname><given-names>SJ</given-names></name><name name-style="western"><surname>Eisenberg</surname><given-names>D</given-names></name><name name-style="western"><surname>Yeates</surname><given-names>TO</given-names></name></person-group>
					<year>2004</year>
					<article-title>Use of logic relationships to decipher protein network organization.</article-title>
					<source>Science</source>
					<volume>306</volume>
					<fpage>2246</fpage>
					<lpage>2249</lpage>
				</citation></ref><ref id="pcbi-0020079-b044"><label>44</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Ofran</surname><given-names>Y</given-names></name><name name-style="western"><surname>Rost</surname><given-names>B</given-names></name></person-group>
					<year>2003</year>
					<article-title>Analysing six types of protein–protein interfaces.</article-title>
					<source>J Mol Biol</source>
					<volume>325</volume>
					<fpage>377</fpage>
					<lpage>387</lpage>
				</citation></ref><ref id="pcbi-0020079-b045"><label>45</label><citation citation-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Hermjakob</surname><given-names>H</given-names></name><name name-style="western"><surname>Montecchi-Palazzi</surname><given-names>L</given-names></name><name name-style="western"><surname>Lewington</surname><given-names>C</given-names></name><name name-style="western"><surname>Mudali</surname><given-names>S</given-names></name><name name-style="western"><surname>Kerrien</surname><given-names>S</given-names></name><etal/></person-group>
					<year>2004</year>
					<article-title>IntAct: An open source molecular interaction database.</article-title>
					<source>Nucleic Acids Res</source>
		