The authors have declared that no competing interests exist.
Conceived and designed the experiments: MHS TJSL YK HK MAAN. Analyzed the data: MHS TJSL NM JES YM JFF CLJ AJE GN CPI. Wrote the paper: MHS TJSL NM AJE GN MAAN.
Interactions of proteins regulate signaling, catalysis, gene expression and many other cellular functions. Therefore, characterizing the entire human interactome is a key effort in current proteomics research. This challenge is complicated by the dynamic nature of protein-protein interactions (PPIs), which are conditional on the cellular context: both interacting proteins must be expressed in the same cell and localized in the same organelle to meet. Additionally, interactions underlie a delicate control of signaling pathways, e.g. by post-translational modifications of the protein partners - hence, many diseases are caused by the perturbation of these mechanisms. Despite the high degree of cell-state specificity of PPIs, many interactions are measured under artificial conditions (e.g. yeast cells are transfected with human genes in yeast two-hybrid assays) or even if detected in a physiological context, this information is missing from the common PPI databases. To overcome these problems, we developed a method that assigns context information to PPIs inferred from various attributes of the interacting proteins: gene expression, functional and disease annotations, and inferred pathways. We demonstrate that context consistency correlates with the experimental reliability of PPIs, which allows us to generate high-confidence tissue- and function-specific subnetworks. We illustrate how these context-filtered networks are enriched in bona fide pathways and disease proteins to prove the ability of context-filters to highlight meaningful interactions with respect to various biological questions. We use this approach to study the lung-specific pathways used by the influenza virus, pointing to IRAK1, BHLHE40 and TOLLIP as potential regulators of influenza virus pathogenicity, and to study the signalling pathways that play a role in Alzheimer's disease, identifying a pathway involving the altered phosphorylation of the Tau protein. Finally, we provide the annotated human PPI network via a web frontend that allows the construction of context-specific networks in several ways.
Protein-protein-interactions (PPIs) participate in virtually all biological processes. However, the PPI map is not static but the pairs of proteins that interact depends on the type of cell, the subcellular localization and modifications of the participating proteins, among many other factors. Therefore, it is important to understand the specific conditions under which a PPI happens. Unfortunately, experimental methods often do not provide this information or, even worse, measure PPIs under artificial conditions not found in biological systems. We developed a method to infer this missing information from properties of the interacting proteins, such as in which cell types the proteins are found, which functions they fulfill and whether they are known to play a role in disease. We show that PPIs for which we can infer conditions under which they happen have a higher experimental reliability. Also, our inference agrees well with known pathways and disease proteins. Since diseases usually affect specific cell types, we study PPI networks of influenza proteins in lung tissues and of Alzheimer's disease proteins in neural tissues. In both cases, we can highlight interesting interactions potentially playing a role in disease progression.
The advent of high-throughput techniques to measure and perturb molecular species in a systematic way has enabled researchers to assess the different layers of cellular metabolism under different experimental conditions. Protein-protein interaction (PPI) networks created by a variety of methods including yeast-two-hybrid (Y2H), mass-spectrometry (MS) and computational predictions
Several attempts have been made to investigate the tissue-specific binding behavior of single proteins and the spatio-temporal dynamics of PPI networks
In addition, many proteins have multiple functions, carried out in cooperation with distinct sets of interacting partners. Networks of interacting proteins with coherent function have been termed context networks
There is a lack of studies testing systematically the potential of adding context information to PPI networks in recovering meaningful PPI subsets and, although there are a few approaches that allow to add expression or functional information to PPI data
Here, we introduce an approach to add context to PPI networks using annotations and relations between the interacting partners and demonstrate that context-specific PPI networks are enriched in high-confidence interactions. We use this approach to investigate how the proteins of the human influenza virus interfere with the immune response of the host cell in a tissue-specific manner, finding novel potential regulators of influenza virus pathogenicity, and to study the brain-specific signaling pathways that play a role in Alzheimer's disease, identifying a pathway involving the altered phosphorylation of the Tau protein. Thereby, we illustrate how the addition of context to PPI networks can guide researchers in the discovery of meaningful interactions and pathways, which would otherwise be obscured by the vast amount of irrelevant (for a specific question) and partly erroneous amount of PPI data.
Our approach to add context-specific information to human PPI data was implemented in the HIPPIE database
Individual proteins were associated with tissues, subcellular locations and biological processes in the following manner. First, proteins were associated with tissues (based on their gene expression profiles retrieved from BioGPS
We associated an interaction with a tissue when both interactors are expressed in the same tissue (e.g. “lung”). Given a term of a functional ontology, we associated an interaction with this function when both interactors are annotated with either the given functional term or with children of it in the hierarchy of the ontology. For example, the GO term “transport” would be associated with an interaction between a protein annotated as involved in “vacuolar transport” and another protein annotated as involved in “nucleocytoplasmic transport”. Functional terms considered were either GO terms or MeSH terms. We excluded the rather unspecific top-level terms ‘biological process’, ‘cellular component’ and ‘cell’. Additionally, we ignored categories that are associated to less than 20 interactions.
Our approach includes a method to infer directed PPIs. This inference of interaction (edge) directionality needs sets of proteins predefined as sinks and sources. As default sources and sinks, we connected all proteins annotated with the GO terms ‘receptor’ and ‘sequence-specific DNA binding transcription factor activity’, respectively, in the UniprotKB
For the evaluation of the influenza virus host factor network generation we performed pathway enrichment analysis with ConsensusPathDB (run on August 30, 2012;
We retrieved the preprocessed microarray data described in
To generate a list of PPIs related to Alzheimer's and protein phosphorylation, first, we used the webserver MedlineRanker
We inferred context information for all interactions in the human PPI database HIPPIE
By assuming that a large fraction of signaling events transmits information from proteins sensing environmental changes to effector proteins altering the cellular state, we computed shortest paths from membrane-bound receptors to transcription factors (TF) through the network. From the predicted information flow we assigned edge directionality to interactions on these paths (see
Overall, we were able to associate context to more than 97,000 of the 101,131 interactions of the current version of HIPPIE. Interactions for which we inferred or collected annotations had significantly better experimental evidence (
(
As expected, we observed that more specific context categories were associated to interactions with higher experimental reliability: while the confidence scores of interactions with rather unspecific and ubiquitous terms resemble the overall confidence score distribution, interactions with highly specific terms usually have a higher than average confidence score (
To demonstrate that our automated context association approach allows identification of relevant interactions, we tested if networks of interactions of our inferred MESH-based disease-annotation are enriched in well-known disease proteins. Therefore, we repeatedly generated disease-context networks around a set of canonical disease proteins. As a canonical disease protein specification, we retrieved the manually curated UniProt Knowledgebase disease protein annotation. For each of the canonical disease proteins, we generated two types of networks: (a) disease networks consisting only of interaction partners of the disease proteins that we had associated with the equivalent MeSH disease term and (b) unfiltered PPI network consisting of all interaction partners of the disease protein from HIPPIE. We did this for all disease proteins where the disease was associated with at least two disease proteins in UniProt and at least two interactions that we had associated with this disease. To quantify the enrichment of disease proteins in these networks we repeatedly calculated the F1 score, the harmonic mean of precision and recall (F1 = 2*precision*recall/(precision+recall)). A one-sided Mann-Whitney-test comparing the distribution of F1 scores between the disease networks and the non-filtered networks indicated that the F1 scores for the disease networks were significantly larger (p<0.05) proving an enrichment of disease proteins in the disease filtered networks (without losing sensitivity by removing disease proteins in the filtering step). The mean precision on the filtered networks was 0.47 and on the unfiltered networks 0.21. The mean recall for the filtered networks was 0.14 and for the unfiltered networks 0.15. This illustrates that in exchange for a small decrease in recall the precision can be more than doubled by applying the MeSH disease filter.
We then investigated the potential of edge directionality inference based on the shortest paths between membrane-bound receptors and TFs through the PPI network to recover known pathways. We retrieved pathway annotations (extracted from WikiPathways download March 29, 2012) and computed the shortest paths through HIPPIE between all pairs of receptors and TFs within the same pathway (excluding only pairs that directly interact or could not be connected by any path). We counted the number of proteins of each pathway found on the shortest paths. We found for 3163 of the 5063 pairs that this approach correctly identified proteins of the selected pathway. The mean precision (the fraction of proteins on the paths that indeed belonged to the correct pathway) over all combinations of receptors with transcription factors was 0.20. The mean recall (the fraction of the pathway that was recovered by considering the paths between one receptor and one transcription factor) was 0.02.
To assess if the agreement between shortest paths and canonical pathways was larger than expected by chance, we generated a background distribution by computing repeatedly the shortest paths between a receptor and a TF from different pathways and computed the overlap between the proteins on the shortest paths to either the TF- or the receptor-containing pathway. We found that the overlap distribution was significantly higher when the receptor and the TF were members of the same pathway (p<0.001; Mann-Whitney-test) proving the potential of shortest paths to recover the signal flow between TFs and receptors when functionally related pairs of receptors and transcription factors are chosen.
We wondered if we could further increase the overlap between the shortest paths and the canonical pathways by filtering the networks for tissue expression. To associate pathways with tissues, we determined for each pathway which tissues were enriched among the genes of the pathway (Supplementary
We repeated the computation of shortest paths linking receptors to transcription factors in tissue-specific networks for combinations of pathways and tissues listed in Supplementary
To further investigate if the described context-associations can help to extract pathway information from networks, we compared the frequency of protein pairs being member of the same pathway (as defined by WikiPathways) among tissue-specific PPIs (both proteins where required to be co-expressed in at least one tissue) and compared this frequency to PPIs between proteins that are not expressed in the same tissue. We observed that interacting protein pairs that are expressed in the same tissue are indeed more likely to be in the same pathway as compared to interacting protein pairs that are expressed in disjoint sets of tissues (p<0.001). This, again, demonstrates that the annotations have captured properties related to pathways and suggests that the filtering helps revealing pathway information.
In the next sections we use the context-associated PPI network to obtain novel insights into the mechanisms of human disease: we perform a targeted study of the PPI network surrounding the human proteins that interact with influenza virus proteins to find potential regulators of viral pathogenicity, and we explore the question of whether and how altered protein phosphorylation might be a cause of Alzheimer's disease.
We analyzed PPI data of human proteins that interact with influenza virus proteins. Influenza viruses infect bronchial epithelial tissue and many cell types in the lung, sometimes resulting in viral pneumonia
(
Next, we identified known pathways enriched in the BET- and lung-specific PPI subnetworks, and found both similarities and differences in the cellular functions of each (see
Cells respond to influenza infection by producing cytokines and chemokines
A recent study demonstrated that signaling through the IL-1 receptor has a protective effect in mice infected with the pandemic 1918 influenza virus
Next, we aimed to predict more specific novel interference mechanisms by constructing directed and tissue-specific protein networks linking the viral proteins with proteins whose corresponding transcript was up-regulated after influenza virus infection. We selected steadily up-regulated transcripts from a microarray experiment measuring gene expression changes over time in a lung epithelial cell line infected with a 2009 pandemic H1N1 virus
We constructed BET- and lung-specific networks connecting the viral proteins with the 228 up-regulated factors by shortest paths. From the shortest paths we assigned directions to the edges on these paths. The directed networks consisted of 577 (BET) and 1056 (lung) PPIs. To examine if these networks might reveal relevant information on how viral proteins interfere with the cellular immune response, we tested for enrichment of known pathways in the directed networks. We found that the directed networks were strongly enriched in immune response-related pathways (especially cytokine-related) even after excluding the 228 up-regulated transcripts, indicating that enrichment was independent of the high fraction of immune response factors in the transcriptomics data (Supplementary
To mine the directed networks for interactions that are involved in interference mechanisms of the viral proteins with the cellular immune response, we concentrated, again, on layer one and two host factor proteins on the shortest paths. From the list of curated pathways enriched in both the BET and the lung directed networks (Supplementary
Close inspection of these comprehensive cytokine-related networks in both BET and lung revealed several points of potential viral protein-mediated interference with inflammatory pathways (
As in BET, lung-specific cytokine-related networks revealed that influenza virus proteins interface with TOLLIP (
Assuming no prior expert knowledge on a given topic, we applied a systematic protocol which can, in principle, be used to interrogate the PPI network about the involvement of protein interactions in a complex biological question according to current knowledge. In general, altered states of protein phosphorylation affect the PPI network and can lead to pathogenesis. Our goal in this example was to investigate the possible role of protein phosphorylation in Alzheimer's disease (AD), the most common form of dementia. AD is a degenerative disease manifesting in the brain, and its cause has been hypothesized to be the formation of protein aggregates leading to neuron death, in particular related to the abnormal phosphorylation of the microtubule-associated protein tau
First, we need to input a list of proteins related to the topic. Using a literature mining protocol (see
The flowchart illustrates the input terms and options used to generate a topic-focused PPI subnetwork. Eight genes were selected as a result of an unbiased literature mining query for proteins related to Alzheimer's disease (AD) and phosphorylation (see main text for details). The PPI network of first neighbours of these genes in HIPPIE was generated. Then, filters were applied to focus on a PPI subnetwork or proteins expressed in the brain and related to cell death, thus relevant to AD.
The initial PPI network contained 727 interactions (
A PPI network was generated as explained in
Within the resulting network, we highlighted the following path (
The incorporation of tissue-specific expression information to create PPI subnetworks is a useful method to elucidate biological processes that cannot be observed when using the complete PPI network. Here we have shown an approach for the inference of associated context for PPIs based on the annotations of the interacting partners, which enhances the relevance of the annotated interactions. Interactions between proteins expressed in the same location (e.g. lung) or at the same time or developmental stage (e.g. embryo development) can then be selected. Directed pathways can be inferred and highlighted in the filtered network according to sets of sources and sinks corresponding to receptors and transcription factors. Using this approach we were able to identify novel, tissue-specific interactions between influenza virus proteins and cellular inflammatory signaling pathways that may regulate pathogenesis associated with infection, and to describe a brain-specific protein phosphorylation pathway relevant for Alzheimer's disease.
Several methods exist to create subnetworks of the human interactome based on context criteria. For example, POINeT
In summary, we have presented and made available an approach to associate context to PPI networks, which provides novel biological insight into mechanisms of disease. The continuing generation of PPI data and further incorporation into databases, and an increasing quality of annotations attached to genes and proteins will result in further improvements of our methodology.
Network of first and second layer host factors (
(ZIP)
Directed BET and lung specific networks connecting first layer viral interactors with upregulated host proteins in Cytoscape format. In the directed network, sources and sinks are color encoded (viral are red and upregulated proteins brown). Cytokine-related proteins are shown as circles.
(ZIP)
Tissues more than two-fold enriched among proteins in pathways.
(XLS)
Pathways enriched in first and second layer influenza host factor networks.
(XLS)
Pathways enriched among directed networks connecting viral proteins with gene products upregulated upon influenza infection.
(XLS)
Comprehensive BET and lung PPI networks connecting viral proteins with cytokine-related second layer proteins on shortest paths between viral proteins and gene products upregulated upon influenza infection.
(XLS)