BAS and ARP analyzed the data and wrote the paper.
Benjamin A. Shoemaker and Anna R. Panchenko are with the Computational Biology Branch of the National Center for Biotechnology Information in Bethesda, Maryland, United States of America.
The authors have declared that no competing interests exist.
Proteins interact with each other in a highly specific manner, and protein interactions play a key role in many cellular processes; in particular, the distortion of protein interfaces may lead to the development of many diseases. To understand the mechanisms of protein recognition at the molecular level and to unravel the global picture of protein interactions in the cell, different experimental techniques have been developed. Some methods characterize individual protein interactions while others are advanced for screening interactions on a genome-wide scale. In this review we describe different experimental techniques of protein interaction identification together with various databases which attempt to classify the large array of experimental data. We discuss the main promises and pitfalls of different methods and present several approaches to verify and validate the diverse experimental data produced by high-throughput techniques.
It is now becoming clear that protein interactions determine the outcome of most cellular processes [
In many cellular processes, proteins recognize specific targets and bind them in a highly regular manner. The specificity of interactions in these cases is determined by structural and physico–chemical properties of two interacting proteins. As a result, there should be a certain degree of conservation in the interaction patterns between similar proteins and domains. Indeed, it has been found that close homologs almost always interact in the same way and protein–protein interactions place certain evolutionary constraints on protein sequence and structural divergence [
In this review and its companion review in the April issue [
Protein interactions can be analyzed by different genetic, biochemical, and physical methods, which are listed in
Different Experimental Methods Measuring Protein Interactions
(A) Y2H detects interactions between proteins X and Y, where X is linked to BD domain which binds to upstream activating sequence (UAS) of a promoter.
(B) MS identifies polypeptide sequence.
(C) TAP purifies protein complexes and removes the molecules of contaminants.
(D) Gene coexpression analysis produces the correlation matrix where the dark areas show high correlation between expression levels of corresponding genes.
(E) Protein microarrays (protein chips) can detect interactions between actual proteins rather than genes: target proteins immobilized on the solid support are probed with a fluorescently labeled protein.
(F) Synthetic lethality method describes the genetic interaction when two individual, nonlethal mutations result in lethality when administered together (a− b−).
The development of the Y2H technique has considerably accelerated the screening of protein interactions in vivo. Y2H is based on the fact that many eukaryotic transcription activators have at least two distinct domains, one that directs binding to a promoter DNA sequence (BD) and another that activates transcription (AD) (
For screening entire genomes, the Y2H method has been advanced into two main approaches [
In the
In the
The small overlap between Y2H experiments can be explained by different factors, among them: differences in protein interaction sampling, Y2H bias towards nonspecific interactions [
MS is a powerful method of studying macromolecular interactions in vitro. The principle of the MS method is to produce ions which can be detected based on their mass-to-charge ratios, thereby allowing the identification of polypeptide sequences [
A TAP tag consists of two IgG binding domains of
Several large-scale studies of protein complexes have been performed using TAP–MS and Y2H methods [
Since the function of a protein complex depends on the functionality of all subunits, subunits should be present in stoichiometric amounts and gene expression levels of subunits in a complex should be related. Gene expression profiles can be provided, for example, from cell cycle experiments and expression levels of a gene under different conditions. Expression profile similarity can be calculated as a correlation coefficient between relative expression levels of two genes/proteins or the normalized difference between their absolute expression levels or calculated using other methods [
It is not very well-understood how genetic variation influences phenotype and how genes interact with each other producing different phenotypes in different strains of the same species [
The most detailed information about protein interaction interfaces at the atomic level can be provided by X-ray crystallography and NMR spectroscopy, but the number of solved protein complexes remains low [
The fast development of experimental techniques for protein interactions has enabled the construction and systematic analysis of interaction networks [
Validation of protein interaction data is difficult; except for small datasets on protein interactions provided by the Protein Data Bank (PDB) [
A large variety of databases exists to study binary protein interactions and the higher order interactions in protein complexes. A summary of some available databases is given in
Databases Available for Searching and/or Downloading Data Related to Protein Interactions
URLs and Primary Citations for Protein Interaction–Related Databases
In spite of the interaction data diversity, there exist considerable overlaps in the datasets contained in the databases, making it difficult to recommend a single resource for a particular type of information. In one effort to deal with this redundancy, the International Molecular Exchange Consortium (IMEx) has been formed in which databases agree to share their data in a consistent and timely fashion (
The Database of Interacting Proteins (DIP) contains experimentally determined protein interactions and includes a core subset of interactions that have passed a quality assessment [
The Biomolecular Interaction Network Database (BIND) includes high-throughput experimental datasets and protein complexes from PDB [
MPact is a resource to access MIPS, which contains a manually curated yeast protein interaction dataset [
PIBASE is a database of domain interactions from the protein structure data [
3did allows one to explore the details of domain interactions from protein structure data (yeast interactions are also included) [
The Conserved Binding Mode (CBM) database is a collection of domain interactions from the structure data where domains are defined by the Conserved Domain Database [
Domain Interaction Map (DIMA) database is a domain interaction map derived from phylogenetic profiling Pfam domains [
In this paper we have reviewed a wide spectrum of experimental techniques for identifying and characterizing protein interactions; each technique can provide a piece in the puzzle of mechanisms of protein recognition [
The authors thank Lewis Geer for helpful discussions and Robert Yates for graphic design of the figures. This work was supported by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health of the US Department of Health and Human Services.
domain that activates transcription
domain that directs binding to a promoter DNA sequence
Biomolecular Interaction Network Database
Conserved Binding Mode database
database of interacting proteins
mass spectroscopy
tandem affinity purification
yeast two-hybrid