Conceived and designed the experiments: AC CLEZ MLD VR JNH AZ. Performed the experiments: AC CLEZ MLD VR AZ. Analyzed the data: AC CLEZ MLD VR JNH AZ. Contributed reagents/materials/analysis tools: JNH AZ. Wrote the paper: CLEZ AZ. Acquired funding: CLEZ JNH AZ.
The authors have declared that no competing interests exist.
Here we introduce a quantitative structure-driven computational domain-fusion
method, which we used to predict the structures of proteins believed to be
involved in regulation of the subtilin pathway in
Because proteins so frequently function in coordination with other proteins, identification and characterization of the interactions among proteins are essential for understanding how proteins work. Computational methods for identification of protein-protein interactions have been limited by the degree to which proteins are similar in sequence. However, methods that leverage structure information can overcome this limitation of sequence-based methods; the three-dimensional information provided by structure enables identification of related proteins even when their sequences are dissimilar. In this work we present a quantitative method for identification of protein interacting partners, and we demonstrate its use in modeling the structure of a hypothetical complex between two proteins that function in a bacterial signaling system. This quantitative approach comprises a tool for generation of hypotheses regarding protein function, which can then be tested using empirical methods, and provides a basis for high-throughput prediction of protein-protein interactions, which could be applied on a whole-genome scale.
Because proteins so frequently function in coordination with other proteins,
identification and characterization of protein-protein complexes are essential
aspects of protein sequence annotation and function determination
We describe the application of a quantitative structure-based comparison method to
the identification of putative protein-protein interactions, and show that this
approach increases sensitivity in detecting putative interactions at low
(<20%) levels of sequence identity, based on the general principle
that structure homology is more highly conserved in evolution than is sequence
homology
To explore this approach, we selected as the subject of our study a protein-protein
interaction that is representative of a common class of biological control systems,
known as the two-component signal transduction system
SpaK (gi: 6226707, Uniprot P33113) and SpaR (gi: 417799, Uniprot P33112) protein
sequences were input to the AS2TS protein structure modeling system (
As no suitable template for the N-terminal domain (218 residues) of SpaK was
identified, this domain was not modeled. Based on match length (227 residues),
e-value (4e-57), and sequence identity (28%), PDB entry 2c2a_A, a
sensor histidine kinase from
Modeled region: 219–459. The 218-residue long N-terminal
membrane spanning region (residues 1–218) was not modeled. A:
Model of the oligomeric state: homodimer. Coloring scheme reflects in
each modeled monomer a consecutive ordering of amino acids in the
N-to-C-terminal direction, whereby N-most residues are colored blue and
C-most residues are red. Blue-cyan (residues 219–300): central
four-helix bundle formed by interaction of 2 helixes from each monomer;
Green-red (residues 301–459): C-terminal ATPase-c domain. The
labels H247 and G392 show the location of two residues that were changed
using site-directed mutagenesis to construct mutants for the
phosphorylation studies (see
Similarly, SpaR was modeled as two separate domains, comprising residues SpaR_d1:
1–117 and SpaR_d2: 118–220. The N-terminal domain was
initially modeled based on the structural template 1mvo_A (crystal structure of
the PhoP receiver domain from
Modeling of the N-terminal domain was based on PDB template 1mvo_A, and
the C-terminal domain was based on PDB template 2gwr_A. The conformation
between domains was modeled based on 2gwr (response regulator protein
MTRA from
The LGA software (
Template - SpaK/R |
N1 |
N2 |
N |
RMSD |
Seq_ID |
LGA_S |
1f51_A - SpaK_d2 | 181 | 159 | 104 | 2.56 | 6.73 | 42.32 |
1f51_E - SpaR_d1 | 119 | 117 | 116 | 1.41 | 25 | 93.11 |
2ftk_A - SpaK_d2 | 181 | 159 | 106 | 2.58 | 6.6 | 42.86 |
2ftk_E - SpaR_d1 | 119 | 117 | 116 | 1.11 | 24.14 | 95.71 |
1th8_A - SpaK_d2 | 132 | 159 | 95 | 2.34 | 17.89 | 42.99 |
1th8_B - SpaR_d1 | 115 | 117 | 76 | 2.71 | 7.89 | 39.3 |
1thn_A - SpaK_d2 | 136 | 159 | 99 | 2.23 | 17.17 | 45.15 |
1thn_B - SpaR_d1 | 114 | 117 | 75 | 2.75 | 6.67 | 38.68 |
1tid_A - SpaK_d2 | 136 | 159 | 98 | 2.23 | 17.35 | 44.47 |
1tid_B - SpaR_d1 | 119 | 117 | 76 | 2.88 | 6.58 | 38.52 |
1til_A - SpaK_d2 | 141 | 159 | 101 | 2.19 | 16.83 | 45.47 |
1til_B - SpaR_d1 | 117 | 117 | 71 | 2.96 | 4.23 | 37.11 |
The domains from the structure models of SpaK and SpaR were compared with all structures from PDB. Listed are those domain-fusion templates for which at least one domain from each of SpaK and SpaR had structure similarity LGA_S> = 35%.
The residue ranges in modeled SpaK domains are: SpaK_d1: 219–300 and SpaK_d2: 301–459, and the residue ranges in modeled SpaR domains are: SpaR_d1: 1–117 and SpaR_d2: 118–220.
N1 denotes a number of residues in the structural domain-fusion template.
N2 denotes the number of residues in the corresponding domain from SpaK or SpaR.
N denotes the number of superimposed C-alpha atoms that fit under a distance of 5.0 Angstroms.
RMSD is the root mean square deviation of N corresponding C-alpha atom pairs from the calculated structural alignment.
Seq_ID denotes the sequence identity in % between the domain-fusion template and the corresponding SpaK or SpaR domain calculated from the structural alignment.
LGA_S is a measure of the level of structure similarity
Domains from the structural models of SpaK and SpaR were compared with all structures from PDB. Listed are the domain-fusion templates that for at least one domain from the SpaK or SpaR model had a level of structure similarity LGA_S above 37%. LGA_S scores are reported for alignments between each modeled domain of SpaK or SpaR and a domain-fusion template domain. The residue ranges in modeled SpaK domains were: SpaK_d1: 219–300 and SpaK_d2: 301–459, and the residue ranges in modeled SpaR domains were: SpaR_d1: 1–117 and SpaR_d2: 118–220.
The
Mutant SpaK proteins were prepared by Ana-Gen Technologies (Palo Alto, CA) using
the Stratagene QuikChange Mutagenesis Kit. Synthetic forward and corresponding
reverse complement oligonucleotide primers were prepared for each of two
mutations introduced into SpaK (altered nucleotides are indicated in bold type):
at position H247 the histidine was changed to glutamine using forward primer
Phosphorylation reactions were performed with each histidine-tagged SpaK wild type and mutant protein in the absence and presence of histidine-tagged SpaR. Upon addition of 32P-labeled ATP, reaction mixtures were incubated for 20 minutes at room temperature, after which the reactions were stopped by addition of 5× phosphorylation sample buffer, then electrophoresed on a 12.5% SDS polyacrylamide gel. The gel was stained with Coomassie blue, dried, and autoradiographed using Kodak X-OMAT AR film.
Phosphorimage analysis was performed to quantify incorporation and turnover of
phosphate in assays involving phosphorylation of 6xHis-SpaK. Four samples of
protein were incubated in the presence of 32P-labeled ATP, of which three were
followed by cold chase treatment with unlabeled 4 mM, 10 mM, or 50 mM ATP, using
reaction conditions described previously
Thin-layer chromatography was performed using Polygram Cell 300 PEI cellulose
plates as described previously
The AS2TS protein structure modeling system
A: Model is based on the A and E chains of SPO0B, a phosphotransferase, complexed with SPO0F, a beryllofluoride (PDB template 2ftk). Blue, red: monomers of SpaK; Green: SpaR. B: Close up view of interacting residues (SpaK: H247; SpaR: D8, D9, D51; shown as stick) believed to mediate transfer of phosphate group from SpaK to SpaR.
Inspection of the constructed SpaK/SpaR complex (
A | |||||
2ftk_A | SpaK | ||||
Res |
ResName |
Res | ResName | Distance |
RMSD(3) |
R | 29_A | A | 246_A | 0.508 | 0.14 |
|
30_A |
|
247_A | 0.565 | 0.236 |
D | 31_A | E | 248_A | 0.644 | 0.203 |
B | |||||
2ftk_E | SpaR | ||||
Res | ResName | Res | ResName | Distance | RMSD(3) |
V | 1209_E | V | 7_E | 0.366 | 0.14 |
|
1210_E |
|
8_E | 0.433 | 0.191 |
|
1211_E |
|
9_E | 0.684 | 0.223 |
Q | 1212_E | E | 10_E | 0.797 | 0.244 |
L | 1253_E | L | 50_E | 0.277 | 0.221 |
|
1254_E |
|
51_E | 0.561 | 0.205 |
M | 1255_E | V | 52_E | 0.731 | 0.178 |
M | 1281_E | L | 77_E | 0.602 | 0.285 |
|
1282_E |
|
78_E | 0.78 | 0.398 |
A | 1283_E | A | 79_E | 0.927 | 0.781 |
T | 1300_E | D | 96_E | 1.276 | 0.417 |
|
1301_E |
|
97_E | 0.737 | 0.367 |
F | 1302_E | I | 98_E | 0.799 | 0.199 |
A | 1303_E | T | 99_E | 0.832 | 0.27 |
|
1304_E |
|
100_E | 0.474 | 0.413 |
P | 1305_E | P | 101_E | 0.366 | 0.509 |
Residue.
Residue name in PDB or model file.
Distance between C-alpha carbons (under global superposition).
RMSD(3): Root mean square deviation along the mainchain atoms (N,CA,C,O) averaged over three residues: current and immediate neighbors along peptide chain (local superposition).
X – aspartic acid (ASP) modified to aspartate beryllium trifluoride (BFD).
2ftk_A corresponds to Spo0B, and 2ftk_E corresponds to Spo0F. Letters in bold represent corresponding functional residues. Neighboring residues within 1 position of functional residues are included in order to provide a sequence-structure context in which highlighted residues were located. A) Residue-residue correspondences between histidine phosphorylation site and neighboring residues of 2ftk chain A and those of SpaK. B) Residue-residue correspondences between regions containing 6 functional residues of 2ftk chain E and SpaR.
In most histidine kinases the extracellular sensing domains are variable in
sequence, reflecting the wide range of environmental signals to which they
respond. Conversely, the cytoplasmic portions typically have a conserved
catalytic core comprising a set of characteristic sequence motifs known as the
H, N, G1, F and G2 boxes
To examine sequence homology in structure context between SpaK and various
histidine kinases in the 5 “box” regions, we used LGA to
globally align the SpaK homology model with all other histidine kinases from PDB
that have these structure motifs. Structures with corresponding
“box” regions included 2ftk_A, 1tid_A, 1b3q_A, and 2ch4_A.
In
|
|||||
|
|
|
|
|
|
S | 28_A | L | 245_A | 0.411 | 0.076 |
R | 29_A | A | 246_A | 0.532 | 0.071 |
|
30_A |
|
247_A | 0.597 | 0.149 |
D | 31_A | E | 248_A | 0.668 | 0.119 |
W | 32_A | I | 249_A | 0.949 | 0.064 |
M | 33_A | K | 250_A | 1.52 | 0.329 |
N | 34_A | I | 251_A | 1.505 | 0.044 |
K | 35_A | P | 252_A | 1.523 | 0.207 |
L | 36_A | I | 253_A | 1.299 | 0.106 |
Q | 37_A | T | 254_A | 1.22 | 0.265 |
|
|||||
|
|
|
|
|
|
L | 403_A | L | 356_A | 0.48 | 0.172 |
L | 404_A | L | 357_A | 0.67 | 0.163 |
H | 405_A | N | 358_A | 0.716 | 0.183 |
L | 406_A | I | 359_A | 0.512 | 0.159 |
L | 407_A | L | 360_A | 0.334 | 0.271 |
R | 408_A | T | 361_A | 0.564 | 0.289 |
|
409_A |
|
362_A | 0.558 | 0.277 |
A | 410_A | A | 363_A | 0.623 | 0.202 |
I | 411_A | V | 364_A | 0.615 | 0.33 |
|
|||||
|
|
|
|
|
|
E | 446_A | F | 387_A | 0.898 | 0.169 |
V | 447_A | V | 388_A | 0.354 | 0.13 |
E | 448_A | K | 389_A | 0.134 | 0.18 |
|
449_A |
|
390_A | 0.803 | 0.202 |
D | 450_A | T | 391_A | 0.595 | 0.323 |
|
451_A |
|
392_A | 1.041 | 0.321 |
R | 452_A | N | 393_A | 0.862 | 0.322 |
|
453_A |
|
394_A | 0.758 | 0.62 |
I | 454_A | F | 395_A | 0.989 | 0.982 |
D | 455_A | S | 396_A | 2.154 | 0.845 |
|
|||||
|
|
|
|
|
|
L | 483_A | L | 400_A | 0.819 | 2.499 |
N | 484_A | K | 401_A | 1.193 | 0.703 |
F | 485_A | K | 402_A | 1.008 | 0.233 |
L | 486_A | A | 403_A | 0.84 | 0.306 |
|
487_A | T | 404_A | 0.987 | 0.45 |
V | 488_A | E | 405_A | 1.894 | 0.474 |
P | 489_A | L | 406_A | 2.433 | 0.365 |
G | 490_A |
|
407_A | 2.514 | 0.611 |
|
491_A | Y | 408_A | 2.078 | 0.773 |
|
|||||
|
|
|
|
|
|
S | 501_A | G | 418_A | 3.312 | 1.066 |
G | 502_A | H | 419_A | 0.966 | 1.007 |
R | 503_A | Y | 420_A | 2.398 | 1.666 |
|
504_A |
|
421_A | 1.198 | 1.07 |
V | 505_A | M | 422_A | 3.453 | 1.131 |
|
506_A |
|
423_A | 0.755 | 1.293 |
M | 507_A | L | 424_A | 1.089 | 0.793 |
Comparisons are made in presumed functional “box”
motifs, the highly conserved sequences termed H, N, G1, F, and G2
boxes, characteristic of histidine kinases
To confirm whether SpaK undergoes auto-phosphorylation and subsequently transfers
a phosphate moiety to SpaR, each protein was tested individually and in
combination in the presence of radio-labeled ATP (
A, B: SDS-PAGE of 6xHis-SpaK and 6xHis-SpaR in isolation or in combination and at various mass ratios, in the presence of ATP. A: Coomassie blue staining. B: Autoradiography; lane a: molecular weight markers. C: Phosphorimage analysis of SpaK incubated with [g-32P]-ATP (lane 1) followed by addition of 4 mM (lane 2), 10 mM (lane3), or 50 mM non-labeled (cold) ATP. D: PEI cellulose thin-layer chromatography of 6xHis-SpaK in isolation, or in combination with 6xHis-SpaR with and without EDTA.
Quantification of radio-labeled phosphate-bound 6xHis-SpaK was performed to
determine whether SpaK might exhibit phosphatase activity (
Thin-layer chromatography was performed to further examine the possibility that
either SpaK or SpaR may exhibit phosphatase activity (
Based on amino acid sequence alignment with other histidine kinases, the highly
conserved histidine at position H247 was presumed to be the site of possible
auto-phosphorylation, and a glycine located at position G392 in the C-terminal
end of SpaK was determined to correspond to the conserved DXG motif of the
nucleotide binding domain in related histidine kinases (
A, B: Polyacrylamide gel electrophoresis of 6xHis-SpaR and 6xHis-SpaK wild type or mutants in isolation or in combination, in the presence of ATP. Lanes 1, 7: molecular weight markers. A: Coomassie blue staining. B: Autoradiography. Mutant1: H247Q, Mutant 2: G392A.
In this work we demonstrated a quantitative approach for modeling protein-protein
complexes using homology modeling followed by structure-based searches for
multi-domain template proteins. In a search for templates upon which to base the
model of a putative SpaK/SpaR complex, we used LGA, which applies two scoring
schemes: GDT (global distance test) and LCS (longest continuous segment). Based on a
previous study involving structure alignments between weakly homologous proteins
Although our approach can be used to identify domain-fusion protein structures that
imply a possible functional association between two proteins of interest, it does
not in itself provide sufficient information for modeling a physical interaction
between the proteins. Protein domains that have less than
30–40% sequence homology to a
“domain-fusion” template are likely to assume a similar
orientation
Our modeling effort supported the hypothesis that SpaK and SpaR may function as a
histidine kinase sensor and a response regulator, respectively, in a two-component
system. Based on homology modeling and domain-fusion analysis, residues
corresponding to those believed to function in phosphorylation and subsequent
transfer of a phosphate moiety from sensor to response regulator in other
two-component systems were identified (
Phosphorylation studies of SpaK and SpaR showed that SpaK auto-phosphorylates and
subsequently trans-phosphorylates SpaR (
In modeling the interaction between SpaK and SpaR we identified 6 suitable
domain-fusion templates (
Although structure modeling and experiments involving phophorylation studies strongly
suggest functional and physical interactions between SpaK and SpaR, we cannot be
entirely certain that our quaternary structure is correct with respect to domain
composition, conformation, or orientation, as the methodology is dependent on
existing structural data within PDB; it is possible that none of the domain-fusion
templates detected by our approach is truly representative of the physical
interaction between SpaK and SpaR, as homology modeling is, by definition, data
driven. Due to the low sequence homologies between SpaK and SpaR and the identified
domain-fusion templates, one could not conclude with any degree of certainty based
solely on template identification that the interaction pose modeled here is likely
to be correct
Although many two-component signal transduction systems have been identified by
sequence homology, we wish to point out that a purely sequence-based approach would
not have yielded the structural domain-fusion templates that were identified in this
study. The strength of our approach is in its ability to identify putative
domain-fusion templates based on structure homology searches in cases where sequence
identities between the proteins of interest and the putative domain-fusion templates
are low. Sequence identities of candidate domain-fusion templates to domains of SpaK
and SpaR ranged from 4% to 25%, but in no instance was
sequence identity greater than 7% simultaneously to both (
Construction of vectors for expression of SpaK and SpaR proteins. A) Expression vectors pQE-31-spaK. B) pQE-31-spaR.
(0.38 MB TIF)
Candidate templates for homology modeling of SpaK monomer.
(0.06 MB PDF)
Candidate templates for homology modeling of SpaR.
(0.06 MB PDF)