ࡱ >
` 0 : bjbj 8 R2 E F F F F F F F Z x x x 8 x >y t Z n { } } } } ~ : \ A 0 M O O O ` O $ h V F q ~ ~ q q F F } } ( q q q q F } F } q q M q q F F q } { T+ x U q i T > 0 n q 0 I 0 0 F q q q q q q q q X q q q n q q q q Z Z Z 4 G d1 Z Z Z G Z Z Z F F F F F F Supporting online material
TOC for Supporting Online Material
TOC \o "1-3" \h \z \u HYPERLINK \l "_Toc168657401" TOC for Supporting Online Material PAGEREF _Toc168657401 \h 1
HYPERLINK \l "_Toc168657402" Synopsis for Supporting Online Material PAGEREF _Toc168657402 \h 1
HYPERLINK \l "_Toc168657403" Figures for Supporting Online Material PAGEREF _Toc168657403 \h 3
HYPERLINK \l "_Toc168657404" Fig. S1 PAGEREF _Toc168657404 \h 3
HYPERLINK \l "_Toc168657405" Fig. S2 PAGEREF _Toc168657405 \h 4
HYPERLINK \l "_Toc168657406" Fig. S3 PAGEREF _Toc168657406 \h 5
HYPERLINK \l "_Toc168657407" Fig. S4 PAGEREF _Toc168657407 \h 6
HYPERLINK \l "_Toc168657408" Tables PAGEREF _Toc168657408 \h 7
HYPERLINK \l "_Toc168657409" Table S1 PAGEREF _Toc168657409 \h 7
HYPERLINK \l "_Toc168657410" Table S2 PAGEREF _Toc168657410 \h 9
HYPERLINK \l "_Toc168657411" References for Supporting Online Material PAGEREF _Toc168657411 \h 10
Synopsis for Supporting Online Material
Many methods that are optimized to predict natively unstructured regions in proteins are trained and tested on residues that are missing from X-ray structures. It has been shown that residues in these regions are similar in amino acid composition to flexible structured loops ADDIN EN.CITE Radivojac2004272717Radivojac, P.Obradovic, Z.Smith, D.K.Zhu, G.Vucetic, S.Brown, C.J.Lawson, J.D.Dunker, A.K.Protein flexibility and intrinsic disorderProtein Science71-80132004(1). Therefore, methods using this approach cannot always distinguish between structured and unstructured loops.
Here, we show one example in which the secondary structure prediction by PSIPRED ADDIN EN.CITE McGuffin200012112117McGuffin, L. J.Bryson, K.Jones, D. T.Protein Bioinformatics Group, Department of Biological Sciences, University of Warwick, Coventry CV4 7AL, UK.The PSIPRED protein structure prediction serverBioinformaticsBioinformatics404-5164Protein FoldingProtein Structure, SecondaryProteins/*chemistry/metabolism*Software2000Apr10869041http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10869041 (2) (Fig. S1) is highly correlated with DISOPRED2 ADDIN EN.CITE Ward2004434317Ward, J. J.Sodhi, J. S.McGuffin, L. J.Buxton, B. F.Jones, D. T.Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK.Prediction and functional analysis of native disorder in proteins from the three kingdoms of lifeJournal of Molecular Biology635-64533732004Mar 2615019783http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15019783(3) output (Fig. 5A in main text); the locations of the peaks of the prediction are correlated with the locations of the loops. NORSnet, however, is optimized to make the distinction between natively unstructured loops and structured loops (see Fig. 5B).
Furthermore, NORSnet captured the unstructured region in DFF45 in its stringent cutoff despite its enrichment in predicted secondary structure elements (Fig. S2)
Since many disorder predictors are based on different concepts, the predictors often predict different proteins to have unstructured regions (see Fig. 3,4,7). In Fig. S3 we show that both IUPred and NORSnet predict hub proteins to be rich in unstructured regions. Interestingly, each one of the methods reliably predicted different hubs to be unstructured.
Figures for Supporting Online Material
Fig. S1
Fig. S1: PSIPRED prediction for Kappa-casein precursor. The protein is predicted to have several long loops (residues 24-42, 89-125 and 130-171). Note that the location of the loops is correlated with high scores predicted by NORSnet and DISORPED2 that use this information.
Fig. S2
Fig. S2: Secondary structure predictions of the N-termini domains of DFF45. Despite the fact that the N-term domain of DFF45 is unstructured, PSIPRED predicts secondary structure elements within that region.
Fig. S3
Fig. S3: Unstructured regions over-represented in protein-protein hubs of worm. Similarly to Fig. 7, we ran IUPred on worm proteins that are involved in protein-protein interactions. NORSnet data is identical to the one presented in Fig. 7. The number of proteins that are predicted to be either unstructured or well-structured is plotted against the number of interacting partners for two different thresholds of reliability of the two methods: A+B were compiled for thresholds at which both methods maintained 100% accuracy for the NESG data (Fig. 4), while graphs C+D were compiled for 100% accuracy on DisProt (Fig. 3). A+C show the results for the number of proteins predicted in each bin of interaction partners, while B+D show the normalized ratios to zoom into the difference between unstructured and structured proteins in each bin. These ratios were compiled as Ratio(bin)={#unstructured(bin)/#structured(bin)} / {#unstructured(1)/#structured(1)}. As all ratios are above 1, proteins with more than one interaction partners have more unstructured regions than proteins with one partner. For the thresholds at which both methods achieved 100% accuracy on the DisProt dataset, both IUPred and NORSnet identified unstructured regions in 98 proteins that interact with seven partners or more. IUPred predicted 37 proteins with unstructured regions that NORSnet did not identify and NORSnet predicted 17 proteins with unstructured regions that IUPred had missed.
Fig. S4
Fig. S4: NORSnet captures domain boundaries. The domain boundaries of 524 multi-domain proteins were marked in a procedure described in Liu and Rost ADDIN EN.CITE Liu2004484817Liu, J.Rost, B.CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA. liu@cubic.bioc.columbia.eduSequence-based prediction of protein domainsNucleic Acids ResNucleic Acids Res3522-303212Computational BiologyGenomicsNeural Networks (Computer)*Protein Structure, TertiaryReproducibility of ResultsResearch Support, U.S. Gov't, P.H.S.Sequence Analysis, Protein/*methods200415240828http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15240828(4). Due to the fact that NORSnet is optimized to identify unstructured stretches that are longer than 30 (and SCOP domain boundaries are often shorter), we used the raw score by NORSnet rather than the filtered output. NORSnet did considerably better than random (in red) and yielded area under ROC-curve (AUC) 0.672 (in blue). Morever, according to our gold standard set, termini residues are never defined as domain borders. In NORSnet no term (in green), we treated NORSnet outputs of the 60 termini residues in each protein as negatives, assessing only NORSnet predictions for the middle of the chain. The new method was more accurate in distingushing domain boundaries from other residues (AUC=0.715).
Tables
Table S1
NumberNESG ID aSequence lengthDisorder signal b1AR2242107Largely2BhR21117Partly3CvR16205Partly4FR254163Largely5HR150679Largely6HR153862Largely7HR1821157Partly8HR1974120Largely9HR2078170Largely10HR2130173Largely11HR22487Largely12HR2299113Largely13HR36115Partly14HR876Largely15HR919208Largely16HR922154Largely17HR997189Largely18KR12231Largely19LmR11103Partly20MaR51125Partly21MhR2275Largely22MhR41206Partly23MrR47128Partly24PsR5176Largely25SR128193Partly26SmR362Largely27SpR562Largely28WR46193Partly29XR550Largely30YR8155Largely
Table S1: Dataset of unstructured proteins from NorthEast Structural Genomics Consortium
a NESG id referred to identifiers given by the NESG consortium.
b Disorder signal referred to different levels of signal of a protein to be unstructured from NMR experiments. Largely marked largely unstructured proteins, e.g., (i) their HSQC has high signal to noise and very low dispersion and (ii) their HetNOE data is clear negative; partly marked partly unstructured proteins, which have some local structure but overall obey the same criteria; 20 proteins were identified as largely unstructured and 10 proteins were identified as partly unstructured.
Table S2
1j0w_A1uw1_A1s5l_I1y7y_A1r8o_B1wpb_A1t6a_A1zeq_X1nng_A1v74_A1s5l_J1y96_A1rfx_A1wv8_A1t6s_A1zhh_B1nxh_A1v74_B1s5l_L1y9l_A1rh5_B1wz3_A1t71_A1zhq_A1ocs_A1vjq_A1s5l_M1ycy_A1rh5_C1x0p_A1t98_A1zlh_B1ogk_A1vk0_A1s5l_T1yfu_A1rhz_A1x6i_A1t9f_A1zoy_D1oj5_A1vk5_A1s5l_U1ygt_A1rk8_C1x7v_A1tlu_A1zpy_A1ojh_A1vrq_D1s5l_X1yhn_B1rli_A1x9z_A1ttw_A1zrl_A1p57_A1w0h_A1s5l_Z1yle_A1rlj_A1xg8_A1txy_A1zv1_A1pc6_A1w2c_A1s68_A1ylm_A1ro5_A1xiz_A1u14_A1zxu_A1pd3_A1w53_A1s7b_A1yln_A1roc_A1xk5_A1u4h_A1zz6_A1q7l_B1w8x_N1s7h_A1ylq_A1rpu_A1xl3_A1u5k_A2a13_A1q7s_A1w8x_P1s7i_A1ylx_A1rr7_A1xl3_C1u5t_C2a1j_A1q8b_A1w94_A1sbz_A1yn5_A1ryl_A1xpj_A1u7i_A2a1x_A1q8d_A1wdu_A1sfu_A1z0j_B1rzn_A1xu1_R1u84_A2a65_A1q9j_A1whz_A1sjw_A1z0p_A1s0y_B1xwr_A1ud0_A2a6q_A1qw2_A1wk2_A1sr4_A1z1a_A1s1h_J1xxo_A1ufi_A2a6q_E1qz8_A1wlf_A1sr4_C1z21_A1s1h_O1y0u_A1umh_A2amy_A1r0d_A1wlq_C1ssz_A1z2n_X1s1i_S1y12_A1urq_A2bem_A1r2m_A1wlz_A1swx_A1z3i_X1s1i_W1y5y_A1usd_A2bho_A1r4v_A1wmi_A1sz9_A1z67_A1s4k_A1y66_A1ut4_A2bjn_A1r5p_A1wmm_A1t0f_C1zc3_B1s5l_H1y7m_A1utx_A2blf_B1r8g_A1wnh_A1t0q_C1zcd_A2bw3_B
Table S2: PDB identifiers that were used as negative set in Fig. 3A
References for Supporting Online Material
ADDIN EN.REFLIST 1. Radivojac, P., Obradovic, Z., Smith, D.K., Zhu, G., Vucetic, S., Brown, C.J., Lawson, J.D. and Dunker, A.K. (2004) Protein flexibility and intrinsic disorder. Protein Science, 13, 71-80.
2. McGuffin, L.J., Bryson, K. and Jones, D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404-405.
3. Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. and Jones, D.T. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of Molecular Biology, 337, 635-645.
4. Liu, J. and Rost, B. (2004) Sequence-based prediction of protein domains. Nucleic Acids Res, 32, 3522-3530.
Supp. material p. PAGE 1 Schlessinger, Liu & Rost
@ A X Y Z [ w x y z ˽˽oZ˽ (hI 5OJ QJ _H
aJ mH nH sH u j} h UmH nH u j hI UmH nH uhI mH nH u 2j hI h >*B*UmH nH ph u hI mH nH uhzE hI 0J: mH nH u$j hzE hI 0J: UmH nH u hI j hI UhA hA mH sH hA hlD hGo ? @ ( %
o
M / a N gdGo gdtt gdg gdZ7
8"N ]N gdr
|"
gdGo gdA * gdA 0 gdA R: : " # $ % & ' ( ) * F G H I o p q źӅ|bRA jq h UmH nH u hzE hI 0J: ^J mH nH u2j hI h >*B*UmH nH ph u hI mH nH u(hI 5OJ QJ _H
aJ mH nH sH u jw h UmH nH u j hI UmH nH uhI mH nH u hzE hI 0J: mH nH u$j hzE hI 0J: UmH nH u 2j hI h >*B*UmH nH ph u
ݺݺ{hݺNݺ 2j hI h >*B*UmH nH ph u %hI OJ QJ _H
aJ mH nH sH u jk h UmH nH u hI mH nH u 2j hI h >*B*UmH nH ph u hI mH nH uhzE hI 0J: mH nH u(hI 5OJ QJ _H
aJ mH nH sH u $j hzE hI 0J: UmH nH u j hI UmH nH u
!
"
#
$
%
&
'
C
D
E
F
M
N
O
i
j
k
l
m
n
o
p
q
¯¡~¡m¯¡S¡ 2j hI h >*B*UmH nH ph u j_ h UmH nH u 2j hI h >*B*UmH nH ph u hI mH nH uhzE hI 0J: mH nH u%hI OJ QJ _H
aJ mH nH sH u$j hzE hI 0J: UmH nH u hI mH nH u j hI UmH nH u je h UmH nH u
! ¯¡~¡mX¡ (hI 5OJ QJ _H
aJ mH nH sH u jS h UmH nH u 2j hI h >*B*UmH nH ph u hI mH nH uhzE hI 0J: mH nH u%hI OJ QJ _H
aJ mH nH sH u$j hzE hI 0J: UmH nH u hI mH nH u j hI UmH nH u jY h UmH nH u! " # + , - G H I J K L M N O k l m n v w x źӇ~dźSӇ~ jG h UmH nH u 2j hI h >*B*UmH nH ph u hI mH nH u%hI OJ QJ _H
aJ mH nH sH u jM h UmH nH u j hI UmH nH uhI mH nH u hzE hI 0J: mH nH u$j hzE hI 0J: UmH nH u 2j hI h >*B*UmH nH ph u /
%
;
<
C
D
øӃ{tkekek_kRLRkR_e_
h`n ^J j hlD hGo U^J
h ^J
hZ7 ^J hlD hGo ^J hlD hGo j hI U(hI 5OJ QJ _H
aJ mH nH sH u jA
h UmH nH u j hI UmH nH uhI mH nH u hzE hI 0J: ^J mH nH u$j hzE hI 0J: UmH nH u 2j hI h >*B*UmH nH ph u , ^ ` a N O R S T Z \ ^ + N Y ~u~htt h ^J htt h~ ^J htt h[ ^J htt hF ^J h hF ^J h h ^J h h~ ^J
hOk ^J
hg ^J
hN ^J htt htt ^J htt hGo ^J
h`n ^J j hlD hGo U^J
hZ7 ^J
h ^J hlD hGo ^J . ( V e t { | ĽrbrU hlD hGo OJ QJ ^J hlD hGo 5OJ QJ \^J hlD hGo 5OJ QJ ^J htt htt 5OJ QJ ^J htt hGo 5OJ QJ ^J hlD hGo 5hlD hGo j
hlD hGo Uhtt htt htt hGo hlD hGo ^J h hGo ^J h hL1 ^J h h)$ ^J
hP= ^J h h~ ^J htt h~ ^J $ $ $ - - - $- %- ,- 6- $ d $If a$ gd<