------------------------------------------------------------ Dataset S2: Compatibility class multiple sequence alignments ------------------------------------------------------------ To accompany: The Origins of Specificity in Polyketide Synthase Protein Interactions Mukund Thattai, Yoram Burak, Boris I. Shraiman Alignments are shown in CLUSTAL format. Each row shows an interacting head-tail pair, labeled by the PKS pathway it belongs to, and the interface number within that pathway. For example, the row labelled ampho_001 represents the interface between the head of ampho_001 and the tail of ampho_002, the 1st and 2nd proteins of the amphotericin PKS pathway. We first list matching head-tail pairs from each comatibility class. We then list mismatched pairs, and pairs that involve unclustered domains (designated H0 or T0). Head and tail alignments are separated by a spacer: -->-- if the corresponding genes are adjacently transcribed, and ----- otherwise. We see that 72/90 H1-T1 pairs, 22/22 H2-T2 pairs, and 15/16 H3-T3 pairs are adjacently transcribed. At the end of each line, we give each head and tail domain a log-likelihood score: the lower this number, the more similar the domain is to others in its class. The score is -log10(p), where p is the probability of the given sequence arising from a position-specific weight matrix generated by considering all other sequences in the same class. Note that sequences with gaps tend to score poorly by this measure. To check if a domain might be mis-classified, compare its score with that of others in the same class. ----------- CLASS H1-T1 ----------- <------HEAD-------> <----------TAIL-----------> ampho_001 IESVSVDRLLDIIDEEFEI-->--QEKVVDYLRRVTNDLRRARRRIGELES 14.6 13.7 ampho_003 LDGASDEDMFALLDDELGL-----EQKLRDYLKLATADLRRARRRVGELES 9.9 11.6 ampho_005 IEAASAEEIFAFIDNELGR-->--ESKLVDYLKWVTADLHQTRRRLQEAES 8.2 10.5 ansam_002 LRDAGVAELFEFIDTRLGR-----HEKLVDYLRWVTADLHETRRKLADLES 13.2 15.5 ansam_003 LDHISDDELFRLADSRLGP-->--EKKLRDYLKRALAENERVQQRLRALET 18.7 17.9 ascom_002 SADATADEIFDLIDREL-------QDKTVEYLRWATAELQKTRAEL----A 10.2 28.3 aureo_001 RSAASLEEVFDLLDDQLGK-----DAKTLEYLKRLTAELLETRERLRTAEA 15.2 16.2 aureo_002 LDGATDDEVLDFISNELGI-->--DDKIRYFLKKVTADLHETRGRLKELED 10.5 15.7 averm_001 FTSATEAEIFKFIDNDLGL-->--EAKLLEYLKRVTADLDRTRRRLYEVVE 15.5 15.7 averm_003 LAETSDEEMFALIDREVGF-----EEKLRDYLKRVTADLLNVRRRLQQIES 15.4 14.7 borre_001 FAAATDDEMFELLEKRFGI-->--EDKLRHLLKRVSAELDDTQRRVREMEE 12.1 16.6 borre_002 LKSASRTEVLDFLTNELGI-->--DEELLDYLKRTASNLQEARQRVHELEE 16.6 19.1 borre_003 LENATADEIYALIDNELGI-->------------MTADLRSARARLQELES 11.7 26.5 borre_004 LDDLSDDDLFDFIDAKFGR-->--EQKLRTYLRRVTADLADVTERLQRAED 11.3 18.9 borre_005 LSTATDDEIFALVDSELGE-->--QDKLRDYLRKTLADLRTTKQRLRDTER 10.5 17.0 chalc_001 LDSASDDDLFAFIEDQL---->--DDKIRDYLKRVVAELHSTRQRLNALEH 10.5 16.1 chalc_002 LAASSDDELFDLFDSDFRS-->--EEKLRDYLRRAMTDLHEAREQIRRTES 15.6 13.9 chalc_003 IDSADDDEIFAFLDESFGD-->--EQKLRDYLKRATTELHKATERLKEVEQ 10.9 12.5 conca_002 IGSASADELFDLLDNNFGM-->--EQKLLEYLRRATADLGQARQRLREEED 12.8 11.4 conca_003 MDTASDDELFEFIDNELGM-->--DDKLRDYLKRVAADLHRTRRRLQEVEG 11.1 10.5 conca_004 IESASHDEIFDLIDNQLGI-->--EQKLRDYLNRVTIDLQQTRRRLRDVEE 9.8 15.4 eco02_001 IESASADDLFDIINNEFGK-->--EEKLLENLKWMTNELRRARRRLHEVEA 11.2 15.7 eco02_003 IEEASDDELFALIDKKFGQ-->--EAKLREYLKKVTTDLDEAYGRLREIES 11.5 15.4 eco02_004 LQEASDEELFAFINKGLGR-->--EETLRDYLKLVTADLHQTRQRLRDVEA 11.5 10.9 eco02_005 IQSATADEIFDLLDDELGL-->--EEKYLDYLKRATTDLREARRRLREVEE 9.3 10.1 eco02_006 LETATDDELFDLLDNELGA-->--EDKLRDYLKRATADLRQARRRLREVED 7.6 8.1 eco02_007 LQEATPDELFEFIEKEFGI-->--QDKILGYLKRVTADLHQTRQRLREVEA 13.2 11.2 eco02_008 LETATDDDLFDFIGKEFGI-->--EEKLRYFLKRVTADLHETRRRLQEVES 9.3 10.6 eryta_001 LLDADLDELVAALGRELGD-->--QDRTAALLRRATTELRAARRRVQQLES 18.0 19.9 eryta_002 DDDTSDDELFSLLDARLGA-->--EARLRQYLKRTLGELESVTAELDEVTA 14.6 22.4 eryth_001 LGEAGVDELLEALGRELDG-->--SEKVAEYLRRATLDLRAARQRIRELES 15.9 15.4 eryth_002 SEDASDDELFSMLDQRFGG-->--EEKLRRYLKRTVTELDSVTARLREVEH 12.4 16.0 fr008_001 LDDASDEELFRLMDGGAP------EDRLRAYLKRAVTDLQEARQQLSDAEN 15.6 18.4 fr008_002 LDTTTDDEMFDLIDNELGL-->--EAKLRDYLKKVTTDLRRTRQRLETVEA 8.7 12.4 fr008_003 FASASDEDMFDMLDNELGL-----EQKLRDYLKRASADLKRSRQRVTELEA 10.7 16.0 fr008_005 FESASTDEVFAFIDNELGR-->--DKKLVDYLKWVTADLHKTRRRLEEAEA 8.7 11.8 gelda_001 LDTANADDVFAFIDQEFGV-->--DEKLLNYLKRVTADLHQTRERLRKAEA 10.7 12.8 herbi_001 LDTANADDVFAFIDQEFGV-->--DEKLLNYLKRVTADLHQTRERLRKAEA 10.7 12.8 hygro_002 LEDASQSELLSFIDKEFGR-->--SKTLTDYLKWVTADLYRTRERLAEVES 13.2 19.5 hygro_003 LDAASAREIFDFLDGKAG--->--EDEFLHYLKKAAADLRDARQQIQELEA 14.1 19.0 lanka_001 VDTVDDEELFALLDQRFGE-----EDKLRRYLRRTAGDLEAVSARLRETEY 13.4 18.2 megal_001 LGEAGVDELLDALERELDA-->--NDKVAEYLRRATLDLRAARKRLRELQS 15.4 19.1 megal_002 SDDASDDELFSMLDRRLGG-->--EDRLRRYLKRTVAELDSVTGRLDEVEY 11.7 18.3 monen_001 LKAASADQIFDFIDNELGV-->--EEQLVEYLRRVTTELHDTRRRLVQEED 9.5 16.0 monen_002 LETATAEQVLDFIDNELGV-->--EEKLVDYLKRVSADLHATRQRLREAEE 10.1 9.4 monen_003 LDDVSDDEMFEFIDREL---->--EAKLRQYLKRVTVDLGQARRRLREVEE 10.0 10.7 monen_004 LQVATTDQVLDFIDKELGV-->--EEELVDYLKRVAAELHDTRQRLREVED 14.1 10.7 monen_005 LESASDDELFELIDRELPS-->--EDKLRHYLKRVTADLGQTRQRLRDVEE 9.3 10.0 monen_006 LSTASADDMFALIDREWGT-----EEKLLEYLKRVTADLRQTERRLQDVES 12.0 12.4 monen_007 LEAASADDIFDLISSEFGK-----EEKLLDHLKWVTAELRQARQRLHDKES 10.4 15.0 mycin_001 LSGASDEDLLAFIDEQI---->--ANRMREYLQRMTVELHGARQRLRHLEQ 15.0 25.6 mycin_002 LRSASDDELFQLLDSDFRT-->--EEKLREYLRRALADLHEARAQVHRLTE 11.6 15.8 mycin_003 IDAADDDAIFAYLDDRLSD-->--EQKLREYLKRSTRELHQTTERLRELEE 16.4 15.5 nanch_001 LASATADEILDFIDNELGV-->--EEKLVEYLRRVTTELHDARTRLRELEE 8.2 10.5 nanch_002 LEAATTDQVLDFIDKELGV-->--EERLVDYLKRVAADLHDTRARLREVED 10.7 10.4 nanch_003 LESASDDEMFALIDQQLGS-->--EDKLRQYLKRVTVDLGEARARLRKAEQ 9.1 13.3 nanch_004 LTLATADEVLAFIDNELGT-->--EERLVDYLKRVATDLHDTRRRLREVEE 11.8 10.0 nanch_005 LESASDDEMFALIDQQLGS-->--EEKLRQYLKRVTLDLGQAKQRLREAEE 9.1 12.0 nanch_006 LETASADEMFALIDREFGE-----DKKLLDYLKRVTADLREAQRRLKDVEY 8.4 13.2 nanch_007 LETATAEDIFDLIATEFGK-----EAKLLDHLKWVTAELRDTRRRLREAES 13.4 11.7 nidda_002 LRSASDDELFDLYDSEF---->--EEKLREYLRRAMGDLHSARERLAELES 11.7 12.4 nidda_003 LDAAADDEIFAYIDERFGT-->--AAKLRSYLRRVTAELHRATEQLRAVEE 12.8 14.3 nysta_001 IDTVSVDRLLDIIDEEFET-->--QEKIVDYLRRVTSDLRRARRRIGELES 15.0 13.9 nysta_003 IEEASDDDMFALLDDELGL-----EQKLRDYLKLATADLRRTRRRVHKLES 9.3 12.6 nysta_005 IEAASAEEVLAFIDHELGR-->--EKKLVDYLKWVTKDLHQTRQRLQEVEA 11.7 12.6 olean_001 VEDATIDELFEVLDNELGN-->--DEKIVEYLKRATVDLRKARHRIWELED 17.4 19.0 olean_002 AAWTSDDDLFAFLDKRLET-----AEKLREYLWRATTELKEVSDRLRETEE 17.2 20.2 phosl_001 LASASAAELMQFIDTELGD-->--DSQLVAYLRKVTTDLQKTRLRLRDAET 13.6 20.2 phosl_003 LADADRAEVLAFIDRELGL-->--EEQLLDYLKRVTADLHDTRVRFQAAEE 11.8 16.5 phosl_005 LADASDDDMLDALGREFGI-->-----MRHFLTELTAELRQAKKRISAYEA 11.6 33.1 pikro_001 FMNASAEELFGLLDQDPST-->--EEKYLDYLRRATADLHEARGRLRELEA 21.8 10.8 pikro_002 LDEASDDDLFSFIDKELGD-->--EDKLRDYLKRVTAELQQNTRRLREIEG 8.7 14.2 pimar_001 LVSASDEELFRLMDAE--------EEKLREYLKRAIADLHETRQQLDETEA 16.0 12.0 pimar_002 FESASDDEVFDLLDNELGL-----EQKLRDYLKRASADLRRSRQRVGELEA 7.7 12.7 pimar_004 LEAASTDEIFAFIDNELGR-->--DKKLVDYLKWVTNDLHQTRQRLREVES 7.7 10.7 rapam_002 LSSASASEILDFIDREFGD-----QDKVVEYLRWATAELHTTRAKLEALAA 10.4 19.4 rifaB_003 LDTASDEELFALVDGL----->--EGQLRDYLKRAIADARDARTRLREVEE 15.3 18.0 rifaB_004 LEAASADEVLDFIDEELGL-->--DEKLLKYLKRVTAELHSLR---KQGAR 7.9 32.4 rifam_003 LDTASDEELFALVDGL----->--EGQLRDYLKRAIADARDARTRLREVEE 15.3 18.0 rifam_004 LEAASADEVLDFIDEELGL-->--DEKLLKYLKRVTAELHSLR---KQGAR 7.9 32.4 sorap_001 FKSATKEELFAAFDEAFGG-->--DEKLVSYLQQAMNELQRAHQRLRAVEE 19.1 19.3 spino_002 LRSATDDELFQLLDNDLEL-->--EEKLREYLRRALVDLHQARERLHEAES 10.9 11.3 spino_003 LDSATHDEIFEFIDNELDL-->--EEKLFGYLKKVTADLHQTRQRLLAAES 10.3 16.6 spino_004 LTAATDDEIFDLIDRKFRR-->--EEKLREYLKRVVVELEEAHERLHELER 11.2 14.7 tylac_001 LDSANDDDLFAFIEEQL---->------------MTAELVATRKRLGALEE 11.9 31.4 tylac_002 LSTASDDELFELLDSGFTP-->--EDKLRAYLRRAMADLHESRERLRATEA 14.1 13.2 tylac_003 LEAADDHEIFAFLDERF---->--EQQLRAYLKRATTELHRTSEQLREERA 13.2 18.0 vicen_001 LESATADELFDILDGELST-----EKKLLDYLKRATTDLREARRRLREMEE 10.4 10.5 vicen_002 LDDASDDEIFDFIDSTFGK-->--EAKLREYLKRVTTDLHETNERLREVEG 10.9 12.7 vicen_003 LETATADDIFAFIDKDLGL-->--EEKLLDYLKRVTADLHATRQRLREAES 9.0 8.0 ----------- CLASS H2-T2 ----------- <------HEAD-------> <----------TAIL-----------> ampho_002 LDDMDAEALLRLAAENSAN-->--TNKYVEALRSSLKEIERLRKQNEQLVA 14.7 12.4 ampho_004 VDSMTVADLVRAALNGQSD-->--AENVVAALRAAVKETERLRRRNRTIVA 17.5 16.0 ansam_001 FDDMAADELVRRALGGE---->--REQLLDALRASLRENERLRATG----- 20.2 29.8 ascom_001 TTAEPDDELFDDMDADALI-->--ENDLIEALRTSVKDNAQLRRENTALRA 34.1 18.1 chalc_004 VDGLDAEALVDLVLNQSD--->--QEKVLEALRTSVKDAERLRKRNRELLA 14.3 14.7 conca_001 IDAMDVAELVRMAREGIES-->--DTTVVEALRDALKETARLRRENRQLQA 16.1 16.0 conca_005 TSGIEEMDVDGLVRLALGS-->--TDKVVEALRASLLENERLRGENTRLRD 22.8 13.6 fr008_004 IDAMDVDGLVQAALNGNPD-->--SEKVVEALRASLKETERLRRQNRDLAA 16.8 10.0 gelda_002 LAELDADELVSRAMRGTTF-->--REDLVKALRTSLMDAERLKRENDRLIA 19.5 14.3 herbi_002 LAELDADELVSRAMRGTTF-->--REDLVKALRTSLMDAERLRRENDRLIA 19.5 13.0 hygro_001 IDAMDTEALIRHVMDGTGA-->--NEQLVEALRVSAKENARLRRENASLQE 18.1 18.5 mycin_004 PDDLDAEALISLAMRQSDR-->--PDRLLEALRSALKEGDRLRSQLRQMTE 17.1 25.0 nidda_004 IDALSPAELIRLAKTGAGQ-->--IDEVLDALRTSVKETERLRRRNQELVA 18.6 14.0 nysta_002 LDDMDAEALLRLATENSAN-->--PDKYVEALRSSLKEIERLRRQNEQLVA 14.7 11.7 nysta_004 VDSMTVADLVRAALNGQSD-->--ENNVVAALRAAVKETDRLRRQNRMLVA 17.5 14.9 pikro_003 EASIDDLDAEALIRMALGP-->--NEQLVDALRASLKENEELRKESRRRAD 20.9 17.6 pimar_003 IDAMDIADLVQAAFDGNSP-->--SETVVAALRASLKEAENLRRQNRKLVA 17.4 14.5 rapam_001 EAFVDEMDADALIKHVLEE-->--EDQLLDALRKSVKENARLRKANTSLRA 25.5 15.9 rifaB_001 AELIDALDISGLVQRALGQ-->--NEQIVDALRASLKENVRLQQENSALAA 24.1 21.6 rifam_001 AELIDALDISGLVQRALGQ-->--NEQIVDALRASLKENVRLQQENSALAA 24.1 21.6 spino_001 ATSIDAMDVAGLVEAALGE-->--YEEVVEALRASLKENERLRRGRDRFSA 19.2 19.1 tylac_004 LDDLDGDALVRLALGEPGE-->--AERLTEALRTSLKEAERLRRQNRELRA 16.9 12.4 ----------- CLASS H3-T3 ----------- <------HEAD-------> <----------TAIL-----------> curac_008 FQEMSEDEMANILARKLES-->--TIDYKALMENAFLQIETLQSKLEAFEN 20.0 31.3 curac_009 IEDISEEEFEALAAQQLEK-->--NKEQLSLSKQMFLALKQAEAKLEMMEL 12.3 20.7 curac_011 ITELSEIELEASVLQEIEA-->--KEQSLSALQRALIALKDARSKLEKYET 19.0 30.0 cysto_003 LEQLPQDELGALLDQKLAA-----QNEHNARLARALVAMEKMQARLEASER 8.1 17.8 epoth_003 IEEMSQDDLTQLIAAKFKA-->--QQNPLKQAAIIIQRLEERLAGLAQAEL 22.6 37.3 jamai_005 TKELSEEQLEELINQELNL-->--NLEQLSPLQKSFLVIERLKSKVDTLEK 16.4 28.5 jamai_006 IENISEEEFEALAAQQLEK-->--NKDQLSLSKQMFLALKQAETKLEMMEL 13.8 21.9 melit_003 LEQLPQDELGALLDQKLAD-->--QNEHSARLARALVALEKMQARLEASER 10.4 15.3 myxal_001 IDDMSEEEVERLFAQRVAQ-->--EKIAQYSPKRLALLAMDLKSRLDAVEG 14.3 34.3 myxal_003 VKQLSDEEAEALLAEKLAA-->--ESNELSPAKRMLVALEKMQTRLNAVEG 10.6 25.1 myxot_003 LEQLPQNELGSLLDEKLAL-->--QHEHSARLARALVALEKMQAKLEASER 10.4 16.5 stigm_003 LDVLSRNEVESRLDERLAA-->--QGEDLGRLKRALTALEKMQAKLDAAER 14.7 22.4 stigm_004 LDALSDDEMAARLAEKLAA-->--SQDHRSLLKQSFLEIERLQAKVDSLER 11.4 23.9 stigm_006 IQQLSDDEVAASINEELAA-->--PVEQLSPLQQALLALREMRAKLDASER 11.3 18.1 stigm_007 LEGLPEEELIALFDREMAA-->--PIDYPARLRRALRVVRELQAELKSGRR 14.9 30.8 stigm_008 LDELSEDQVASLLAERLNA-->--TKDNQDLLKRALVTIDKLQAKVDALER 10.3 25.3 ------------------------- Unclustered or mismatched ------------------------- <------HEAD-------> <----------TAIL-----------> averm_002 H0-T2 AEDSSSSRNRTHHTHEGET-----SEKLVDALRASLKANQTLRARNEQLAA --.- 16.6 curac_007 H0-T3 IKQSSNQELESSIDQILES-->--EKQQIKQLSPLQRAALALKKLETKLNN --.- 43.6 curac_010 H0-T3 INQLSEDEMDLAVSQAVSQ-->--QTTQLSNQQLLLLKIQQATAKLHEIET --.- 36.3 curac_012 H0-T0 VDHAIAAELQEIKNLLKEG-->--TTQQDVSSQEVLQVLQEMRSRLEAVNK --.- --.- cysto_004 H0-T3 AQRLSETELHSLIHSLSGD-->--IRGLSPEKRVNLAKMLLRSAGEVAPES --.- 27.9 eco02_002 H0-T2 LNGLDSLDGPSGNDNDSNR-->--NEKVVEALRASLKETERLRRRNQELTD --.- 10.3 epoth_004 H1-T0 VASLDEDGLFALIDESLAR-->--EGQLLERLREVTLALRKTLNERDTLEL 20.8 --.- epoth_005 H3-T0 LRGMTDEQKDALLAEKLAQ-->--ATTNAGKLEHALLLMDKLAKKNASLEQ 18.8 --.- melit_004 H0-T3 AQRLSETELHSLAQSLSGD-->--VRGLSPEKRVNLAKMLLRSAGESVPES --.- 27.5 mycol_001 H0-T1 MFRTPTSEISPTLEGGRGV-->--EENLRVYLKQVITDLHQMQARLRKIEK --.- 26.5 myxal_002 H3-T0 VTELSDEEVERLIAQKLS--->-------------MLLALDLHAESEALKR 14.6 --.- myxal_005 H0-T3 VDESAADDKEVFV-------->--PAEQPSALQRAALLVEKMQARLDAVER --.- 19.6 myxot_004 H0-T3 AGRLSESELRDLAHSFPGD-->--VRGLSPEKRVSLAKMLLRSAGEAVPEQ --.- 26.8 nanch_008 H1-T2 LQEASADEVLQFIDSHLGR-----DAKVVEALRTSLLETERLRKENDRLRA 13.2 13.0 nanch_010 H0-T1 VEHPTSTLLGKYLAETYEA-----QEQLVDYLKRVATDLHDTQQRLREVEA --.- 12.1 nidda_001 H3-T1 LEEQPDEPLGAPLDERLDE-->--NDKIRSYLKRATAELHRTKERLAELES 20.2 14.6 rifaB_002 H0-T2 ERSIADLGVDDLVQLAFGD-->--YEKVVEALRKSLEEVGTLKKRNRQLAD --.- 20.5 rifam_002 H0-T2 ERSIADLGVDDLVQLAFGD-->--YEKVVEALRKSLEEVGTLKKRNRQLAD --.- 20.5 stigm_001 H0-T3 FQELSNEDLLGMLSDDKED-->--SDMMKQLARMIRDLPPDRRAFLADLLR --.- 42.3 stigm_002 H0-T3 VDALSGTALAELFDEQLSA-->----------------MQKMSTRLDALQR --.- 46.1 stigm_005 H0-T3 LEGLSDDESGRLASGTAQR-->-----MSSFLERVAELSPEKRAALAELLR --.- 36.0