Supplementary Figure 1.

Supplementary Figure 1. Electron micrograph of microcrystals of YadA-M. The needle-shaped crystals have a length of 5-10 µm. The crystals are stable in ddh 2 O for long time periods. Please note that electron microscopy was exclusively used to assess the quality of the microcrystalline material.

Supplementary Figure 2: Alignment of the TAA membrane anchor domain Haemophilus Hia (PDB 2GR7), Yersinia YadA, and other members of the family selected from the nr database filtered at 70% sequence identity, are shown. The alignment was produced using the MPI toolkit (http://toolkit.tuebingen.mpg.de). * marks the hydrophobic residues in the transmembrane - strands, a and d mark the core positions of the coiled coil as discussed in supplement S5. The ASSA region, named after the core residues of the region in the YadA sequence, is marked in red. It comprises residues G35-L42. The highly conserved residues L45 and G72 are labeled in green. Note also the conservation of the C-terminal aromatic residue, which is part of the BamA recognition motif of TAAs (1). >2GR7 VNNLEGKVNKVGKRADAGTASALAASQLPQATMPGKSMVAIAGSSYQ-GQNGLAIGVSRISDNGKVIIRLSGTTNSQG--KTGVAAGVGYQW >YADA FRQLDNRLDKLDTRVDKGLASSAALNSLFQPYGVGKVNFTAGVGGYR-SSQALAIGSGYRVN-ENVALKAGVAYAGSS--DVMYNASFNIEW > a d a d a a d * * * * * * * * * * * * * * * * * * * >gi 46313782 IGQVYNSFNDLKKDMYGGVASAMAVAGLPQPTGAGRSMVSAATSNYH-GQQGFAAGYSYVTESNRWVVKASVTGNTRS--DFGAVVGAGYQF >gi 23467016 NNELRTQLNTTDRNLRAGIAGALAAAGLPMSSVPGKSMFAASAGSYK-GQSAVALGYSRVSDNGKITLRLQGTRSSTG--DVGGSVGVGYQW >gi 96988 FRQLDNRLDKLDTRVDKGLASSAALNSLFQPYGVGKVNFTAGVGGYR-SSQALAIGSGYRVN-ESVALKAGVAYAGSS--DVMYNASFNIEW >gi 16565696 FRQLRDQINKNRKRSDAGIAGAMAMTAI--PMIDGKQSFGMAASNYR-DEQAIAAGIIFRTS-ENTVVRLNTSWDTQH--GTGVATGMSIGW >gi 45516184 ISNLSNRIDGAQRDANAGTASAMALAGLPQSVLPGKGMVALAGSTYS-GQSALALGVSKLDS-GRWVFKGGVTSNTRR--NVGATVGAGFHW >gi 46132846 NSYTDQRVDALSREAHAGTAAAMAMAGLPQATIPGKSMIALGGATYR-GQSGLAIGASVMSPGGRWVYKLTGSTARN---TYGASLAAGFHW >gi 17549839 IGMVRQGISQVARGAYSGIAAATALTMIPDVD-QGKSIAIGIGSATYKGYQAVALGASARIS-HNLKAKMGVGYSSE---GTTVGMGASYQW >gi 17986489 VDGLQGQINSARKEARAGAANAAALSGLRYDNRPGKVSIATGVGGFK-GSTALAAGIGYTSK--NENARYNVSVAYNEA-GTSWNAGASFTL >gi 22997603 KQYTDGVVGSLRRDTDGGVAAAIATANLPQAYIPGRGMTSVGVSSYR-GQSAIAVGVSSVSESGRWVFKFSGSANTRS--QVGIGAGVGYQW >gi 23467579 LVDVNKRVDTLDKNTKAGIASAVALGMLPQSTAPGKSLVSLGVGHHR-GQSATAIGVSSMSSNGKWVVKGGMSYDTQR--HATFGGSVGFFF >gi 7228558 AQNLNNRIDNVDGNARAGIAQAIATAGLVQAYLPGKSMMAIGGGTYR-GEAGYAIGYSSISDGGNWIIKGTASGNSRG--HFGASASVGYQW >gi 46156455 NNALRTQIHHADRRLRAGIAGANAAAALASVSMPGKSMVAIAAAGHD-GESALAIGYSRISDNGKVMLKLQGNSNSQG--KVSGAVSVGYQW >gi 15677822 IDSLDKNVANLRKETRQGLAEQAALSGLFQPYNVGRFNVTAAVGGYK-SESAVAIGTGFRFT-ENFAAKAGVAVGTSSGSSAAYHVGVNYEW >gi 24114871 MVEMDNKLSKTESKLSGGIASAMAMTGLPQAYTPGASMASIGGGTYN-GESAVALGVSMVSANGRWVYKLQGSTNSQG--EYSAALGAGIQW >gi 33151932 MEQNTHNINKLSKELQTGLANQSALSMLVQPNGVGKTSVSAAVGGYR-DKTALAIGVGSRIT-DRFTAKAGVAFNTYNG-GMSYGASVGYEF >gi 22988498 INAVQNGVNQVAKNAYAGIAAATALTMIPDVD-QGKTIAVGVGGGSYKGSQAVALGISARIT-QNLKMKAGAGTSSQ---GTTVGLGASYQW >gi 46143244 VGHVNQRINKVNKELRAGIAGANAAAGLPQAYIPGKSMMAVAAGTYK-NESALAVGYSRSSDNGKVILKLQGNANTRG--DLGGSVGVGYQW >gi 23467834 NNELRTQMNNNDRNMRAGVAQAVAQANLPINILPGKSTLSLATGNYM-GTQAFAVGYSRVSDNGKLSVKFSLGHGDK---KTSVGAGVGYSW >gi 13472521 LSQLNSDLGGIRDEARQAAAIGLAAASLRYDDRPGKLSVAAGGGFWR-DSSALAFGAGYTSE--DGRIRGNVSGTAAGG-HVGVGAGISFTL >gi 46156748 FNQLENRFDAFSKESRAGIAGSNAAAALPTISIPGKSVLSVSAGTYK-GQSAVALGYSRVSDNGKVLLKLHGNSNSVG--DFGGGVGIGWAW >gi 23500987 FGKLNEDIVATRIEARQAAAIGLAAASLRYDDRPGKISAAIGGGFWR-GEGAVALGLGHTSE--DQRMRSNLSAATSGG-NWGMGAGFSYTF >gi 32028865 LHATNQRLEEVNKDAKAGIAAAMAFKEV--PFVPGKWSYAAGAAHYS-SESAVSLNLGRTSA-GKYAISGGISSDSRG--RVGFRVGISGVF >gi 23115364 VNDRFEDLDRRIRRNGAMSAAMSQMSANSAYAKPGRGRLAVGAGFQD-GESGLAIGYGRRIN-ENVSVSIGAAFSGS---ESSGGVGFGVDL >gi 18568377 VNAFDGRITALDSKVENGMAAQAALSGLFQPYSVGKFNATAALGGYG-SKSAVAIGAGYRVN-PNLAFKAGAAINTSGNKKGSYNIGVNYEF >gi 22127163 YSELKQDLRKQNSVLSAGIASAMSMASLTQPYTSGSSMTTIGAASYR-GQSALSLGVSSISDSGRWVSKLQASSNTQG--DFGIGVGVGYQW >gi 23467645 LNNLEHKFDMSNKNLRAGIAGANAAAGLASVSMPGKSMLAISAAGYD-GENAVAVGYSRMSDNGKVMLKLQGNSNSRG--KVGGSVSVGYQW >gi 15964211 FAQLSGEIGQVRSEARQAAAIGLAAASLRFDNEPGKLSVALGGGFWR-SEGALAFGAGYTSEDGRVRANLTGAAAGG---NVGVGAGLSITL >gi 23466952 NNELRTQLHSVNRESRSGIAGANAAAALPMIAMPGKSALAVSAGAYK-GQSAVALGYSRMSDNGKIMLKLHGNSTSTG--DFGGGVGIGWAW >gi 46143665 VANIDNRVSKLDKRVRGIGANAAAASSLPQVYIPGKSMVALAGGAYS-GASAVAVGYSRASDNGKVILKVNGTANSAG--HYSGGVGVGYQW >gi 46316503 VGAIQQGVNDLARNAYSGIAIAGALAGMPQVD-PGKVISVGAGFGNYGGYTAIAVGGSARI-AQNTVIKLGVGTVNGS--RMMVNGGIGHSW >gi 15602579 INKLGDHINKVDKDLRAGIAGATAVAFLQRPNEAGKSIVSLGVGSYR-SESAIAVGYARNSDNNKISIKLGGGMNSRG--DVNFGGSIGYQW >gi 23466874 VNRLDNVISTNNRTLQAGIAGANAAAALPTVTMPGKSTIALSAGTYK-GRNAVAIGYSRLSDNGKITLKLQGNSNSAG--DFGGGVGVGWTW >gi 7532792 LDSQQRQINENHKEMKRAAAQSAALTGLFQPYSVGKFNASAAVGGYS-DEQALAVGVGYRFN-EQTAAKAGVAFSDG---DASWNVGVNFEF >gi 27380558 LAALNGRVDNLTRESRGGVALALAASSLQFDPRPGKISVSGGFGNFQ-GQSGLAVGLGYSYS-DAMRFNAAFTAAQQG--AIGVRAGASWTL >gi 32028660 NNELRTQLNNTDRTLRAGIAGSNAAAGLASVSMPGKSMLAISAAGYG-GENAVAIGYSRMSDNGKIMLQLQGNRNSRG--KAAGSVSIGYQW >gi 15603435 YNILNNRINKVDKDLRAGIAGANAAAGLPQAYIPGKSMVAVAAGTYK-GQNAIALGMSRISDNGKVIIKLTGNTNSRG--DFGASIGAGYQW >gi 22982300 NSYTDDQIRSARRDSYGGTASAMAMAGLPQAVLPGHGMVAMAGGTYA-GQSAFAIGVSQLSETGKWVYKLQGTTDSRG--QFGASIGAGMHW >gi 32029187 VNRLDNAISTTNRRLQAGIAGALATGGLPITVMPGKSMLAASAGSYK-GQSAVALGYSRMSDNGKIMLRLQGTSTSTG--DVGGSVGVGYQW >gi 19568164 IDRLDSRVNELDKEVKNGLASQAALSGLFQPYNVGSLNLSAAVGGYK-SKTALAVGSGYRFN-QNVAAKAGVAVSTNGG-SATYNVGLNFEW >gi 22000944 TNELDHRIHQNENKANAGISSAMAMASMPQAYIPGRSMVTGGIATHN-GQGAVAVGLSKLSDNGQWVFKINGSADTQG--HVGAAVGAGFHF >gi 16121668 FGGFDKDINQKQKQLNAGIAATMAAAVIPQKS-GSKVSIGVGLAGYS-DQGAGSVGAIWHVN-QRITMNTTMTYDTQR--GVSLLTGLSIGI >gi 23466989 FNHLDNKIEIFNKDLRAGVAGAHAAAALPTVTMPGKSSLALSAGTYK-GNNAVALGYSRLSDNGKIMLKLHGSRNSAG--DFGGGVGVGWTW >gi 16121667 YNQLSDKVNKNFNKTNAGISGAMAMSGIPQKFGYEKSFGMAIGAYR--GQSALAVGGDWNIN-HKTITRVNVSADTEG--GVGVAAGFAFGI >gi 22983404 MGNMSNSINNVDRNAAKGIASASALNIVTPYL-PGRTTLNAGVANYR-GYQSVGLGVSRWNE--KGTINYNLGVSTSGGNSTIVRAGIGIVL >gi 16766976 MGEMNSKIKGVENKMSGGIASAMAMAGLPQAYAPGANMTSIAGGTFN-GESAVAIGVSMVSESGGWVYKLQGTSNSQG--DYSAAIGAGFQW >gi 46317988 INSVRDEMSKYRKDADAGTASAIAMANMPQAVLPGEKVVALGGGTYG-GQSAMAVGLSFATT--KWLVKGSVTTAVSGHGSFGAGAGVGYRW >gi 46315938 AGQLQQGINDTARKAYSGVAAATALTMIPDVD-KDKVLSVGVGVGSYQGYSAVALGATARIT--NIKMRAGASLGGS---GTAIGMGASMQW >gi 22988648 MQQFQGGLSDMARNAYSGTASALALTAIPEVD-SSKNLAIGVGTAGYKGYQAVAVGLSARVT-QSLKVKLGAGISSA---TTAVTAGAAYQW >gi 21230133 IEDRLRRQNRRLDRQGAMGSAMLNMSASVAGI-ASQNRIGAGVGFQN-GESALSVGYQRAIS-PRATVTIGGALSGD---DSSIGVGAGFGW >gi 22996732 VNGQMRRQDRRISRQGAMGAAMLNMATSAAGI-HTQNRVGAGVGFQN-GQAALSLGYQRAIS-DRSTVTIGGAFSSS---DSSVGIGAGFGW >gi 23115151 MEWKLRKQDQRIDRMGAMTAAMVQMSASASGL-RTQNRVAVGAGFQG-GEQALSIGYQRAIS-DRATFTVGGAFSDS---ESSAGVGLGFGW >gi 38638179 FSELNDRVNRNESRANAGIAGAMAMSAIPYLNNYVDNSFGMATSTFR-GETAIASGYQRQIN-PYVNVRLSSSWDTSN--GVGVAAGVALGW >gi 46312900 INSLGSQLQQTDQMAKQGIAAVGAMASIPQLDRDANFGMGVGTSTFL-GQKAMAVNMQARIT-ENLKASINGGFSGG---QKVIGAGMLYQW >gi 8572547 GQHFNNRISAVERQTAGGIANAIAIATLPSPSRAGEHHVLFGSGYHN-GQAAVSLGAAGLSDTGKSTYKIGLSWSDAG--GLSGGVGGSYRW >gi 15800223 FSSLKNEVDDNRKEANAGTASAIAIASQPQVKTGDVMMVSAGAGTFN-GESAVSVGTSFNAG-THTVLKAGISADTQS--DFGAGVGVGYSF >gi 46192873 DAVNVGQLNDGLREVSAGVAMSMAMAQLPAPLDGSNHSFGVAVGGFD-GQEALALGGTAIVN-NNVTLRGALSHAGG---KTGAGVGVGWSF >gi 33152901 QQIDQRILHQFRKEMHMNTANTAAMSSLNFGN-GYGVSVGAAIGGHK-GQYSLALGTAYTDYQTQVNVKIALPVKQPKPSNITYGVGFVYNF >gi 46322712 AHADAAADPADRFDGARGIAATAGMASIPHMDRDSSFAMGGGTATFQ-GRKAMAVGVQARIT-ENLKATVNVGFAGS---QRVVGAGMLYQW >gi 46314378 LTQMQQQIQQTDSMAREGIAATAAMASIPHMDRDSNFAMGVGTATFQ-GQKAMAVGVQARVT-ENLKATLNGGFAGS---QRVVGAGMLYQW >gi 42631179 QVDTRLNRTDLRINRLGASAAALASLKPAQLGEDDKFALSLGVGSYK-NAQAMAMGAVFKPA-ENVLLNVAGSFSDS---EKTFGAGVSWKF >gi 27380649 GSLQSEITANQQEARRGIVAAVSAAPVLMPSA-RGRTTVAVNAGYYR-GQSGVGIGISHRLD-WTTPTVLFGGYSNGGG-EHIGRAGMAVEF 1. U. Lehr et al., Molecular Microbiology 78, 932 (Nov, 2010).

Experiment Mixing times Number of restraints Total CHHC 35/50/80/150/200/300/500 µs 6/4/14/16/2/15/43 100 NHHC 35/50/100/200 µs 32/12/28/39 111 DARR 200/300/500 ms 266/163/87 516 PDSD 15/100 ms 55/182 237 TEDOR 2.24/6/12 ms 48/60/2 110 PAR 2.25/6/15 ms 35/106/31 172 CPPI-DARR 300 ms 71 71 CPPI-DARR-DD 300 ms 21 21 Supplementary Table 1: Number of restraints used by ISD in the final structure calculation. The restraints include 1192 non-redundant distance bounds that were extracted from manually assigned cross-peaks and an additional set of 146 restraints that were assigned automatically.

Supplementary Note 1: The ssnmr dataset We recorded each solid-state MAS NMR experiment with several mixing times to achieve structural restraints in YadA-M. This led to a huge dataset, however some spectra proved redundant. For establishing the inter-strand contacts, 2D CHHC spectra with seven different 1 H- 1 H mixing times (i.e., 35, 50, 80, 150, 200, 300 and 500 µs) were recorded. Short mixing times (i.e., 35, 50 and 80 µs) ensure a predominant if not exclusive spin diffusion between alpha protons on adjacent anti-parallel beta strands. For YadA-M we found 50, 80, 150 and 500 µs CHHC spectra the most useful among all CHHC dataset where we could assign 1, 4, 7 and 10 inter-strand cross peaks, respectively. We also recorded 2D NHHC spectra (35, 50, 100 and 200 µs) for establishing long-range inter-strand contacts. These spectra proved rather more useful to confirm the sequential assignment. Spectral peaks from amide proton of residue i and alpha proton of residue i-1 were dominant over medium and long range peaks. Nevertheless, 2D NHHC at 200 µs gave a handful of inter-strand cross peaks and proved useful when analyzed in parallel with CHHC spectra. To our experience, 2D TEDOR spectra recorded with 2.5, 6 and 12 ms mixing were much better resolved and informative than NHHC. 2D 13 C- 13 C correlation spectra with 200, 300 and 500 ms DARR mixing and 2.5, 6 and 15 ms PAR mixing were recorded for obtaining long range restraints. The TEDOR 12 ms, DARR 500 ms and PAR 12 ms proved redundant because most of the cross peaks had already relaxed during these mixing times. From the methyl filtered DARR spectra (i.e., CPPI-DARR 300 ms and CPPI-DARR-DD 300 ms) we found CPPI-DARR 300 the most useful. The CPPI-DARR-DD 300 ms spectrum with double methyl filter shows peaks only in the methyl region which is highly crowded and difficult to interpret. In 2D PDSD (15, 100 ms) the spectrum with 15 ms was useful only for residue specific and sequential assignments. In conclusion, only 13 dataset were sufficient to derive distance restraints necessary for a successful structure calculation (i.e., 2D CHHC 50, 80, 150, 500 µs, 2D NHHC 200 µs, 2D TEDOR 2.5, 6 ms, 2D DARR 200, 300 ms, 2D PAR 2.5, 6 ms, 2D CPPI-DARR 300 ms, 2D PDSD 100 ms; these dataset are marked with asterisk in following tables A and B).

The tables below provide an overview of the 24 spectra of the dataset, including the number of picked peaks and number of assignments that provided structure-defining restraints. A brief explanation of the different columns in Table A is as follows: Experiment type: Type of pulse sequence used to exchange magnetization between different nuclei. Mixing time: Time allowed for magnetization exchange; as rule of thumb, short mixing times mainly provide intra-residue or sequential exchange, while extended mixing times are required for long-range transfers. Proton frequency/mhz: Proton Larmor frequency of the experiment. MAS/kHz: Magic-angle spinning frequency of the experiment. Total picked peaks: Total number of both manually assigned and ambiguous peaks for each experiment; ambiguity can be both in one or two dimensions. Sequential assignments: Backbone-backbone, backbone-side chain and side chain-side chain correlations between residue (i) and (i+/-1). Short range assignments: Transfer between nuclei of residues that are separated by more than two - but less than three - amino acid residues in the primary sequence. These types of restraints are particularly helpful for defining the secondary structure of alpha-helices. Medium range assignments: Transfer between nuclei of residues that are separated by more than three - but less than five - amino acid residues in the primary sequence; these restraints are particularly helpful in defining beta-turns connecting the beta-strands. Long range assignments: Transfer between nuclei of residues that are separated by more than five amino acid residues in the primary sequence. These restraints are very important for defining the exact geometry of the molecule. Long-range restraints were used to define the register of β-sheet and helped to position the N-terminal alpha-helix relative to the β-sheet in YadA-M. Inter molecular assignments: Distance restraints between protomers of the symmetrical trimer. Restraints between strand β-1 of protomer A and β-4 of protomer B are required to define the correct sheer number of the β-barrel. The β-strands are tilted with respect to the N- terminal alpha-helices; hence the position the N-terminal α-helix relative to the β-sheet is defined by several intermolecular restraints between the α-helix of protomer A and the β- sheet of protomer B. Asterisk: Non-redundant, essential dataset.

Table A List of spectra with number of different distance restraints used in YadA-M structure calculation experiment type mixing time nuclei involved proton frequency/ MHz MAS/ khz total picked peaks total manual assignments sequential assignments short range assignments medium range assignments long range assignments intermolecular assignments DARR* 200 ms DARR* 300 ms DARR 500 ms CPPI-DARR-DD 300 ms CPPI-DARR* 300 ms PDSD 15 ms PDSD* 100 ms PAR* 2.25 ms PAR* 6 ms PAR 15 ms TEDOR* 2.2 ms TEDOR* 6 ms TEDOR 12 ms CHHC 35 µs CHHC* 50 µs CHHC* 80 µs CHHC* 150 µs CHHC 200 µs CHHC 300 µs CHHC* 500 µs NHHC 35 µs NHHC 50 µs NHHC 100 µs NHHC* 200 µs 13 C- 13 C 900 13 887 669 230 32 6 58 24 13 C- 13 C 900 13 979 736 242 78 6 60 24 13 C- 13 C 900 12 624 403 116 37 2 20 7 13 C- 13 C 900 12 129 75 9 1 0 6 0 13 C- 13 C 900 12 453 319 79 23 2 13 11 13 C- 13 C 900 13 302 266 43 1 1 2 1 13 C- 13 C 900 13 653 536 165 24 0 3 4 13 C- 13 C 850 13.33 278 246 19 0 0 6 1 13 C- 13 C 850 13.33 477 362 68 15 4 23 4 13 C- 13 C 850 13.33 290 225 47 5 0 13 2 15 N- 13 C 850 13.33 220 190 62 2 0 1 2 15 N- 13 C 850 13.33 236 212 81 11 0 2 1 15 N- 13 C 850 13.33 48 43 10 0 0 1 0 13 C- 13 C 700 10 28 20 0 0 0 1 1 13 C- 13 C 700 10 37 33 2 0 0 1 1 13 C- 13 C 700 10 66 56 6 0 0 4 1 13 C- 13 C 700 10 116 84 10 2 0 4 4 13 C- 13 C 900 13 69 61 2 1 1 3 1 13 C- 13 C 900 13 97 68 10 3 0 4 1 13 C- 13 C 900 13 177 103 19 11 2 7 4 15 N- 13 C 700 10 68 44 22 0 0 0 0 15 N- 13 C 700 10 59 45 18 0 0 0 0 15 N- 13 C 700 10 124 70 25 0 0 0 0 15 N- 13 C 700 10 124 84 33 0 0 5 4

Table B List of parameter values used to record NMR experiments experiment type mixing time nuclei involved proton frequency/ MHz MAS/ khz no. of scans dwell time (µs) direct acquisition (ms) indirect acquisition (ms) Increment delay (µs) No. of slices in F1 (TD) Transient delay (d1) (seconds) Approximate experiment time (hours) DARR* 200 ms DARR* 300 ms DARR 500 ms CPPI-DARR-DD 300 ms CPPI-DARR* 300 ms PDSD 15 ms PDSD* 100 ms PAR* 2.25 ms PAR* 6 ms PAR 15 ms TEDOR* 2.2 ms TEDOR* 6 ms TEDOR 12 ms CHHC 35 µs CHHC* 50 µs CHHC* 80 µs CHHC* 150 µs CHHC 200 µs CHHC 300 µs CHHC* 500 µs NHHC 35 µs NHHC 50 µs NHHC 100 µs NHHC* 200 µs 13 C- 13 C 900 13 32 10 17.98 9.60 12.0 800 2.79 22 13 C- 13 C 900 13 32 10 17.98 9.60 12.0 800 2.79 22 13 C- 13 C 900 12 64 10 26.0 7.0 10.0 704 3.0 40 13 C- 13 C 900 12 96 12.30 19.98 6.60 12.30 544 3.0 45 13 C- 13 C 900 12 96 12.30 24.5 6.60 12.30 544 3.0 45 13 C- 13 C 900 13 64 8.0 15.0 4.75 8.0 592 3.0 32 13 C- 13 C 900 13 64 8.0 15.0 4.75 8.0 592 3.0 32 13 C- 13 C 850 13.33 32 6.133 19.99 16.87 37.50 450 2.79 11 13 C- 13 C 850 13.33 64 6.133 19.99 13.20 37.50 352 2.79 18 13 C- 13 C 850 13.33 160 6.133 19.99 13.20 37.50 352 2.79 44 15 N- 13 C 850 13.33 16 6.133 19.99 9.37 37.50 250 4.0 5 15 N- 13 C 850 13.33 64 6.133 19.99 11.62 37.50 310 4.0 22 15 N- 13 C 850 13.33 64 6.133 19.99 9.37 37.50 250 4.0 18 13 C- 13 C 700 10 96 11.0 11.99 4.55 10.93 416 3.0 34 13 C- 13 C 700 10 96 11.0 11.99 7.16 10.93 656 3.0 53 13 C- 13 C 700 10 96 11.0 11.99 7.16 10.93 656 3.0 53 13 C- 13 C 700 10 96 11.0 11.99 7.16 10.93 656 3.0 53 13 C- 13 C 900 13 80 10 17.98 8.96 28.0 320 2.79 20 13 C- 13 C 900 13 80 10 17.98 11.64 28.0 416 2.79 26 13 C- 13 C 900 13 80 10 17.98 11.64 28.0 416 2.79 26 15 N- 13 C 700 10 512 8.0 11.98 4.22 88.11 48 2.79 19 15 N- 13 C 700 10 512 8.0 11.98 4.22 88.11 48 2.79 19 15 N- 13 C 700 10 512 8.0 11.98 4.22 88.11 48 2.79 19 15 N- 13 C 700 10 512 8.0 11.98 4.22 88.11 48 2.79 19

Supplementary Note 2: Structure calculation Restraints from experimental data Secondary structure and dihedral angle restraints. Secondary structure was predicted from the chemical shifts using TALOS+ (1) (see Figure A(A)). We kept predictions of canonical secondary structure with a confidence value greater than 50%. Based on the chemical shifts, we found an N-terminal helix (16-35, 40-42) and four strands spanning residues 54-63 (strand 1), 67-76 (strand 2), 81-90 (strand 3), and 95-104 (strand 4). Dihedral angles predicted from chemical shifts using TALOS+ were filtered to include only residues with more than 50% confidence of being in canonical secondary structure. This filtering resulted in 60 phi/psi restraints among which 56 were Good" predictions and 4 had to be taken with caution ( Warn category). Distance restraints. Distance restraints were derived from 24 different 2D solid-state MAS NMR spectra comprising 2033 manually assigned cross-peaks. All spectra were combined to yield a non-redundant list of 1192 distance restraints. Duplicated restraints were reduced to the one with the smallest upper bound. An overview of all distance restraints is provided in Table A. Upper bounds for the spectra can be found in Table B. Initial classification of distance restraints. YadA-M is a homo-trimer, which complicates the structure calculation because restraints can refer to intra- or intermonomer contacts. It is very unlikely that cross-peaks between strand 1 and 4 are intra-monomer restraints, because this would require a tiny barrel composed of four strands only. Likewise, restraints involving residues from the first three strands are very likely to be intra-monomer restraints. Using this rationale, we classified all nonredundant distance restraints into 525 intra-monomer, 48 inter-monomer, and 619 ambiguous restraints. Figure A(B) shows a contact map representation of the distance restraints. The contact map clearly shows the anti-parallel pairing of neighboring strands. Structure calculation with ISD We used an iterative structure calculation protocol. We first determined an approximate structure of the monomer, which we then assembled into a trimer. 1

During each round of structure calculation, the distance restraints were disambiguated on the basis of the previous structure ensemble and additional hydrogen bonds were inferred. That way, we determined the structure of YadA-M de novo without imposing any knowledge that was not derived from the NMR data. An overview of all structure calculations is shown in Table C. A flow chart of the entire structure calculation of YadA is shown in Figure B. Calculation of the first monomer ensemble (simulation 1) In the initial calculation, we neglected the 48 restraints classified as inter-monomeric and used only the 525 intra-monomer and the 619 ambiguous restraints. Because the data included restraints that cannot be explained by a single monomer, we used a soft error-tolerant restraint potential. Cross-peaks from solid-state NMR spectra are difficult to quantify and only provide upper distance bounds. Our model for distance data evaluates the probability of observing a pair of lower and upper bounds given the correct, however unknown experimental distance. Because the correct distance is unknown, we treat the distances as nuisance parameters, which we integrate out using Monte Carlo sampling. Instead of the more restrictive log-normal distribution (2), we used an outlier-insensitive variant, the log-laplace distribution, to relate the unknown experimental distance to the calculated distances. The effective restraint potential is shown in Figure C and is similar to the error-tolerant linear restraint potential introduced by Kuszewski et al. (3). In addition, we used the 120 angular restraints predicted with TALOS+. The angular restraints were modeled with a von Mises distribution with separate adaptive weights for phi and psi restraints. The conformational prior involved the standard soft-repulsive non-bonded force field (PROLSQ potential form with parallhdg5.3 parameters (4,5) and interactions involving hydrogens switched off) as well as an additional phi/psi and hydrogen bonding potential for backbone hydrogen bonds. Structure calculation was carried with replica exchange Monte Carlo (6). 11980 replica transitions were simulated of which the last 4980 structures made up the final ensemble. During conformational sampling, the weight of the distance restraints was shared between intra and ambiguous restraints and estimated using Gibbs sampling (7). As expected, the estimated weight was quite low (3.76 ± 0.141) because the data contained intermonomer restraints that could not be explained with a single monomer, which led to 2

an elevated noise level. We observed 79.7 ± 6.4 distance restraint violations (> 0.5 Å) within the monomer ensemble (intra-monomer: 21.0 ± 3.9, ambiguous: 58.0 ± 5.3). The force constants of the phi and psi restraints were estimated and clipped to values of 20 (corresponding to a circular variance of 13 ). The ensemble was quite heterogeneous (see Figure D), but clearly showed a 4 stranded beta-sheet and an N- terminal helix. Inference of secondary structure and hydrogen bonds We used the ensemble from simulation 1 to infer the secondary structure as well as the register of beta-strands. Secondary structure was assigned with DSSP (8), backbone hydrogen bonds were detected using HBPLUS (9). Both secondary structure and hydrogen bonds were calculated for all 4980 structures. Figure E shows a contact map of the backbone hydrogen bonding pattern and the frequency of occurrence of canonical secondary structure. We assigned the secondary structure according to the maximum a posteriori estimate derived from the posterior probability with additional smoothing in the helix region resulting in the following secondary structure: 13-42 helix, 54-62 strand 1, 65-74 strand 2, 81-89 strand 3, and 93-100 strand 4. All observed hydrogen bonds were filtered for those that were consistent with the inferred secondary structure. We kept hydrogen bonds between helical residues that followed the canonical (i + 4, i) pattern. For the hydrogen bonds between residues in strands, we inferred the register of neighboring strands by selecting the subset of restraints that was consistent with an anti-parallel pattern (i, j), (i + 2, j 2), etc. In total, we obtained 46 hydrogen bonds (converted to 92 hydrogen bonding restraints, one acceptor-donor and one acceptor-hydrogen restraint for each bond) among which 26 were helical bonds and 20 were bonds between the anti-parallel strands 1-2, 2-3 and 3-4. Structure calculation of the monomer using the inferred hydrogen bonds (simulation 2 and 3) The second simulation used the 46 inferred hydrogen bonds in addition to the distance and angular restraints and a force field with a more realistic non-bonded energy function adapted from the Rosetta sofware (10) (Lennard-Jones potential with linear 3

asymptotics). No other aspect of the Rosetta software was used in the structure calculation; all calculations were done with ISD. We have shown previously that incorporation of attractive van der Waals contributions improves structure calculation from NMR data (11). To use realistic van der Waals energies rather than merely purely repulsive potentials has been proposed previously by Kuszweski et al. (12) and is standard practice in NMR structure calculation. Hydrogen bond restraints were implemented as distance restraints between donor and acceptor (2.9 Å) and amide hydrogen and acceptor (1.9 Å); the force constant was fixed to 100 kcal/mol. Using a replica-exchange simulation, we calculated a structure ensemble under all experimental restraints and the additional hydrogen bonds; the simulation started from an extended structure; the total number of replica transitions was 4500. After 3000 replica transitions, the calculation had converged yielding an ensemble of 1500 structures. The estimated weight of the distance restraints was slightly higher 3.85 ± 0.14 than in the previous simulation. This is consistent with a decrease in the number of violations: 64.0 ± 6.3 (intra-monomer: 16.0 ± 3.6, ambiguous: 47.0 ± 4.6). We calculated an average structure and a local B factor indicating the variability within the ensemble (shown in Figure F(A)). The ensemble is better defined than the first monomer ensemble. We also used the mean structure to complete the intra-monomer hydrogen bonds (Figure F(B)). To proceed further, we calculated an average structure that was as close as possible to the ensemble average but also obeyed the covalent restraints and showed good nonbonded interactions. The average structure was obtained with a short replica simulation (620 transitions, simulation 3) in which we applied B-factor weighted positional restraints. The resulting structure was used as starting structure to assemble the trimer. Assembly of the trimer (simulation 4) In general, symmetry poses challenges to NMR structure determination (as opposed to X-ray structure determination). It increases the ambiguity of assignments because inter- and intramolecular interactions (within the protomer as opposed to between protomers) cannot easily be distinguished. Thus, we assume that solution of any monomeric protein would be easier than solving the structure of YadA - if that protein is similar in size. 4

To assemble the trimer, we imposed exact C3 point symmetry and used all available distance restraints with a strong force constant fixed to a value of 50. Exact rotational symmetry was implemented in ISD using a single monomer structure parameterized in dihedral angles that also experienced the forces and interactions with the image structures generated by the symmetry operator. This is in spirit similar to the approach of Bardiaux et al. (13). Ambiguous distance restraints were calculated as r -6 averages over all possible intra- and inter-monomer distances. In addition to the distance restraints, we used B-factor weighted restraints to the mean structure (as we did in the calculation of the monomer average structure). The force field was based on a Lennard-Jones potential for non-bonded interactions between non-hydrogen atoms. The simulation started from the final structures generated in simulation 3. A total of 3520 replica transitions were sampled, the simulation converged after 2500 transitions. The final ensemble showed a significantly reduced number of violations (> 0.5 Å) 19.0 ± 2.3 (intra-monomer: 7.0 ± 1.1, inter-monomer: 1.0 ± 1.3, ambiguous: 10.0 ± 1.8), which was not only the effect of formerly violated ambiguous restraints being satisfied in the trimer, but also partly due to the high force constant. The assembled trimer showed a beta-barrel structure with the trimeric helix bundle passing through the pore (Figure G). From the ensemble, we derived additional intra-monomer hydrogen bonds that were in register with the hydrogen bonds obtained from the mean structure. Moreover, the ensemble determined the register of strand 1 and strand 4 uniquely. The final set of hydrogen bonds is shown in Figure H and comprises 61 intra-monomer and 10 intermonomer hydrogen bonds (between strand 1 and 4). Tests with alternative oligomerization states Although the stoichiometry of trimeric autotransporters is long known from biochemical data, we also tested if the NMR data themselves are sufficient to determine the stoichiometry that is most compatible with the distance restraints. To do so, we calculated three assemblies: a dimer, a trimer and a tetramer using C2, C3, and C4 symmetry, respectively. In each calculation, we fixed the structure of the monomer according to the result of simulation 3 such that the only free parameter was the symmetry axis (as described in the previous section). 5

The three alternative assemblies differ substantially in the number of restraint violations: Dimer: 53.0 ± 4.7 (total), 10.0 ± 2.1 (intra), 22.0 ± 2.4 (inter), 19.0 ± 3.6 (ambiguous) Trimer: 23.0 ± 8.0 (total), 7.0 ± 2.0 (intra), 2.0 ± 1.4 (inter), 13.0 ± 5.6 (ambiguous) Tetramer: 37.0 ± 3.3 (total), 7.0 ± 1.5 (intra), 8.0 ± 1.2 (inter), 21.0 ± 3.2 (ambiguous) (total: overall number of restraint violations; intra: number of intra-protomer violations; inter: number of inter-protomer violations; ambiguous: number of violated ambiguous distance restraints). The violation statistic clearly favors a trimeric assembly over a dimeric or tetrameric assembly. Thus the NMR data contain sufficient information to determine the stoichiometry of the YadA-M assembly. We also carried out structure calculations with alternative oligomerization states allowing for full flexibility of the protomer chain by using the methods described in (13). For reasons of computational efficiency these calculations were done with the ARIA/CNS software (14,15), which converges much faster than a full-blown ISD simulation. For every calculation, the initial data set consisted in 2033 manually assigned ssnmr peaks, supplemented with hydrogen bond and TALOS restraints. ARIA was used to iteratively resolve the inter-/intra-monomeric ambiguity and calculate structure ensembles. The starting structure was always a protomer chain with randomized backbone torsion angles. These calculations clearly show that a trimer arrangement explains the data best and produces ensembles showing the best validation criteria (Table F). Full trimer calculation (simulation 5) The next replica simulation used all distance and angular restraints as well as the 71 hydrogen bonds inferred from the previous simulations. The weights of the distances, hydrogen bonds and dihedrals angles were estimated. To enhance the conformational sampling, the prior distribution involved a purely repulsive force field. 40 distance restraints between atoms that were separated by one or two covalent bonds only were removed (intra-monomer: 24, ambiguous: 16 restraints). We ran a replica simulation with 2400 transitions, which converged after 1000 transitions and resulted in an 6

ensemble comprising 1400 structures. The distance restraint violations increased because of the lower force constant (10.98 ± 1.13) to 39.0 ± 6.1 (intra-monomer: 15.0 ± 2.7, inter-monomer: 1.0 ± 0.8, ambiguous: 22.0 ± 5.0). Cluster analysis (16) of the 1400 monomer structures revealed that the structure ensemble comprised three main conformers that were populated by 54, 32 and 14% of all structures; the ensemble RMSDs of the conformers were 0.74, 0.60, and 1.0 Å, respectively (see Figure I). The largest structural differences (local RMSD > 2 Å) along the backbone were found in the region spanning residues 35 to 49. Conformer 1 has a straight helix, whereas the helix in conformer 2 and 3 had a kink. Conformer 2 was a kind of intermediate structure between conformer 1 and 3. Conformer 2 and 3 were closer to the mean structure from simulation 2 (Cα RMSD 1.6 Å without first 12 N- terminal residues) than conformer 1 (RMSD 1.9 Å). Cluster 1 and 3 showed the same number of violations 36.6 ± 4.3, whereas cluster 2 violated the distance restraints more with 45.1 ± 4.8 violations on average. This was reflected in a slightly elevated estimated weight of the distance data for cluster 1 and 3 ( 11.4 kcal/mol) and a lower distance weight for cluster 2 (10.2 kcal/mol). In addition of being the most populated conformer, cluster 1 also showed the best average dihedral angle energy and also the lowest average non-bonded energy (Table D). Final trimer calculation (simulation 6) We selected the first conformer for further analysis. To improve the structure, we used the ensemble of the first conformer to disambiguate distance restraints and to identify consistently violated restraints. Moreover, we collected from the peak lists new restraints that were consistent with the structure. For each distance restraint, we evaluated all possible realizations r AA, r AB, r BA and counted the number of times each realization was smaller than the upper bound within a tolerance of 1 Å. This resulted in frequencies, p AA etc., of the upper bound being consistent with the respective distance. We then constructed the following probabilities: Probabilities for unique assignments: Pr(AA) = p AA (1" p AB )(1" p BA ) Pr(AB) = (1" p AA )p AB (1" p BA ) Pr(BA) = (1" p AA )(1" p AB ) p BA 7

Probabilities for ambiguous assignments: Pr(AAor AB) = p AA p AB (1" p BA ) Pr(AAor BA) = p AA (1" p AB ) p BA Pr(ABor BA) = (1" p AA )p AB p BA Pr(AAor ABor BA) = p AA p AB p BA Probability that a peak is wrong: Pr(wrong) = (1 " p AA )(1 " p AB )(1 " p BA ) We classified a restraint as intra-monomer, if Pr(AA) > 0.9. We classified a restraint as inter-monomer, if Pr(AB) > 0.9 or Pr(BA) > 0.9. We removed the restraint, if Pr(wrong) > 0.9. In all other cases, we treated the restraint as completely ambiguous. Using this scheme, we obtained the final list of restraints shown in Figure J and Table E. Using this list of restraints, we calculated the final structure ensemble of YadA-M. The final structure calculation comprises a first replica simulation with 4970 transitions based on the Lennard-Jones force field without hydrogen interactions. In a subsequent simulation (1330 transitions), also the hydrogen interactions were switched on and an additional potential for side chain dihedral angles taken from the CHARMM force field. The YadA-M structure ensemble was selected from the last 250 conformational samples. The estimated weight of the distance restraints was 11.1 ± 0.3. Validation of ISD restraints ands ensemble The restraint and structure statistics are listed in Table E. Crystal contacts and docking Strong cross-peaks were observed between hydrophobic residues belonging to strand β2 (I71), and β1 (F56) or β4 (F101), and pointing towards the outside of the barrel. These correlations could not be accommodated within the YadA trimeric structure, and were considered as possible inter-molecular contacts between neighboring trimers in the crystal. On the basis of these ambiguous contacts, we used HADDOCK (17) to calculate a model of the trimer-trimer interface. The most favorable arrangement 8

obtained is illustrated in Figure K. Two neighboring trimers are arranged in a tilted up-and-down disposition, where the main axes of the trimers make a ~130 angle. The interface mostly involved exposed hydrophobic residues on the membrane-facing side of the barrel. Due to the internal symmetry of the trimer, each YadA molecule in the crystal would be thus surrounded by three other trimers. The reconstruction of ten adjacent YadA molecules in the putative lattice produced no clashes. Remarkably, this arrangement of molecules is similar to the one observed in the Hia 1022-1098 crystal, with a 112 angle between trimers axes (18). Additionally, some of the cross-peaks rejected during the trimer calculation could be later assigned as crystal-contacts (Figure 2F in the main paper). 9

Figure A: Secondary structure topology and distance restraints. (A) Secondary structure prediction confidence. (B) Contact map showing the distance restraints obtained from 24 solid-state MAS NMR spectra. Colors indicate if a restraint is treated as intra-monomer (black), inter-monomer (green), or ambiguous (red) restraint. (C) Comparison of upper bound statistics between a solution data set of ubiquitin (1D3Z) and the solid-state restraints of YadA-M. 10

restraint type full set non-redundant set long-range ( i j > 5) 340 219 medium-range (4 i j 5) 25 19 short-range (2 i j 3) 290 219 sequential 1378 735 intra-residual 0 0 Table A: Number of distance restraints derived from manually assigned cross-peaks from 24 solid-state MAS NMR spectra. 11

experiment mixing times upper bounds [Å] CHHC 35/50/80/150/200/300/500 µs 3.75/4.0/4.5/5.75/6.0/6.5/7.0 NHHC 35/50/100/200 µs 3.5/4.0/5.0/6.0 Darr 200/300/500 ms 7.5/7.8/7.8 PDSD900 15/100 ms 4.5/6.25 Tedor 2.24/6/12 ms 5.5/6.5/7.5 PAR 2.25/6/15 ms 5.0/6.5/8.0 CPPI-DARR 300 ms 8.00 CPPI-DARR-DD 300 ms 7.80 Table B: 2D experiments that were used to define distance restraints in terms of upper bounds. Mixing times and upper bounds are listed in corresponding order. 12

No. structure distances phi/psi hbonds B-weighted positions 1 monomer + + - - 2 monomer + + + - 3 monomer - - - + 4 trimer + - - + 5 trimer + + + - 6 trimer + + + - Table C: Overview of structure calculations. + / indicates inclusion / omission of restraints. In the monomer calculations, 1144 distance restraints are available; in the trimer calculations, we have 1192 distance restraints. The predicted phi/psi angles comprise 120 restraints. In the monomer calculations, 46 intra-monomer hydrogen bonds are used (see Fig. D); in the trimer calculations, the number of hydrogen bonds is 61 (see Fig. G). The B factor weighted positional restraints are derived from the average monomer structure (Fig. E) obtained from simulation 2. 13

Distance restraints intra + ambiguous + Torsion angle restraints 1st monomer ensemble (simulation 1) H bonds intra 2nd monomer ensemble (simulation 2) average monomer (simulation 3) Symmetry 1st trimer assembly C2 symmetry 1st trimer assembly C3 symmetry (simulation 4) H bonds inter 1st trimer assembly C4 symmetry 2nd trimer assembly (simulation 5) Cluster analysis Cluster 2 Cluster 1 Cluster 3 Disambiguation Final trimer assembly (simulation 6) Figure B: Flow chart of the structure calculation with ISD. 14

Figure C: Effective restraint potential resulting from an upper bound U = 4.5 Å and a lower bound L = 1.8 Å (indicated as red dashed lines) at a weight k = 4.0. The restraint potential is the negative probability of observing U and L viewed as a function of the inter-atomic distance d. 15

Figure D: Structure ensemble obtained with simulation 1. (A), (C): Front and back views of the superimposed structure ensemble where color changes from blue (Nterminus) to red (C-terminus). (B), (D): Front and back views of the average structure obtained with robust superposition (7); color indicates local variance (blue: rigid, red: highly flexible). 16

Figure E: Secondary structure inferred from the monomer ensemble. (A) Posterior probability of secondary structure (red: helix, blue: beta-strand). (B) Observed mainchain hydrogen bonds indicated by black dots. (C) Filtered observed hydrogen bonds that are consistent with the inferred secondary structure and converted into distance restraints. 17

Figure F: (A) Mean structure obtained from simulation 2. The B-factor coloring indicates the local variability in the structure ensemble ranging from rigid (blue) to highly flexible (red). (B) Main chain hydrogen bonds inferred from mean structure. 18

Figure G: Structure ensemble obtained with simulation 4 (first assembly of the trimer). The structure ensemble is shown in chainbows coloring from side (A), top (B) and bottom (C). 19

Figure H: Hydrogen bonds inferred from the trimer ensemble. (A) Observed hydrogen bonds in simulation 4. (B) Filtered observed hydrogen bonds that are consistent with the inferred secondary structure and converted into distance restraints. 20

Figure I: Structure ensembles from simulation 5. (A), (B), (C): conformers with population weights 54% (blue), 32% (red), and 14% (green). (D): mean structures with same colors. The 12 first N-terminal residues are not shown because they are disordered. 21

statistic conformer 1 conformer 2 conformer 3 population size 54% 32% 14% heterogeneity (13-105, monomer) [Å] 0.74 0.60 1.0 total violations (> 0.5 Å) 36.0 ± 4.5 45.0 ± 4.8 36.0 ± 4.3 intra-monomer violations (> 0.5 Å) 15.0 ± 2.8 15.0 ± 2.1 12.0 ± 2.0 inter-monomer violations (> 0.5 Å) 1.0 ± 0.9 1.0 ± 0.8 1.0 ± 0.6 ambiguous violations (> 0.5 Å) 19.0 ± 2.7 28.0 ± 3.2 23.0 ± 4.0 circular variance phi [10-3 ] 46.2 ± 7.4 51.9 ± 9.9 57.8 ± 12.9 circular variance psi [10-3 ] 46.8 ± 11.1 46.5 ± 6.8 57.1 ± 11.3 1st generation packing quality -3.51 ± 0.24-2.77 ± 0.24-3.35 ± 0.22 2nd generation packing quality -3.37 ± 0.23-3.14 ± 0.23-3.61 ± 0.23 Ramachandran plot appearance -3.51 ± 0.60-3.02 ± 0.37-4.14 ± 0.31 Backbone conformation 0.33 ± 0.29 0.24 ± 0.39-0.09 ± 0.38 Table D: Conformer statistics from simulation 5. 22

Figure J: (A) Contact map showing the distance restraints used in the final refinement. (B) Colors indicate if a restraint is treated as intra-monomer (black), intermonomer (green), ambiguous (red). 23

Restraints and structure statistics Numer of restraints (per monomer) Distance restraints Intra monomer 1064 Intra-residual ( i j = 0) 16 Sequential ( i j = 1) 631 Medium-range (2 i j 5) 222 Long-range ( i j > 5) 195 Inter monomer 81 Ambiguous 193 Total 1338 Dihedral angle restraints (φ/ψ) 120 (60/60) Hydrogen bonds restraints (intra/inter) 71 (61/10) Restraints statistics Number of distance violations > 0.5 Å 9 ± 2 Number of distance violations > 0.3 Å 32 ± 2 RMS of distance violations 0.118 ± 0.004 Å Structural quality Ramachandran statistics a Most favoured regions 77.58 ± 1.38 Allowed regions 16.38 ± 2.09 24

Generously allowed regions 3.31 ± 1.40 Disallowed regions 2.70 ± 0.91 WHATIF Z-scores Backbone conformation 0.47 ± 0.44 2nd generation packing quality -2.88 ± 0.18 Ramachandran plot appearance -3.24 ± 0.55 χ1/χ2 rotamer normality -3.80 ± 0.61 Structural precision (residues 12-105) Backbone atoms (monomer) Heavy atoms (monomer) Backbone atoms (trimer) Heavy atoms (trimer) 0.74 ± 0.26 Å 1.38 ± 0.58 Å 0.84 ± 0.32 Å 1.45 ± 0.61 Å Table E: Restraints and structure statistics of the final YadA-M ensemble. a calculated with PROCHECK. 25

Figure K: Model of the organization of two YadA trimers (red and blue) in the crystal obtained by HADDOCK. The view in B is rotated by 90. 26

Oligomer (symmetry) Dimer (C2) Trimer (C3) Tetramer (C4) Coordinates precision (residues 12-105) a Backbone atoms, monomer (Å) 3.35 0.98 1.12 Backbone atoms, oligomer (Å) 7.24 1.03 1.24 Restraints statistics Number of restraint violations (per monomer) NMR Distances (> 0.1 Å) 31.5 2.65 3.80 Dihedral angles (> 5º) 10.35 0.45 1.15 Structure quality Ramachandran most favored regions (%) b 70.0 81.5 79.4 Backbone conformation (Z-score) c -2.15 0.20-0.30 2nd generation packing quality (Z-score) -4.66-4.05-4.40 χ1/χ2 rotamer normality (Z-score) -4.68-2.32-3.12 Clashscore d 35.40 5.61 4.18 Inter-monomer interfaces e Δ i G (kcal/mol) - -14.91-12.30 Δ i G P-value - 0.47 0.64 Table F: Structure ensemble statistics for the different oligomerization states calculated with ARIA. All reported values are the average value over the 20 lowest energy conformers. Best values are highlighted in bold. a Average root mean square deviation (RMSD) of the ensemble atomic coordinates with respect to the average structure. b Determined by PROCHECK (19). c Determined by WHAT-IF (20). d Number of inter-atomic clashes per 1000 atoms, determined by Molprobity (21). e Interfaces analysis performed with PISA (Protein interfaces, surfaces and assemblies (http://www.ebi.ac.uk/pdbe/prot_int/pistart.html) (22). Δ i G = solvation free energy gain upon formation of the interface. Negative Δ i G corresponds to hydrophobic interfaces, or positive protein affinity. Δ i G P-value indicates the P-value of the 27

observed Δ i G. P>0.5 means that the interface is less hydrophobic than expected, suggesting that it is an artifact. P<0.5 indicates interfaces with an unexpectedly large hydrophobicity, and thus a high specificity. 28

References 1. Y. Shen and A. Bax. Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology. J. Biomol. NMR 38, 289 302 (2007). 2. W. Rieping, M. Habeck, and M. Nilges. Modeling errors in NOE data with a lognormal distribution improves the quality of NMR structures. J. Am. Chem. Soc. 27, 16026 7 (2005). 3. J. Kuszewski, C. D. Schwieters, D. S. Garrett, R. A. Byrd, N. Tjandra, and G. M. Clore. Completely automated, highly error-tolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments. J. Am. Chem Soc. 126, 6258-73 (2004). 4. J. P. Linge and M. Nilges. Influence of non-bonded parameters on the quality of NMR structures: a new force-field for NMR structure calculation. J. Biomol. NMR, 13, 51 59 (1999). 5. J. P. Linge, M. Habeck, W. Rieping, and M. Nilges. ARIA: automated NOE assignment and NMR structure calculation. Bioinformatics 19, 315 316 (2003). 6. M. Habeck, M. Nilges, and W. Rieping. Replica-Exchange Monte Carlo scheme for Bayesian data analysis. Phys. Rev. Lett. 94, 0181051 4 (2005). 7. M. Habeck, W. Rieping, and M. Nilges. Weighting of experimental evidence in macromolecular structure determination. Proc. Natl. Acad. Sci. USA 103, 1756-1761 (2006). 8. W. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577 2637 (1983). 9. I. K. McDonald and J. M. Thornton. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 238, 777 793 (1994). 10. B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, and D. Baker. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364 1368 (2003). 11. M. Habeck. Statistical mechanics analysis of sparse data. J. Struct. Biol. 173, 541 548 (2011). 12. J. Kuszewski, A. M. Gronenborn, G. M. Clore, Improving the quality of NMR and crystallographic protein structures by means of a conformational database potential derived from structure databases. Protein Science 5, 1067-80 (1996). 13. B. Bardiaux, B. J. van Rossum, M. Nilges and H. Oschkinat. Efficient modeling of symmetric protein aggregates from NMR data. Angewandte Chemie 51, 6916-9 (2012) 29

14. A. T. Brunger, Version 1.2 of the Crystallography and NMR system. Nat Protoc 2, 2728-33 (2007) 15. W. Rieping et al., ARIA2: automated NOE assignment and data integration in NMR structure calculation. Bioinformatics 23, 381-2 (2007) 16. M. Hirsch and M. Habeck. Mixture models for protein structure ensembles. Bioinformatics 24, 2184 2192 (2008). 17. C. Dominguez, R. Boelens, A. M. Bonvin, HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. Journal of the American Chemical Society 125, 1731-7 (2003) 18. G. Meng, N. K. Surana, J. W. St Geme, 3rd, G. Waksman, Structure of the outer membrane translocator domain of the Haemophilus influenzae Hia trimeric autotransporter. Embo J 25, 2297-304 (2006) 19. R. A. Laskowski, J. A. Rullmannn, M. W. MacArthur, R. Kaptein, J. M. Thornton, AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. Journal of biomolecular NMR 8, 477-86 (1996). 20. G. Vriend, WHAT IF: a molecular modeling and drug design program. J Mol Graph 8, 52-6 (1990) 21. I. W. Davis et al., MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res 35, W375-83 (2007) 22. E. Krissinel, K. Henrick, Inference of macromolecular assemblies from crystalline state. Journal of molecular biology 372, 774-97 (2007) 30

Supplementary Note 3: Coiled-coil analysis Three-dimensional structures were analyzed with the Socket server (http://coiledcoils.chm.bris.ac.uk/socket/server.html) using default parameter settings (1). Solid-state MAS NMR structure of YadA-M The helix bundle of YadA-M shows classical coiled-coil packing up to the center of the barrel (see Figure A). The side chains of residues F18, L21, L25, L28 and V32 pack into the core of the coiled coil (a, d, a, d and a layers according to Socket). After residue G35, which should be part of the next d layer, the helix stretches and transforms into an almost extended structure. Instead of G35, the next residue, L36, shows knobs-into-holes packing (see Figures B and D). Thus L25, L28, V32 and L36 appear to define the core layers of an 11-residue (hendecad) repeat. Although hydrophobic residue L42 perfectly matches the d position in the canonical heptad repeat, it is no longer part of the coiled-coil core but form contacts to the barrel. Crystal structure of Hia A parallel three-stranded coiled coil is also detected in the N-terminal helix bundle of the Hia structure (PDB ID code 2GR7) (see Figure A). The N-terminal helix shows a heptad repeat with hydrophobic residues in a and d layers (residues V1010, L1013, V1017, V1020, A1024 corresponding to YadA-M residues F18, L21, L25, L28, V32). Socket detects a non-canonical interrupt: Unlike G35 in YadA-M, the corresponding glycine in Hia, G1027, is still assigned to the d position but A1031, which should be in a according to a heptad repeat, is also assigned to a d layer by Socket. Therefore, the canonical pattern starts to break at least at A1031 (a residue of the ASSA region), if not earlier at G1027. As for YadA-M, residues V1017, V020, A1024 and T1028 appear to from a hendecad repeat (see Figures C and D). Hydrophobicity analysis To verify if the results found in both anchor domain structures hold for the entire family, an analysis of the hydrophobicity of all TAA members that are part of the alignment was carried out (see Figure E, also shown as Figure 3B in the main text). For each sequence, we calculated the hydrophobicity using

the Kyte-Doolittle scale. At a and d positions F18, L21, L25, L28, V32 we observe strong conservation of hydrophobicity across all TAAs, which indicates that all members show a canonical coiled-coil structure for these residues. Interestingly and in accord with the YadA-M structure, the putative d- position G35 shows no preference for hydrophobicity but rather its sequential neighbor, L36, a residue that exhibits core packing in the YadA-M structure. This again supports a 7-11 pattern where the heptad defined by core positions F18-L21-L25 is followed by a hendecad defined by L25-L28-V32- L36. Positions G35 and S39, which would occupy d and a layers according to an uninterrupted heptad repeat, exhibit no conserved hydrophobicity. This indicates that the stretching of the helix by switching from a 7-repeat to an 11-repeat seen in the structures of YadA-M and Hia, in which positions G35 and S39 form contacts to the barrel, is conserved across all TAAs. Comparison with the stalk domain of YadA We compared the coiled coil of YadA-M with recent crystal structures of the stalk domain YadA (2). Alvarez et al. characterized the structure of various constructs of YadA stalk fragments that include the ASSA region. Mutations were introduced to force the hairpin region into a heptad packing and to counteract its low helix propensity. Figure F shows the crystal structures of these constructs superimposed onto the coiled coil of the solid-state NMR structure of YadA-M. Not all residues of the constructs could be resolved by X-ray crystallography; often the polypeptide chain cannot be built beyond the hairpin region. We observe that the coiled coil of YadA-M superimposes well onto the constructs, which themselves only differ considerably in the ASSA region. The structural heterogeneity observed in the YadA stalk fragments reflects our finding that the heptad repeat breaks at G35 and that core residues in the ASSA region and beyond (e.g. L42) need to establish contacts to the barrel in order to adopt a stable conformation.

Figure A: Socket analysis of the structure of YadA-M (top) and of Haemophilus Hia 2GR7 (bottom).

Figure B: Coiled-coil layers in YadA-M structure. G35 should be a d layer, but rather its neighbor, L36, shows core packing.

Figure C: Coiled-coil layers in Hia structure (2GR7). According to a heptad repeat, G1027 should form the next d layer.

Figure D: Switch from 7 to 11 repeat in YadA-M and 2GR7 right before the ASSA region (highlighted in orange).

Figure E: Conservation of hydrophobicity (Kyte-Doolittle scale) across the entire TAA family.

Figure F: Comparison of the solid-state MAS NMR structure of the membrane anchor domain of YadA with crystal structures of the stalk. (A) crystal structures of the stalk domain (PDB ID codes 3H7X, 3H7Z, 3LT6, 3LT7) with the ASSA region highlighted in orange; (B) same as (A) but with the coiled coil of YadA-M shown as red ribbon; (C) full length structure of YadA-M superimposed onto the stalk domain structures (barrel shown in surface representation); (D,E) close-ups of the hairpin region.

References 1. J. Walshaw, D. N. Woolfson. SOCKET: a program for identifying and analysing coiled-coil motifs within protein structures. J. Mol. Biol. 307, 1427-1450 (2001). 2. B. Alvarez, M. Gruber, A. Ursinus, S. Dunin-Horkawicz, A. Lupas. A transition from strong right-handed to canonical left-handed supercoiling in a conserved coiled-coil segment of trimeric autotransporter adhesins. J. Struct. Biol. 170, 236-245 (2010).

Supplementary Note 4: Side chain interactions Contacts between the stretch of small helical residues and beta barrel residues Contacts between the stretch of small helical residues with the ring of small barrel residues are corroborated by NMR correlations between A37-A68 and A41-G61. The cross peaks between these conserved residues were not used as constraints in the structure calculations; it was rather the other way around: according to the calculated structure, these residues are close in space, which led us to search for cross peaks that we - in retrospect - could attribute to these close contacts. We indeed found some cross peaks which are illustrated in Figure A. 2D-DARR 300 44 2 33 1 Figure A: Regions extracted from a 2D DARR experiment recorded with 300 ms mixing on YadA-M (strips 1-4, top). Long-range transfers involving conserved small residues in the α-helix and β-sheet, and between G61 (β1) and A68 (β2) are highlighted with red boxes. The conserved residues A37, A41, A68 and G61 are coloured in cyan in the ribbon model of YadA-M (bottom). 6.1

The top panel shows 2D contour plots of aliphatic and carbonyl regions extracted from a 2D 13 C- 13 C DARR experiment recorded with 300 ms mixing on YadA-M. Transfers involving conserved small residues in the α-helix and β-sheet are highlighted with red boxes. Mutual transfers between residues A68 and G61 in adjacent β-strands are also highlighted with red boxes. Strips 1 and 3 show the Cα and C regions with correlations for alanines in the β- sheets. Strip 2 shows sequential and long-range correlations involving the Cβ of alanines, strip 4 shows a correlation between the Cα of G61 and the C of A41. The conserved residues A37, A41, A68 and G61 are colour coded in cyan in the ribbon model of YadA-M, which is shown in the lower part of the Figure A. For clarity, the α-helix has been left out in the figure on the right, to obtain a better view of the position of A37 and A41 relative to G61 and A68 in β-strands 1 and 2, respectively. Non-covalent S O interactions between M96 and S38, S39 Methionine residues are good monitors for sample integrity. The methyl carbon chemical shift is extremely sensitive to oxidation of the methionine sulphur atom. We did not observe any change in the methionine chemical shifts over a period of several years. From the solid-state MAS NMR structure of YadA-M, we found that the sulphur atom of M96 in strand β4 is in close vicinity to the oxygen atoms of the serines S38-S39 (in the ASSA region) in the helix (S O distances ~ 3.5Å, see Figure B). The divalent electrophilic sulphur of M96 could in principle form a non-covalent interaction with the nucleophilic side-chain hydroxyl oxygens from S38 and S39 (see below). Such an interaction would explain why no degradation of our sample was observed by monitoring the (single) methionine, since the pseudo-oxidation of the sulphur by the serine oxygens protects it from oxidation. Figure B: Intramolecular S O interaction between the sulphur of M96 in the β-sheet and side-chain oxygens of S38 and S39 in the α-helix. Distances between O and S are about 3.5 Å. The long side-chain of M96 in the porelumen can bridge the gap between the sheet and the helix, and can form a non-covalent S O interaction. 6.2

X-ray crystallographic studies have shown that sulfoxide complexes show intramolecular, non-covalent S O close interactions. Moreover, intermolecular S O interactions were reported between sulfoxides and amides in solutions [1]. Similar intra- and intermolecular interactions involving the divalent sulphur of methionine residues were characterized in biomolecules [2]. In these interactions, the sulphur of methionine interacts with either the backbone carbonyl or side-chain carboxylate oxygen atoms. The divalent sulphur of methionine acts as an electrophile while the electron-rich oxygen behaves as a nucleophile. Such noncovalent interactions can occur when the separation between the sulphur and oxygen amounts less than 4 Å. Studies also suggests a potential stabilizing role of methionine residues in the protein core; it has been suggested that S O interactions supply a mechanism that reduces the number of unsatisfied hydrogen bonds [2]. Interaction between conserved G72 and L45 Contacts between the highly conserved hydrophobic residues L45 in the α-helix and G72 facing the pore lumen were observed in 2D DARR spectra recorded with longer mixing times (Figure C). Figure C: Interaction between L45 and G72: 2D Contour plots from 2D-DARR 500 ms (right side) and 2D-CPPI- DARR 300 ms (bottom left) showing long-range hydrophobic contacts between the highly conserved residues L45 and G72. 6.3

References 1. Nagao, Y., et al., Highly stereoselective asymmetric Pummerer reactions that incorporate intermolecular and intramolecular nonbonded S...O interactions. Journal of the American Chemical Society, 2006. 128(30): p. 9722-9. 2. Pal, D. and P. Chakrabarti, Non-hydrogen bond interactions involving the methionine sulfur atom. J Biomol Struct Dyn, 2001. 19(1): p. 115-28. 6.4

Supplementary Note 5: Helix propensity and disorder in YadA-M Secondary structure were predicted with TALOS+ (1) from solid-state NMR backbone chemical shifts. Regions predicted with low confidence mainly correspond to loops or turns (between N-terminal helix and β1-strand, between individual β-strands) but also to the Ala37- Ala40 region (ASSA), to a certain amount. In addition, TALOS+ is also not able to predict regular secondary structure with great confidence for Ala37, Ser38 and Ser39 (Figure A). To get more quantitative estimates of the flexibility of YadA-M residues, we used the Random Coil Index (RCI) program (2). RCI provides empirical correlations between secondary backbone chemical shifts from multiple nuclei and flexibility. It has been demonstrated high RCI values correlates with higher flexibility (2). In YadA-M, high RCI are predicted for the same ASSA region (Ala37, Ser38 and Ser39). As expected, residues belonging to more rigid secondary structure elements show lower random coil index values (Figure A). In addition, low helical propensities are reported by SSP (3) for the same region. In the ISD ensemble structure, this region appears substantially well defined as α-helical, and with little flexibility (Figure C and D) Experimental backbone chemical shifts are in good agreement with predicted shifts from the trimeric ISD structure ensemble (Table A). Root mean square deviations (RMSD) for Cα and Cβ atoms are 1.7 ppm and 1.5 ppm, respectively. Largest discrepancies are observed for residues Ser38, Ser39 and Leu42 (Figure B). For serine 38 and 39, two scenarios are possible: (i) electronic influence of Met96 whose sulfur is in close proximity of both serines oxygen γ, an effect not taken into account by the shift prediction method, or (ii) inconsistency between the backbone conformation of the two consecutive serine residues in the ISD ensemble (α helical) and the experimental secondary shifts that suggest random coil. Investigation of known protein structures also exhibiting Ser-Oγ and Met- Sδ in close proximity revealed no systematic difference between experimental and predicted Cα/Cβ chemical shifts for Serines (Table B). The largest observed shift difference, excluding terminal residues, corresponds to 1.8 times the overall RMSD for Cα, while in the case of the YadA structure, it is 2.2 and 2.7 times for Ser38 and Ser39, respectively. Furthermore, it is very unlikely that the weak hydrogen bonds potentially formed between Ser and Met sidechains could largely impact the Cα chemical shifts. To consider the latter scenario described earlier, ISD structures were minimized in the presence of chemical shifts restraints for Cα and Cβ atoms. This type of restraint intends, by 1

optimizing the molecular conformation, to reduce the discrepancy between experimental and theoretical chemical shifts. As expected, in this new ensemble (ISD CS refined) the agreement between experimental and predicted shifts is improved (Table A), in particular for Ser38 and Ser39 (Figure B). Yet, secondary structures are deteriorated with regards to the original ISD ensemble (Figure C). In particular, β strands are less well delimited. In addition, the α helical preference of the ASSA region (37-40) is less pronounced and residues 42-44 appear, to a certain extent, as 3 10 helix. On the other hand, the ASSA region displays higher atomic fluctuations in the CS refined ensemble (Figure D), as suggested by RCI and TALOS+. A latter refinement with hydrogen bonds restraints (ISD CS refined + HB) restores the quality of secondary structure elements, while yielding a similar (or better) agreement of chemical shifts (Table A and Figure C). There are three serine pairs in YadA-M ( 37 ASSA 40, 64 RSSQ 67 and 91 GSSD 94 ). Of these, the last two pairs are found in the loops that connect β strands 1-2, 3-4 and form polar residues at the outside facing edge of the β-barrel (see Figure E). The first pair (Ser38-Ser39) is part of the ASSA region in the α-helix and shows an intensity drop of Cα-Cb cross peaks (Figure F). Local dynamics of the amino acid residues can be deduced from relative peak intensities in different spectra as shown in literature (4). A comparison of amino-acid residues in YadA-M on the basis of their relative cross peak intensities (NCα-Cα/NC -Cα for glycines and NCα- Cβ/NC -Cβ for all other residues) reveal that cross peak intensities are consistent and are lower for residues in loop regions than those in the β-sheet and α-helix (see Figure G). Exception in the alpha helix is the ASSA region (cf. cross peak intensity of Gly35, Ser38 and Ser39). The residues in the periphery of beta strands also show weaker peak intensities. Gly72 and Ser73 show high peak intensities reflecting a higher degree of rigidity. Method Secondary structures and Random coil index values were determined from backbone 13 C and 15 N solid-state NMR chemical shifts with TALOS+ (1) and RCI (2), respectively. Secondary structure propensities were estimated from Cα/Cβ chemical shifts with SSP program (3). Theoretical backbone chemical shifts for the trimeric YadA-M structure were predicted with SHIFTX2 (5). Shift predictions were averaged over the 20 structures of the ensemble. RMSD, correlation coefficient and average bound violations ( ω exp -ω theo - 2

[σ exp +σ theo ] ) between experimental and theoretical shifts were also determined. Protein structures available in the Protein Data Bank (PDB) (6) and for which experimental backbone chemical shifts are available in the BMRB (7) were scanned for Serine Oγ and Methionine Sδ being in close proximity (cutoff distance 4.3 Å). For the matching entries, experimental and theoretical Cα/Cβ chemical shifts of the involved serine residues were compared. Conformers of the ISD structure ensemble were energy minimized with the program CNS (8) in the presence of carbon chemical shift restraints (9). With this term observed and expected Cα/Cβ secondary chemical shifts are harmonically restrained via a grid of phi and psi backbone angles. In addition, atomic coordinates of heavy atoms were confined close to their initial values through harmonic restraints. A similar minimization was performed with additional hydrogen bond restraints. These restraints were applied with a log-harmonic potential, consistent with ISD formalism, where the optimal force constant is automatically estimated (10). 3

Table A: Statistics for experimental and predicted backbone chemical shifts. Measure Atom ISD ISD CS refined ISD CS refined + HB Cα 1.65 1.41 1.41 RMSD Cβ 1.55 1.21 1.24 (ppm) C 1.59 1.56 1.55 N 4.19 4.01 3.95 Correlation coefficient <Violation> (ppm) Cα 0.950 0.964 0.964 Cβ 0.994 0.996 0.996 C 0.818 0.826 0.828 N 0.729 0.722 0.734 Cα 0.69 0.51 0.52 Cβ 0.64 0.47 0.52 C 0.74 0.71 0.71 N 2.08 1.97 1.94 Table B: Comparisons between experimental and predicted Cα and Cβ chemical shifts of Serine residues in presence of potential Oγ-Sδ hydrogen bonds with Methionine side-chains. PDB entry Serine Methionine Distance (A) Δ(ω exp -ωt heo ) / RMSD (Cα) Δ(ω exp -ωt heo ) / RMSD (Cβ) 1w7d 103 OG 101 SD 3.04 0.45 0.12 2npl 48 OG 45 SD 3.20 1.07 0.01 2aga 39 OG 74 SD 3.31 0.03 0.05 1rwu 57 OG 20 SD 3.73 1.35 0.93 2v1n 20 OG 16 SD 3.76 1.77 0.44 1w7d 102 OG 109 SD 3.81 0.76 1.26 1z1z 101 OG 102 SD 3.82 1.21 0.50 1z7r 3 OG 1 SD 4.08 3.36 1.94 1qts 728 OG 737 SD 4.09 0.25 0.17 1xo8 46 OG 87 SD 4.14 0.20 0.62 2e34 75 OG 134 SD 4.18 0.31 0.61 1z1z 90 OG 102 SD 4.23 0.70 0.51 1ls8 1 OG 86 SD 4.27 1.65 1.63 2i85 112 OG 134 SD 4.28 1.07 1.19 4

A Confidence 1.0 0.8 0.6 0.4 0.2 H E L 0.0 B Random Coil Index 0.12 0.10 0.08 0.06 0.04 0.02 0.00 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 S80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 C 1.5 1.0 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 S80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 H E 0.5 SSP 0.0 0.5 1.0 1.5 2.0 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 S80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 YadA sequence Figure A: Secondary structure propensities and multiple nuclei secondary chemical shifts predicted from backbone 13 C and 15 N chemical shifts. (A) Secondary structure prediction confidence reported by TALOS+, (B) Random Coil Index reported by RCI and (C) secondary structure propensity calculated by SSP (H: Helix, E: Strand, L: loop). A CS (ppm) 45 50 55 60 65 experimental predicted (ISD) predicted (ISD ref) B 70 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 N80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 60 CS (ppm) 50 40 30 20 E12 K13 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 L36 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 V51 K53 V54 N55 F56 T57 A58 V60 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 S73 Y75 R76 V77 N78 E79 N80 V81 A82 L83 K84 A85 V87 A88 Y89 A90 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 YadA M sequence Figure B: Comparison of experimental (red) and predicted 13 C Cα (A) and Cβ (B) chemical. Chemical shifts predicted from the ISD structure and ISD structure minimized against experimental shifts are plotted in blue and black, respectively. 5

A Frequency (%) 100 80 60 40 20 0 α helix 3 10 helix β strand Turn B Frequency (%) C Frequency (%) 100 80 60 40 20 0 100 80 60 40 20 0 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 N80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 N80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 N80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 Figure C: Frequency of secondary structure elements predicted by DSSP (11) in the ISD ensemble (A), ISD CS refined ensemble (B) and ISD CS refined + HB ensemble (C) along the YadA-M sequence. A 2.0 ISD ISD refined RMSF [A] 1.5 1.0 0.5 B RMSF [A] 3.5 3.0 2.5 2.0 1.5 1.0 0.5 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 N80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 E12 K13 G14 A15 H16 K17 F18 R19 Q20 L21 D22 N23 R24 L25 D26 K27 L28 D29 T30 R31 V32 D33 K34 G35 L36 A37 S38 S39 A40 A41 L42 N43 S44 L45 F46 Q47 P48 Y49 G50 V51 G52 K53 V54 N55 F56 T57 A58 G59 V60 G61 G62 Y63 R64 S65 S66 Q67 A68 L69 A70 I71 G72 S73 G74 Y75 R76 V77 N78 E79 N80 V81 A82 L83 K84 A85 G86 V87 A88 Y89 A90 G91 S92 S93 D94 V95 M96 Y97 N98 A99 S100 F101 N102 I103 E104 W105 YadA M sequence Figure D: Root Mean Square Fluctuation (RMSF) of backbone (A) and heavy atoms (B) of ISD (black) and ISD CS refined (red) structure ensembles. 6

Figure E: Contour plot of a 2D 13 C- 13 C DARR showing the serine region of the spectrum (right panel). The assignment of the ten observable serines is indicated. The six serines that occur in pairs in YadA-M are highlighted in red in the spectrum, and are colour-coded in the ribbon model shown at the left (the ribbon model is shown in two different orientations, the N-terminal coiled-coil region has been left out for clarity). Figure F: Integrated intensities of alpha-beta cross peaks for residues K27-A41 in the α- helix, plotted against residue number. A gradual intensity drop is observed in the region around A37, which corroborates an increased flexibility for this stretch of the α-helix. Note that G35 does not have a beta carbon and has been left out of the plot. 7