Introduction

Bacteroides are abundant and crucial members of the modern human gut microbiota. A key evolved feature of these bacteria is the ability of each strain to produce numerous (eight or more) distinct capsular polysaccharides (CPS)1,2 that are tightly regulated so that only one CPS is typically produced per bacterial cell. This bet-hedging strategy generates Bacteroides populations with great surface variability that protect from phage3,4,5 and mediate immune modulation, biofilm formation, antibiotic resistance, and inflammation6,7,8,9,10,11.

CPS diversity is achieved by regulating both transcription initiation and elongation of CPS biosynthesis operons. Bacteroides fragilis (Bfr) has eight distinct CPS operons, producing PSA–PSH. All but PSC use invertible promoters and all encode upxY (YX) and upxZ (ZX) paralogs as the first genes in each operon12,13. The fraction of each promoter oriented ON versus OFF varies with environmental conditions14. CPS promoter inversions are stochastic and multiple CPS promoters are oriented ON in most cells simultaneously15,16,17. Bacteroides prioritize expression of one promoter-ON CPS operon over others by regulating RNA polymerase (RNAP) elongation via the operon-specific YX elongation activator and ZX inhibitor of non-cognate YX.. ZX inhibits a subset of non-cognate YX possibly via direct binding (e.g., ZA from PSA may inhibit YE from PSE). Bfr YX paralogs must distinguish among eight target CPS loci to enable operon-specific regulation, but how this discrimination is accomplished is unknown.

YX family proteins are specialized (i.e., locus-specific) paralogs of NusG/Spt5, the only universal transcription factor found in archaea, eukaryotes, and bacteria18. NusG-family regulators bind RNAPs during transcript elongation and modulate RNAP activity through interactions with the RNAP and the surface-exposed ntDNA strand19,20,21. Globally acting Escherichia coli NusG and its single specialized paralog RfaH increase elongation rate and decrease pausing22,23,24,25. In contrast, Bacillus subtilis, Mycobacterium tuberculosis, and Thermus thermophilus NusGs enhance both pausing and intrinsic termination26,27,28,29,30. Pausing during transcript elongation is a universal regulatory feature of RNAPs that allows site-specific recruitment of transcription factors (TFs)31 and guides RNA synthesis.

Among the known NusGSP families, RfaH of Proteobacteria is the best understood. RfaH targets operons that contain a DNA element called ops (operon polarity suppressor) in their leader regions (DNA between the transcription start site and the translation start codon of the first gene). RNAP pauses at the 12-nucleotide ops, allowing RfaH to associate via sequence-specific interactions with a non-template strand DNA hairpin (ntDNAhp) exposed by the paused RNAP22,23,32,33. Other NusGSP include LoaP in Firmicutes21, TaA in Myxococcota34, and plasmid-encoded ActX in Proteobacteria35.

The CPS operon leader regions are required for Y-mediated regulation12, consistent with sequence-specific YX recruitment to RNAP paused in this region (Fig. 1a). In principle, YX could recognize ntDNA (like RfaH), nascent RNA (like LoaP), or both to discriminate among multiple, similar CPS operon targets. We used both in vivo and in vitro analyses to identify pauses in CPS operon leader regions, establish that these pause sites function as recruitment sites for Y, and discover NusGSP–DNA interactions and mechanisms that mediate Y–CPS operon specificity. We found that Z directly binds noncognate Ys to block Y action and that differential YX–ZX affinities enable CPS hierarchical control of transcript elongation. These results define mechanisms that explain the exquisite specificity of multiple NusGSP and that allow Bacteroides to program CPS diversity in the highly dynamic human gut environment.

Fig. 1: Bacteroides fragilis RNAP pauses in CPS operon leader regions in vivo and in vitro at candidate YX recruitment sites called opsX.
figure 1

a Representative CPS operon diagram highlighting YX and ZX, the first two genes in B. fragilis PSX operons. Horizontal triangles mark the inverted repeats recognized by Mpi recombinase for promoter inversion17. Proposed roles for CPS diversity in B. fragilis subpopulations (colored coats) are listed3,6,8,9,10,99. The schematics illustrate the proposed roles of YX activation and ZX inhibition of noncognate YX in generating subpopulation CPS diversity12,13. YX is recruited to RNAP paused at cognate but not non-cognate opsX sites that encode a pause hairpin (PH). ZX directly binds YX from heterologous operons and inhibits its recruitment. In vivo (NET-seq) and in vitro (PIVoT) methods for identifying RNAP pause sites (opsX) are illustrated. b Transcriptional pauses in CPS leader regions identified in this study are shown in comparison to the RfaH ops pause and the E. coli consensus elemental pause sequences37. T template strand, NT non-template strand. Fully conserved nucleotides are capitalized; largely conserved nucleotides are lowercase. Asterisks (*) indicate operons were only probed in vivo. c PIVoT assay of PSE promoter-distal leader regions. Assays included 1 µM NusA or 150 nM YE added concomitantly with NTPs as indicated. RNAs from a reaction time course were separated by 8% Urea-PAGE. d NusA and YX synergistic activities at cognate opsX sites. YX association manifests as pause inhibition or pause enhancement (aqua bars), or capture (blue bars). Fold changes in pausing or capture (if applicable) after addition of factors are shown relative to baseline (no factors control). Data are presented as mean values ± SD from n = 3 independent experiments.

Results

Bacteroides fragilis RNAP pauses in CPS operon leader regions in vivo and in vitro at candidate YX-recruitment sites (ops X)

Specific YX recruitment sites likely exist in CPS leader regions because these leader sequences are variable and are required for YX activity12. Since EcoRfaH is recruited to RNAP at leader region ops pause sites, we first asked if BfrRNAP pauses in the leader regions of CPS operons. To identify candidate YX-recruiting pause sites directly in vivo, we used nascent elongating transcript sequencing (NET-seq) (Fig. 1a, b and Supplementary Fig. 1a). NET-seq allows genome-scale identification of precise nascent RNA 3′ ends, which are enriched at pause sites36,37.

NET-seq revealed single prominent pause sites in most CPS operon leader regions (Fig. 1a, b and Supplementary Fig. 1b)37. Eight CPS leader pauses exhibited an obvious consensus sequence that resembles strong E. coli pauses (Fig. 1b) as well as apparent nascent RNA pause hairpins (PHs) that resemble those known to enhance pausing allosterically in concert with NusA in other bacteria (e.g., the so-called type-1 E. coli his and B. subtilis trp leader region pauses; Supplementary Fig. 1c)37,38,39,40. Pausing in the PSC leader region (the only Bfr CPS operon with a non-invertible, constitutively ON promoter)41 occurred at multiple sites; weak pausing occurred at a site resembling the other seven in sequence and location (Fig. 1b and Supplementary Fig. 1b). We designated the CPS leader pause sites opsX (‘X’ designates the CPS operon) based on analogy to the RfaH ops site.

To test whether the opsX pause recruits YX, we generated recombinant Bacteroides fragilis RNAP (rBfrRNAP) and assayed CPS leader regions using promoter-less in vitro transcription (PIVoT) (Fig. 1a and Supplementary Fig. 2a, b)42,43. PIVoT bypasses the need for σA-dependent initiation. We first asked if rBfrRNAP recognizes the consensus elemental pause signal defined for EcoRNAP (Fig. 1b)37. Signals resembling this consensus direct pausing by a wide variety of RNAPs from bacteria to human37,44,45. Bacterial pause sequences are reported to differ in some species46,47 and have not been tested for Bacteroidota. We found that rBfrRNAP pauses strongly at the consensus sequence but not anti-consensus sequence (Supplementary Fig. 2c), suggesting its pause signals resemble those of EcoRNAP and most other tested RNAPs.

We next assayed pausing in for six of the eight CPS leader regions (PSA, B, C, E, F, and H). Strikingly, the PSA, B, E, F, and H leader segments encoded prominent pause sites that corresponded exactly to the sites found by NET-seq (Supplementary Figs. 2d, 3). Pausing was less prominent but detectable at opsC, consistent with the heterogeneous pausing observed the NET-seq. We conclude that CPS operon leader regions encode strong pause sites for RNAP with similar but not identical sequences, as might be expected for YX recruitment sites that must distinguish among YX paralogs.

To ask if the CPS leader pauses function as targets for YX recruitment and test whether they are modulated by regulators like NusA and YX, we purified recombinant BfrNusA and YX for these six PSA operons (YA, YB, YC, YE, YF, and YH; Methods) and tested their effects on pausing using PIVoT. In Eco and Bsu, NusA stimulates pausing in part via contacts to PHs37,39,43,48,49,50. All six CPS leader pauses were greatly enhanced by NusA (Fig. 1c and Supplementary Fig. 4). Intriguingly, YA,B,E inhibited the cognate leader pause, whereas YC,F,H enhanced the cognate leader pause (Fig. 1c, d and Supplementary Fig. 4). YE additionally trapped a fraction of RNAP just downstream from the pause site, as seen previously with EcoRfaH (‘capture’ in Fig. 1c, d and Supplementary Fig. 4). Thus, YX association with paused elongation complexes (PECs) may manifest as either pro-pausing or anti-pausing activity.

Importantly, the effects of YX are likely to be specific to the NET-seq identified leader pauses, consistent with opsX sites functioning as specific YX-recruitment sites. YX only modulated pausing at cognate opsX but not non-cognate opsX or other positions (Supplementary Fig. 4). We conclude that the NET-seq-identified leader pauses are bona fide target sites for YX association with BfrRNAP. Notably, opsA,B,E encode putative ntDNAhps at [−11 to +1] that resemble the ops ntDNAhp known to recruit EcoRfaH (5′-GCGAGC stems; Fig. 1b and Supplementary Fig. 1c). The Bfr opsX ntDNAhp sequences differ, consistent with specific recruitment of cognate YX. However, opsF,H are identical in the ntDNAhp region, suggesting that some other element contributes to specificity.

RNAP capture by YXopsX interaction, which is evident by accumulation of RNAs a few nucleotides longer than the primary pause RNA for opsE but not opsA or opsB (Supplementary Fig. 4), suggests some but not all opsX sites exhibit pause cycling31,33,51. Pause cycling occurs when the ntDNA is captured by a regulator that also contacts RNAP (e.g., Ecoσ70 or RfaH), anchoring the PEC and hindering extension beyond 2–3 nt52,53. Trapped PECs can be rescued by RNA cleavage factors GreA,B33, creating a cycle that repeats until ntDNA contacts rearrange to allow normal elongation51.

Importantly, even in the presence of globally acting BfrNusG, YF still enhances opsF pausing (Supplementary Fig. 5a). Thus, YX appears to outcompete BfrNusG even though both NusG and its specialized paralog YX use the same primary binding site on RNAP (Supplementary Fig. 5b).

ZX inhibits YX at ops X through direct ZX–YX interaction

We next sought to test whether YX binding requires sequence upstream of the putative ntDNAhp region using in vitro binding, in silico interaction, and in vivo gene expression assays. YE is predicted to be inhibited by ZA but not by ZE or ZC in a strain with only the PSA, PSE, and PSC promoters oriented ON (expression hierarchy PSA > E > C)13,17. We call this strain [AE]ON for simplicity because the PSC promoter is constitutive13. To test our prediction, we measured ZA–YE.and ZE–YE binding constants by biolayer interferometry (BLI) (Fig. 2a, b). ZA but not ZE bound tightly to YE (KD ~ 0.9 nM vs ~88 nM). We conclude that ZX acts through direct YX binding.

Fig. 2: opsX pause sites are recruitment sites in vivo that enable YX-locus specificity, CPS hierarchical control, and can be re-wired to bypass direct inhibition by ZX.
figure 2

a ZX directly binds YX as revealed by biolayer interferometry (BLI100) over a range of ZX concentrations yielding biotin-YE–ZX on and off rates: YE–ZA kon = 1.1 × 106 ± 2.9 × 105 M1s1, koff = 9.0 × 10−4 ± 2.4 × 10−4 s1; YE–ZE kon = 1.1 × 105 ± 2.5 × 104 M1s−1, koff = 9.6 × 103 ± 7.3 × 104 s1. Assays were performed in triplicate and globally fit to a 1:1 binding model (see Methods). Reported KDs are averages from three independent global fits and errors represent standard deviations. b YE–ZA and YE–ZE binding curves. The control group used buffer in place of ZX and was subtracted from all binding curves shown (see Methods). Data are presented as mean values ± SD from n = 3 independent experiments. c AlphaFold354 model of YE–ZA and steric clash evident when YE–ZA is aligned to an RfaH-bound PEC23. d Fold changes in YE capture as a function of cognate ZE or non-cognate ZA concentration in PIVoT assays (Methods). Subsets of NusA, YE, variable [ZX], and NTPs were added to initiate pause assays (50 nM YE, 15–480 nM ZX, 1 µM NusA, final). The control group used buffer in place of ZX and was subtracted from all binding curves shown (see Methods). Data are presented as mean values ± SD from n = 3 independent experiments. e Model for ZX inhibition of YX recruitment. f Strain background used in opsX replacement experiments are depicted (∆mpi M44 in each strain ensured only the PSA, PSC, and PSE promoters are oriented ON). In WT, promoter orientations are variable in single cells, but some cells express PSE. PSE is an insertion mutant that abrogates PSE expression. In promoter-locked [AEON] strains, only YA-activated genes are expressed because of cross-operon inhibition of YE and YC by ZA. Strains with partial [−10:−1]E or full [−38:−1]E segments of opsE were replaced with their opsA counterparts ([−10:−1]A and [−43:−1]A) and assayed for their ability to rescue PSE expression by Western blot.

To understand how ZA might interact with YE, we predicted their association using AlphaFold 354 (Fig. 2c and Supplementary Fig. 6). The ZA–YE complex, which was predicted with high confidence, placed ZX on the RNAP-binding interface of YE. When modeled into an EcoRNAP-RfaH-ops-PEC (PDB 8PHK)33 by alignment of the YE NGN domain with the RfaH NGN, ZA clashed with two major PEC features: (i) the RNAP clamp helices (CH), which provide the primary RNAP binding site for all NusG-family regulators (Fig. 2c, orange); and (ii) the proximal upstream DNA duplex (usDNA). Thus, ZX likely inhibits YX by preventing its recruitment to RNAP at opsX pause sites.

We next used PIVoT to test whether ZA or ZE blocked YE inhibition of pausing at the candidate opsE pause site as predicted by the AlphaFold model. ZE blocked YE action only at high concentrations (KI approximating the KD measured by BLI; Fig. 2d). In contrast, ZA inhibited YE at all tested concentrations. We conclude that differential YX–ZX affinities enable CPS hierarchical control of transcript elongation (Fig. 2e).

YX targets extended ops X sites in vivo

Using these insights into ZX–YX interaction, we tested whether opsX pause sites function as YX recruitment sites in vivo and which sequences govern cognate YX function. Using a constitutive [AE]ON strain17, we replaced opsE segments with the corresponding opsA segments. We predicted that the opsEopsA swapped strain should activate PSE expression because YA should bind opsA in PSE. To ask if the PH-encoding region of opsX is required for YX recruitment, we also constructed a hybrid opsE–A strain in which only the ntDNAhp region corresponding to the RfaH ops but not the PH-encoding region of opsE was replaced with opsA sequence (Fig. 2f). Using antibodies confirmed to detect PSE in a WT strain but not in a PSE mutant, we tested for PSE expression in [AE]ON and derivative strains: ∆ZA, hybrid opsE–A, and full opsE→A (Fig. 2f). PSE was (i) not expressed in [AE]ON; (ii) expressed in ∆ZA; (iii) not expressed in the hybrid opsE–A strain; and expressed in the full opsE→A swapped strain.

To confirm that the upstream PH-encoding region is required for YX action, we also tested YA and YE effects similarly using PIVoT (Supplementary Fig. 7a, b). Neither YA nor YE modulated pausing or PEC capture at WT levels unless the full cognate opsX including the upstream PH-encoding region was present. Thus, both in vivo and in vitro, the cognate upstream PH-encoding region is required for full YX activity.

We conclude that opsX is comprised of both the ntDNAhp region and the upstream PH-encoding region. These regions are necessary and sufficient to program YX recruitment and enhancement of CPS-operon transcription. The inactivity of YX at hybrid sites establishes that the ~40 bp Bacteroides CPS opsX sequences differ fundamentally from the RfaH ops that requires only a 12-bp ntDNAhp sequence. Additional recognition of the upstream PH-encoding region likely aids YX discrimination among target sites. However, determining whether these upstream sequences contact YX as a nascent RNA hairpin, as proposed for LoaP55, or as duplex DNA required further experimentation.

YXops X pairs can be divided into distinct classes

To ask if the variability in opsX sequences could be related to variability in YX paralogs, we compared their apparent evolutionary relationships to sequence and structural alignments of YX, RfaH, and NusGs (Fig. 3a and Supplementary Fig. 8). Strikingly, both YX protein and opsX DNA sequences clustered into two distinct classes with two outliers (anti-pausing Class-1, PSA,B,E; pro-pausing Class-2, PSD,F,H; Outliers PSG,C) (Fig. 3b and Supplementary Fig. 8). We use the opsX pause site defined as position −1 as a reference in this analysis. Class-1 DNA–RNA sequences exhibited several key features: (i) an apparent ntDNAhp (orange arrows); (ii) an apparent PH that extends to −12 to −9 (red arrows; relative to −1 pause RNA 3′ nucleotide position); and (iii) the YX gene start codon is at +41, +42. Class-1 YX protein sequences (Fig. 3a) exhibited (i) an identical β2–β3 hairpin sequence in the NGN domain (LPTQFVIRQLYKRR[R/K]RVEVP); (ii) variable sequences (pink) in NGN α1 and α2 that contact the ops ntDNAhp (yellow), RNAP protrusion, and RNAP gate loop; and (iii) variability in the C-terminal KOW domain (Fig. 3a and Supplementary Fig. 8). The variable YX sequences in contacts to the ntDNAhp, protrusion, and gate loop are consistent with YX recognition and potential effects on pausing27,56,57, whereas variability in KOW may enable target specificity or coupling of transcription to other cellular processes.

Fig. 3: YX can be divided into distinct classes.
figure 3

a Analysis of sequence conservation and solvent accessibility of CPS operon YXs from B. fragilis (Genbank accession NC_003228.3; strain NCTC 9343). Known contacts to RNAP modules or DNA are based on structures of E. coli RNAP in complex with NusG (PDB 6C6U)23 or RfaH (PDB 6C6S)23 (bars on right). The structural model, based on PDB 8PHK33, depicts key NusGSP-interacting modules of a PEC (gate loop, protrusion, clamp, ntDNAhp, upstream and downstream DNA (usDNA and dsDNA) and features of NusGSP (NGN, KOW, β hairpin). The extent of sequence conservation among Class-1YX is shown on a magenta color scale mapped to RfaH in the 8PHK model and also compared linearly to conservation among all YX proteins. b Sequence comparisons among opsX annotated with features relevant to pausing and YX action compared to phylograms101 of YX and opsX shown on right. The red lines indicate the alternative clustering of opsC and opsG versus YC and YG relative to the uniform clustering of other YX and opsX sequences into Class 1 and Class 2.

The Class-1 PSA,B PHs have greater potential to extend towards the pause RNA 3′ end (teal highlight) relative to the PSE PH. Extension of PHs past −10 is thought to destabilize PECs at intrinsic terminators58, but we did not observe termination at these sites. An alternative role of PHs extending past −10 could be to aid PEC escape from pause cycles if auxiliary factors like GreA,B are insufficient. Thus, we postulated that base-pairing of the PSA,B PHs at −11, −10, and −9 could explain why, in contrast to YE, YA, and YB did not capture PECs in pause cycles (Supplementary Fig. 4 and Fig. 3b red highlight) (see next section). Based on an apparent ability to prevent PEC capture by YX, we call this PH extension the escape duplex (ED).

Pro-pausing Class-2 (PSD,F,H) sequences exhibited features that differed from Class-1 (Fig. 3b and Supplementary Fig. 8). For Class-2 DNA-RNA: (i) opsX lacks an obvious ntDNAhp; (ii) the apparent PH extends only to −14; and (iii) the YX gene start codon is at +9 relative to opsX. For Class-2 YX: (i) the β2–β3 hairpin sequence is variable with pattern of basic residues distinct from Class-1; (ii) NGN α1 and α2 also are variable but distinct from Class-1 and thus consistent with differential recognition and different effects on pausing; and (iii) the KOW domain exhibits greatly increased positive charge relative to Class-1 (Supplementary Fig. 8).

PSC,G were outliers whose YX and opsX clustered differently relative to Class-1,2. Their apparent PHs extended to −12 or −16, respectively. The YX start codons were at +111, +25 and both YX sequences were relatively divergent compared to Class-1,2. YC enhanced rather than inhibited the opsC pause (Supplementary Fig. 4). Class-2 YX and PSC YC exhibited charge similarity to the LoaP KOW proposed to bind RNA hairpins (Supplementary Fig. 8).

We conclude that YX regulators diverged during evolution to form at least two distinct classes within which the interactions that determine YXopsX specificity and pro- vs. anti-pausing action appear to have followed different trajectories.

ops X PHs stabilize PECs but also can aid escape of PECs captured by YX-DNA contacts

We next sought to assess the function of the putative opsX PHs (Fig. 3b). We focused on Class-1 opsX to investigate the impact of the PH and ED (Fig. 3b and Supplementary Fig. 9). The strong effect of NusA on Class-1 pauses (Fig. 1d and Supplementary Fig. 4) made it likely the PHs stimulate pausing39,43,48,49,50,59. Further, removal of the PH-encoding region from an opsE scaffold eliminated NusA-stimulation of pausing (Supplementary Fig. 10a). To probe the functions of the conventional opsE PH and the unconventional opsB PH + ED, we used complementary antisense oligonucleotides (asDNAs or asRNAs) to progressively disrupt the 5′ arm of the PSE,B PHs (Fig. 4a, c).

Fig. 4: Nascent RNA hairpins promote pausing or pausing-then-escape at opsX.
figure 4

a Experimental scheme. rBfrRNAP was reconstituted upstream of the opsE pause, enabling PH formation upon RNA extension. b Antisense DNA (asDNA; 10 µM final) effects on NusA enhancement of PH-stimulated opsE pausing, where different asDNAs disrupt PH formation to different extents. asDNA oligonucleotides were added concomitantly with NusA (or storage buffer) and NTPs (1 µM and 100 µM each NTP, final). Data are presented as mean values ± SD from n = 3 independent experiments. c Experimental scheme. rBfrRNAP was reconstituted upstream of opsB pause, enabling PH formation. d Antisense RNAs (asRNAs; 0.5 µM final) pairing with an escape duplex (ED, green; the ED is unique to PSA and PSB). asRNAs inhibited PEC escape, leading to accumulation of a captured RNA (opsB + 3 nt). Assays were performed in the presence of 1 µM NusA and 150 nM YB. The amount of RNA paused or captured 45 s after addition of NTPs is shown. Fold changes are relative to plus NusA and minus YB. Data are presented as mean values with data points from n = 2 independent experiments shown.

asDNAs that disrupt the PSE PH by pairing with the 5′ arm, but not those that pair just upstream, reduced pausing (Fig. 4b). Thus, the PH alone stimulates pausing at opsX and BfrNusA significantly stimulates pausing in a PH-dependent manner. We conclude that opsX sites are type-1 pauses that encode NusA-stabilized PHs, in notable contrast to the type-2 RfaH ops that lacks a PH38.

To test the idea that the apparent ED could aid escape of PECs, we measured the effect on capture of antisense RNAs (asRNAs) that disrupt the ED by pairing to the distal bases of 5′ arm of the opsB PH. opsB but not opsE encodes an ED, and YB does not cause PEC capture in contrast to YE (Fig. 4c, d and Supplementary Fig. 4). Addition of asRNAs that progressively disrupted the ED caused YB to capture PECs in pause cycles. Thus, opsB, and by analogy opsA, PHs not only stimulate opsX pausing synergistically with NusA to allow time for YX recruitment, but also use an ED to drive forward translocation at the pause. The ED breaks extensive contacts by YX necessary for its initial recruitment but problematic for subsequent EC escape.

YX distinguishes PECs via multipartite NGN interactions with exposed ntDNA and upstream duplex DNA

We next sought to determine how Class-1 YX proteins distinguish cognate vs. non-cognate opsX sites via the PH-encoding region (Fig. 2). Since the ntDNA of opsE and opsB are most similar, particularly at the key −6 ntDNAhp position (Fig. 3b and Supplementary Fig. 10b), we reasoned that the contribution of sequences upstream from the ntDNAhp might be most apparent by swapping regions between opsE and opsB. We used PIVoT to measure YX effects on NusA-stimulated pausing and capture using templates with opsE–B swapped sequences or YE–YB hybrid proteins that separate potential NGN vs. KOW contributions (Fig. 5a). To enable the direct comparison between opsE and opsB, we used a variant of opsB that lacked the ED (opsB, –ED).

Fig. 5: YX distinguish opsX binding sites through variations in the non-template DNA and upstream duplex DNA.
figure 5

a Diagram comparing Class 1 opsE and opsB sequences. Regions varied in experiments shown in (bd) are highlighted in magenta. b Experimental scheme to assay effects of the usDNA and PH. c PIVoT assays (1 µM NusA, 100 µM each NTP, 150 nM YX when added) comparing YB fold effects on capture on WT versus mutant scaffolds. YB causes capture in these experiments because native RNA structures are not permitted to form (including the escape duplex (ED)). The gel panel depicts time-courses of pausing on WT opsB vs a mutant having substitution 2 from (a). Right: results from a representative single time-point (i.e., 10 s after NTP addition) comparison of YB effects across multiple mutant scaffolds. Data are presented as mean values ± SD from n = 3 independent experiments. d Comparison of YB effects in the absence or presence of a PH (the opsB scaffold plus PH was designed to prohibit ED formation to enable this comparison). Data are presented as mean values ± SD from n = 3 independent experiments. e YX fold effect on pausing at the 45 s time point on WT opsE or hybrid sequences progressively mutated from opsE towards opsB with the ED disrupted (see Supplementary Fig. 11 for scaffold sequence). PIVoT assays were performed at a single timepoint (45 s). Data are presented as mean values ± SD from n = 3 or n = 6 independent experiments. f PIVoT assay of pause and capture for WT vs Hybrid NGN–KOW YX (150 nM each) on WT vs hybrid (labeled ‘−14ins’) opsX sequences. Data are presented as mean values ± SD from n = 3 independent experiments.

To ask if YX recognizes the upstream DNA or the PH RNA encoded by it, we first tested whether the upstream DNA sequences affected YX action in the absence of a PH (Fig. 5b). With the PH removed, YB stimulated RNAP capture at the opsB pause site by a factor of ~4.5 (Fig. 5c). When 3-bp segments of the opsB usDNA were replaced with opsE sequence, YB capture of RNAP decreased either modestly (substitutions 1 and 2) or nearly completely (substitution 3). We next asked if changing both the upstream DNA and the PH from opsB to opsE sequences had more effect on YB action than changing just the upstream DNA, as predicted if the PH functions in YB action. However, even combining the 1 + 2 + 3 substitutions in the upstream DNA and PH had no greater defect in YB action than introducing substitution 3 to the upstream DNA alone (Fig. 5d). We conclude that YB recognition of the extended opsB site depends on the usDNA and not on the PH RNA.

We next investigated the contributions of the upstream sequences in progressively interconverted opsE and opsB to PEC capture by YX (Fig. 5e and Supplementary Fig. 11). To simplify this comparison, we used a variant of opsB in which capture was activated by removing the ED (Supplementary Figs. 9, 11). Strikingly, YE continued to function even when the opsE ntDNAhp was changed to the opsB ntDNAhp. However, the YE effect was mostly lost and YB capture progressively increased as the usDNA was increasingly converted to opsB sequence (Fig. 5e). Thus, multiple segments of usDNA contribute to YB recognition of opsB. Consistent with our in vivo experiments (Fig. 2f), we conclude that opsX sequences are multipartite ntDNA and usDNA signals of ~40 nucleotides whose constituent parts variably contribute to YX recruitment in different CPS operons.

We next asked if the NGN alone recognizes opsX as it does for RfaH–ops interaction23,32 or if the KOW domain might also participate, as proposed for LoaP55. Attempts to purify a Class-1 NGN alone yielded only insoluble protein. Instead, we compared NGN–KOW YE–B hybrids to YE and YB on opsE, opsB, and an opsE-B hybrid scaffold (Fig. 5f). For both YE and YB, the effect on capture or pausing was determined completely by the NGN domains. We conclude that recognition of opsX by at least Class-1 YX is mediated by the NGN and not the KOW domain.

Class-1 YX protects upstream DNA from exonucleolytic cleavage

For the YX NGN to contact upstream duplex DNA, the DNA must distort from a canonical B-form trajectory departing the PEC (Fig. 6a). Although protein interactions can easily bend duplex DNA60, we sought direct physical evidence for usDNA–YX-NGN interaction. Exonuclease III (ExoIII) has been used extensively to detect PEC boundaries on DNA61,62,63. Since YE,B variably depend on distal usDNA in our activity assays, we assayed opsE,B with cognate YX.

Fig. 6: YX contacts sequences in the upstream duplex DNA and ntDNAhp to recognize cognate opsX via a capture-then-escape mechanism.
figure 6

a YX protects the distal upstream DNA from exonucleolytic cleavage. Exonuclease III (ExoIII) cleaves in the 3′-to-5′ direction but temporarily halts when encountering obstacles such as DNA-bound proteins. Protection was assayed in the absence or presence of YX or hybrid NGN–KOW at various timepoints. b Pseudodensitometry traces of template DNA cleavage products separated by 8% Urea-PAGE. Band intensities reflect relative levels of cleavage products after 5 s of exonucleolytic cleavage. Traces are representative of experiments performed in at least duplicate (Supplementary Fig. 12). c Quantification of upstream DNA protection from exonucleolytic cleavage at various regions on WT or mutant opsB scaffolds. Data are presented as mean values with data points from n = 2 independent experiments shown.

Over the full time course, YE,B strongly stabilized a −21 footprint, 6–7 base pairs upstream of RNAP (Supplementary Fig. 12). However, YB but not YE also slowed ExoIII digestion at −24, and −31 to −34. Further, these same upstream protections were caused by a YB,E NGN–KOW hybrid (Fig. 6b and Supplementary Fig. 12). We conclude that YB NGN likely contacts usDNA at least near −21 to −24, and −31 to −34.

As an additional test of the upstream YB contacts, we performed ExoIII assays on scaffolds containing opsB-to-opsE sequence changes to distal usDNA (−36 to −34 and −26 to −24) and proximal usDNA (−18 to −16). These substitutions strongly reduced upstream protection from ExoIII (Fig. 6c and Supplementary Fig. 13). Together, our results suggest a set of YX specificity determinants reflected in both physical contacts detected with ExoIII and sequence effects on YX activity.

To understand these contacts in a structural context, we modeled YE and YB into an RfaH- ops-PEC structure (PDB 8PHK)33. Both YE and YB are predicted to have a much larger positively charged surface approximately in the path of the usDNA (Supplementary Fig. 14). This charge is created largely by basic residues in the beta hairpin mini-domain of YE,B and could position the usDNA for sequence-specific readout by NGN.

Discussion

Human gut Bacteroides strains synthesize numerous surface CPS that are highly regulated to create subpopulations in which primarily a single PS locus is transcribed, providing phenotypic plasticity to environmental challenges. To coordinate CPS gene expression in a manner that maximizes CPS diversity, Bacteroides have developed a complex hierarchy involving locus-specific cognate YX activation and noncognate ZX inhibition.

We have elucidated the biochemical mechanisms of Bacteroides CPS hierarchical control (Fig. 7): (i) BfrRNAP pauses prominently at single CPS leader-region pause sites (opsX); (ii) opsX programs NusA-enhanced, RNA hairpin-stabilized transcriptional pauses that create time windows for YX recruitment; (iii) ZX inhibits non-cognate YX directly via differential binding affinities, forming a heterodimer that precludes YX recruitment by steric clash of ZX with RNAP and opsX; (iv) YX locus-specific recruitment depends on multipartite interactions of the YX NGN domain with the exposed opsX ntDNA and upstream duplex DNA; (v) YXs evolved into functionally distinct classes; and (vi) YX-bound PECs use different mechanisms to escape opsX. This combination of multiple functions at a single RNAP pause site has little precedent and may reflect the strong evolutionary pressure associated with the challenges of discriminating among multiple similar NusGSPs.

Fig. 7: Model for YX-specific recruitment.
figure 7

BfrRNAP pauses in the 5′ leader of CPS operons to provide time for YX recruitment. These pauses arise initially through RNA–DNA contacts to RNAP (elemental pause), then are stabilized synergistically by a PH and NusA. YX is recruited with high fidelity to cognate operons by multipartite ~40-bp opsX elements, with variable influence of constituent elements depending on the CPS operon. YXs that interact extensively with cognate opsX (YA, YB) are associated with escape duplex (ED)–encoding PHs, which provides force in the form of base-pairing to drive forward translocation and inhibit backtracking. Differences among escape mechanisms (e.g., those with or without EDs) may aid differential regulation.

Bacteroides belong to the greater phylum Bacteroidota, evolutionarily distant from the commonly studied model proteobacterium E. coli and firmicute B. subtilis. Despite the importance of these bacteria to human health, there is a limited understanding of Bacteroides transcription regulation. Our recombinant BfrRNAP overexpression system enables facile production and genetic manipulation of BfrRNAP. Multiple questions can now be addressed, including the roles of uncharacterized RNAP sequence insertions64, the molecular interactions of RNAP with TFs (e.g., σA) and small molecules (e.g., ppGpp), and sequence-dependent effects on transcriptional activities (e.g., backtracking, translocation, etc.). Recombinant RNAPs enable studies of both lineage-specific transcription mechanisms and evolutionary comparisons. rBfrRNAP will enhance mechanistic understanding in the entire field of transcription, as demonstrated by numerous recent studies in M. tuberculosis, C. difficile, and B. subtilis26,27,28,65,66.

We found that opsX recruitment sites for YX are ~40 bp multipartite DNA elements with both upstream duplex and transcription bubble ntDNA components, in striking contrast to the 12-nucleotide ntDNAhp (ops) necessary for RfaH-recruitment and the proposed nascent RNA hairpin necessary for LoaP recruitment32,55. The ntDNAhps formed by ops and opsX differ in apparent structure and position relative to the pause site. All eight ops sites in E. coli targeted by the single RfaH encode the same ntDNAhp sequence: 5′-GCGGTAGC67. The longer Bacteroides 5′-YGCGNAGCR ntDNAhps exhibit both similar (GCG..AGC stems) and distinct (loop) features compared to ops. These differences highlight how Bacteroides evolved to manage numerous NusGSP. Extensive YXopsX interactions may also accelerate Bacteroides adaptation by expanding the sequence space available for functional bifurcation following gene duplication.

We found that ZX inhibits YX recruitment to opsX–PECs directly, likely by blocking YX interaction with the conserved β′ clamp helices (CH) and the opsX usDNA. ZX could also tune heterologous operon PSX expression or limit self-expression through negative feedback. Ultimately, YX–ZX interactions define the cell surface architecture of Bacteroides. Our findings provide a foundation for understanding them.

The closer Class-2 start-codon proximity to opsX (9 bp) suggests that Class-2 YX may play a stronger role in ribosome association for coupled transcription–translation of the YX gene. Translation is not well studied in Bacteroides68,69,70,71, but both the similarity of anti-pausing by BfrNusG to EcoNusG (Supplementary Fig. 5) and the location of stop codons relative to intrinsic terminators72 suggests transcription and translation may be coupled in Bacteroides – like E. coli but unlike B. subtilis72,73,74,75,76,77,78,79,80,81,82. RfaH is thought to recruit ribosomes for coupled translation in E. coli83,84. Start codon GUG is thought to initiate ribosomes 5–10 times more weakly than AUG in E. coli85. Taken together, these differences are consistent with evolution of Class-2 YXopsX pairs for tight linkage of YX and ribosome recruitment at opsX sites immediately adjacent to the translation start site. Both these potential distinctions (relative to Class 1) in Class-2 YXopsX function and interesting differences evident for YCopsC and YGopsG require future experimental investigation.

We also discovered a regulatory RNA element—the opsX PH ED—involved in the regulation of PSA and PSB. The conserved role of PHs at opsX is to enhance pausing with NusA. The opsA,B ED provides a driving force to propel RNAP out of pause-cycling traps created by extensive interactions that occur at these sites. Possibly, opsE does not encode an ED because YE interacts with less sequence (Supplementary Fig. 12) and Gre factor may be sufficient for its escape as it is for RfaH33. Alternatively, the strong kinetic difference in escape mechanisms could be exploited by Bacteroides in CPS expression control. We propose that the ED evolved in response to evolutionary pressure to expand YX specificity.

Our results provide new mechanistic insights into transcriptional regulation by a large class of NusGSP, YX (UpxY). We find that determinants of transcriptional pausing in the phylum Bacteroidota resemble those found for other bacteria, but that recruitment sites for these NusGSPs differ notably both in being multipartite and much more extensive (~40 bp) than found for E. coli RfaH (~12 bp). Two aspects of the YX recruitment mechanisms provide precedent for new types of transcriptional regulation: (1) the upstream DNA is a sequence-specific platform for PEC regulation, and (2) pause hairpins can include escape duplexes that can drive escape from regulator-stabilized pauses. These discoveries highlight the importance of studying transcriptional mechanisms in diverse bacteria.

Methods

Plasmids, oligonucleotides, and strains used in this study are listed in Supplementary Tables S14. Nucleic acid scaffolds used in PIVoT assays are organized by figure in Supplementary Fig. 16. All reported measurements were taken from distinct samples.

E. coli strain construction

E. coli strain RL3569 was created by P1 transduction of RL1674 with donor strain RL357086 harboring the rifampicin-resistance mutation S522F in rpoB. Briefly, 5 mL of donor strain RL3570 was grown to saturation (overnight) in LB + 5 mM CaCl2. The next day, 50 µl of the donor strain was mixed with 100 µl of a 105 dilution (in LB + 5 mM CaCl2) of a freshly made P1 stock, then incubated at 37 °C for 20 min without shaking. 2.5 mL of 45c-equilibrated R top agar (0.8 % agar, 1% tryptone, 0.8% NaCl, 0.1% yeast extract, supplementing to a final concentration of 2 mM CaCl2 and 0.1% glucose after autoclaving) was added to the bacteria-phage mixture, flicked to mix, then poured evenly onto a thick, moist, freshly-made R plate (1.2% agar, 1% tryptone, 0.8% NaCl, and 0.1% yeast extract, supplementing to a final concentration 2 mM CaCl2 and 0.2% glucose after autoclaving). The plates were incubated at 37 °C overnight in a plastic bag with wet paper towels. The next day, the plate was transferred to a 4c room and overlayed with 5 mL of MC solution (10 mM MgSO4 + 5 mM CaCl2). After a 5 h incubation at 4 °C, the overlayed solution containing fresh P1 lysate was collected, 0.2 µm filter-sterilized, then stored in the dark at 4c until use. The recipient strain (RL1674) was grown to saturation (overnight) in LB + 5 mM CaCl2 + 20 µg chloramphenicol/mL. The next day, 100 µl of donor P1 phage serial dilutions were separately mixed with 100 µl of recipient strain overnight culture, then incubated at 37 °C for twenty minutes with no shaking. The mixture was plated on LB agar + 20 µg chloramphenicol/mL + 100 µg rifampicin/mL. Candidates were sequence-verified.

B. fragilis strain construction

Bacterial growth

B. fragilis NCTC 9343 (ATCC25285; Genbank assembly ASM2598v1) strains were grown in basal medium87 or on BHI plates supplemented with 5 mg hemin/liter and 2.5 µg vitamin K1/L. Mutants ΔmpiM4417, ΔmpiM44ΔupaZ13 and ΩPSE41 were previously constructed. For selection of cointegrants, gentamycin (200 µg/ml) and erythromycin (5 µg/ml) were added to the plates when indicated.

Construction of mutant PSE ops and HP-ops regions in 9343ΔmpiM44

Two different alterations to the PSE 5′ UTR were made in the ΔmpiM44 strain. In the first mutant, the ops sequence of the PSE locus (CTGCGAAGCATA) was replaced with the ops sequence of the PSA locus (ccgcgtagcgca). In the second mutant, a larger replacement was made and included the hairpin region adjacent to the ops sequence. The sequence from the PSE 5′ UTR (ttggctgagaaaaagagtctcacccaaCTGCGAAGCATA) was replaced with the sequence from the PSA 5′UTR (cggtttgaatgggaaaagatgtctcgtccaaaccgcgtagcgca). The recombinant plasmids were created by PCR amplifying two (ops) or three (HP-ops) DNA segments using Phusion polymerase (NEB) with ΔmpiM44 as template with the primers listed in Table S2. These segments were cloned into BamHI-digested pLGB1388 using NEBuilder (NEB). Plasmids were sequenced to confirm the correct assembly of the segments. Plasmids were conjugally transferred from E. coli S17 λpir to ΔmpiM44 and after overnight co-incubation, were plated on BHIS with gentamycin and erythromycin. The resulting cointegrants were passaged in basal medium for several hours and plated on BHIS with 50 ng anhydrotetracycline to select for double cross-over recombinants. These strains were tested by PCR for replacement of the PSE sequences with the respective PSA sequences and the genomes of these two strains were sequenced to confirm the correct replacements.

Western immunoblot analysis

Bacterial strains were grown overnight to an apparent OD600 of ~1.2. Bacteria were pelleted and resuspended in 1× LDS loading buffer (Invitrogen) and boiled for 5 min. Cell lysates (equivalent to 3.5 µl of the original culture) were loaded onto 4–12% NuPAGE (Invitrogen) and run with MES buffer until the 17 kDa molecular weight standard had run to the bottom of the gel to allow for migration of the high molecular weight PSE further into the gel. The contents of the gel were transferred to PVDF and blocked with 5% skim milk in TBS with 0.5% tween (TBST). The blot was probed with a mouse monoclonal antibody specific to PSE (Supplementary Fig. 15) used at 1:100 dilution, washed with TBST, and probed with a 1:2000 dilution of alkaline phosphatase conjugated goat-anti rabbit IgG (Invitrogen Catalog # 31340 Lot YA366475). After washing with TBST, the blot was developed with BCIP/NBT (KPL).

NET-seq

B. fragilis NCTC 9343 rpoC-3xFLAG was streaked onto BHIS plates and incubated at 37 °C anaerobically for 2 days. A swab from a dense area on the plate was used to inoculate overnight cultures. The next day, 10 mL of the overnight culture was used to inoculate 500 mL SBM (starting apparent OD600 0.04 as measured by a Denville® CO8000 Personal Cell Density Meter). When the apparent OD600 measured 0.65, cultures were removed from the anaerobic chamber and 300 mL was used for subsequent steps.

To harvest nascent transcripts for the NET-seq workflow, cultures were filtered between two vacuum filtration systems using a 0.45 µm pore nitrocellulose filter (GVS Micron Sep, 1215305). Cells were scraped off each filter using a spatula and plunged immediately into liquid nitrogen (i.e., cells from the same culture were combined into the same 50 mL conical tube containing ~25 mL liquid nitrogen). Collected cells were cryo-lysed using a RETSCH mixer mill (MM 400) as previously described37, with the exception that 50 mL stainless steel canisters and a 25 mm stainless steel ball were used to perform the cryomilling.

To isolate nascent transcripts, we performed a modified 3xFLAG-IP protocol with previously described buffers37. Specifically, the thawed grindate volume was scaled to 5.5 mL with lysis buffer (1× lysis stock [20 mM Tris, pH 8.0, 0.4% Triton X-100, and 0.1% NP-40 substitute], 100 mM NH4Cl, 1× EDTA-free cOmplete Mini protease inhibitor cocktail [Roche Diagnostics GmbH, 11836170001], 10 mM MnCl2, and 50 µ/mL RNasin [Promega, N211B], and 0.4 mg/mL puromycin), DNA was partially digested for 20 min with RQ1 DNase (0.054 µ/mL [0.02 µ/mL for the E. coli-only NET-seq pilot experiment])[Promega, M6101], and digestion reactions were stopped by addition of EDTA to 28 mM (final concentration). RNAP-nascent transcript complexes were directly immunoprecipitated using Anti-FLAG M2 affinity gel (Sigma, A2220) (i.e., without buffer exchange), and the precipitated RNAP-nascent transcript complexes were subsequently washed four times (1× lysis stock, 100 mM NH4Cl, 300 mM KCl, 1 mM EDTA, and 50 µ/mL RNasin)[Promega, N2515]. RNAP-nascent transcript complexes were eluted twice with 3xFLAG peptide (Sigma, F4799) (1× lysis stock, 100 mM NH4Cl, 2 mg/mL 3xFLAG peptide, 1 mM EDTA, and 50 µ/mL RNasin). Nascent transcripts were purified using a miRNeasy kit [Qiagen, 217084] as previously described37. However, to reduce phenol and chaotropic salt contamination, nascent transcripts were subjected to an additional overnight isopropanol-GlycoBlue (Invitrogen, AM9516) precipitation at −20 °C.

For nascent transcript library generation, we followed a modification of a previous NET-seq workflow36,37. Specifically, our workflow included using custom adaptors compatible with an Illumina NovaSeq X instrument. Likewise, the DNA adapter used for nascent transcript 3′ end ligation was adenylated using components from a NEB 5´ DNA Adenylation kit (E2610; 6 µM DNA linker [RL15032], 80 µM ATP, 6 µM Mth RNA ligase, and 1× Adenylation Reaction Buffer). The adenylation reaction was incubated for 4 h incubation at 65 °C, inactivated at 85 °C for 5 min, and precipitated overnight at −20 °C with isopropanol and GlycoBlue (Invitrogen AM9516). The precipitated, adenylated DNA linker was ligated to 750 ng of precipitated nascent transcripts, in duplicate, using components of a NEB T4 RNA Ligase 2, truncated (T4 Rnl2tr) kit (M0242; 10% DMSO, 22% PEG8000, 3 µM adenylated DNA linker, T4 Rnl2tr [14.7 µ/µL], RNasin [2 µ/µL], and 1× T4 RNA Ligase Reaction Buffer). These ligation reactions were incubated at 37 °C for 4 h. After this incubation, T4 Rnl2tr was inactivated by incubation with Proteinase K (0.04 µ/µL) (NEB, P8107) at 37 °C for 1 h. RNAs were fragmented, resolved, gel extracted, and precipitated as previously described36,37, with the exception that the gel extraction incubation at 70 °C was increased to 25 min. cDNAs were synthesized using a custom adapter (RL14637) and a previously described protocol36,37, with the exception that the reaction time was increased to 1 h. Circularization of gel extracted and precipitated cDNAs was performed using a protocol previously described36,37, with the exception that the circularization reaction incubation period was increased to 3 h and the gel extraction incubation period was increased as above. After circularization, cDNA libraries were PCR amplified using minimal cycles and custom adapters, gel extracted, and precipitated as previously described36,37. Library concentration and amplified product size distribution were determined using an Agilent TapeStation 4150. NET-seq libraries were sequenced by the University of Wisconsin-Madison Biotechnology Center on an Illumina NovaSeq X instrument.NET-seq data were processed using a combination of custom scripts and standard tools. Briefly, adapters, linker, and control oligos potentially contaminating each sample were trimmed from raw reads using cutadapt (v3.4). Reads with a minimum length of 14 nts were mapped to the B. fragilis genome (NC_003228.3) using Bowtie (v1.3.0) allowing both one mismatch and random assignment of reads mapping to multiple loci based on alignment stratum (Bowtie options --best -a -M 1 -v 1). Alignments were converted to BAM and BED files using samtools (v1.16.1) and bedtools (v2.30.0). The specific 3′ end counts for each genome position were determined using bedtools (options -d -strand - -5 [plus strand] or -d -strand + −5 [minus strand]).

rBfrRNAP cloning and purification

B. fragilis RNAP coding regions were codon-optimized using Gene Designer from DNA2.0 (now ATUM) using E. coli codon frequencies89 and amplified from synthetic DNA (IDT) of B. fragilis NCTC 9343, then cloned into a pRM756 backbone90, incorporating a His10-ppx tag at the C-terminus of β′ and a Strep tag at the N-terminus of β. RBS sites were optimized using denovodna.com91,92. This plasmid enables T7 overexpression of all subunits under IPTG control.

rBfrRNAP was purified similarly to E. coli RNAP93, with changes described below. Following transformation of RL3569 with pJS015, a colony was picked and inoculated into a 3 mL LB + 25 µg kanamycin/mL + 20 µg chloramphenicol/mL. Two milliliter of overnight culture was used to inoculate 2 L LB + 25 µg kanamycin/ml + 10 drops Sigma Antifoam Y-30 Emulsion in baffled Fernbach flasks and incubated at 37 °C. When the apparent OD600 reached 0.4, the temperature was dropped to 16 °C, overexpression was induced by addition of 200 µM IPTG, and incubation was continued with shaking at 200 RPM overnight (~18 h). Cell cultures were placed on ice for 20 min, then pelleted by centrifugation at 3000 × g for 15 min at 4 °C.

Moving forward, all steps were performed at 4 °C or on ice, and all buffers were filtered through 0.2 µm filters. Pellets were resuspended in 30 mL lysis buffer (50 mM Tris-HCl pH 8.0, 5% glycerol, 100 mM NaCl, 2 mM EDTA, 10 mM BME, 10 mM DTT, 0.1 mg/mL phenylmethylsulfonyl fluoride, with one dissolved tablet of Roche cOmpleteTM ULTRA EDTA-Free Protease Inhibitor Cocktail). The resuspended cell solution was sonicated for 20 min total (alternating sonication on/off times of 5 min) with settings Power 8, Duty Cycle 20%. The lysate was then transferred to round-bottom polycarbonate tubes and spun at 27,000 × g for 15 min. The supernatant was transferred to a 100 mL beaker with stir bar, then 6.5% PEI was slowly added to a final concentration of 0.6% while stirring. The solution was stirred for one hour, then transferred to open-top, round-bottom polycarbonate tubes and spun at 11,000 × g for 15 min. After decanting supernatant, a tissue homogenizer was used to resuspend the pellet in 25 mL of TGEDZ (10 mM Tris-HCl pH 8.0, 5 % glycerol, 0.1 mM EDTA, 5 µM ZnCl2, 1 mM dithiothreitol) with added 0.3 M NaCl. The solution was spun at 11,000 × g for 15 min. After decanting supernatant, a tissue homogenizer was used to resuspend the pellet in 25 mL of TGEDZ with added 1 M NaCl. The solution was spun at 11,000  × g for 15 min. The supernatant was transferred into a 100 mL beaker with stir bar, then finely-ground AmSO4 was added to the stirring solution to a final concentration of ~0.37 g/mL and precipitated overnight. The solution was transferred to Oak Ridge round-bottom tubes and spun at 27,000 × g for 15 min.

The pellet was dissolved in 35 mL of HisTrap Binding Buffer (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5 mM imidazole, 5 mM beta-mercaptoethanol (BME), then spun at 27,000 × g for 15 min in the same Oak Ridge round-bottom tube. The supernatant was filtered through 0.2 µm filters and applied at 1 mL/min to a HisTrap HP 5 mL column, pre-equilibrated with HisTrap Binding Buffer. The column was washed with HisTrap Binding Buffer at 5 mL/min until A280 reached baseline, then washed at 5 mL/min with 2% HisTrap Elution Buffer (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 1 M imidazole, 5 mM beta-mercaptoethanol [BME]) until A280 reached baseline. rBfRNAP was eluted at 5 mL/min with a 2–50% gradient of HisTrap Elution Buffer (translating to a 20–500 mM imidazole gradient). Three milliliter elution fractions containing rBfRNAP were pooled, filtered through 0.2 µm filters, then the NaCl concentration was reduced to 150 mM for the following purification step by dilution with TGEDZ buffer.

HisTrap elution fractions were pooled then diluted with 100 mM Tris-HCl, pH 8.0, 1 mM EDTA, 10 mM DTT to adjust the salt concentration to 150 mM NaCl. The sample was then applied to a 5 mL Strep-Tactin® XT High Capacity column pre-equilibrated with 2 CV Buffer W (100 mM Tris-HCl, pH 8.0, 150 mM NaCl, 1 mM EDTA, 10 mM DTT) at 2 mL/min. The flow-through was reapplied to the column at 0.037 mL/min. The column was then washed with 5 CV of Buffer W. rBfRNAP was eluted with Buffer BXT (100 mM Tris-HCl, pH 8.0, 150 mM NaCl, 1 mM EDTA, 10 mM DTT, 50 mM D+Biotin (Acros Organics)).

Pooled fractions from the previous step were applied at 1.5 mL/min to a HiTrap HP column pre-equilibrated with TGEDZ + 200 mM NaCl. The column was then washed with TGEDZ + 200 mM NaCl until A280 reached baseline, then rBfrRNAP was eluted with TGEDZ + 500 mM NaCl at 2.5 mL/min.

Pooled fractions from the previous step were dialyzed overnight in RNAP storage buffer (10 mM Tris-HCl, pH 8.0, 25% glycerol, 100 mM NaCl, 100 µM EDTA, 1 mM MgCl2, 20 µM ZnCl2, 10 mM DTT) using a 10 kDa MWCO cassette, then concentrated using Ultra-4, MWCO 100 kDa (Sigma-Aldrich Z648043-24EA) to a final concentration of 8 µM. The solution was then aliquoted, flash-frozen, and stored at –80 °C.

Cloning and purification of transcription factors

All TFs (NusG, NusA, YA, YB, YC, YE, YF, YH, YB(NGN)–YB(KOW), YE(NGN)– YB(KOW)) were cloned into a pTYB2 backbone (Addgene catalog N6702S) after PCR amplification from Bacteroides fragilis ATCC 25285 (NCTC 9343) genomic DNA by NEB HiFi DNA assembly (Gibson Assembly). This vector enables IPTG-inducible over-expression of proteins fused at the C-terminus to the Saccharomyces cerevisiae VMA intein and chitin-binding domain. Importantly, to ensure efficient self-cleavage via the intein, an Ala residue was incorporated at the C-terminus of all transcription factor coding sequences.

After plasmid sequence verification, RL1674 (E. coli BL21 RosettaTM (DE3)) was transformed by electroporation with pTYB2-derived constructs, then plated on LB agar with 100 µg ampicillin/mL and 20 µg chloramphenicol/mL (for retention of pRARE2 plasmid). For each expression construct, a single colony was picked and used to inoculate a 3 mL overnight LB culture grown at 37 °C containing the same concentration of antibiotics. The next day, 1 mL of overnight culture was used to inoculate a 200 mL LB culture containing antibiotics (3% ethanol was added for all YX constructs) and grown at 37 °C. When the OD reached 0.2–0.3, the incubation temperature was dropped to 16 °C and shaking continued for 30 min. Subsequently, a final concentration of 200 µM IPTG was added and incubation continued overnight (16–18 h). The next day, cultures were placed on ice for 20 min, then pelleted at 3000 × g for 15 min at 4 °C.

Pellets were resuspended in 40 mL of Chitin Wash Buffer (CWB; 30 mM Tris-HCl, pH 7.5–8.0 depending on protein pI, 0.5 M NaCl, 1 mM EDTA, 0.05% Tween® 20) plus one dissolved tablet of Roche cOmpleteTM ULTRA EDTA-Free Protease Inhibitor Cocktail. The cell suspension was sonicated 10 min at 20% duty cycle, Power 8. The lysate was pelleted at 30,000 × g for 30 min at 4 °C, then the supernatant was passed through 0.2 µm filters.

The subsequent steps were performed at room temperature closely following manufacturer’s instructions. Briefly, 3 mL of a homogenous suspension of NEB Chitin Resin (Catalog S6651L) were loaded into a 25 mL Poly-Prep Gravity Chromatography Column (Biorad), washed with 5 mL of mQH2O, then equilibrated by washing 3 times each with 10 mL of CWB. The lysate was subsequently loaded onto the column, then washed three times each with 10 mL of CWB. Cleavage Buffer (CB) was made by adding 500 µl of 1 M DTT (prepared fresh from solid reagent) to 10 mL of CWB, then a quick flush was performed by adding 3 mL of CB. SDS-PAGE revealed no premature elution in the quick flush fraction. Immediately after dripping stopped, the bottom and top of the column were capped, parafilmed, and the column was incubated at room temperature overnight (16–18 h) to allow sufficient time for cleavage. The next day, cleaved protein was eluted by addition of 1.5 mL CWB + 10 mM DTT, then dialyzed overnight in 10 mM Tris-HCl, pH 7.5–8.0 depending on pI, 2% glycerol, 100 mM NaCl, 100 µM EDTA, 10 mM DTT using a 10 K MWCO cassette. After removal from the dialysis cassette, additional glycerol was added to a final concentration of 25%. The solution was aliquoted, flash-frozen, then stored at −80 °C until use.

PIVoT assays

A direct reconstitution approach was used to assemble elongation complexes (ECs). Briefly, RNA and template DNA oligonucleotides were mixed at a ratio of 1:1.2 (5 µM: 6 µM) in transcription buffer (TB; 20 mM Tris-OAc, pH 7.7, 40 mM KOAc, 5 mM Mg(OAc)2, 1 mM DTT), then annealed by slow cooling in a thermocycler. To assemble 10× ECs, first the annealed RNA:tDNA scaffold and RNAP were mixed in TB and incubated for 15 min at 37 °C. Then, non-template DNA oligonucleotide was added and incubation continued for an additional 15 min at 37 °C. The solution was diluted with TB to prepare 2× EC (subtracting volume of further additions) and incubated for 1 min at 37 °C. Then, 5 µCi of [α-32P]NTP (depending on the scaffold) was added and incubated for 3 min at 37 °C. Additional GTP was added such that the final concentration of GTP in the solution was 10 µM, and incubation continued for 3 min at 37 °C.

2× ECs were aliquoted and all comparisons made were therefore performed with identically formed ECs. The assay was performed at 37 °C: transcription was restarted by addition of 2× NTPs minus/plus TFs or storage buffer. For Fig. 5c, YB was pre-incubated with halted ECs following reconstitution at −3 and incorporation labeling to −2 prior to restarting transcription. Timepoints were taken by mixing 5 µl reaction aliquots with 5 µl of 2× Stop Buffer (25 mM EDTA, 8 M Urea, 1× TBE, 0.1% bromophenol blue, 0.1% xylene cyanol). The ratio and concentrations of EC components in the 1× EC solution was 1:1.2:1.4:1.6 (R:T:RNAP:NT; 50 nM, 60 nM, 70 nM, 80 nM). The final reaction concentrations of TFs are indicated in each figure legend. Unless otherwise indicated, NTPs are added to a final reaction concentration of 100 µM. RNAs were resolved by 8% or 15% Urea-PAGE with 0.5× TBE running buffer until the leading dye ran off the gel. Gels were exposed to PhosphorImager screens and scanned using a Typhoon Phosphorimager. To quantify effects in ImageQuant, boxes were drawn around the pause band opsX, the capture band(s) (if applicable), and beyond. After subtracting background, the fractions of RNA at opsX or at capture positions were averaged and errors reflect standard deviation from at least three replicates (unless indicated otherwise).

For the Z-titration assay in Fig. 2d, data were fit in Kaleidagraph to a sigmoidal function of the form y = a + (b-a) / (1 + (x/c)d) where a= ymin, b is ymax, c is the ZX concentration at mid-point, and d is slope at mid-point; and weighted by standard deviation (error bars) from three assays.

Biolayer interferometry

Preparation of biotinylated-YE: pJS060 was cloned similarly to other pTYB2-derived constructs (see above), with the exception that two oligos were included in the Gibson assembly to introduce the 16 codon Avi-tagTM onto the N-terminus of upeY. Expression, cell harvesting, and lysis conditions are as described above. Avi-YE was biotinylated on a gravity column as described below:

The subsequent steps were performed at room temperature closely following NEB instructions. Briefly, 3 mL of a homogenous suspension of NEB Chitin Resin (Catalog S6651L) were loaded into a 25 mL Poly-Prep Gravity Chromatography Column (Biorad), washed with 5 mL of mQH2O, then equilibrated by washing 3 times each with 10 mL of CWB. The lysate was subsequently loaded onto the column, then washed three times each with 10 mL of CWB. The column was then washed with three times each with Avi Chitin Wash Buffer (AviCWB = 10 mM Tris 8.0, 0.5 M KGlu, 0.1% Tween20). Components from Avidity BirA500 Kit were used in the subsequent biotinylation reaction: a biotinylating solution (500 µL AviCWB, 70 µL of BiomixA, 70 µL Biomix B, 10 µL of 1 mg/mL BirA) was added to the column and the reaction was allowed to continue for 2.5 h. The column was subsequently washed three times each with 10 mL of CWB. Cleavage Buffer (CB) was made by adding 500 µl of 1 M DTT (prepared fresh from solid reagent) to 10 mL of CWB, then a quick flush was performed by adding 3 mL of CB. Immediately after dripping stopped, the bottom then the top of the column were capped, parafilmed, and the column was incubated at room temperature overnight (16–18 h) to allow sufficient time for cleavage. The next day, cleaved protein was eluted by addition of 1.5 mL CWB + 10 mM DTT, then dialyzed overnight in 10 mM Tris-HCl pH 7.5, 2% glycerol, 100 mM NaCl, 100 µM EDTA, 1 mM DTT using a 10 K MWCO cassette. After removal from the dialysis cassette, additional glycerol was added to a final concentration of 20%. The solution was aliquoted, flash-frozen, then stored at −80 °C until use. Importantly, Biotin-YE retained activity in vitro.

For each titration, 1 mL of 0.3 µM biotinylated-YE was prepared in Octet Binding Buffer 4.1 (OBB4.1 = PBS + 400 mM NaCl + 0.01% Triton X-100 + 0.25% BSA). ZA solution was prepared at 100 nM in OBB4.1 with twofold serial dilutions down to 1.56 nM. ZE solution was prepared at 500 nM in OBB4.1 with serial dilutions down to 31.3 nM. Plates were prepared for binding assays: in plate 1200 µL of OBB4.1 was placed in each well of column 1 containing a biosensor (up to 8 biosensors per experiment); plate 2 (containing ‘half-area’ wells permitting 100 µL volumes) column 1 contained 100 µL/well of OBB4.1, column 2 contained 100 µL/well of 0.3 µM biotinylated-YE, and column 3 contained 100 µl/well of ZX serial dilutions or buffer (as a blank/reference) prepared above.

A basic kinetics assay was performed using standard acquisition rates at 30 °C on a ForteBio Octet RED96 system. Octet® Streptavidin (SA) Biosensors were pre-equilibrated for 10 min at 30c. Step times: Baseline (Plate 2 Column 1 (P2C1)) = 60 sec; Loading (P2C2 = 320 sec (or until 2 nm loading density reached); Baseline (P2C1) = 60 sec; Association (P2C3) = ≥ 300 sec; Dissociation (P2C1) = ≥ 300 sec.

Data were processed using Octet Data Analysis Software. The reference biosensor curve (bio-YE + buffer in place of ZX) was subtracted from all binding curves. Traces were subsequently aligned along the Y axis at pre-association baseline with interstep correction performed at the dissociation step. Noise Filtering (Savitsky-GolayFiltering, smoothingfunction) was performed. Data from each experiment were independently globally fit. For each binding pair tested, two out of three global fits have R2 values around 0.95 or greater and chi-squared values less than 3 as recommended by ForteBio. Given the two orders of magnitude difference in binding constants, limited conclusions we are making, and parsimonious agreement of these constants among replicates and with our PIVoT assays, we deemed the fits overall acceptable. The average and standard deviation of the kinetic parameters from the global fits are reported. Equilibrium constants are calculated from models. The value ‘Req/Rmax’ is reported as fraction YE bound.

Exonuclease footprinting

Nucleic acid scaffolds used in exonuclease footprinting assays were each comprised of: (i) a 32P-labeled template DNA oligo, (ii) a non-template DNA oligo with four consecutive phosphorothioate bonds at the 3′ end, and (iii) an RNA oligo with 3′ end at the position of pausing in opsX and having noncomplementary bases upstream of the RNA-DNA hybrid to prohibit backtracking.

Template DNA oligo (20 μM) was labeled in a T4 PNK reaction with 1 μCi of [γ-32P]ATP and allowed to proceed for 15 mins at 37 °C. ATP (1 μL of 1 mM) was subsequently added to the reaction and allowed to proceed for 30 min at 37 °C. Reactions were stopped by heating at 65 °C for 20 min and oligos were subsequently purified using G-50 columns pre-equilibrated with TE and following the manufacturer’s instructions.

TECs were reconstituted essentially as described in in vitro transcription assays, except that the molar ratio of T:R:Pol:NT was 1:2:3:5 (50 nM 32P-T: 100 nM R: 150 nM RNAP: 250 nM NT). TECs were subsequently split into 35 μL aliquots and incubated with either storage buffer or YX variants for 3 min at 37 °C. Tubes were shifted to 30 °C and allowed to incubate for 3 min before removing a 5 μl aliquot (time 0) and mixing with equal volume 2× Stop Buffer. Exonuclease reactions were initiated by adding 100 μ of ExoIII, and aliquots were removed from reactions and mixed with stop buffer at times indicated in figures.

To quantify both transient and stable protection from exonucleolytic cleavage, pseudodensitometry traces were generated for the first timepoint lane. Regions of interest were identified by comparison to a sequencing ladder. Areas under the peaks of these regions were determined by manual integration in Microsoft Excel, then divided by the sum of the areas under all peaks to the right of it. These values were determined in the absence or presence of YB, and their ratio is reported as fold change (+YB/−YB) for each sequence variant.

Structural models

A model of YB was made using Modeller94,95 and fitted to 8PHK33. Additional upstream and downstream DNA were modeled using Pymol. The YE–ZA complex structure was predicted using AlphaFold 354, yielding an interface predicted template modeling (iPTM) score of 0.89 and predicted template modeling (pTM) score of 0.9 (values above 0.8 represent confident high-quality predictions). Additional confidence metrics are illustrated in Supplementary Fig. 6. RNA secondary structures were predicted using RNAFold95.

The BfrRNA polymerase PEC model was generated using Modeller96, the M. tuberculosis PEC formed on the B. subtilis trpL pause sequence (8E74)27, NusA and NusG NGN models from SWISS-MODEL97, and Porphymonas gingevalis RNAP (8DKC)98.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.