Biological context

SCoV-2 is a member of the Betacoronavirus family and contains a large single-stranded (+) RNA genome with a length of approx. 30,000 nucleotides (nts) (Hu et al. 2020; V’kovski et al. 2020). The RNA genome of the virus not only contains the coding regions for the viral proteins, but also extended and highly structured 5′- and 3′-UTRs, as well as internal structured RNA elements with important functional roles in genome replication, transcription of subgenomic (sg) mRNAs and the balanced translation of viral proteins (Madhugiri et al. 2016; Kelly et al. 2020; Tidu et al. 2020). While the development of antiviral therapeutics against COVID-19 is primarily focused on the viral proteins, the highly structured RNA elements provide an extensive reservoir of additional drug targets to be exploited. The architecture of the RNA genome of SCoV2 and related viruses has so far been investigated mainly by sequence-based computational predictions and by chemical probing approaches in vitro and in vivo (e.g. Manfredonia et al. 2020; Rangan et al. 2020). Although structural probing methods have been established to map RNA-small molecule interactions even in cells (Martin et al. 2019), these tools are unable to define the tertiary structure and dynamics of the RNA-elements in the SCoV-2 genome with sufficiently high resolution to enable structure-based drug design by virtual screening.

While the sequences of the individual structural elements vary between different Coronaviruses, their ubiquitous presence and highly conserved secondary structures suggest that these elements are critically important for viral viability and pathogenesis (reviewed in Madhugiri et al. 2016). One example of such an important structure is stem-loop 5 (SL5). SL5 is structurally conserved in the genomes of Alpha- and Betacoronaviruses and has been shown to be crucial for efficient viral replication (Chen and Olsthoorn 2010; Guan et al. 2011).

In SCoV-2, SL5 consists of four helices including nts 149–297 of the 5′-UTR and the first 29 nts of the Nsp1 coding region (Suppl. Figure 1A). Sub-elements are joined to the SL5 basal stem by a four-helix junction. These sub-elements are termed SLs 5a, 5b and 5c. SL5a consists of 31 nucleotides and represents the largest of the three stem-loops. Intriguingly, the apical loop sequences of SL5a and SL5b are identical (5′-UUUCGU-3′) and belong to the 5′-UUYCGU-3′ motif, which is also found in Alphacoronaviruses. This high level of sequence conservation suggests functional importance, e.g. in viral packaging (Masters 2019). Thus, we have recently obtained secondary structure models of SL5a-c and the basal stem segment of SL5 based on initial 1H and 15N assignments (Wacker et al. 2020). In order to characterize SL5a further, we provide here a near complete 1H, 13C and 15N chemical shift assignment.

Methods and experiments

Sample preparation

RNA synthesis for NMR experiments: For DNA template production, the sequence of SL5a together with the T7 promoter was generated by hybridization of complementary oligonucleotides and introduced into the EcoRI and NcoI sites of an HDV ribozyme encoding plasmid (Schürer et al. 2002), based on the pSP64 vector (Promega). RNAs were transcribed as HDV ribozyme fusions to obtain a homogeneous 3′-end. The recombinant vector pHDV-5_SL5a was transformed and amplified in the Escherichia coli strain DH5α. Plasmid-DNA was purified using a large scale DNA isolation kit (Gigaprep; Qiagen) according to the manufacturer’s instructions and linearized with HindIII prior to in-vitro transcription using the T7 RNA polymerase P266L mutant, which was prepared as described in (Guillerez et al. 2005). 15 ml transcription reactions [20 mM DTT, 2 mM spermidine, 200 ng/µl template, 200 mM Tris/glutamate (pH 8.1), 40 mM Mg(OAc)2, 12 mM NTPs, 32 µg/ml T7 RNA Polymerase, 20% DMSO] were performed to obtain sufficient amounts of SL5a RNA (5′-pppGGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCC-3′). Preparative transcription reactions (6 h at 37 °C and 70 rpm) were terminated by addition of 150 mM EDTA. SL5a RNA was purified as follows: RNAs were precipitated with one sample volume of ice-cold 2-propanol. RNA fragments were separated on 15% denaturing polyacrylamide (PAA) gels and visualized by UV shadowing at 254 nm. SL5a RNA was excised from the gel and eluted using the following protocol: The gel fragments were granulated in two gel volumes 0.3 M NaOAc solution, incubated for 30 min at − 80 °C, followed by 15 min at 65 °C. The RNA was further eluted from gel fragments overnight by passive diffusion into 0.3 M NaOAc, precipitated with EtOH and desalted via PD10 columns (GE Healthcare). Residual PAA was removed by reversed-phase HPLC using a Kromasil RP 18 column and a gradient of 0–40% 0.1 M acetonitrile/triethylammonium acetate. After freeze-drying of RNA-containing fractions and cation exchange by LiClO4 precipitation (2% in acetone), the RNA was folded in water by heating to 80 °C followed by rapid cooling on ice. Buffer exchange to NMR buffer (25 mM potassium phosphate buffer, pH 6.2, 50 mM potassium chloride) was performed using Vivaspin centrifugal concentrators (2 kDa molecular weight cut-off). Purity of SL5a was verified by denaturing PAA gel electrophoresis and homogenous folding was monitored by native PAA gel electrophoresis, loading the same RNA concentration as used in NMR experiments.

Using this protocol, two NMR samples of SL5a, an 810 µM uniformly 15N- and a 680 µM uniformly 13C,15N-labeled sample, were prepared and used for the assignment presented herein.

NMR experiments

NMR experiments using the 15N-labeled RNA were carried out at the Karolinska Institute (KI) using a Bruker AVANCEIII 600 MHz NMR spectrometer equipped with a 5 mm, z-axis gradient 1H [13C, 15N, 31P]-QCI cryogenic probe. All NMR experiments with the 13C,15N-labeled RNA were conducted at the Center for Biomolecular Magnetic Resonance (BMRZ) at the Goethe University (GU) Frankfurt using Bruker AVIIIHD NMR spectrometers from 600 to 800 MHz, which are equipped with the following cryogenic probes: 5 mm, z-axis gradient 1H [13C,31P]-TCI cryogenic probe (600 MHz), 5 mm, z-axis gradient 1H [13C, 15N, 31P]-QCI cryogenic probe (700 MHz) and 13C-optimized 5 mm, z-axis gradient 13C, 15N [1H]-TXO cryogenic probe (800 MHz).

At BMRZ and KI, experiments were performed at 298 K if not indicated otherwise. NMR spectra were processed and analyzed using Topspin versions 4.0.8 (GU) and 3.6.2 (KI). The chemical shift assignment was conducted using Sparky (Lee et al. 2015). NMR data were managed and archived using the platform LOGS (2020, version 2.1.54, Signals GmbH & Co KG, www.logs.repository.com). 1H chemical shifts were referenced externally to DSS, and 13C and 15N chemical shifts were indirectly referenced from the 1H chemical shift as described earlier (Wishart et al. 1995).

We have previously reported the imino and cytidine amino resonance assignment of SL5a (Wacker et al. 2020) that allowed us to determine the base pairing in this RNA element. The location of stable base pairs is confirmed by through space 2hJNN coupling constants (Dingley et al. 2008) reported in Suppl. Table S1. These assignments were available from experiments conducted on a 15N-labeled RNA sample and provided starting points of the aromatic proton resonance assignment using 1H,1H-NOESY (Tables 1 I, 2 I) and (H)C(CCN)H (Tables 1 IV, 2 V) experiments linking the imino proton resonances to the aromatic protons and carbons (Fig. 1a and b). The remaining H6/8–C6/8 resonances in the aromatic 1H,13C-HSQC spectrum (Tables 1 II, 2 III) were assigned using a 3D 13C-NOESY-HSQC experiment (Table 1 VII), which was selective for the aromatic region. Cytidine and uridine C5-H5 resonances were assigned using 1H,1H-TOCSY (Table 1 VI, Fig. 1e) and 1H,13C-HSQC spectra (Table 1 III, Fig. 1d). Furthermore, quaternary carbon atoms were assigned using an HNCO type experiment (Table 2 IV) and the TROSY relayed HCCH-COSY experiment (Table 1 VIII). The 13C-detected 3D CNC spectrum (Table 1 V, Fig. 1c) linked the aromatic carbons to the anomeric C1′ resonances, where the nitrogen dimension aided in distinguishing between purine and pyrimidine nucleotides as well as between uridines and cytidines. Also, by correlating C6/8 to C1′, resonance overlap is minimized given the broader signal distribution in the carbon as opposed to the respective proton dimensions. Based on C1′ resonances obtained from the CNC spectrum and from sequential assignment in the NOESY spectra, H1′–C1′ correlations were assigned in the 1H,13C-HSQC spectrum (Table 1 III, Fig. 1f). A continuous sequential walk of H1′-to-H6/H8 was possible for both helices (Fig. 1c). The H1′–C1′ assignment was further confirmed with a 3D 13C-NOESY-HSQC experiment (Table 1 IX), which was selective for the C1′ resonances. Using two different 3D HCCH TOCSY experiments (Table 1 X, XI and XII), the remaining ribose carbon resonances C2′–C5′ were assigned. The two experiments differed in the TOCSY mixing time such that with a short mixing time of 6 ms, C2′ and C3′ resonances could be distinguished by intensity differences, while with a long mixing time of 18 ms also C4′ and C5′ carbons were correlated to the C1′ resonances.

Table 1 List of NMR experiments conducted at KI and BMRZ at 298 K
Table 2 List of NMR experiments conducted at KI and BMRZ at 283 K
Fig. 1
figure 1

Resonance assignment of aromatic protons and carbons and the linkage to the ribose. a HCCNH experiment correlating the imino protons of guanosines and uridines to the corresponding intranucleobase C8 and C6 resonances, respectively. b 1H,13C-HSQC spectrum showing the aromatic H6/8–C6/8 correlations. c 2D plane of the 13C-detected CNC-experiment correlating C6/8 to C1′. d Transposed 1H,13C-HSQC spectrum showing the H5–C5 correlations for uridines and cytidines. e 1H,1H-TOCSY spectrum linking H5 and H6 in pyrimidines. f Transposed 1H,13C-HSQC spectrum of the H1′–C1′ region. Panel c further shows the secondary structure of SL5a with genomic numbering. Positive contours are given in black, negative contours are held in red. Experimental details are given in Table 1. Exemplary connections between the displayed spectra are demonstrated with the gray dashed lines for residues G213 and U191. Assignments of the asymmetric bulge and the apical loop are highlighted with bold font

The U-rich bulge

One of the structural features of the SL5a RNA is an asymmetric U-rich bulge (Fig. 1c). In this likely more dynamic part of the RNA, a near to complete sequential walk (H6/8 to H6/8 or H1′) was possible and thus, all aromatic H6/8–C6/8 correlations were assigned. With the aromatic assignment at hand, the strong imino resonance of a uridine involved in non-canonical base pairing was assigned to residue U194 using the (H)C(CCN)H experiment at 283 K. From observation of this signal, the formation of a base pairing involving U194 and likely either U211 or U212 is suggested. This is further supported by an imino-to-imino NOE contact between U194 and a non-canonical uridine at 273 K. Furthermore, from the U194 carbon chemical shifts in the HNCO experiment, we conclude that the hydrogen bonding interaction is mediated through the C2 carbonyl group (Fürtig et al. 2003; Ohlenschläger et al. 2004). The existence of a GU- wobble base pair involving residues U195 and G210 has not been confirmed, yet. However, broadened imino proton resonances for an additional guanosine and uridine, which are taking part in non-canonical interactions, are observed at low temperature (283 K).

The 5′-UUUCGU-3′ hexaloop

In addition to the U-rich asymmetric bulge (Fig. 1c), SL5a features a 5′-UUUCGU-3′ hexaloop, which also caps the helix of SL5b in the 5′-UTR. Except for residue U205, all aromatic loop assignments were derived from sequential NOE correlations, e.g. H6/8 to H5 or H1′ to H6/8 sequential contacts. Since the central residues of this loop sequence, 5′-UUCG-3′, resemble a highly abundant and well-characterized tetraloop sequence (Cheong et al. 1990; Fürtig et al. 2004; Nozinovic et al. 2010), we asked, whether structural features of this UUCG tetraloop are also found within the 5′-UUUCGU-3′ hexaloop of SL5a. While the characteristic imino proton resonances of the sheared GU base pair in the 5′-UUCG-3′ tetraloop remained elusive in SL5a spectra (e.g. 1H 1D or 1H,15N-HSQC), 1H,13C-HSQC spectra of the ribose region of SL5a and a 14 nt RNA with a 5′-cUUCGg-3′ tetraloop (secondary structure Suppl. Figure 1B) yielded a similar peak pattern (Fig. 2a and b). Here, it is evident that the chemical shifts of the central two nucleotides of the 5′-UUUCGU-3′ hexaloop, U202 and C203, are in good agreement with the respective counterparts in the 5′-cUUCGg-3′ tetraloop. This observation is also reflected in the canonical coordinates (Ebrahimi et al. 2001; Cherepanov et al. 2010), which suggest the ribofuranosyl ring to adopt the C2′-endo conformation for U202 and C203, while the remaining nucleotides (with a complete ribose carbon assignment) adopt the canonical C3′-endo conformation (Fig. 2c). These spectral data suggest a structural similarity between the middle part of the 5′-UUUCGU-3′ hexa- and 5′-cUUCGg-3′ tetraloop. This might not hold true to the same extent for the flanking residues U201 and G204 as characteristic resonances are absent in the 1H,13C-HSQC spectrum of the ribose region (Fig. 2a and b). Thus, the detailed loop architecture remains subject to further structural investigation.

Fig. 2
figure 2

Comparison of 1H,13C-CT-HSQC spectra of the ribose regions of a SL5a and b a 14 nt RNA with 5′-cUUCGg-3′ tetraloop (Fürtig et al. 2004; Nozinovic et al. 2010). Positive contours are given in black, negative contours in red. Experimental details are given in Table 1. The loop sequences are displayed and U202/U7 and C203/C8 resonances are highlighted in bold font. c Canonical coordinates for all residues of SL5a with a complete carbon ribose assignment. For comparison, the canonical coordinates of residues U7 and C8 of a 14 nt RNA with 5′-UUCG-3′ tetraloop are given in red

Assignment and data deposition

The nearly complete resonance assignment of SL5a builds on the imino resonance assignment published earlier (Wacker et al. 2020). Starting from this assignment, all 33 aromatic H6–C6 and H8–C8 correlations were unambiguously assigned. Furthermore, the H2–C2 correlations of the two adenosines present in this RNA as well as all of the H5–C5 correlations of the uridines and cytidines were unambiguously assigned. In addition, the quaternary carbon atoms of the nucleobases in purines (C2: 77%, C4: 69%, C5: 62% and C6: 92%) and pyrimidines (C2: 15% and C4: 15%) were partially assigned. Here, uridine C2 and C4 resonances as well as guanosine C2 and G-1, G188, G198 and G208 C6 resonances were assigned at 283 K. Also, non-protonated tertiary nitrogen atoms of purines (N3: 15% (only adenosines assigned), N7: 100% and N9: 100%) and pyrimidines (N1: 95% and N3: 80% (cytidines)) were successfully assigned to a large extent. Within the ribose moieties, 91% of the H1′ and 91% of the C1′ atoms were assigned. Within the remaining ribose carbon atoms C2′–C5′, 77% were assigned. In summary, we assigned 97% of the 1H (H6/8, H5, H2, H1′) and 92% of the 13C (C6/8, C5(pyr), C1′) atoms, which are considered most important for an in-depth structural characterization. We updated the BMRB deposition with code 50346.