Biological context

The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) represents a serious threat to the stability of human societies throughout the world. In order to eradicate the disease, it is necessary to foster research aimed at the development of vaccines or effective antiviral inhibitors. The establishment of effective therapeutic strategies requires the identification of molecular targets against which inhibitory strategies can be developed.

The nucleoprotein (N) of SARS-CoV-2 is a crucial component of the replication machinery, localizing to the viral replicase–transcriptase complex (4, 5, 7–9). Beta-coronaviral N encapsidates the viral genome within a nucleocapsid, formed from multiple copies of the protein (Zúñiga et al. 2007; McBride et al. 2014), thereby providing protection from the host immune system, and is essential for regulation of viral gene transcription (Wu et al. 2014). N is also produced in abundance in infected cells and is an important marker for infection. In order to identify possible avenues targeting the essential roles of N in the viral cycle, it is important to understand the structure and function of this complex protein at atomic resolution.

N is a 419 amino acid multi-domain protein, comprising folded, RNA-binding (N2) and dimerization (N4) domains spanning residues 45–175 and 266–365 respectively (Chang et al. 2014). N2 has been described using both X-ray crystallography (Kang et al. 2020; Peng et al. 2020) and NMR spectroscopy (Dinesh et al. 2020). The dimerization domain N4 has been described using X-ray crystallography (Ye et al. 2020; Peng et al. 2020) and N has been shown to form higher order oligomers via N4 (Ye et al. 2020; Zeng et al. 2020; Lutomski et al. 2020). The remaining 164 amino acids comprising N1, N3 and N5 are predicted to be intrinsically disordered, but despite evidence that these domains are essential for function in the related Mouse Hepatitis virus (Keane and Giedroc 2013), there is currently no atomic resolution information describing their behaviour in solution. Nuclear magnetic resonance is the ideal tool for studying the conformational behaviour of highly dynamic or intrinsically disordered proteins (Jensen et al. 2014).

Here we present the assignment of the backbone resonances of the first two intrinsically disordered domains (N1, spanning residues 1–44 and N3, the central disordered domain, spanning residues 176–265). The N3 IDR (175–263) comprises a serine-arginine (SR) rich domain that is phosphorylated in virions, a modification that plays a role in both function and localization of SARS-CoV-1 N (Peng et al. 2008). N5 is thought to contribute to the oligomerization of N in SARS-CoV-1 (Lo et al. 2013).

Recent studies used single molecule FRET and molecular modelling (Cubuk et al. 2020) to describe the global conformational behaviour of N1 and N3. Peptides representing the SR region of N3 in its phosphorylated and non-phosphorylated forms were also described using NMR spectroscopy (Savastano et al. 2020). Mass spectrometry has also revealed a number of auto-catalytic sites in N, two of which are present in N3 (Lu et al. 2020; Lutomski et al. 2020).

N from SARS-CoV-2 has been shown to undergo liquid–liquid phase separation (LLPS) upon mixing with RNA (Cubuk et al. 2020; Savastano et al. 2020; Chen et al. 2020; Iserman et al. 2020; Perdikari et al. 2020; Carlson et al. 2020; Lu et al. 2020; Jack et al. 2020). The formation of membraneless condensates by colocalization of components involved in viral replication in liquid droplets is prevalent in negative sense single strand RNA viruses, such as rabies (Nikolic et al. 2017), measles (Zhou et al. 2019; Guseva et al. 2020), and Nipah (Ringel et al. 2019). It is not yet known which interactions are responsible for LLPS in SARS-CoV-2, but it is highly likely that the intrinsically disordered domains are involved in this process. Moreover, N2 and N3 were shown to be essential for phase separation with RNA (Lu et al. 2020).

Here, we present backbone resonance assignment and NMR relaxation of two domains (N1 and N3) of N, providing data prerequisite for characterisation of the functional modes of this enigmatic protein, screening of interaction partners and detailed mapping of interactions.

Methods and experiments

Sample preparation

The primary sequence of N from SARS-CoV-2 was extracted from NCBI genome entry NC_045512.2 [GenBank entry MN908947.3 (Wu et al. 2014)]. Commercially synthesized genes (Genscript Biotech) were codon-optimized for expression in Escherichia coli and subcloned in a pET21b(+) vector. Hexa-histidine and TEV-cleavage tags were included at the N-terminus to facilitate protein purification. After protease cleavage, the proteins contain N-terminal GRR- extensions. For 15N and 13C isotope labelling, cells were grown in M9-minimal medium supplemented with 15N–NH4–Cl and 13C6d–glucose (1 and 2 g/L respectively).

Both nucleoprotein constructs (N123, comprising residues 1–263 and N3, comprising residues 175–263) were cloned into a pESPRIT vector between the AatII and NotI cleavage sites with His8-tag and TEV cleavage sites at the N-terminus (GenScript Biotech Netherlands). Transformation was performed by heat-shock and proteins were expressed in E. coli BL21(DE3) (Novagen) for 5 h at 37°C after induction at an optical density of 0.6 with 1 mM isopropyl–b–d–thiogalactopyranoside. Cells were harvested by centrifugating at 5000 rpm, resuspended in 20 mM Tris (pH 8.0) and 500 mM NaCl buffer, lysed by sonication, and centrifugated again at 18,000 rpm at 4°C. The supernatant was subjected to standard Ni purification. Proteins were eluted with 20 mM Tris (pH 8), 500 mM NaCl and 500 mM imidazole. Samples were then dialysed against 20 mM Tris (pH 8), 500 mM NaCl, 2 mM DTT at room temperature overnight. Following TEV cleavage, samples were concentrated and subjected to size exclusion chromatography (SEC, Superdex 75/200) in 50 mM Na-Phosphate (pH 6.5), 250 mM NaCl 2 mM DTT buffer (NMR buffer). Proteins were studied at 600 and 150 μM for N3 and 0.91 mM for N123.

NMR experiments

BEST and BEST-TROSY (BT) double and triple resonance assignment experiments, including BEST-HNCA, BEST HN(CO)CA, BT-HNCO (Lescop et al. 2007; Solyom et al. 2013), were recorded on 15N,13C-labeled samples at 298 K using Bruker Avance III spectrometers equipped with a cryoprobe at 1H frequencies of 600 and 850 MHz. R1rho relaxation experiments (Lakomek et al. 2012) were recorded at 150 μM protein concentration and 298 K in a 50 mM Na-Phosphate (pH 6.5), 250 mM NaCl 2 mM DTT buffer at 950 MHz.

Residues 217–224 and 231–235 of N3 were assigned using a deuterated 13C 15N labelled sample at 300 µM. BEST-HNCA, BEST-HN(CO)CA with one increment in 15N dimension were recorded at 850 MHz, allowing detection of weak additional peaks of the helical region and connection with those already assigned. All spectra were processed using NMRPipe (Delaglio et al. 1995) and analyzed using CCPNMR Analysis Assign (Skinner et al. 2016) and NMRFAM-SPARKY (Lee et al. 2015). Assignment was further assisted by comparison to BMRB entry 34511 (Dinesh et al. 2020) of the N-terminal RNA-binding domain of the SARS-CoV-2 nucleoprotein, specifically nucleoprotein residues 44 to 180 (N2).

Assignments and data deposition

The HSQC of N3 is typical of an intrinsically disordered protein (Fig. 1). Most peaks in N3 are reproduced in the spectrum of N123, that also reveals the presence of the folded RNA-binding domain N2, in addition to disordered peaks from N1 (intensity of peaks from the first 15 residues of N3 were reduced in N123, probably due to the proximity with N2). The assignment of N2 has been published (Dinesh et al. 2020), and assignment of N1 in N123 (Fig. 2) and N3 in its isolated form and in the context of N123 were accomplished using standard triple resonance approaches. A high percentage of resonances could be assigned in N3 (84% 1HN, 84% 15NH, 82% 13Cα and 76% 13C) and N1 in the context of N123 (94% 1HN, 94% 15NH, 100% 13Cα, 57% 13Cβ and 100% 13C). These assignments have been deposited in the Biological Magnetic Resonance Databank (BMRB ID: 50557, comprising backbone resonance assignment of N3 and 50558, comprising backbone resonance assignment of N1 and of resolved resonances in N3, both in the context of N123).

Fig. 1
figure 1

1H–15N HSQC of N3 (175–263) from SARS-CoV-2. The spectrum was acquired on an 850 MHz spectrometer at 298 K at a concentration of 150 μM in 50 mM Na-phosphate, pH 6.0, 250 mM NaCl. Assigned backbone 1H–15N peaks are labelled

Fig. 2
figure 2

1H–15N BEST-TROSY of N123 (1–263) from SARS-CoV-2. The spectrum was acquired on a 950 MHz spectrometer at 298 K at a concentration of 150 μM in 50 mM Na-phosphate, pH 6.5, 250 mM NaCl. Backbone 1H–15N peaks assigned in this study are labelled, resonances transferred from assignment of N3 (Fig. 1) are starred. The disperse signals from N2 for which an assignment is already available (Dinesh et al. 2020) are present but have not been labelled

Secondary structural propensity and dynamics

Resonance assignment of the two domains confirms the disordered nature of N1 and N3 (Figs. 1 and 2). The linker region (N3) connecting the two folded domains (N2 and N4), comprises intrinsically disordered SR-rich and polar termini, flanking a central hydrophobic strand that exhibits a pronounced helical propensity (> 30% from 216 to 224 with near 100% helical population from position 220) (Fig. 3). Assignment of 5 residues at the centre of this region was not possible, possibly due to dimerization mediated by the helix. This region is also predicted to form a hydrophobic helix in SARS-CoV (Chang et al. 2009). Recent simulations (Cubuk et al. 2020) also proposed the presence of a weakly (< 20%) helical motif in the SR region at the N-terminal region of N (176–185), which is not seen experimentally, although there is overlap of predicted helical propensity (< 30% helix predicted to stretch from 213 to 225) in the vicinity of experimentally observed helical propensity. Studies of isolated peptides (Savastano et al. 2020) also suggested a helical propensity in the SR region which is not seen here.

Fig. 3
figure 3

Analysis of chemical shifts and relaxation rates for N1 and N3 from SARS-CoV-2. a Secondary Cα chemical shifts for the disordered residues in N3. b Secondary Cα chemical shifts for the disordered residues in N123 that are observed in triple resonance assignment experiments. c Rotating frame relaxation rates measured in N123. Measurements were made at 950 MHz at 298 K using sample concentration of 150 μM. Residues from N2 are not shown

Assignment of the backbone resonances of the N-terminal domain of N (N1) within the N123 construct (Fig. 2) reveals the presence of an intrinsically disordered chain with no detectable secondary structural propensity and only very slight differences around residue 248 compared to free N3. Note that a number of resonances that were visible in N3 were not assigned in N123, most probably due to short relaxation times experienced in the vicinity of the helical motif that limits transfer of magnetization in triple resonance experiments. Nevertheless putative transfer of assignment was proposed on the basis of the 15N–1H correlation spectra of N3 and N123 (starred resonances in Fig. 2). Spin relaxation measured in N123 confirms the highly dynamic nature of N1 and N3 in the context of the folded RNA binding domain N2 (Fig. 3), with relaxation rates in the expected range for a disordered domain (Adamski et al. 2019), with the exception of the helical element in N3 and three consecutive residues in N1 (R14-I15-T16) that both show elevated rates.

In conclusion, NMR backbone assignment and preliminary relaxation studies provide the basis for further NMR studies of this important drug target, providing the tools necessary for the identification of inhibitors and for detailed functional studies of this essential protein.