INTRODUCTION

The key role in the adhesion of coronaviruses (CoVs) to the host cells belongs to the spike (S) protein. The S protein is a homotrimer, each monomer consisting of two subunits, S1 and S2. The S2 subunit is anchored in the viral membrane and is responsible for the virus fusion with the host cell [1, 2]. The S1 ectodomain consists of four subdomains (S1A – S1D). Each of these subdomains can bind to the receptor, but it is still unclear whether they act in concert or separately [3]. In the MERS (Middle East respiratory syndrome)-CoV, the S1B subdomain is responsible for the interaction with dipeptidyl peptidase 4 and is critical for the virus penetration into the host cell, while the S1A lectin-like subdomain binds O-acetylated sialoglycan Footnote 1 [4, 5]. The S1B subdomain of SARS-CoV-1 and SARS-CoV-2, recognizes angiotensin-converting enzyme 2 (ACE2) [6]. The specificity of other subdomains has yet to be found, although ACE2 is not the only possible target of the S proteins of SARS-CoV-1 and SARS-CoV-2 (e.g., it also interacts with CD147 glycoprotein) [7, 8]. The S proteins of other human coronaviruses, HKU1 and OC43, as well as of bovine BCoV, have the lectin-like S1A domain that binds 9-O-acetylated sialic acid; this interaction is low-affine, but nevertheless contributes to the virus penetration to its main target [9]. The data on the sialo-binding activity of the SARS-CoV-2 S protein are contradictory: the study [10] reported the absence of such interaction; however, other studies have suggested the possibility of the S protein binding to gangliosides [11]. It should be noted that all SARS-CoV-2 surface proteins are sialylated, especially the S protein with its 22 glycosylation sites. Since the virus should not bind itself, its glycan-binding protein cannot bind regular sialic acid (Sia)-containing terminal motifs, such as Neu5Acα2-3Gal and Neu5Acα2-6Gal; although this does not exclude its interaction with O-acetylated sialosides, which are uncommon for the host epithelial cells (as it takes place for the MERS-CoV S protein). In other words, if SARS-CoV-2 binds Sia-containing glycosides, latter should differ from the usual Neu5Acα2-3Gal and Neu5Acα2-6Gal motifs.

It was shown [12] that low-molecular-weight heparin exerts a significant anti-inflammatory effect in COVID-19 patients due to a pronounced decrease in the content of the pro-inflammatory cytokine IL-6. It is also believed that heparin can directly interact with SARS-CoV-2. Glycosaminoglycans (GAGs) can act as adhesion factors for adenoviruses, herpes viruses, papillomavirus, cytomegalovirus and others, and this binding is efficiently inhibited by the soluble form of GAG [13]. Coronaviruses are not an exception – NL63 and SARS-CoV-1 (in a form of the pseudovirus) use GAG for the adhesion on the host cell along with the ACE2 receptor [14]. The trimeric form of the S protein from the pandemic SARS-CoV-2 binds to the full-length heparin with an amazingly high affinity of 40 pM, which is several orders of magnitude better than for the MERS-CoV S protein [15]. The presence of 2-O- and 6-O-sulfates is important for the binding [16]. Considering all the above-mentioned, we examined the glycan-binding specificity of the recombinant SARS-CoV-2 S protein. Although we expected to discover the ability of the S protein to bind some other carbohydrate receptors, such as sulfated or uncommon sialylated mammalian glycans (see above), we found that the S protein bound with a high affinity interaction primarily lactosamine-type glycans.

MATERIALS AND METHODS

Glyc-PAA-biot glycoconjugates (where Glyc is glycan, PAA is polyacrylamide, biot is biotin with the C5 spacer; 20 kDa, containing 20 mol% glycan and 5 mol% biotin) were from GlycoNZ (Auckland, New Zealand).

S protein homotrimer. Recombinant trimeric SARS-CoV-2 S protein was prepared using original expressing plasmid vector SBW4G_S-FdT4 carrying the chimeric gene encoding the ectodomain of glycoprotein S and beta-propeller trimeric domain of bacteriophage T4 fibritin with the C-terminal eight-histidine tag (His8-tag) under control of the CAG promoter. Transient expression was achieved by transfection of cultured HEK293 cells with the recombinant plasmid SBW4G_S-FdT4. Affinity chromatography purification was performed on a Ni-NTASuperFlow resin (Qiagen, Germany). The purity and integrity of all purified recombinant proteins were confirmed by SDS-PAGE; the molecular weight of protein subunit is 139 kDa (see Supplement for the amino acid sequence).

Screening by enzyme-linked immunosorbent assay (ELISA). The recombinant S protein (5 µg/ml in phosphate buffered saline, PBS) was used for coating the microplate wells (Nunc MaxiSorp, Thermo Fisher, USA) overnight at room temperature. After coating, the plate was left to dry and then washed with PBS, containing 0.1% Tween-20 (Merck, USA) (PBS-T). Glyc-PAA-biot glycoconjugates were serially diluted in the serum dilution solution (Cat. # PPO0520, Epitek, Novosibirsk, Russia) starting with 10 µg/ml concentration and then added to the microplate wells. The microplate was incubated for 90 min on a shaker at 37°C, washed, and 100 µl of horseradish-peroxidase conjugate (Epitek, Novosibirsk, Russia; dilution, 1 : 20 in PBS-T), was added to each well. The plate was incubated for another 40 min on the thermoshaker at 37°C and washed. Next, 100 µl of tetramethylbenzidine solution (Thermo Fisher, USA), was added to each well. The plate was incubated for 30 min in the dark at room temperature. The reaction was stopped with 5% H2SO4, and the absorbance of the colored product was measured with a Bio-Rad Model 680 microplate reader (Bio-Rad, USA) at a wavelength of 450 nm.

Measurement of equilibrium dissociation constants. Eleven glycans that demonstrated the highest binding activity in ELISA screening were added to a strip 96-well Nunc MaxiSorp plate (Thermo Fisher, USA) coated with the S protein (see above). The strips were incubated on a shaker at 37°C; every 10 min, one of the strips was washed with PBS-T and placed in a refrigerator (4°C). After 100 min, all strips were processed as described above, and the data for the strips incubated for different time periods were compared.

The concentration of the glycan-S protein complex is described by the ligand-receptor interaction equations (1)-(4):

$$\begin{aligned} \lbrack LR \rbrack = A(1 - e^{-kt}), \end{aligned}$$
(1)
$$\begin{aligned} A = \frac{[R]_{0}[L]_{0}}{K_{d}+[R]_{0}}, \end{aligned}$$
(2)
$$\begin{aligned} k = k_{+}[R]_{0} + k_{-}, \end{aligned}$$
(3)
$$\begin{aligned} K_{d} = \frac{k_{-}}{k_{+}}, \end{aligned}$$
(4)

where [LR] is the complex concentration; [R]0 and [L]0 are initial concentrations of glycan and S protein, respectively; k+ and k are rate constants. The k exponent was calculated by approximation of the experimental dependence of the glycan-S protein complex concentration on time by function (1) using the OriginPro software package. Since k was determined at several conjugate concentrations, plotting the dependence of the calculated k value against concentration of the added glycoconjugates produced linear equation (3). Kd was calculated according to equation (4).

RESULTS AND DISCUSSION

Although our group makes extensive use of printed glycan array (PGA) [17] to study glycan-binding proteins (including viral ones), in case of the SARS-CoV-2 S protein, we used ELISA, for which the polystyrene plates where coated with the S protein and then incubated with Glyc-PAA-biot glycopolymers to study their binding to the immobilized protein. This is a more time-consuming method, and the result analysis is complicated by the contribution of the non-specific binding due to the presence of excessive biotin residues in Glyc-PAA-biot conjugates (>10 per one PAA chain on average [18]). However, we believe that this “reverse” method has a definite advantage, as the comparison of the optical density (OD) values for the S protein binding to different glycans should give us reliable relative values of the interaction affinity, while the PGA [19], where ligands are immobilized, does not guarantee equal degree of ligand immobilization to the solid phase. Using ELISA, we were able to examine S protein interaction with 155 glycoconjugates, including sialoglycans typical for human cells, various sulfated glycans, N-acetyllactosamine oligosaccharides, glycans representing the ABH blood group antigens, and many others. The maximum reliable OD value in this version of ELISA is a little over 3, so signals within the 2.0 to 3.2 range were referred to as strong and OD values less than 1.0 were interpreted as a lack of binding. Despite the fact that the assay conditions were intentionally “tuned” to the amplification of the lowest signals, the assay revealed no significant S protein binding to sulfated or common sialylated glycans, i.e., those with the terminal Neu5Acα2-3Gal, Neu5Acα2-6Gal, or Neu5Acα2-6GalNAc moieties or 9-NAc-Neu5Acα structural motifs (9-NAc derivatives were taken as mimetics of the corresponding O-Ac sialosides). The only sialoligand that exhibited high-affinity binding with the S protein was Neu5Acα2-8Neu5Acα disaccharide (#11, table), although the tetrasaccharide of GD2 ganglioside (Neu5Acα2-8Neu5Acα2-3Galβ1-4Glcβ) and the trisaccharide Neu5Acα2-8Neu5Acα2-8Neu5Acα showed no binding in our assay. The binding of the Neu5Acα2-8Neu5Acα disaccharide is mainly due to two carboxyl groups of two Neu5Ac residues (i.e., to Coulomb interactions), which is consistent with the recently published paper on the virus binding to the clusters of Neu5Ac monosaccharides [20]. These two residues happen to be positioned at a distance that favors their interaction with the lectin-like site; this interaction is abolished by a bulky substituent R in Neu5Acα2-8Neu5Acα-R (as in gangliosides, where R is the inner carbohydrate core). The table shows all glycans binding S protein with a high affinity (OD value over 2.0).

Table.
figure A

The structure of glycans that bind the S protein with a high affinity in ELISA

Out of 10 top glycans, eight (shown in grey in the table) were typical ligands for human galectins. These glycans were oligolactosamines, glycan part of glycosphingolipid asialoGM1, and tetrasaccharides of blood groups A (type 4) and B (type 4). Among the “second-tier” glycans (OD value in the range from 2.0 to 2.6) four compounds were lactosamine derivatives and two were galactose monosaccharide and its derivative with sulfate group at position 3. Therefore, the S protein preferentially bound beta-galactosides. Interestingly, the unsubstituted Galβ1-4GlcNAcβ disaccharide (monomeric LN) showed lower binding efficiency compared to the top lactosamines from the table, which is also typical for human galectins [21].

The undoubted similarity of the recognition patterns of the SARS-CoV-2 S protein and galectins can be explained by the structural features of the former, namely, the presence of specific galectin fold in the S1A domain in the S proteins from various CoVs (MERS-CoV, HCoV-OC43, BCoV, and TGEV) [9, 22]. However, until now, no binding to galactose-containing glycans – galectin ligands – has been reported. It is assumed that the galectin fold in the S protein has been evolutionarily borrowed from the host cell in a truncated form, and therefore, does not function as a full-fledged lectin. That is, this domain is unable to bind βGal motifs; however, in the course of evolution of some viruses, it has acquired the ability to recognize other carbohydrate motifs [23, 24].

Not all of the top glycans are structurally similar to galectin ligands. We have already discussed above the disialoside. The trisaccharide LeX (Galβ1-4(Fucα1-3)GlcNAc) does not bind to any of the known human galectins, but at the same time, contains N-acetyllactosamine. Therefore, the recognition of a not entirely conventional ligand by a not entirely conventional galectin does not seem surprising to us. The blood group A trisaccharide GalNAcα1-3(Fucα1-2)Gal is also known not to be a galectin ligand. We explain the fact that it got on the list of top S protein ligands as follows: this glycan differs from other ligands in the array by its hydrophobicity, as it has the -O(CH2)3NHCO(CH2)5NH- spacer. The same trisaccharide A without the hydrophobic spacer is a poor ligand for the S protein. Taken together, these statements suggest that the observed binding of the trisaccharide A Footnote 2 is a result of a fortunate combination of two components – nonspecific (hydrophobic) binding and specific (GalNAcα) binding, the latter being relatively weak and therefore insufficient for the manifestation of affinity for the S protein without the “assistance” from the unusual spacer. The revealed ability to recognize glycans with the terminal glucosamine moiety, for instance GlcNAcα1-3GalNAcβ, as well as other glycans that do not belong to the lactosamine family (see above), at first glance does not fit the hypothesis of the galectin-like binding site in the S protein. Nevertheless, there are examples when replacing just one or two amino acids in a lectin not only abolished the binding of carbohydrates, but led to the ability to recognize another glycan [25]. In our case, the difference in the amino acid sequence is very significant (data not shown), so we are talking only about the similarity of their folds. Hence, the difference between the S protein and glycan-recognizing galectin is not surprising in itself. More surprising is the “restoration” of the ability of the galectin fold in the S protein of the latest coronavirus to bind typical galectin ligands, which has been lost by other coronaviruses.

We also used ELISA in the kinetic mode (see Materials and methods section) to approximately estimate the affinity of the recombinant S protein interaction with the top 11 glycans in the form of Glyc-PAA-biot conjugates. The highest affinity was found for the disialoside (Kd = 10 nM, all values refer to glycan and not to its polymeric conjugate) and Galβ1-4GlcNAcβ1-6(Galβ1-3)GalNAcα (LN6TF) (Kd = 20 nM). The Kd values for ACE2 reported in the literature range from 1.2 nM [2] to 95 nM [26]. Although it would be incorrect to directly compare the data obtained in completely different experimental systems, nevertheless, the measured Kd values suggest the contribution of carbohydrate-mediated reception in vivo. Obviously, it is especially important to know which glycans are actual or potential/additional targets of the virus in the human epithelium. Based on the data obtained, it can be assumed that this is the N-acetyllactosamine glycan of the glycoprotein O-chain, since the tetrasaccharide that showed one of the best Kd values was LN6TF, whereas oligo-lactosamine glycans typical for the N-chains demonstrated lower affinity.

Most lactosamine motifs on the surface of the lung epithelium cells are masked by sialic acid attached to them; therefore, in a healthy person, an additional affinity of the virus (for lactosamines) is unlikely to make a significant contribution to the primary virus adhesion. However, neuraminidase, which is present in many pathogens (first of all, in the influenza virus), may release sialic residues, thus exposing lactosamine residues. Hence, we hypothesize that parallel infection of a pathogen with strong neuraminidase activity will promote adhesion of SARS-CoV-2 and, therefore, increase its virulence.