NMR Resonance Assignment Methodology: Characterizing Large Sparsely Labeled Glycoproteins

https://doi.org/10.1016/j.jmb.2019.04.029Get rights and content

Highlights

  • A software package for NMR resonance assignment in sparsely labeled proteins is described.

  • NMR resonances in a sparsely labeled, mammalian-cell-expressed, protein are assigned.

  • Paramagnetic effects on assigned resonances facilitate docking of an enzyme substrate.

Abstract

Characterization of proteins using NMR methods begins with assignment of resonances to specific residues. This is usually accomplished using sequential connectivities between nuclear pairs in proteins uniformly labeled with NMR active isotopes. This becomes impractical for larger proteins, and especially for proteins that are best expressed in mammalian cells, including glycoproteins. Here an alternate protocol for the assignment of NMR resonances of sparsely labeled proteins, namely, the ones labeled with a single amino acid type, or a limited subset of types, isotopically enriched with 15N or 13C, is described. The protocol is based on comparison of data collected using extensions of simple two-dimensional NMR experiments (correlated chemical shifts, nuclear Overhauser effects, residual dipolar couplings) to predictions from molecular dynamics trajectories that begin with known protein structures. Optimal pairing of predicted and experimental values is facilitated by a software package that employs a genetic algorithm, ASSIGN_SLP_MD. The approach is applied to the 36-kDa luminal domain of the sialyltransferase, rST6Gal1, in which all phenylalanines are labeled with 15N, and the results are validated by elimination of resonances via single-point mutations of selected phenylalanines to tyrosines. Assignment allows the use of previously published paramagnetic relaxation enhancements to evaluate placement of a substrate analog in the active site of this protein. The protocol will open the way to structural characterization of the many glycosylated and other proteins that are best expressed in mammalian cells.

Introduction

NMR structural studies of uniformly 13C/15N-labeled proteins larger than 40–60 kDa are challenging even when perdeuteration is used to enhance resolution and sensitivity. For glycosylated proteins, which are often expressed in mammalian cell culture to produce native-like glycosylation, perdeuteration is not possible; even structural studies of 20- to 30-kDa protein are then challenging. Moreover, uniform isotopic labeling in mammalian cells with 13C and 15N can be costly as a mix of isotopically labeled amino acids, as opposed to isotopically labeled metabolic substrates, such as glucose and ammonium chloride, must be supplied. An economically viable alternative exists, namely, sparse labeling using a single or small subset of isotopically labeled amino acids [1], [2]. Sparse labels can provide long range structural constraints through paramagnetic perturbations of resonance positions and intensities, as well as orientational constraints from residual dipolar couplings (RDCs) [3], [4], [5]. These constraints, along with chemical shift perturbation on interaction with other entities, can often be used to position ligands in binding sites and assemble proteins in multi-protein complexes [6], [7]. However, resonances must still be assigned to specific sites in proteins, and this must now be done without the aid of the triple resonance experiments usually applied to uniformly labeled proteins [8].

We recently introduced a strategy for resonance assignment of sparsely labeled proteins that relies on acquisition of nuclear Overhauser effects (NOEs), RDCs, and chemical shifts; all parameters measured directly from, or through modulation of, crosspeaks seen in basic two-dimensional heteronuclear single quantum coherence (HSQC) or multiple quantum coherence spectra. The strategy was implemented in a program package, ASSIGN_SLP, that employed a genetic algorithm to optimize pairing of specific spectral crosspeaks with specific protein sites using scores that compare experimental measurements of these parameters to predictions based on prior structural information, primarily from a single x-ray structure. The package was tested on a set of four small non-glycosylated proteins having known structures and crosspeak assignments, as well as a small glycoprotein [9]. It was subsequently applied to a larger non-glycosylated and perdeuterated protein, for which only the structure of isolated domains was known [10]. While the general approach showed success with smaller systems, it became clear that for larger systems, factors in addition to the technical aspects of associating predictions with experimental measurement would have to be considered. These include degeneracies in data that increase with the number of labeled sites, the greater probability of internal motion affecting observables and the more extensive spin–spin interactions that occur in larger proteins. Here, we introduce an approach that predicts parameters from molecular dynamics (MD) trajectories, as opposed to single structural snapshots from x-ray structures, to better account for effects internal motion and spin–spin interactions on predicted parameters. It also uses an improved procedure for identification of high-confidence assignments in the presence of data degeneracy. This approach, now embodied in a software package entitled ASSIGN_SLP_MD, proves useful in providing key assignments for a challenging 36-kDa glycoprotein, the luminal domain of rST6Gal1 (hereafter just rST6Gal1).

ST6Gal1 is a sialyltransferase that adds a sialic acid to the terminal galactose of N-linked glycans of many glycoproteins and is therefore of importance in mammalian physiology [11]. The bond it forms is from the 2-carbon of sialic acid to the 6-oxygen of galactose, as opposed to the 3-oxygen of galactose. The specificity of the hemagglutinin of the avian influenza virus for the 2–3 linkage, found on glycans in the human gut, but seldom in the upper respiratory tract, is what restricts the transmission of bird flu to humans [12], [13]. Levels of 2–6 linked sialic acid also rise in certain types of cancer, and there is significant effort devoted to understanding the possible role of sialylation in this disease [14], [15]. A decade ago, we began an NMR-based structural study of rST6Gal1 [16]. At the time, there were no crystal structures of ST6Gal1, or any of a close structural homolog. Using a sparse labeling approach in which all phenylalanines were labeled with 15N, we demonstrated adequate resolution and sensitivity to detect HSQC crosspeaks from all 16 phenylalanine amide protons in the construct. Using a paramagnetic analog of the sialic acid donor (CMP-sialic acid), in which carboxy-TEMPO replaced the carboxyl-carrying sialic acid, we also showed that 4 of the 16 crosspeaks lost significant intensity. Based on an expected 1/r6 distance dependence of intensity loss, this number was deemed consistent with the number of phenylalanines in peptide segments believed to form the active site. However, in the absence of assignments, we were unable to use the paramagnetic constraints to dock the donor analog in the active site of a homology model. In 2013, two x-ray structures appeared [17], [18], one of the rat enzyme on which our NMR work had been done [18]. With this structure in hand, along with previously collected RDC data, newly collected 1H–1H NOE data, and our new sparse label assignment strategy, we have proceeded with assignments of a new construct of rST6Gal1, isotopically labeled with 15N in all phenylalanines. A subset of the assignments are validated using a limited set of mutants in which single phenylalanines are changed to tyrosines, and then the assignments are used to place a sugar donor analog in the active site of rST6Gal1 in a manner consistent with paramagnetic perturbation data.

Section snippets

Program development

The ASSIGN_SLP_MD package is a collection of programs, primarily MATLAB scripts, that accepts as input a user-supplied MD trajectory, one or more files with experimental NOE peak lists (or NOE vectors derived from NOE strip plots), a file with 1H chemical shifts for labeled sites, a file with 15N or 13C chemical shifts for labeled sites and one or more files with RDC lists. Each of the files ends with a list of error estimates modified by weights for the specific data type. As success is very

Discussion

The data presented on the assignment of 1H–15N crosspeaks in HSQC spectra of the sparsely labeled glycoprotein, rST6Gal1, suggest that similar assignments will be possible on a host of biomedically relevant proteins that are best expressed in mammalian, or other eukaryotic cells. Validation of assignments has confirmed an ability to set reasonable confidence limits on assignment so that, even when total assignments are not possible, a subset can be identified as trusted assignments. In many

Protein expression, mutagenesis, and purification

Protein sample preparations used in collection of RDC data and PRE data were analogous to those described in a previous publication [16]. New samples were prepared for the collection of NOE data and validation by mutagenesis using modified methods for expression, labeling, and purification as described in the literature [34], [40]. Briefly, expression constructs encoding the luminal domain of rat ST6Gal1 (UniProt P13721, residues 103 to 403) in the pGEn2 vector were transiently transfected into

Acknowledgment

This work was supported by grants from the US National Institutes of Health (P41GM103390 and R01GM033225). Manuscript content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References (43)

  • H.N.B. Moseley et al.

    Complete relaxation and conformational exchange matrix (CORCEMA) analysis of NOESY spectra of interacting systems—2-dimensional transferred NOESY

    J. Magn. Reson. Ser. B

    (1995)
  • A. Liwo et al.

    Computational techniques for efficient conformational sampling of proteins

    Curr. Opin. Struct. Biol.

    (2008)
  • R.C. Bernardi et al.

    Enhanced sampling techniques in molecular dynamics simulations of biological systems

    Biochim. Biophys. Acta Gen. Subj.

    (2015)
  • J. Kim et al.

    A Semiautomated Assignment protocol for methyl group side chains in large proteins

  • J.M. Courtney et al.

    Experimental protein structure verification by scoring with a single, unassigned NMR spectrum

    Structure.

    (2015)
  • L. Meng et al.

    Enzymatic basis for N-glycan sialylation: structure of rat alpha2,6-sialyltransferase (ST6GAL1) reveals conserved and unique features for glycan sialylation

    J. Biol. Chem.

    (2013)
  • K.W. Moremen et al.

    Expression system for structural and functional studies of human glycosylation enzymes

    Nat. Chem. Biol.

    (2018)
  • K. Chen et al.

    The use of residual dipolar coupling in studying proteins by NMR

  • J.H. Prestegard et al.

    Residual dipolar couplings in structure determination of biomolecules

    Chem. Rev.

    (2004)
  • W. Becker et al.

    Investigating protein–ligand interactions by solution nuclear magnetic resonance spectroscopy

    Chemphyschem.

    (2018)
  • Q. Gao et al.

    NMR assignments of sparsely labeled proteins using a genetic algorithm

    J. Biomol. NMR

    (2017)
  • Cited by (0)

    G.R.C. and A.E. contributed equally to the project.

    Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA.

    §

    Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey PA, USA.

    View full text