Section IIAn Integrated Platform for Automated Analysis of Protein NMR Structures
Introduction
With the advent of multidimensional and triple-resonance strategies for determining resonance assignments and three-dimensional (3D) structures, it has become increasingly clear that protein NMR spectra have the quality and information content to allow largely automated and standardized analyses of assignments and structures for small proteins. This has been realized over the past few years in the development of automated methods for many of the steps in production nuclear magnetic resonance (NMR) protein structure analysis. These advances are significant demonstrations of NMR as a powerful and accessible tool for biophysical chemistry, drug design, and functional genomics. In this chapter, we summarize our efforts in standardizing the NMR data collection process, building an integrated platform for automated NMR structure analysis, and demonstrating its impact for the Northeast Structural Genomics (NESG) Consortium.
Section snippets
Overview of the Automated Protein Structure Analysis Process
The principal steps of automated NMR protein structure analysis are outlined in Fig. 1. These include (1) standardized data collection and organization, (2) processing (including spectral referencing and Fourier transformation), (3) peak picking and peak list editing, (4) resonance assignment, and (5) structure determination (including analysis of conformational constraints, NOESY assignment, residual dipolar coupling (RDC) data analysis, and 3D structure generation). In building an automated
The Organizational Challenge
The process of NMR-based protein structure analysis is challenged by requirements for properly executing, processing, and analyzing many separate NMR experiments. Unlike biomolecular crystallography, which generally involves a single type of data collection experiment, an NMR protein structure determination may require proper collection and analysis of 10–20 individual two-, three-, and four-dimensional (2D, 3D, and 4D) NMR spectra. These data must be highly self-consistent, as the input to the
Local Data Organization and Archiving
Biomolecular NMR research groups require efficient and simple access to archival NMR data, both for routine storage purposes and for the development and testing of novel computational methods for data analysis. Common methods of archiving raw NMR data [usually in the form of time domain free-induction decay (FID) data] in use in most biomolecular NMR laboratories are often inefficient, outdated, and error prone, leading to frequent loss of valuable data that are both hard and expensive to
NMR Spectral Processing
Several NMR spectral processing issues need to be carefully considered for successful automated data analysis. Particularly important is accurate and precise chemical shift referencing in the direct and indirect dimensions using IUPAC-defined referencing methods (Wishart et al., 1995), with dimethylsilapentane-5-sulfonic acid (DSS) as the reference compound. Accurate 13C,15N, and 1H referencing is essential for ensuring the development of an accurate database of chemical shift values (Zhang et
Peak Picking
Peak picking represents one of the crucial steps of NMR data analysis that has resisted successful automation for the purpose of automated resonance assignment and structure determination. This is due largely to cross-peak overlap and artifacts associated with large peaks, especially solvent and diagonal peaks. Multidimensional NMR spectra often exhibit artifacts of baseline distortions, intense solvent lines, ridges, and⧸or sinc wiggles. These problems are sometimes exacerbated by different
Interspectral Registration and Quality Assessment of Peak Lists
Quality assessment of input peak lists for further steps in the automated NMR analysis is crucial for the success of automation. We use several quality assessments of peak lists when judging if the peak lists are good enough for the later steps of automation. These include (1) peak list registration, (2) the examine_expected_peaks.pl (EEP), and (3) the ESS reports of the AutoPeak software suite (Moseley et al., 2001). The first quality assessment is the ability to register peak lists to each
AutoAssign: Automated Analysis of Backbone Resonance Assignments
Significant progress has been made recently in automated analysis of resonance assignments, particularly using triple-resonance NMR data. Several laboratories are developing programs that automate either backbone or complete resonance assignments [reviewed in Baran 2004, Moseley 1999, Zimmerman 1995]. Most automated programs use the same general analysis scheme that originates from the classical strategy developed by Wüthrich and co-workers (Billeter 1982, Wagner 1982, Wuthrich 1986).
Most
Automated Analysis of Side-Chain Resonance Assignments
Although several approaches have been found to provide robust automation of backbone resonance assignments, a robust approach to automated side-chain assignments is not yet generally available. The program GARANT (Bartels et al., 1996) supports automated backbone and side-chain assignments. Recently, a combined approach of using GARANT and AUTOPSY (Koradi et al., 1998) together demonstrates promising results in automating both peak picking and resonance assignments, including many side-chain
Resonance Assignment Validation Software
As with peak picking, quality assessment of resonance assignments is crucial for robustness in later steps of the automated NMR analysis. For this purpose, we have developed a set of computer utilities called the Assignment Validation Software (AVS) suite (Moseley et al., 2004) for rigorously evaluating and validating a set of protein resonance assignments before submission to the BMRB and⧸or use in subsequent structure and⧸or functional analysis, without the need of a 3D structure. They serve
AutoStructure: Automated Analysis of NOESY Data
One of the principal goals of automated structure determination programs involves iterative analysis of multidimensional NOESY data. Several fully automated heuristic approaches for NOESY interpretation and structure calculation have been developed, including NOAH (Mumenthaler 1995, Mumenthaler 1997), ARIA (Nilges 1995, Nilges 1997), CANDID (Herrmann et al., 2002b), AutoStructure (Huang et al., 2003), a simulated annealing assignment approach implemented in XPLOR (Kuszewski et al., 2004), and
Minimal Constraint Approaches to Rapid Automated Fold Determination
Medium-accuracy fold information can often provide key clues about protein evolution and biochemical function(s). Extending ideas originally proposed by Kay and co-workers for determining low-resolution structures of larger proteins (Gardner et al., 1997), a largely automatic strategy has been developed for rapid determination of medium-accuracy protein backbone structures using deuterated, 13C,15N-enriched protein samples with selective protonation of side-chain methyl groups (13CH3) (Zheng et
Structure Quality Assessment Tools
One of the most important challenges in modern protein NMR is to develop a fast and sensitive structure quality assessment measure that can evaluate the “goodness-of-fit” of a 3D structure compared with its NOESY peak lists and indicate the correctness of its fold. This is especially critical for automated NOESY interpretation and structure determination approaches. One approach uses an NMR R factor similar to that used in X-ray crystallography, which often requires computationally intensive,
An Integrated Platform for Automated NMR Structure Analysis
Protein NMR spectroscopists depend on a number of software packages to facilitate the analysis of data. For this reason, the process of solving a protein structure by NMR presents a formidable technical challenge to scientists. Although a number of software packages have been developed for the analysis of NMR data, a comprehensive solution for the complete automated analysis of NMR data from FIDs to three-dimensional structures is not yet available. Users choose between a number of different
Conclusions
Recent developments provide automated analysis of NMR assignments and 3D structures. These approaches are generally applicable to proteins ranging from about 50 to 150 amino acids. Although progress over the past few years is encouraging, even for small proteins more work is required before automated structure analysis is routine. In particular, general methods for automated analysis of side-chain resonance assignments are not yet well developed, though current efforts in this area are quite
Acknowledgements
We thank J. Aramini, A. Bhattacharya, G. Sahota, D. Snyder, G. V. T. Swapna, and D. Zheng for useful discussions and for their efforts over the past several years in developing automated NMR data analysis algorithms and software. The authors' recent work on automated NMR data analysis has been supported by the NIH Protein Structure Initiative (P50-GM62413).
References (109)
- et al.
A Metropolis Monte Carlo implementation of bayesian time-domain parameter estimation: Application to coupling constant estimation from antiphase multiplets
J. Magn. Reson.
(1998) - et al.
Practical aspect of proton–carbon–carbon–proton three-dimensional correlation spectroscopy of labeled proteins
J. Magn. Reson.
(1990) - et al.
1H-1H correlation via isotropic mixing of 13C magnetization, a new three-dimensional approach for assigning 1H and 13C spectra of 13C-enriched proteins
J. Magn. Reson.
(1990) - et al.
Sequential resonance assignments in protein 1H nuclear magnetic resonance spectra. Computation of sterically allowed proton-proton distances and statistical analysis of proton-proton distances in single crystal protein conformations
J. Mol. Biol.
(1982) - et al.
Protein heteronuclear NMR assignments using mean-field simulated annealing
J. Magn. Reson.
(1997) - et al.
Total correlation spectroscopy (TOCSY) of proteins using co-addition of spectra recorded with several mixing times
J. Magn. Reson.
(1993) - et al.
Helix to helix packing in proteins
J. Mol. Biol.
(1981) - et al.
Analysis and prediction of the packing of alpha-helices against a beta-sheet in the tertiary structure of globular proteins
J. Mol. Biol.
(1982) - et al.
VERIFY3D: Assessment of protein models with three-dimensional profiles
Methods Enzymol.
(1997) - et al.
Toward an NMR R factor
J. Magn. Reson.
(1991)
Solution NMR structure and folding dynamics of the N terminus of a rat non-muscle alpha-tropomyosin in an engineered chimeric protein
J. Mol. Biol.
Correlation of backbone amide and aliphatic side-chain resonances in 13C⧸15N-enriched proteins by isotropic mixing of 13C magnetization
J. Magn. Reson.
Torsion angle dynamics for NMR structure calculation with the new program DYANA
J. Mol. Biol.
An integrated approach to structural genomics
Prog. Biophys. Mol. Biol.
Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA
J. Mol. Biol.
Solution NMR structure of ribosome-binding factor A (RbfA), a cold-shock adaptation protein from Escherichia coli
J. Mol. Biol.
Packing of alpha-helices onto beta-pleated sheets and the anatomy of alpha⧸beta proteins
J. Mol. Biol.
Linear prediction spectral analysis of NMR data
Prog. NMR Spectrosc.
Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY
J. Magn. Reson.
Side chain and backbone assignments in isotopically labeled proteins from two heteronuclear triple resonance experiments
FEBS Lett.
Automated analysis of NMR assignments and structures for proteins
Curr. Opin. Struct. Biol.
Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data
Methods Enzymol.
Automated assignment of simulated and experimental NOESY spectra of proteins by feedback filtering and self-correcting distance geometry
J. Mol. Biol.
Calculation of protein structures with ambiguous distance restraints. Automated assignment of ambiguous NOE crosspeaks and disulphide connectivities
J. Mol. Biol.
Automated NOESY interpretation with ambiguous distance restraints: The refined NMR solution structure of the pleckstrin homology domain from beta-spectrin
J. Mol. Biol.
The Xplor-NIH NMR molecular structure determination package
J. Magn. Reson.
WHAT IF: A molecular modeling and drug design program
J. Mol. Graph.
Sequential resonance assignments in protein 1H nuclear magnetic resonance spectra. Basic pancreatic trypsin inhibitor
J. Mol. Biol.
Validation of protein structures for the protein data bank
Methods Enzymol.
Pseudo-structures for the 20 common amino acids for use in studies of protein conformations by measurements of intramolecular proton-proton distance constraints with nuclear magnetic resonance
J. Mol. Biol.
Modified genetic algorithm resolves ambiguous NOE restraints and reduces unsightly NOE violations
Proteins
Protein sequential resonance assignments by combinatorial enumeration using 13C alpha chemical shifts and their (i, i−1) sequential connectivities
J. Biomol. NMR
Solution NMR structure of the 30S ribosomal protein S28E from Pyrococcus horikoshii
Protein Sci.
A tracked approach for automated NMR assignments in proteins (TATAPRO)
J. Biomol. NMR
SPINS: Standardized protein NMR storage. A data dictionary and object-oriented relational database for archiving protein NMR spectra
J. Biomol. NMR
Automated analysis of protein NMR assignments and structures
Chem. Rev.
Automated sequence-specific NMR assignment of homologous proteins using the program GARANT
J. Biomol. NMR
GARANT—A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra
J. Comput. Chem.
The Protein Data Bank
Nucleic Acids Res.
Crystallography & NMR system: A new software suite for macromolecular structure determination
Acta Crystallogr. D Biol. Crystallogr.
Structural genomics: A pipeline for providing structures for the biologist
Protein Sci.
Principles that determine the structure of proteins
Annu. Rev. Biochem.
High-resolution structure of the oligomerization domain of p53 by multidimensional NMR
Science
PACES: Protein sequential assignment by computer-assisted exhaustive search
J. Biomol. NMR
Protein backbone angle restraints from searching a database for chemical shift and sequence homology
J. Biomol. NMR
NMRPipe: A multidimensional spectral processing system based on UNIX pipes
J. Biomol. NMR
Efficient analysis of protein 2D NMR spectra using the software package EASY
J. Biomol. NMR
Assignment of side-chain 13C resonances in perdeuterated proteins
J. Am. Chem. Soc.
2D and 3D NMR spectroscopy employing 13C-13C magnetization transfer by isotropic mixing. Spins system identification in large proteins
J. Am. Chem. Soc.
Cited by (68)
A common binding motif in the ET domain of BRD3 forms polymorphic structural interfaces with host and viral proteins
2021, StructureCitation Excerpt :Samples were prepared in 3-, 4- or 5-mm Shigemi NMR tubs, or in 1.7-mm micro NMR tubes. Conventional 3D triple resonance (Baran et al., 2004; Huang et al., 2005a) and 3D NOESY NMR experiments were executed using standard Bruker NMR pulse sequences optimized for each protein sample at 25°C on Bruker Avance III 600 MHz and Avance III 800 MHz NMR spectrometer systems. Complex formation was confirmed using 1D 15N T1 and T2 relaxation experiments to estimate rotational correlation times τc, and effective molecular weights were estimated by comparison with τc measurements made on collection of reference proteins, as described elsewhere (Rossi et al., 2010).
NMR structure validation in relation to dynamics and structure determination
2014, Progress in Nuclear Magnetic Resonance SpectroscopyCitation Excerpt :This area is very well developed, with residue-level validation important to detect errors [43]. Independent cross-validation is typically done using RDC [24–26], chemical shift anisotropy [79], paramagnetic relaxation [80] or SAXS data [27], with original NOESY spectra or peak lists providing a not fully independent but largely valid alternative [28–31,39]; note however that novel structure calculation approaches like UNIO [40] use these data directly. Comparing back-calculated chemical shifts to the experimental ones also offers an excellent approach, although backbone chemical shifts are not fully independent if dihedral restraints were derived from chemical shifts using methods such as TALOS [81–83] or DANGLE [84].
Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA
2022, Nature CommunicationsRole of backbone strain in de novo design of complex α/β protein structures
2021, Nature CommunicationsA method for validating the accuracy of NMR protein structures
2020, Nature Communications