Elsevier

Methods in Enzymology

Volume 394, 2005, Pages 111-141
Methods in Enzymology

Section II
An Integrated Platform for Automated Analysis of Protein NMR Structures

https://doi.org/10.1016/S0076-6879(05)94005-6Get rights and content

Abstract

Recent developments provide automated analysis of NMR assignments and three-dimensional (3D) structures of proteins. These approaches are generally applicable to proteins ranging from about 50 to 150 amino acids. In this chapter, we summarize progress by the Northeast Structural Genomics Consortium in standardizing the NMR data collection process for protein structure determination and in building an integrated platform for automated protein NMR structure analysis. Our integrated platform includes the following principal steps: (1) standardized NMR data collection, (2) standardized data processing (including spectral referencing and Fourier transformation), (3) automated peak picking and peak list editing, (4) automated analysis of resonance assignments, (5) automated analysis of NOESY data together with 3D structure determination, and (6) methods for protein structure validation. In particular, the software AutoStructure for automated NOESY data analysis is described in this chapter, together with a discussion of practical considerations for its use in high-throughput structure production efforts. The critical area of data quality assessment has evolved significantly over the past few years and involves evaluation of both intermediate and final peak lists, resonance assignments, and structural information derived from the NMR data. Methods for quality control of each of the major automated analysis steps in our platform are also discussed. Despite significant remaining challenges, when good quality data are available, automated analysis of protein NMR assignments and structures with this platform is both fast and reliable.

Introduction

With the advent of multidimensional and triple-resonance strategies for determining resonance assignments and three-dimensional (3D) structures, it has become increasingly clear that protein NMR spectra have the quality and information content to allow largely automated and standardized analyses of assignments and structures for small proteins. This has been realized over the past few years in the development of automated methods for many of the steps in production nuclear magnetic resonance (NMR) protein structure analysis. These advances are significant demonstrations of NMR as a powerful and accessible tool for biophysical chemistry, drug design, and functional genomics. In this chapter, we summarize our efforts in standardizing the NMR data collection process, building an integrated platform for automated NMR structure analysis, and demonstrating its impact for the Northeast Structural Genomics (NESG) Consortium.

Section snippets

Overview of the Automated Protein Structure Analysis Process

The principal steps of automated NMR protein structure analysis are outlined in Fig. 1. These include (1) standardized data collection and organization, (2) processing (including spectral referencing and Fourier transformation), (3) peak picking and peak list editing, (4) resonance assignment, and (5) structure determination (including analysis of conformational constraints, NOESY assignment, residual dipolar coupling (RDC) data analysis, and 3D structure generation). In building an automated

The Organizational Challenge

The process of NMR-based protein structure analysis is challenged by requirements for properly executing, processing, and analyzing many separate NMR experiments. Unlike biomolecular crystallography, which generally involves a single type of data collection experiment, an NMR protein structure determination may require proper collection and analysis of 10–20 individual two-, three-, and four-dimensional (2D, 3D, and 4D) NMR spectra. These data must be highly self-consistent, as the input to the

Local Data Organization and Archiving

Biomolecular NMR research groups require efficient and simple access to archival NMR data, both for routine storage purposes and for the development and testing of novel computational methods for data analysis. Common methods of archiving raw NMR data [usually in the form of time domain free-induction decay (FID) data] in use in most biomolecular NMR laboratories are often inefficient, outdated, and error prone, leading to frequent loss of valuable data that are both hard and expensive to

NMR Spectral Processing

Several NMR spectral processing issues need to be carefully considered for successful automated data analysis. Particularly important is accurate and precise chemical shift referencing in the direct and indirect dimensions using IUPAC-defined referencing methods (Wishart et al., 1995), with dimethylsilapentane-5-sulfonic acid (DSS) as the reference compound. Accurate 13C,15N, and 1H referencing is essential for ensuring the development of an accurate database of chemical shift values (Zhang et

Peak Picking

Peak picking represents one of the crucial steps of NMR data analysis that has resisted successful automation for the purpose of automated resonance assignment and structure determination. This is due largely to cross-peak overlap and artifacts associated with large peaks, especially solvent and diagonal peaks. Multidimensional NMR spectra often exhibit artifacts of baseline distortions, intense solvent lines, ridges, and⧸or sinc wiggles. These problems are sometimes exacerbated by different

Interspectral Registration and Quality Assessment of Peak Lists

Quality assessment of input peak lists for further steps in the automated NMR analysis is crucial for the success of automation. We use several quality assessments of peak lists when judging if the peak lists are good enough for the later steps of automation. These include (1) peak list registration, (2) the examine_expected_peaks.pl (EEP), and (3) the ESS reports of the AutoPeak software suite (Moseley et al., 2001). The first quality assessment is the ability to register peak lists to each

AutoAssign: Automated Analysis of Backbone Resonance Assignments

Significant progress has been made recently in automated analysis of resonance assignments, particularly using triple-resonance NMR data. Several laboratories are developing programs that automate either backbone or complete resonance assignments [reviewed in Baran 2004, Moseley 1999, Zimmerman 1995]. Most automated programs use the same general analysis scheme that originates from the classical strategy developed by Wüthrich and co-workers (Billeter 1982, Wagner 1982, Wuthrich 1986).

Most

Automated Analysis of Side-Chain Resonance Assignments

Although several approaches have been found to provide robust automation of backbone resonance assignments, a robust approach to automated side-chain assignments is not yet generally available. The program GARANT (Bartels et al., 1996) supports automated backbone and side-chain assignments. Recently, a combined approach of using GARANT and AUTOPSY (Koradi et al., 1998) together demonstrates promising results in automating both peak picking and resonance assignments, including many side-chain

Resonance Assignment Validation Software

As with peak picking, quality assessment of resonance assignments is crucial for robustness in later steps of the automated NMR analysis. For this purpose, we have developed a set of computer utilities called the Assignment Validation Software (AVS) suite (Moseley et al., 2004) for rigorously evaluating and validating a set of protein resonance assignments before submission to the BMRB and⧸or use in subsequent structure and⧸or functional analysis, without the need of a 3D structure. They serve

AutoStructure: Automated Analysis of NOESY Data

One of the principal goals of automated structure determination programs involves iterative analysis of multidimensional NOESY data. Several fully automated heuristic approaches for NOESY interpretation and structure calculation have been developed, including NOAH (Mumenthaler 1995, Mumenthaler 1997), ARIA (Nilges 1995, Nilges 1997), CANDID (Herrmann et al., 2002b), AutoStructure (Huang et al., 2003), a simulated annealing assignment approach implemented in XPLOR (Kuszewski et al., 2004), and

Minimal Constraint Approaches to Rapid Automated Fold Determination

Medium-accuracy fold information can often provide key clues about protein evolution and biochemical function(s). Extending ideas originally proposed by Kay and co-workers for determining low-resolution structures of larger proteins (Gardner et al., 1997), a largely automatic strategy has been developed for rapid determination of medium-accuracy protein backbone structures using deuterated, 13C,15N-enriched protein samples with selective protonation of side-chain methyl groups (13CH3) (Zheng et

Structure Quality Assessment Tools

One of the most important challenges in modern protein NMR is to develop a fast and sensitive structure quality assessment measure that can evaluate the “goodness-of-fit” of a 3D structure compared with its NOESY peak lists and indicate the correctness of its fold. This is especially critical for automated NOESY interpretation and structure determination approaches. One approach uses an NMR R factor similar to that used in X-ray crystallography, which often requires computationally intensive,

An Integrated Platform for Automated NMR Structure Analysis

Protein NMR spectroscopists depend on a number of software packages to facilitate the analysis of data. For this reason, the process of solving a protein structure by NMR presents a formidable technical challenge to scientists. Although a number of software packages have been developed for the analysis of NMR data, a comprehensive solution for the complete automated analysis of NMR data from FIDs to three-dimensional structures is not yet available. Users choose between a number of different

Conclusions

Recent developments provide automated analysis of NMR assignments and 3D structures. These approaches are generally applicable to proteins ranging from about 50 to 150 amino acids. Although progress over the past few years is encouraging, even for small proteins more work is required before automated structure analysis is routine. In particular, general methods for automated analysis of side-chain resonance assignments are not yet well developed, though current efforts in this area are quite

Acknowledgements

We thank J. Aramini, A. Bhattacharya, G. Sahota, D. Snyder, G. V. T. Swapna, and D. Zheng for useful discussions and for their efforts over the past several years in developing automated NMR data analysis algorithms and software. The authors' recent work on automated NMR data analysis has been supported by the NIH Protein Structure Initiative (P50-GM62413).

References (109)

  • N.J. Greenfield et al.

    Solution NMR structure and folding dynamics of the N terminus of a rat non-muscle alpha-tropomyosin in an engineered chimeric protein

    J. Mol. Biol.

    (2001)
  • S. Grzesiek et al.

    Correlation of backbone amide and aliphatic side-chain resonances in 13C⧸15N-enriched proteins by isotropic mixing of 13C magnetization

    J. Magn. Reson.

    (1993)
  • P. Güntert et al.

    Torsion angle dynamics for NMR structure calculation with the new program DYANA

    J. Mol. Biol.

    (1997)
  • U. Heinemann et al.

    An integrated approach to structural genomics

    Prog. Biophys. Mol. Biol.

    (2000)
  • T. Herrmann et al.

    Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA

    J. Mol. Biol.

    (2002)
  • Y.J. Huang et al.

    Solution NMR structure of ribosome-binding factor A (RbfA), a cold-shock adaptation protein from Escherichia coli

    J. Mol. Biol.

    (2003)
  • J. Janin et al.

    Packing of alpha-helices onto beta-pleated sheets and the anatomy of alpha⧸beta proteins

    J. Mol. Biol.

    (1980)
  • P. Koehl

    Linear prediction spectral analysis of NMR data

    Prog. NMR Spectrosc.

    (1999)
  • R. Koradi et al.

    Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY

    J. Magn. Reson.

    (1998)
  • T.M. Logan et al.

    Side chain and backbone assignments in isotopically labeled proteins from two heteronuclear triple resonance experiments

    FEBS Lett.

    (1992)
  • H.N. Moseley et al.

    Automated analysis of NMR assignments and structures for proteins

    Curr. Opin. Struct. Biol.

    (1999)
  • H.N. Moseley et al.

    Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data

    Methods Enzymol.

    (2001)
  • C. Mumenthaler et al.

    Automated assignment of simulated and experimental NOESY spectra of proteins by feedback filtering and self-correcting distance geometry

    J. Mol. Biol.

    (1995)
  • M. Nilges

    Calculation of protein structures with ambiguous distance restraints. Automated assignment of ambiguous NOE crosspeaks and disulphide connectivities

    J. Mol. Biol.

    (1995)
  • M. Nilges et al.

    Automated NOESY interpretation with ambiguous distance restraints: The refined NMR solution structure of the pleckstrin homology domain from beta-spectrin

    J. Mol. Biol.

    (1997)
  • C.D. Schwieters et al.

    The Xplor-NIH NMR molecular structure determination package

    J. Magn. Reson.

    (2003)
  • G. Vriend

    WHAT IF: A molecular modeling and drug design program

    J. Mol. Graph.

    (1990)
  • G. Wagner et al.

    Sequential resonance assignments in protein 1H nuclear magnetic resonance spectra. Basic pancreatic trypsin inhibitor

    J. Mol. Biol.

    (1982)
  • J. Westbrook et al.

    Validation of protein structures for the protein data bank

    Methods Enzymol.

    (2003)
  • K. Wüthrich et al.

    Pseudo-structures for the 20 common amino acids for use in studies of protein conformations by measurements of intramolecular proton-proton distance constraints with nuclear magnetic resonance

    J. Mol. Biol.

    (1983)
  • M. Adler

    Modified genetic algorithm resolves ambiguous NOE restraints and reduces unsightly NOE violations

    Proteins

    (2000)
  • M. Andrec et al.

    Protein sequential resonance assignments by combinatorial enumeration using 13C alpha chemical shifts and their (i, i−1) sequential connectivities

    J. Biomol. NMR

    (2002)
  • J.M. Aramini et al.

    Solution NMR structure of the 30S ribosomal protein S28E from Pyrococcus horikoshii

    Protein Sci.

    (2003)
  • H.S. Atreya et al.

    A tracked approach for automated NMR assignments in proteins (TATAPRO)

    J. Biomol. NMR

    (2000)
  • M.C. Baran et al.

    SPINS: Standardized protein NMR storage. A data dictionary and object-oriented relational database for archiving protein NMR spectra

    J. Biomol. NMR

    (2002)
  • M.C. Baran et al.

    Automated analysis of protein NMR assignments and structures

    Chem. Rev.

    (2004)
  • C. Bartels et al.

    Automated sequence-specific NMR assignment of homologous proteins using the program GARANT

    J. Biomol. NMR

    (1996)
  • C. Bartels et al.

    GARANT—A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra

    J. Comput. Chem.

    (1997)
  • H.M. Berman et al.

    The Protein Data Bank

    Nucleic Acids Res.

    (2000)
  • A.T. Brünger
  • A.T. Brünger et al.

    Crystallography & NMR system: A new software suite for macromolecular structure determination

    Acta Crystallogr. D Biol. Crystallogr.

    (1998)
  • M.R. Chance et al.

    Structural genomics: A pipeline for providing structures for the biologist

    Protein Sci.

    (2002)
  • C. Chothia

    Principles that determine the structure of proteins

    Annu. Rev. Biochem.

    (1984)
  • G.M. Clore et al.

    High-resolution structure of the oligomerization domain of p53 by multidimensional NMR

    Science

    (1994)
  • B.E. Coggins et al.

    PACES: Protein sequential assignment by computer-assisted exhaustive search

    J. Biomol. NMR

    (2003)
  • G. Cornilescu et al.

    Protein backbone angle restraints from searching a database for chemical shift and sequence homology

    J. Biomol. NMR

    (1999)
  • F. Delaglio et al.

    NMRPipe: A multidimensional spectral processing system based on UNIX pipes

    J. Biomol. NMR

    (1995)
  • C. Eccles et al.

    Efficient analysis of protein 2D NMR spectra using the software package EASY

    J. Biomol. NMR

    (1991)
  • B.T. Farmer et al.

    Assignment of side-chain 13C resonances in perdeuterated proteins

    J. Am. Chem. Soc.

    (1995)
  • S.W. Fesik et al.

    2D and 3D NMR spectroscopy employing 13C-13C magnetization transfer by isotropic mixing. Spins system identification in large proteins

    J. Am. Chem. Soc.

    (1990)
  • Cited by (68)

    • A common binding motif in the ET domain of BRD3 forms polymorphic structural interfaces with host and viral proteins

      2021, Structure
      Citation Excerpt :

      Samples were prepared in 3-, 4- or 5-mm Shigemi NMR tubs, or in 1.7-mm micro NMR tubes. Conventional 3D triple resonance (Baran et al., 2004; Huang et al., 2005a) and 3D NOESY NMR experiments were executed using standard Bruker NMR pulse sequences optimized for each protein sample at 25°C on Bruker Avance III 600 MHz and Avance III 800 MHz NMR spectrometer systems. Complex formation was confirmed using 1D 15N T1 and T2 relaxation experiments to estimate rotational correlation times τc, and effective molecular weights were estimated by comparison with τc measurements made on collection of reference proteins, as described elsewhere (Rossi et al., 2010).

    • NMR structure validation in relation to dynamics and structure determination

      2014, Progress in Nuclear Magnetic Resonance Spectroscopy
      Citation Excerpt :

      This area is very well developed, with residue-level validation important to detect errors [43]. Independent cross-validation is typically done using RDC [24–26], chemical shift anisotropy [79], paramagnetic relaxation [80] or SAXS data [27], with original NOESY spectra or peak lists providing a not fully independent but largely valid alternative [28–31,39]; note however that novel structure calculation approaches like UNIO [40] use these data directly. Comparing back-calculated chemical shifts to the experimental ones also offers an excellent approach, although backbone chemical shifts are not fully independent if dihedral restraints were derived from chemical shifts using methods such as TALOS [81–83] or DANGLE [84].

    View all citing articles on Scopus
    View full text