Elsevier

Methods in Enzymology

Volume 523, 2013, Pages 109-143
Methods in Enzymology

Chapter Six - Scientific Benchmarks for Guiding Macromolecular Energy Function Improvement

https://doi.org/10.1016/B978-0-12-394292-0.00006-0Get rights and content

Abstract

Accurate energy functions are critical to macromolecular modeling and design. We describe new tools for identifying inaccuracies in energy functions and guiding their improvement, and illustrate the application of these tools to the improvement of the Rosetta energy function. The feature analysis tool identifies discrepancies between structures deposited in the PDB and low-energy structures generated by Rosetta; these likely arise from inaccuracies in the energy function. The optE tool optimizes the weights on the different components of the energy function by maximizing the recapitulation of a wide range of experimental observations. We use the tools to examine three proposed modifications to the Rosetta energy function: improving the unfolded state energy model (reference energies), using bicubic spline interpolation to generate knowledge-based torisonal potentials, and incorporating the recently developed Dunbrack 2010 rotamer library (Shapovalov & Dunbrack, 2011).

Introduction

Scientific benchmarks are essential for the development and parameterization of molecular modeling energy functions. Widely used molecular mechanics energy functions such as Amber and OPLS were originally parameterized with experimental and quantum chemistry data from small molecules and benchmarked against experimental observables such as intermolecular energies in the gas phase, solution phase densities, and heats of vaporization (Jorgensen et al., 1996, Weiner et al., 1984). More recently, thermodynamic measurements and high-resolution structures of macromolecules have provided a valuable testing ground for energy function development. Commonly used scientific tests include discriminating the ground state conformation of a macromolecule from higher energy conformations (Novotný et al., 1984, Park and Levitt, 1996, Simons et al., 1999), and predicting amino acid sidechain conformations (Bower et al., 1997, Jacobson et al., 2002) and free energy changes associated with protein mutations (Gilis and Rooman, 1997, Guerois et al., 2002, Potapov et al., 2009).

Many studies have focused on optimizing an energy function for a particular problem in macromolecular modeling, for instance, the FoldX energy function was empirically parameterized for predicting changes to the free energy of a protein when it is mutated (Guerois et al., 2002). Often, these types of energy functions are well suited only to the task they have been trained for. Kellogg, Leaver-Fay, and Baker (2011) showed that an energy function explicitly trained to predict energies of mutation did not produce native-like sequences when redesigning proteins. For many projects, it is advantageous to have a single energy function that can be used for diverse modeling tasks. For example, protocols in the molecular modeling program Rosetta for ligand docking (Meiler & Baker, 2003), protein design (Kuhlman et al., 2003), and loop modeling (Wang, Bradley, & Baker, 2007) share a common energy function, which allowed Murphy, Bolduc, Gallaher, Stoddard, and Baker (2009) to combine them to shift an enzyme's substrate specificity.

Sharing a single energy function between modeling applications presents both opportunities and challenges. Researchers applying the energy function to new tasks sometimes uncover deficiencies in the energy function. The opportunities are that correcting the deficiencies in the new tasks will result in improvements in the older tasks—after all, nature uses only one energy function. Sometimes, however, modifications to the energy function that improve its performance at one task degrade its performance at others. The challenges are then to discriminate beneficial from deleterious modifications and reconcile task-specific objectives.

To address these challenges, we have developed three tools based on benchmarking Rosetta against macromolecular data. The first tool (Section 3), a suite we call “feature analysis,” can be used to contrast ensembles of structural details from structures in the PDB and from structures generated by Rosetta. The second tool (Section 4), a program we call “optE,” relies on fast, small-scale benchmarks to train the weights in the energy function. These two tools can help identify and fix flaws in the energy function, facilitating the process of integrating a proposed modification. We follow (Section 5) with a curated set of large-scale benchmarks meant to provide sufficient coverage of Rosetta's applications. The use of these benchmarks will provide evidence that a proposed energy function modification should be widely adopted. To conclude (Section 6), we demonstrate our tools and benchmarks by evaluating three incremental modifications to the Rosetta energy function.

Alongside this chapter, we have created an online appendix, which documents usage of the tools, input files, instructions for running the benchmarks, and current testing results: http://rosettatests.graylab.jhu.edu/guided_energy_function_improvement.

Section snippets

Energy Function Model

The Rosetta energy function is a linear combination of terms that model interactions between atoms, solvation effects, and torsion energies. More specifically, Score12 (Rohl, Strauss, Misura, & Baker, 2004), the default fullatom energy function in Rosetta, consists of a Lennard–Jones term, an implicit solvation term (Lazaridis & Karplus, 1999), an orientation-dependent H-bond term (Kortemme, Morozov, & Baker, 2003),sidechain and backbone torsion potentials derived from the PDB, a short-ranged

Feature Analysis

We aim to facilitate the analysis of distributions of measurable properties of molecular conformations, which we call “feature analysis.” By formalizing the analysis process, we are able to create a suite of tools and benchmarks that unify the collection, visualization, and comparison of feature distributions. After motivating our work, we describe the components (Section 3.1) and illustrate how they can be integrated into a workflow (Section 3.2) by investigating the distribution of the

Maximum Likelihood Parameter Estimation with optE

Recall that the Rosetta energy function is a weighted linear combination of energy terms that capture different aspects of molecular structure, as defined in Eq. (6.1). The weights, w, balance the contribution of each term to give the overall energy. Because the weights often need adjusting after modifying an energy term, we have developed a tool called “optE” to facilitate fitting them against scientific benchmarks. The benchmarks are small, tractable tests of Rosetta's ability to recapitulate

Large-Scale Benchmarks

Scientific benchmarking allows energy function comparison. The tests most pertinent to the Rosetta community often aim toward recapitulating observations from crystal structures. In this section, we describe a curated set of previously published benchmarks, which together provide a comprehensive view of an energy function's strengths and weaknesses. We continually test the benchmarks on the RosettaTests server to allow us to immediately detect changes to Rosetta that degrades its overall

Three Proposed Changes to the Rosetta Energy Function

In this final section, we describe three changes to Rosetta's energy function. After describing each change and its rationale, we present the results of the benchmarks described above.

Conclusion

We have described three tools that can be used to evaluate and improve macromolecular energy functions. Inaccuracies in the energy function can be identified by comparing features from crystal structures and computationally generated structures. New or reparameterized energy terms can be rapidly tested with optE to determine if the change improves structure prediction and sequence design. When a new term is ready to be rigorously tested, we can test for unintended changes to feature

Acknowledgments

Support for A. L. F., M. J. O., and B. K. came from GM073151 and GM073960. Support for J. S. R. came from NIH R01 GM073930. Thanks to Steven Combs for bringing the bicubic-spline implementation to Rosetta.

References (59)

  • J. Novotný et al.

    An analysis of incorrectly folded protein models. Implications for structure predictions

    Journal of Molecular Biology

    (1984)
  • B. Park et al.

    Energy functions that discriminate X-ray and near native folds from well-constructed decoys

    Journal of Molecular Biology

    (1996)
  • R.J. Petrella et al.

    Protein sidechain conformer prediction: A test of the energy function

    Folding and Design

    (1998)
  • J.W. Ponder et al.

    Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes

    Journal of Molecular Biology

    (1987)
  • C.A. Rohl et al.

    Protein structure prediction using Rosetta

    Methods in Enzymology

    (2004)
  • M.V. Shapovalov et al.

    A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions

    Structure

    (2011)
  • K. Simons et al.

    Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions

    Journal of Molecular Biology

    (1997)
  • M.J. Sippl

    Calculation of conformational ensembles from potentials of mean force: An approach to the knowledge-based prediction of local structures in globular proteins

    Journal of Molecular Biology

    (1990)
  • C. Wang et al.

    Protein-protein docking with backbone flexibility

    Journal of Molecular Biology

    (2007)
  • J.M. Word et al.

    Asparagine and glutamine: Using hydrogen atom contacts in the choice of side-chain amide orientation

    Journal of Molecular Biology

    (1999)
  • V.B. Chen et al.

    MolProbity: All-atom structure validation for macromolecular crystallography

    Acta Crystallographica. Section D: Biological Crystallography

    (2010)
  • H.-M. Chen et al.

    Sodock: Swarm optimization for highly flexible protein-ligand docking

    Journal of Computational Chemistry

    (2007)
  • R. Das et al.

    Atomic accuracy in predicting and designing noncanonical RNA structure

    Nature Methods

    (2010)
  • F. Ding et al.

    Emergence of protein fold families through rational design

    PLoS Computational Biology

    (2006)
  • G.F. Fabiola et al.

    C-H⋯O hydrogen bonds in β-sheets

    Acta Crystallographica. Section D: Biological Crystallography

    (1997)
  • S.J. Fleishman et al.

    RosettaScripts: A scripting language interface to the Rosetta macromolecular modeling suite

    PLoS One

    (2011)
  • T. Hamelryck et al.

    Potentials of mean force for protein structure prediction vindicated, formalized and generalized

    PLoS One

    (2010)
  • R. Jacak et al.

    Computational protein design with explicit consideration of surface hydrophobic patches

    Proteins

    (2012)
  • M.P. Jacobson et al.

    Force field validation using protein side chain prediction

    The Journal of Physical Chemistry B

    (2002)
  • Cited by (174)

    View all citing articles on Scopus
    1

    These authors contributed equally to this work.

    View full text