Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More Than 1000 Mutations

https://doi.org/10.1016/S0022-2836(02)00442-4Get rights and content

Abstract

We have developed a computer algorithm, FOLDEF (for FOLD-X energy function), to provide a fast and quantitative estimation of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants (1088 mutants) spanning most of the structural environments found in proteins. FOLDEF uses a full atomic description of the structure of the proteins. The different energy terms taken into account in FOLDEF have been weighted using empirical data obtained from protein engineering experiments. First, we considered a training database of 339 mutants in nine different proteins and optimised the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein–protein complex mutants. The global correlation obtained for 95 % of the entire mutant database (1030 mutants) is 0.83 with a standard deviation of 0.81 kcal mol−1 and a slope of 0.76. The present energy function uses a minimum of computational resources and can therefore easily be used in protein design algorithms, and in the field of protein structure and folding pathways prediction where one requires a fast and accurate energy function. FOLDEF is available via a web-interface at http://fold-x.embl-heidelberg.de

Introduction

The translation of structural data into energetic parameters is one of the long-term goals of protein structure analysis. Moreover, there is a need for accurate and fast algorithms for protein energy calculations, in particular for the development of algorithms with complex search procedures and numerous combinatorial calculations. Typical examples of such algorithms are protein design and protein docking algorithms1., 2. and protein structure prediction methods.3 Recently, the possibility of predicting protein folding pathways from their folded structure prompted an interest in obtaining fast and reliable energy calculations from a static protein structure.4., 5., 6., 7.

The development of a fast and reliable protein force-field is a complex task, given the delicate balance between the different energy terms that contribute to protein stability.8., 9. Many different force-fields have been constructed for predicting protein stability changes. These range from force-fields based on pure statistical analysis of structural sequence preferences,10., 11., 12., 13., 14. and force-fields based on multiple sequence alignments,15., 16., 17. to detailed molecular dynamics force-fields.18., 19.

These force-fields can be divided into three major categories: (i) those using a physical effective energy function (PEEF); (ii) those based on statistical potentials for which energies are derived from the frequency of residue or atom contacts in the protein database (SEEF) as reviewed by Lazaridis & Karplus;3 and (iii) those using empirical data obtained form experiments ran on proteins (EEEF).

The main drawbacks of the PEEF potentials are that they are computationally very expensive and they can therefore be used only on small sets of protein mutants. The computation time can be reduced somewhat by using implicit terms for solvation energies and side-chain entropies, but the time required to get a reliable estimate of a free energy difference between a wild-type and mutant protein is still significant.20

The power of SEEFs is that they contain terms that account for complex effects that are difficult to describe separately, and they contain empirical approximations for the denatured state. A drawback of this approach is that once an SEEF potential has been constructed, improvements cannot be added easily without introducing overlaps in the underlying energies.

EEEF approaches combine a physical description of the interactions with lessons learned from experiments. Good examples of such algorithms are the helix/coil transition algorithm AGADIR21., 22. or the SPMP method.23 The AGADIR algorithm is accurate at predicting the helical content of peptides in solution and has been used to design mutations that increase the thermostability of a protein through local interactions.24., 25., 26. A limitation of this algorithm is that it can be applied only to α-helices and cannot take tertiary interactions into account.

Here, we have developed an energy function based on the EEEF approach using a strategy similar to that used for the development of AGADIR. We have taken advantage of the large amount of experimental work that has been devoted to understanding protein energetics. In particular, we have relied on the body of data that probed, through single and multiple-residue mutation analysis, the roles of particular interactions that contribute to protein stability.27., 28. We followed a two-step procedure. First, we considered a training database of 339 mutants in nine different proteins and optimised the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein–protein complex mutants.

Considering the training and the blind test database together, the algorithm was tested over 1088 mutants. In this entire database, most of the important interactions that govern protein stability are represented in the protein mutant database. All types of secondary structures are represented substantially (turn, 17%; alpha and 310-helix, 30%; β-sheet, 32%; coil, 21%). There is a similar number of mutations that involve only hydrophobic residues and mutations that involve deletions or substitution of polar atoms (47% and 53%, respectively). Finally, the percentages of mutated residues having a solvent-accessibility higher or lower than 30% are similar, 45% and 55% of the mutant database, respectively. The global correlation obtained for 95% of the entire mutant database is 0.83 with a standard deviation of 0.81 kcal mol−1 (1cal=4.184J) and a slope of 0.76. The present energy function FOLDEF (FOLD-X energy function, in the following) uses a minimum of computational resources and can therefore be used easily in protein design algorithms where one requires a fast and accurate energy function.

Section snippets

Energy terms in the FOLD-X energy function

The FOLD-X energy function (FOLDEF) includes terms that have been found to be important for protein stability. The free energy of unfolding (ΔG) of a target protein is calculated using equation (1):ΔG=WvdwΔGvdw+WsolvHΔGsolvH+WsolvPΔGsolvP+ΔGwb+ΔGhbond+ΔGel+WmcTΔSmc+WscTΔSscwhere ΔGvdw is the sum of the van der Waals contributions of all atoms. ΔGsolvH and ΔGsolvP is the difference in solvation energy for apolar and polar groups, respectively, when going from the unfolded to the folded state. ΔG

Discussion

The strategy used in this work is based on the large number of protein mutants whose thermodynamic properties have been studied experimentally. Hence, the FOLDEF energy function includes the energy data derived from model compound studies, and accounts for the features specific to the protein world. These features are, for instance, the importance of the structural flexibility, the existence of the unfolded state as a reference state, and the dielectric properties of the protein in the core or

Conclusion

FOLDEF was developed to provide a fast and quantitative estimation of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants spanning most of the structural environments found in proteins. The standard deviations (Table 3) indicate that for 70% of the mutants the error was below 0.81 kcal mol−1. This value provides a confidence interval that can be used to assess the

Composition of the potential

FOLD-X energy function (FOLDEF) includes several terms: van der Waals interactions, solvation effects, hydrogen bonds, water bridges, electrostatic and entropy effects for the backbone and the side-chain see equation (1) in Results.

Solvent exposure

In FOLDEF, interaction energies are scaled with the solvent accessibility of the atoms involved in the interaction. The solvent accessibility is estimated using the atomic occupancy method (Occ),36., 37., 38. which sums the volumes of the atoms j surrounding a given

Supplementary Files

Acknowledgements

We thank N. J. C. Strynadka for giving us the coordinates of the TEM–BLIP structure before release in the PDB (1JTD), and E. Lacroix for providing us with the statistical analysis of the ϕ/ψ dihedral angles of each amino acid. This work was supported by EU grants BIO4-CT97-2086 and CT96-0013, and by the Ramon Areces Fundation (Spain). R.G. was supported by a fellowship from the Human Frontier Science Program. This work was supported, in part, by NIH and HHMI grants made to J. Andrew McCammon,

References (93)

  • H. Domingues et al.

    Improving the refolding yield of interleukin-4 through the optimization of local interactions

    J. Biotechnol.

    (2000)
  • N. Taddei et al.

    Stabilisation of alpha-helices by site-directed mutagenesis reveals the importance of secondary structure in the transition state for acylphosphatase folding

    J. Mol. Biol.

    (2000)
  • K. Takano et al.

    Contribution of water molecules in the interior of a protein to the conformational stability

    J. Mol. Biol.

    (1997)
  • R. Abagyan et al.

    Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins

    J. Mol. Biol.

    (1994)
  • L. Serrano et al.

    The folding of an enzyme. II. Substructure of barnase and the contribution of different interactions to protein stability

    J. Mol. Biol.

    (1992)
  • B.W. Matthews

    Studies on protein stability with T4 lysozyme

    Advan. Protein Chem.

    (1995)
  • F. Colonna-Cesari et al.

    Excluded volume approximation to protein–solvent interaction. The solvent contact model

    Biophys. J.

    (1990)
  • L. Holm et al.

    Evaluation of protein models by atomic solvation preference

    J. Mol. Biol.

    (1992)
  • L.S. Itzhaki et al.

    The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation-condensation mechanism for protein folding

    J. Mol. Biol.

    (1995)
  • S.J. Hamill et al.

    The folding of an immunoglobulin-like Greek key protein is defined by a common-core nucleus and regions constrained by topology

    J. Mol. Biol.

    (2000)
  • K.F. Fulton et al.

    Mapping the interactions present in the transition state for unfolding/folding of FKBP12

    J. Mol. Biol.

    (1999)
  • V. Villegas et al.

    Structure of the transition state in the folding process of human procarboxypeptidase A2 activation domain

    J. Mol. Biol.

    (1998)
  • D.E. Kim et al.

    A breakdown of symmetry in the folding transition state of protein L

    J. Mol. Biol.

    (2000)
  • D. Shortle

    Staphylococcal nuclease: a showcase of m-value effects

    Advan. Protein Chem.

    (1995)
  • S. Albeck et al.

    Evaluation of direct and cooperative contributions towards the strength of buried hydrogen bonds and salt bridges

    J. Mol. Biol.

    (2000)
  • C. Zhang et al.

    Determination of atomic desolvation energies from the structures of crystallized proteins

    J. Mol. Biol.

    (1997)
  • Y. Nozaki et al.

    The solubility of amino acids and two glycine peptides in aqueous ethanol and dioxane solutions. Establishment of a hydrophobicity scale

    J. Biol. Chem.

    (1971)
  • M. Levitt

    A simplified representation of protein conformations for rapid simulation of protein folding

    J. Mol. Biol.

    (1976)
  • M.A. Roseman

    Hydrophilicity of polar amino acid side-chains is markedly reduced by flanking peptide bonds

    J. Mol. Biol.

    (1988)
  • Y.W. Chen et al.

    Contribution of buried hydrogen bonds to protein stability. The crystal structures of two barnase mutants

    J. Mol. Biol.

    (1993)
  • T. Lazaridis et al.

    Discrimination of the native from misfolded protein models with an energy function including implicit solvation

    J. Mol. Biol.

    (1999)
  • I. Motoc et al.

    Van der Waals volume fragmental constants

    Chem. Phys. Letters

    (1985)
  • J.A. Ippolito et al.

    Hydrogen bond stereochemistry in protein structure and function

    J. Mol. Biol.

    (1990)
  • J.C. Covalt et al.

    Core and surface mutations affect folding kinetics, stability and cooperativity in IL-1 beta: does alteration in buried water play a role?

    J. Mol. Biol.

    (2001)
  • S.M. Roe et al.

    Patterns for prediction of hydration around polar residues in proteins

    J. Mol. Biol.

    (1993)
  • T. Hage et al.

    Crystal structure of the interleukin-4/receptor alpha chain complex reveals a mosaic binding interface

    Cell

    (1999)
  • J.W. Wray et al.

    Structural analysis of a non-contiguous second-site revertant in T4 lysozyme shows that increasing the rigidity of a protein can enhance its stability

    J. Mol. Biol.

    (1999)
  • C.J. Camacho et al.

    Protein docking along smooth association pathways

    Proc. Natl Acad. Sci. USA

    (2001)
  • E. Alm et al.

    Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures

    Proc. Natl Acad. Sci. USA

    (1999)
  • O.V. Galzitskaya et al.

    A theoretical search for folding/unfolding nuclei in three-dimensional protein structures

    Proc. Natl Acad. Sci. USA

    (1999)
  • V. Munoz et al.

    A simple model for calculating the kinetics of protein folding from three-dimensional structures

    Proc. Natl Acad. Sci. USA

    (1999)
  • C.N. Pace et al.

    Forces contributing to the conformational stability of proteins

    FASEB J.

    (1996)
  • M.J. Rooman et al.

    Are database-derived potentials valid for scoring both forward and inverted protein folding?

    Protein Eng.

    (1995)
  • C.M. Topham et al.

    Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables

    Protein Eng.

    (1997)
  • K.L. Maxwell et al.

    Mutagenesis of a buried polar interaction in an SH3 domain: sequence conservation provides the best prediction of stability effects

    Biochemistry

    (1998)
  • S.M. Larson et al.

    The identification of conserved interactions within the SH3 domain by alignment of sequences and structures

    Protein Sci.

    (2000)
  • Cited by (0)

    View full text