Journal of Molecular Biology
Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More Than 1000 Mutations
Introduction
The translation of structural data into energetic parameters is one of the long-term goals of protein structure analysis. Moreover, there is a need for accurate and fast algorithms for protein energy calculations, in particular for the development of algorithms with complex search procedures and numerous combinatorial calculations. Typical examples of such algorithms are protein design and protein docking algorithms1., 2. and protein structure prediction methods.3 Recently, the possibility of predicting protein folding pathways from their folded structure prompted an interest in obtaining fast and reliable energy calculations from a static protein structure.4., 5., 6., 7.
The development of a fast and reliable protein force-field is a complex task, given the delicate balance between the different energy terms that contribute to protein stability.8., 9. Many different force-fields have been constructed for predicting protein stability changes. These range from force-fields based on pure statistical analysis of structural sequence preferences,10., 11., 12., 13., 14. and force-fields based on multiple sequence alignments,15., 16., 17. to detailed molecular dynamics force-fields.18., 19.
These force-fields can be divided into three major categories: (i) those using a physical effective energy function (PEEF); (ii) those based on statistical potentials for which energies are derived from the frequency of residue or atom contacts in the protein database (SEEF) as reviewed by Lazaridis & Karplus;3 and (iii) those using empirical data obtained form experiments ran on proteins (EEEF).
The main drawbacks of the PEEF potentials are that they are computationally very expensive and they can therefore be used only on small sets of protein mutants. The computation time can be reduced somewhat by using implicit terms for solvation energies and side-chain entropies, but the time required to get a reliable estimate of a free energy difference between a wild-type and mutant protein is still significant.20
The power of SEEFs is that they contain terms that account for complex effects that are difficult to describe separately, and they contain empirical approximations for the denatured state. A drawback of this approach is that once an SEEF potential has been constructed, improvements cannot be added easily without introducing overlaps in the underlying energies.
EEEF approaches combine a physical description of the interactions with lessons learned from experiments. Good examples of such algorithms are the helix/coil transition algorithm AGADIR21., 22. or the SPMP method.23 The AGADIR algorithm is accurate at predicting the helical content of peptides in solution and has been used to design mutations that increase the thermostability of a protein through local interactions.24., 25., 26. A limitation of this algorithm is that it can be applied only to α-helices and cannot take tertiary interactions into account.
Here, we have developed an energy function based on the EEEF approach using a strategy similar to that used for the development of AGADIR. We have taken advantage of the large amount of experimental work that has been devoted to understanding protein energetics. In particular, we have relied on the body of data that probed, through single and multiple-residue mutation analysis, the roles of particular interactions that contribute to protein stability.27., 28. We followed a two-step procedure. First, we considered a training database of 339 mutants in nine different proteins and optimised the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein–protein complex mutants.
Considering the training and the blind test database together, the algorithm was tested over 1088 mutants. In this entire database, most of the important interactions that govern protein stability are represented in the protein mutant database. All types of secondary structures are represented substantially (turn, 17%; alpha and 310-helix, 30%; β-sheet, 32%; coil, 21%). There is a similar number of mutations that involve only hydrophobic residues and mutations that involve deletions or substitution of polar atoms (47% and 53%, respectively). Finally, the percentages of mutated residues having a solvent-accessibility higher or lower than 30% are similar, 45% and 55% of the mutant database, respectively. The global correlation obtained for 95% of the entire mutant database is 0.83 with a standard deviation of 0.81 kcal mol−1 and a slope of 0.76. The present energy function FOLDEF (FOLD-X energy function, in the following) uses a minimum of computational resources and can therefore be used easily in protein design algorithms where one requires a fast and accurate energy function.
Section snippets
Energy terms in the FOLD-X energy function
The FOLD-X energy function (FOLDEF) includes terms that have been found to be important for protein stability. The free energy of unfolding (ΔG) of a target protein is calculated using equation (1):where ΔGvdw is the sum of the van der Waals contributions of all atoms. ΔGsolvH and ΔGsolvP is the difference in solvation energy for apolar and polar groups, respectively, when going from the unfolded to the folded state. ΔG
Discussion
The strategy used in this work is based on the large number of protein mutants whose thermodynamic properties have been studied experimentally. Hence, the FOLDEF energy function includes the energy data derived from model compound studies, and accounts for the features specific to the protein world. These features are, for instance, the importance of the structural flexibility, the existence of the unfolded state as a reference state, and the dielectric properties of the protein in the core or
Conclusion
FOLDEF was developed to provide a fast and quantitative estimation of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants spanning most of the structural environments found in proteins. The standard deviations (Table 3) indicate that for 70% of the mutants the error was below 0.81 kcal mol−1. This value provides a confidence interval that can be used to assess the
Composition of the potential
FOLD-X energy function (FOLDEF) includes several terms: van der Waals interactions, solvation effects, hydrogen bonds, water bridges, electrostatic and entropy effects for the backbone and the side-chain see equation (1) in Results.
Solvent exposure
In FOLDEF, interaction energies are scaled with the solvent accessibility of the atoms involved in the interaction. The solvent accessibility is estimated using the atomic occupancy method (Occ),36., 37., 38. which sums the volumes of the atoms j surrounding a given
Supplementary Files
Acknowledgements
We thank N. J. C. Strynadka for giving us the coordinates of the TEM–BLIP structure before release in the PDB (1JTD), and E. Lacroix for providing us with the statistical analysis of the ϕ/ψ dihedral angles of each amino acid. This work was supported by EU grants BIO4-CT97-2086 and CT96-0013, and by the Ramon Areces Fundation (Spain). R.G. was supported by a fellowship from the Human Frontier Science Program. This work was supported, in part, by NIH and HHMI grants made to J. Andrew McCammon,
References (93)
- et al.
Empirical potentials and functions for protein folding and binding
Curr. Opin. Struct. Biol.
(1997) - et al.
Effective energy functions for protein structure prediction
Curr. Opin. Struct. Biol.
(2000) - et al.
The SH3-fold family: experimental evidence and prediction of variations in the folding pathways
J. Mol. Biol.
(2000) - et al.
Thermodynamics of structural stability and cooperative folding behavior in proteins
Advan. Protein Chem.
(1992) Knowledge-based potentials for proteins
Curr. Opin. Struct. Biol.
(1995)- et al.
Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials
J. Mol. Biol.
(1996) - et al.
Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence
J. Mol. Biol.
(1997) - et al.
Suggestions for safe residue substitutions in site-directed mutagenesis
J. Mol. Biol.
(1991) - et al.
Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters
J. Mol. Biol.
(1998) - et al.
Stabilization of proteins by rational design of alpha-helix stability using helix/coil transition theory
Fold. Des.
(1996)