Atomistic molecular simulations of protein folding
Highlights
► Advances in sampling now allow atomistic simulations of protein folding. ► Developments of energy functions have reduced secondary structure bias. ► Atomistic simulations give an estimate of the transition path time. ► Comparison to experimental results is critical – but very challenging. ► Studying the unfolded state is one of the frontiers for all-atom simulation.
Introduction
Theory and coarse-grained molecular simulations can give powerful insights into the nature of protein folding. Many properties of folding can be understood from the hypothesis that the energy landscape of proteins is ‘funneled’ [1, 2, 3], with both the energy and configurational entropy smoothly decreasing as a function of the nativeness of the structure, and only minimal ‘frustration’ due to non-native contacts [4]. Such a landscape can only arise through evolution or design, since the landscapes of random heteropolymers will not have these features [5]. Funnel-based approaches, including both theory, and coarse-grained simulation models (Gō models [6]) have been very successful: they can explain the fact that proteins fold fast, the relative folding rates of different folds, and folding mechanism, at coarse resolution [7, 8, 9]. Such models can even be used to explain misfolding events, provided that these are driven by native-like interactions [10, 11].
An alternative to assuming a particular form for the energy landscape is to attempt to model constructively the specific physical interactions giving rise to the landscape. This type of model does not depend on knowing the folded structure and can therefore fully account for non-native contributions to the energy. The most feasible method of doing this is to use classical dynamics with an empirically parameterized potential energy surface, or ‘force field’ [12]. Both the protein and solvent are represented at atomic detail, with energy terms describing variations in energy due to bond and angle stretching, torsional rotations, dispersion, exchange and long-range electrostatics. In principle, if this is done accurately, all of the interactions present in the real system should be captured and the energy function will fold proteins. However, the disadvantage of this approach is that it comes at an enormous computational cost, relative to the more coarse-grained approaches described above. Furthermore, the actual energy functions used currently are certainly an approximation, neglecting effects such as electronic polarizability that are known to make a significant contribution to the total energy [13]. Therefore, one might ask whether undertaking such simulations is worth the effort, given that very useful results can be obtained easily by starting from funnel models derived from more high-level physical considerations. The devil's advocate might even ask: are atomistic simulations just very expensive and detailed, but possibly not very realistic, movies?
In fact, as a result of a general increase in computational power and the development of purpose-built computer hardware, the development of novel computational algorithms and improvements in energy functions, it has become possible in the last few years to fold a number of small proteins with all-atom simulations [14••, 15••, 16, 17, 18•]. In this article, I review the advances that have made this feat possible, with an emphasis on sampling algorithms and energy functions. I examine what additional insights we have gained into protein folding from running atomistic molecular dynamics simulations, and what we may hope to gain from them in future – in particular, the advantages that they may hold over coarse-grained approaches. Lastly, I consider how energy functions for folding might evolve in order to represent more accurately the molecular energy surface. I also discuss how far they might be systematically simplified, given that the relatively simple additive energy functions used today have already been quite successful.
Section snippets
Advances in sampling of folding events
The first approach successfully used to fold proteins and compute folding rates was based on distributed computing, where a large number of independent simulations are run on different computers. By running many short simulations of tens to hundreds of nanoseconds, a handful of folding events will be observed - this concept was pioneered in the Folding@Home project, initially using implicit solvent [19]. Because of the dependence on fast folding events, the rate calculation can be sensitive to
Improvements in energy functions for folding
Clearly the quality of any folding simulation will depend critically on the quality of the underlying energy function. Problems with older force fields have motivated a number of recent efforts to improve force field parameters, with particular emphasis on what is important for folding. A long-standing concern about all-atom force fields has been that many tend to favour formation of either α-helical or β-sheet structures. Common examples include the known bias of the Amber ff94 force-field [44
What have we learnt and what could we to learn?
The availability of atomistic simulation trajectories in principle allows a wealth of detail to be determined on folding mechanism. For example, in a recent benchmark study, Lindorff-Larsen et al. [15••] folded twelve different proteins to their native structure, obtaining in each case reasonable agreement with the experimentally determined folding rate. From these data, they were able to draw some general conclusions about the folding, in the context of the simulation model. They found the
How will we know if we have the right answer?
Obtaining a microscopically detailed picture of folding is clearly of little value if the picture is incorrect. The ultimate test of the accuracy of a simulation must be comparison with experimental observables. For folding, this means obtaining the correct equilibrium observables in each stable state, and the correct relaxation rates between stable states. If the experimental signal could be calculated, it would be straightforward to evaluate this from long equilibrium simulations [56] or from
Room for further improvements
Whilst folding simulations with atomistic force fields have been very successful, there are some well-known shortcomings that should be pointed out, which mainly pertain to the energy functions used. Firstly, although simulations can very often get the correct folding rate near the folding midpoint, and sometimes even the correct stability near 300 K, folding cooperativity is usually too weak. That is, even if the free energy may be approximately correct near 300 K, the temperature dependence is
Conclusion and outlook
Five years ago, it was not clear whether the same atomistic force field would, in general, be able to fold proteins from different structural classes in explicit solvent; and owing to computational limits, it was not possible to find out. Since then, very substantial advances in computing hardware and software have made it feasible to fold a range of proteins with different topologies. This result was facilitated by complementary refinements of the energy functions. Knowing this gives us a lot
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgement
David de Sancho and Kresten Lindorff-Larsen are thanked for helpful comments on the manuscript. The author is supported by a Royal Society University Research Fellowship.
References (78)
- et al.
Topological and energetic factors: what determines the structural details of the transition state ensemble and “en-route” intermediates for protein folding? An investigation for small globular proteins
J Mol Biol
(2000) - et al.
A survey of flexible protein binding mechanisms and their transition states using native topology based energy landscapes
J Mol Biol
(2005) - et al.
Absolute comparison of simulated and experimental protein-folding dynamics
Nature
(2002) - et al.
Making connections between ultrafast protein folding kinetics and molecular dynamics simulations
Proc Natl Acad Sci USA
(2011) - et al.
Force field bias in protein folding simulations
Biophys J
(2009) - et al.
Rate constant and reaction coordinate of trp-cage folding in explicit water
Biophys J
(2008) - et al.
Finite temperature string method for the study of rare events
J Phys Chem B
(2005) - et al.
Microsecond simulations of the folding/unfolding thermodynamics of the Trp-cage miniprotein
Proteins
(2010) - et al.
All-atom empirical potential for molecular modeling and dynamics studies of proteins
J Phys Chem B
(2000) - et al.
Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations
J Comp Chem
(2004)
Protein simulations with an optimized water model: cooperative helix formation and temperature-induced unfolded state collapse
J Phys Chem B
Unfolded state dynamics and structure of protein L characterized by simulation and experiment
J Am Chem Soc
Characterizing the unfolded states of proteins using single molecule FRET spectroscopy and molecular simulations
Proc Natl Acad Sci USA
Ab initio prediction of tryptophan fluorescence quenching by protein electric field enabled electron transfer
J Phys Chem B
Toward an outline of the topography of a realistic protein-folding funnel
Proc Natl Acad Sci USA
From Levinthal to pathways to funnels
Nat Struct Biol
Navigating the folding routes
Science
Behind the folding funnel diagram
Nat Chem Biol
Intermediates and barrier crossing in a random energy model (with applications to protein folding)
J Phys Chem
Studies on protein folding, unfolding and fluctuations by computer simulation. II. A three-dimensional lattice model of lysozyme
Biopolymers
Recent successes of the energy landscape theory of protein folding and function
Q Rev Biophys
Quantifying the roughness on the free energy landscape: entropic bottlenecks and protein folding rates
J Am Chem Soc
Single molecule fluorescence reveals sequence-specific misfolding in multidomain proteins
Nature
Empirical force fields for biological macromolecules: overview and issues
J Comp Chem
Intermolecular potentials
Science
Atomic-level characterization of the structural dynamics of proteins
Science
How fast-folding proteins fold
Science
Folding a protein on a computer: an atomic description of the folding pathway of protein A
Proc Natl Acad Sci USA
Heterogeneity even at the speed limit of folding: large scale molecular dynamics study of a fast-folding variant of the Villin headpiece
J Mol Biol
Microscopic events in β-hairpin folding from alternative unfolded ensembles
Proc Natl Acad Sci USA
GROMACS4: algorithms for highly efficient, load-balanced, and scalable molecular simulation
J Chem Theory Comput
Scalable molecular dynamics with NAMD
J Comp Chem
Ten-microsecond molecular dynamics simulation of a fast-folding WW domain
Biophys J
Common structural transitions in explicit-solvent simulations of villin headpiece folding
Biophys J
Chemical, physical and theoretical kinetics of an ultrafast folding protein
Proc Natl Acad Sci USA
Millisecond-scale molecular dynamics simulations on anton
Transition path sampling: throwing ropes over rough mountain passes, in the dark
Annu Rev Phys Chem
Elaborating transition interface sampling methods
J Comp Phys
Transition-path sampling of beta-hairpin folding
Proc Natl Acad Sci USA
Cited by (126)
Surface hydration and preferential interaction directs the charged amino acids-induced changes in protein stability
2020, Journal of Molecular Graphics and ModellingProtein folding: how, why, and beyond
2020, Protein Homeostasis Diseases: Mechanisms and Novel TherapiesFolding at home: Artificial intelligence and crypto symbiosis for the science
2024, IET BlockchainMolecular Free Energies, Rates, and Mechanisms from Data-Efficient Path Sampling Simulations
2023, Journal of Chemical Theory and Computation