Journal of Molecular Biology
Sequence Optimization for Native State Stability Determines the Evolution and Folding Kinetics of a Small Protein
Introduction
A major difficulty in studying protein evolution has always been the lack of a meaningful baseline against which to compare natural protein sequences. Since the only available sources of data on protein evolution have been modern natural protein sequences and their engineered mutants, we have been limited to comparing natural sequences and structures to each other in hopes of gaining insight into how contemporary proteins have evolved in the first place. Natural protein sequences are believed to be the result of selection for stability and specificity of native structure, correct folding kinetics, and ultimately, biological function. Teasing apart the relative roles these factors play in driving protein evolution is further complicated by the fact that these are all highly interrelated properties and the available sampling of protein sequences has been highly biased by evolutionary history and research preferences toward sequencing model systems and organisms of relevant scientific interest. Several recent studies have used computational protein models to simulate protein evolution in silico, in attempts to isolate the effects of various evolutionary pressures.1., 2., 3., 4., 5.
One way to study the relative importance of these factors is to sample and select protein sequences based on only one criterion (in this study, native state stability) and to compare the resulting sequences to natural protein sequences.6 There is currently no experimental method to explore sequence space based purely on native state stability. Since all protein folding experiments depend on some threshold folding rate, kinetics do exert an influence on all experiments. For example, a protein that folded to a stable native state, but with a folding time-constant of the order of a day, would likely not survive a typical experimental selection or screen. We sought to investigate the factors that drive protein evolution through computational means that would allow us to explicitly isolate native state stability as the only selective pressure. We have recently developed methods to computationally generate large diverse families of protein sequences which are selected by maximizing the energy gap between the native and unfolded state.7., 8. In this study, we use these methods to simulate the evolution of SH3 domains, and compare the resulting in silico evolved sequences to a large alignment of natural SH3 sequences.
SH3 domains provide an ideal model system for detailed studies of protein evolution due to the availability of many natural sequences (over 300) and structures (over 50 structures of 20 distinct domains), and their popularity as a model system for studies of protein folding thermodynamics and kinetics.9 By comparing a large, diverse set of computationally evolved SH3 domain sequences to a large, diverse alignment of natural SH3 sequences,10 we find that 86% of the sequence conservation pattern seen in the SH3 family is recreated by our fairly simple in silico protein evolution. By further comparing these results to several recent detailed studies of SH3 folding kinetics, we also see that residues optimized for the native state structure play consistent roles in the folding transition state across the SH3 family, and vice versa. This suggests that sequence optimization for native state stability is the major driving force in the evolution of SH3 domains and that fast folding is a direct result of this sequence optimization.
Section snippets
In silico evolution of SH3 domain structural ensembles
Eleven X-ray structures of SH3 domains were used as templates for in silico protein evolution (PDB codes: 1abo, 1abq, 1ad5, 1fmk, 1fyn, 1lck, 1pht, 1sem, 1shf, 1shg, 2abl). The SH3 domain peptide backbone was isolated from each PDB structure and used to create the target structural ensembles. Importantly, all side-chain atoms were removed so that no information about the native sequence was included in any phase of the in silico protein evolution. Fifty non-redundant structurally optimized
Discussion
By comparing the end-results of natural and in silico evolution, we have found that the selective pressures operating in both processes are very similar. Thus, selection for native stability has likely been the dominant selective pressure in the recent evolution of SH3 domains. Eleven positions in the SH3 domain show no evidence of having been under any significant selective pressure at all throughout recent evolution. In silico protein evolution, selecting only for native state stability,
In silico protein evolution
Protein structures are modeled by placing side-chain rotamers onto a fixed target backbone, stripped of all native side-chains. The resulting all-atom protein models are scored using a linear combination of the Amber potential function59 with OPLS non-bonded parameters,60 a surface-area term that accounts implicitly for solvation effects,61 and a set of amino acid baseline corrections which provides an implicit model of the unfolded state. A population of 300 models is then minimized by a
Acknowledgements
This work would not have been possible without the enthusiastic participation of thousands of Genome@home users around the world. The authors are greatly indebted to everyone who contributed processor time to this study. A full list of users is available at http://gah.stanford.edu/userstats.txt. We thank Alan Davidson for insightful comments on the manuscript. S.M.L. is a James Clark Fellow of the SGF program. We thank John Desjarlais for generously providing the SPA source code and for his
References (61)
- et al.
Why are proteins so robust to site mutations?
J. Mol. Biol.
(2002) In silico design for protein stabilization
Curr. Opin. Biotechnol.
(1999)- et al.
De novo protein design: towards fully automated sequence selection
J. Mol. Biol.
(1997) - et al.
Computer search algorithms in protein modification and design
Curr. Opin. Struct. Biol.
(1998) - et al.
New strategies in protein design
Curr. Opin. Biotechnol.
(1995) - et al.
Energy functions for protein design
Curr. Opin. Struct. Biol.
(1999) - et al.
De novo protein design. I. In search of stability and specificity
J. Mol. Biol.
(1999) - et al.
Review: protein design—where we were, where we are, where we're going
J. Struct. Biol.
(2001) - et al.
Computational protein design
Struct. Fold Des.
(1999) - et al.
Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design
J. Mol. Biol.
(2000)
Automatic protein design with all atom force-fields by exact and heuristic optimization
J. Mol. Biol.
Side-chain and backbone flexibility in protein core design
J. Mol. Biol.
Solution structure and dynamics of a designed hydrophobic core variant of ubiquitin
Struct. Fold. Des.
Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions
J. Mol. Biol.
Dramatic stabilization of an SH3 domain by a single substitution: roles of the folded and unfolded states
J. Mol. Biol.
Bergerac-SH3: “frustation” induced by stabilizing the folding nucleus
J. Mol. Biol.
Computer-assisted re-design of spectrin SH3 residue clusters
Biomol. Eng.
Protein folding kinetics beyond the phi value: using multiple amino acid substitutions to investigate the structure of the SH3 domain folding transition state
J. Mol. Biol.
De novo protein design. II. Plasticity in sequence space
J. Mol. Biol.
Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure
J. Mol. Biol.
Evolutionary conservation in protein folding kinetics
J. Mol. Biol.
Residues participating in the protein folding nucleus do not exhibit preferential evolutionary conservation
J. Mol. Biol.
Stiffness of the distal loop restricts the structural heterogeneity of the transition state ensemble in SH3 domains
J. Mol. Biol.
NMR and temperature-jump measurements of de novo designed proteins demonstrate rapid folding in the absence of explicit selection for kinetics
J. Mol. Biol.
Kinetics, thermodynamics and evolution of non-native interactions in a protein folding nucleus
Nature Struct. Biol.
Hiking in the energy landscape in sequence space: a bumpy road to good folders
Proteins: Struct. Funct. Genet.
Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space
Proc. Natl Acad. Sci. USA
Roles of mutation and recombination in the evolution of protein thermodynamics
Proc. Natl Acad. Sci. USA
Native protein sequences are close to optimal for their structures
Proc. Natl Acad. Sci. USA
Thoroughly sampling sequence space: large-scale protein design of structural ensembles
Protein Sci.
Cited by (10)
Substitutions of Amino Acids with Large Number of Contacts in the Native State Have no Effect on the Rates of Protein Folding
2016, Biochimica et Biophysica Acta - Proteins and ProteomicsDe Novo Evolutionary Emergence of a Symmetrical Protein Is Shaped by Folding Constraints
2016, CellCitation Excerpt :At the later evolutionary stage, the duplicated fusions (AncA5B, Anc5V, WT45V) decreased in stability when selectively diversified (WTV). That higher foldability comes jointly with lower stability seemed counterintuitive—a stable native state suggests a deep native energy well, and thus smoother funneling and also lower tendency of the native state to misfold and aggregate, as routinely described for globular domains (Gillespie and Plaxco, 2000; Larson and Pande, 2003). Are the loss of native stability and the parallel decrease in levels of insoluble aggregates accompanying the transition from identical to diversified fusions a mechanistic underpinning of higher foldability, or are they perhaps the result of selection for another biophysical property or simply the outcome of most mutations having a destabilizing effect (Tokuriki et al., 2007)?
Identification of the PXW Sequence as a Structural Gatekeeper of the Headpiece C-terminal Subdomain Fold
2006, Journal of Molecular BiologySimulating protein evolution in sequence and structure space
2004, Current Opinion in Structural Biology