Journal of Molecular Biology
Volume 332, Issue 1, 5 September 2003, Pages 275-286
Journal home page for Journal of Molecular Biology

Sequence Optimization for Native State Stability Determines the Evolution and Folding Kinetics of a Small Protein

https://doi.org/10.1016/S0022-2836(03)00832-5Get rights and content

Abstract

Investigating the relative importance of protein stability, function, and folding kinetics in driving protein evolution has long been hindered by the fact that we can only compare modern natural proteins, the products of the very process we seek to understand, to each other, with no external references or baselines. Through a large-scale all-atom simulation of protein evolution, we have created a large diverse alignment of SH3 domain sequences which have been selected only for native state stability, with no other influencing factors. Although the average pairwise identity between computationally evolved and natural sequences is only 17%, the residue frequency distributions of the computationally evolved sequences are similar to natural SH3 sequences at 86% of the positions in the domain, suggesting that optimization for the native state structure has dominated the evolution of natural SH3 domains. Additionally, the positions which play a consistent role in the transition state of three well-characterized SH3 domains (by phi-value analysis) are structurally optimized for the native state, and vice versa. Indeed, we see a specific and significant correlation between sequence optimization for native state stability and conservation of transition state structure.

Introduction

A major difficulty in studying protein evolution has always been the lack of a meaningful baseline against which to compare natural protein sequences. Since the only available sources of data on protein evolution have been modern natural protein sequences and their engineered mutants, we have been limited to comparing natural sequences and structures to each other in hopes of gaining insight into how contemporary proteins have evolved in the first place. Natural protein sequences are believed to be the result of selection for stability and specificity of native structure, correct folding kinetics, and ultimately, biological function. Teasing apart the relative roles these factors play in driving protein evolution is further complicated by the fact that these are all highly interrelated properties and the available sampling of protein sequences has been highly biased by evolutionary history and research preferences toward sequencing model systems and organisms of relevant scientific interest. Several recent studies have used computational protein models to simulate protein evolution in silico, in attempts to isolate the effects of various evolutionary pressures.1., 2., 3., 4., 5.

One way to study the relative importance of these factors is to sample and select protein sequences based on only one criterion (in this study, native state stability) and to compare the resulting sequences to natural protein sequences.6 There is currently no experimental method to explore sequence space based purely on native state stability. Since all protein folding experiments depend on some threshold folding rate, kinetics do exert an influence on all experiments. For example, a protein that folded to a stable native state, but with a folding time-constant of the order of a day, would likely not survive a typical experimental selection or screen. We sought to investigate the factors that drive protein evolution through computational means that would allow us to explicitly isolate native state stability as the only selective pressure. We have recently developed methods to computationally generate large diverse families of protein sequences which are selected by maximizing the energy gap between the native and unfolded state.7., 8. In this study, we use these methods to simulate the evolution of SH3 domains, and compare the resulting in silico evolved sequences to a large alignment of natural SH3 sequences.

SH3 domains provide an ideal model system for detailed studies of protein evolution due to the availability of many natural sequences (over 300) and structures (over 50 structures of 20 distinct domains), and their popularity as a model system for studies of protein folding thermodynamics and kinetics.9 By comparing a large, diverse set of computationally evolved SH3 domain sequences to a large, diverse alignment of natural SH3 sequences,10 we find that 86% of the sequence conservation pattern seen in the SH3 family is recreated by our fairly simple in silico protein evolution. By further comparing these results to several recent detailed studies of SH3 folding kinetics, we also see that residues optimized for the native state structure play consistent roles in the folding transition state across the SH3 family, and vice versa. This suggests that sequence optimization for native state stability is the major driving force in the evolution of SH3 domains and that fast folding is a direct result of this sequence optimization.

Section snippets

In silico evolution of SH3 domain structural ensembles

Eleven X-ray structures of SH3 domains were used as templates for in silico protein evolution (PDB codes: 1abo, 1abq, 1ad5, 1fmk, 1fyn, 1lck, 1pht, 1sem, 1shf, 1shg, 2abl). The SH3 domain peptide backbone was isolated from each PDB structure and used to create the target structural ensembles. Importantly, all side-chain atoms were removed so that no information about the native sequence was included in any phase of the in silico protein evolution. Fifty non-redundant structurally optimized

Discussion

By comparing the end-results of natural and in silico evolution, we have found that the selective pressures operating in both processes are very similar. Thus, selection for native stability has likely been the dominant selective pressure in the recent evolution of SH3 domains. Eleven positions in the SH3 domain show no evidence of having been under any significant selective pressure at all throughout recent evolution. In silico protein evolution, selecting only for native state stability,

In silico protein evolution

Protein structures are modeled by placing side-chain rotamers onto a fixed target backbone, stripped of all native side-chains. The resulting all-atom protein models are scored using a linear combination of the Amber potential function59 with OPLS non-bonded parameters,60 a surface-area term that accounts implicitly for solvation effects,61 and a set of amino acid baseline corrections which provides an implicit model of the unfolded state. A population of 300 models is then minimized by a

Acknowledgements

This work would not have been possible without the enthusiastic participation of thousands of Genome@home users around the world. The authors are greatly indebted to everyone who contributed processor time to this study. A full list of users is available at http://gah.stanford.edu/userstats.txt. We thank Alan Davidson for insightful comments on the manuscript. S.M.L. is a James Clark Fellow of the SGF program. We thank John Desjarlais for generously providing the SPA source code and for his

References (61)

  • L. Wernisch et al.

    Automatic protein design with all atom force-fields by exact and heuristic optimization

    J. Mol. Biol.

    (2000)
  • J.R. Desjarlais et al.

    Side-chain and backbone flexibility in protein core design

    J. Mol. Biol.

    (1999)
  • E.C. Johnson et al.

    Solution structure and dynamics of a designed hydrophobic core variant of ubiquitin

    Struct. Fold. Des.

    (1999)
  • S.M. Larson et al.

    Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions

    J. Mol. Biol.

    (2000)
  • Y.K. Mok et al.

    Dramatic stabilization of an SH3 domain by a single substitution: roles of the folded and unfolded states

    J. Mol. Biol.

    (2001)
  • A.R. Viguera et al.

    Bergerac-SH3: “frustation” induced by stabilizing the folding nucleus

    J. Mol. Biol.

    (2001)
  • I. Angrand et al.

    Computer-assisted re-design of spectrin SH3 residue clusters

    Biomol. Eng.

    (2001)
  • J.G. Northey et al.

    Protein folding kinetics beyond the phi value: using multiple amino acid substitutions to investigate the structure of the SH3 domain folding transition state

    J. Mol. Biol.

    (2002)
  • P. Koehl et al.

    De novo protein design. II. Plasticity in sequence space

    J. Mol. Biol.

    (1999)
  • H. Kono et al.

    Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure

    J. Mol. Biol.

    (2001)
  • K.W. Plaxco et al.

    Evolutionary conservation in protein folding kinetics

    J. Mol. Biol.

    (2000)
  • S.M. Larson et al.

    Residues participating in the protein folding nucleus do not exhibit preferential evolutionary conservation

    J. Mol. Biol.

    (2002)
  • D.K. Klimov et al.

    Stiffness of the distal loop restricts the structural heterogeneity of the transition state ensemble in SH3 domains

    J. Mol. Biol.

    (2002)
  • B. Gillespie et al.

    NMR and temperature-jump measurements of de novo designed proteins demonstrate rapid folding in the absence of explicit selection for kinetics

    J. Mol. Biol.

    (2003)
  • L. Li et al.

    Kinetics, thermodynamics and evolution of non-native interactions in a protein folding nucleus

    Nature Struct. Biol.

    (2000)
  • G. Tiana et al.

    Hiking in the energy landscape in sequence space: a bumpy road to good folders

    Proteins: Struct. Funct. Genet.

    (2000)
  • E. Bornberg-Bauer et al.

    Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space

    Proc. Natl Acad. Sci. USA

    (1999)
  • Y. Xia et al.

    Roles of mutation and recombination in the evolution of protein thermodynamics

    Proc. Natl Acad. Sci. USA

    (2002)
  • B. Kuhlman et al.

    Native protein sequences are close to optimal for their structures

    Proc. Natl Acad. Sci. USA

    (2000)
  • S.M. Larson et al.

    Thoroughly sampling sequence space: large-scale protein design of structural ensembles

    Protein Sci.

    (2002)
  • Cited by (10)

    • De Novo Evolutionary Emergence of a Symmetrical Protein Is Shaped by Folding Constraints

      2016, Cell
      Citation Excerpt :

      At the later evolutionary stage, the duplicated fusions (AncA5B, Anc5V, WT45V) decreased in stability when selectively diversified (WTV). That higher foldability comes jointly with lower stability seemed counterintuitive—a stable native state suggests a deep native energy well, and thus smoother funneling and also lower tendency of the native state to misfold and aggregate, as routinely described for globular domains (Gillespie and Plaxco, 2000; Larson and Pande, 2003). Are the loss of native stability and the parallel decrease in levels of insoluble aggregates accompanying the transition from identical to diversified fusions a mechanistic underpinning of higher foldability, or are they perhaps the result of selection for another biophysical property or simply the outcome of most mutations having a destabilizing effect (Tokuriki et al., 2007)?

    • Simulating protein evolution in sequence and structure space

      2004, Current Opinion in Structural Biology
    View all citing articles on Scopus
    View full text