De Novo Proteins with Life-Sustaining Functions Are Structurally Dynamic

https://doi.org/10.1016/j.jmb.2015.12.008Get rights and content

Highlights

  • Well-ordered protein structure is commonly assumed to be an essential feature of protein function.

  • Here we investigated the biophysical and structural properties of a family of de novo designed proteins that provide life-sustaining functions in E. coli.

  • We discovered that the de novo proteins tested here do not form well-ordered structures and instead form dynamic dimer structures.

  • These results highlight the importance of protein dynamics in protein function and also suggest that dynamic structures may have been important in early evolution.

Abstract

Designing and producing novel proteins that fold into stable structures and provide essential biological functions are key goals in synthetic biology. In initial steps toward achieving these goals, we constructed a combinatorial library of de novo proteins designed to fold into 4-helix bundles. As described previously, screening this library for sequences that function in vivo to rescue conditionally lethal mutants of Escherichia coli (auxotrophs) yielded several de novo sequences, termed SynRescue proteins, which rescued four different E. coli auxotrophs. In an effort to understand the structural requirements necessary for auxotroph rescue, we investigated the biophysical properties of the SynRescue proteins, using both computational and experimental approaches. Results from circular dichroism, size-exclusion chromatography, and NMR demonstrate that the SynRescue proteins are α-helical and relatively stable. Surprisingly, however, they do not form well-ordered structures. Instead, they form dynamic structures that fluctuate between monomeric and dimeric states. These findings show that a well-ordered structure is not a prerequisite for life-sustaining functions, and suggests that dynamic structures may have been important in the early evolution of protein function.

Introduction

The two central challenges of protein design are (i) to devise novel amino acid sequences that fold into stable three-dimensional structures and (ii) to devise sequences that perform chemically and/or biologically significant functions. Early work in protein design began approximately 25 years ago, with attempts to design 4-helix bundles [1], [2]. Those pioneering studies focused exclusively on folding and stability, and they paid little attention to protein function. This seemed reasonable at the time because it was assumed that achieving a well-ordered structure was an essential prerequisite for protein function. Because of this assumption, it was only in recent years, as the design of stably folded structures achieved some level of success [3], [4], [5], [6], [7], [8], that protein designers began to consider the possibility of devising novel proteins that bind targets and/or catalyze reactions [9], [10], [11], [12].

The presumption that uniquely folded structures are essential for function arose from the pioneering achievements of structural biology. The first crystal structures, solved more than half a century ago, revealed ordered structures with well-defined active sites that accounted for their biochemical functions [13]. After observing such structures, it is not surprising that researchers assumed that a well-ordered structure was a prerequisite for a well-defined function. Indeed, these early findings led to a central paradigm of structural biology: amino acid sequence determines three-dimensional structure, and structure—typically denoting a well-ordered structure—determines function.

In recent years, however, numerous studies have demonstrated that many natural proteins responsible for essential cellular functions are, in fact, intrinsically disordered and/or dynamic [14], [15]. In light of these findings, it may be time to reconsider assumptions about the relationship between well-ordered structures and biological function—both for naturally evolved proteins and for proteins designed de novo.

In the current study, we question these assumptions by probing the structural and biophysical properties of several α-helical proteins, which were designed de novo in our laboratory and shown previously to function in vivo by providing life-sustaining activities in Escherichia coli [16]. Using a range of experimental techniques, we probe whether these functional de novo proteins fold into well-ordered, kinetically stable structures or, alternatively, fluctuate between dynamic states.

The de novo α-helical proteins that are the subject of the current study were drawn from a large combinatorial library of binary patterned sequences that we described previously [16], [17], [18]. Briefly, binary patterning is a strategy for protein design, which is built on the premise that the overall structure of a protein can be specified by designing the sequence periodicity of polar and nonpolar amino acids to match the structural periodicity of the desired secondary structure. Thus, a pattern that places a nonpolar amino acid every 3 or 4 residues along a sequence would match the structural repeat of 3.6 residues per turn of a canonical α-helix and thereby would generate an amphiphilic α-helical segment. When four such helices are linked together, the hydrophobic effect drives them to pack against one another, thereby forming a 4-helix bundle with nonpolar residues pointing toward the protein core and polar residues exposed to solvent (Fig. 1a). Since only the type of residue—polar versus nonpolar—is designed explicitly, the strategy is inherently binary. However, because the identities of the polar and nonpolar side chains are not specified, the strategy is inherently combinatorial and facilitates the construction of vast libraries of novel sequences.

The combinatorial diversity of the protein library is encoded at the DNA level by using degenerate codons, such as NTN (N = A, T, C, or G) to encode five nonpolar amino acids (Phe, Leu, Ile, Met, and Val) and VAN (V = A, C, or G) to encode six polar amino acids (His, Glu, Gln, Asp, Asn, and Lys). These degenerate codons can be assembled in a pattern compatible with the desired structure to produce a collection of synthetic genes, which can be translated in E. coli to produce a large library of de novo proteins.

Previously, we reported the construction of three binary patterned libraries of sequences designed to fold into 4-helix bundles [17], [19], [20]. The sequences in these libraries do not share homology with naturally occurring proteins. They were not selected by eons of evolution, and they may share features with primordial sequences that existed in the early history of life on earth.

Previous studies of proteins from these binary patterned libraries showed that many of the sequences fold into stable structures [20]. Three structures were determined by NMR or crystallography to reveal 4-helix bundles with hydrophobic interiors and polar surfaces, as envisioned by the binary patterned design. Two proteins from our second-generation library formed monomeric 4-helix bundles [4], [21], while an X-ray structure solved from a sequence from the third-generation library revealed a domain-swapped dimer [22]. We have also identified de novo proteins from these libraries that bind small molecules, including drugs and cofactors [18], [23]. Furthermore, we identified sequences that possess weak catalytic activity for simple reactions and substrates, such as the hydrolysis of p-nitrophenyl esters [18].

The results summarized in the previous paragraph demonstrated that proteins from binary patterned libraries possess structural and functional properties in vitro resembling those of natural proteins. More recently, we have become interested in the possibility of designing collections of novel sequences as an initial step toward constructing artificial “proteomes”. This interest led to experiments probing the ability of our novel sequences to provide essential functions in vivo. Since the proteins in our libraries were designed for structure, but not explicitly designed for any particular function, we used unbiased high-throughput genetic selections to search for novel sequences that functioned in vivo. These selections relied on a series of E. coli auxotrophs: strains that are deleted for individual genes that encode enzymes necessary for survival on minimal medium. In a typical auxotroph rescue experiment, an E. coli auxotrophic strain was transformed with a binary patterned library encoding 106 de novo proteins. In most cases, the auxotroph was not rescued by sequences from our library; however, four auxotrophic strains of E. coli were rescued by sequences from our third-generation binary patterned library [16]. The four rescued auxotrophic strains are deleted for a range of functions: Δfes is missing enterobactin esterase, ΔilvA is missing threonine deaminase, ΔserB is missing phosphoserine phosphatase, and ΔgltA is missing citrate synthase. In all, more than 20 de novo sequences were found to rescue one of these four deletion strains. We denote these novel sequences the SynRescue proteins because they are synthetic (not derived from nature) and they rescue the given deletion strain. Individual proteins are named SynΔstrain#, such that SynFes2 is the second de novo protein identified that rescued Δfes.

It is tempting to assume that the SynRescue proteins rescue the deletion strains in a direct manner by performing the same biochemical activity as the deleted protein. However, this need not be the case. It is also possible for a SynRescue protein to compensate for a deleted protein by increasing the expression, enhancing the activity, or altering the specificity of an endogenous E. coli protein. Irrespective of the mechanism of rescue, structural and biophysical characterization of the SynRescue proteins may help elucidate their functions.

The SynRescue proteins also present an unusual opportunity to revisit the relationship between well-ordered structure and biological function. Moreover, because these sequences were devised de novo in the laboratory, we can ask whether uniquely folded three-dimensional structures are essential for function in vivo in a system that is not biased by eons of evolutionary history. To address these questions, we investigated the biophysical properties of the SynRescue proteins, using both computational and experimental approaches. Results from circular dichroism (CD), size-exclusion chromatography (SEC), and NMR demonstrate that the SynRescue proteins are α-helical and relatively stable. Surprisingly, however, they do not form well-ordered structures. Instead, they form dynamic structures that fluctuate between monomeric and dimeric states. These findings show that well-ordered structure is not a prerequisite for function in vivo, and they suggest that dynamic structures may have been important in the early evolution of protein function.

Section snippets

The SynRescue proteins

For this investigation, we explored the biophysical and structural properties of seven SynRescue proteins: SynFes2, which rescues Δfes; SynGltA1, which rescues ΔgltA; SynIlvA1, which rescues ΔilvA; and SynSerB1, SynSerB2, SynSerB3, and SynSerB4, which rescue ΔserB. We compared their properties to three control proteins S824, S23, and WA20. The proteins S23 and S824 are sequences from the second-generation library (hence the “S” prefix). We previously reported the solution NMR structure of S824,

Discussion

We investigated the biophysical and structural properties of several de novo proteins that were shown previously to provide activities capable of sustaining the growth of living cells. We determined that the SynRescue proteins are α-helical and thermostable and that they denature reversibly. However, 1H15N HSQC NMR experiments demonstrate that their structures are dynamic and undergo kinetic exchange on an intermediate timescale. SEC indicates that the SynRescue proteins do not form long-lived

Computational simulations using Rosetta

Protein structure prediction simulations were performed using the Rosetta macromolecular modeling software fragment assembly protocol [24]. Briefly, this protocol combines 3-residue and 9-residue fragments (from high-resolution crystal structures) using a reduced centroid model of the protein, coarse-grained energy functions, and a Monte Carlo search procedure, followed by an all-atom high-resolution structure refinement step. The 3-residue and 9-residue fragments are chosen based on sequence

Acknowledgements

We thank Dr. Istvan Pelczer and Ken Conover from the Princeton University Chemistry Department NMR facility for helpful discussions on NMR pulse sequences and results. We also thank the Princeton University Research Computing center for access to the Tiger cluster. We also thank Ann Mularz and Katherine Digianantonio for helpful discussions on this research and manuscript. This work was funded by National Science Foundation grants MCB-1050510 and MCB-1409402 to M.H.H. and a National Institutes

References (32)

  • S.Y. Lau et al.

    Synthesis of a model protein of defined secondary and quaternary structure. Effect of chain length on the stabilization and formation of two-stranded alpha-helical coiled-coils

    J. Biol. Chem.

    (1984)
  • M.H. Hecht et al.

    De novo design, expression, and characterization of Felix: A four-helix bundle protein of native-like sequence

    Science

    (1990)
  • L. Regan et al.

    Characterization of a helical protein designed from first principles

    Science

    (1988)
  • P.B. Harbury et al.

    High-resolution protein design with backbone freedom

    Science

    (1998)
  • Y. Wei et al.

    Solution structure of a de novo protein from a designed combinatorial library

    Proc. Natl. Acad. Sci. U. S. A.

    (2003)
  • B. Kuhlman et al.

    Design of a novel globular protein fold with atomic-level accuracy

    Science

    (2003)
  • G.S. Murphy et al.

    Computational de novo design of a four-helix bundle protein—DND_4HB

    Protein Sci.

    (2015)
  • N. Koga et al.

    Principles for designing ideal protein structures

    Nature

    (2012)
  • P.S. Huang et al.

    High thermodynamic stability of parametrically designed helical bundles

    Science

    (2014)
  • S.J. Fleishman et al.

    Computational design of proteins targeting the conserved stem region of influenza hemagglutinin

    Science

    (2011)
  • D. Rothlisberger et al.

    Kemp elimination catalysts by computational enzyme design

    Nature

    (2008)
  • J.B. Siegel et al.

    Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction

    Science

    (2010)
  • L. Jiang et al.

    De novo computational design of retro-aldol enzymes

    Science

    (2008)
  • J.C. Kendrew et al.

    A three-dimensional model of the myoglobin molecule obtained by X-ray analysis

    Nature

    (1958)
  • P.E. Wright et al.

    Intrinsically disordered proteins in cellular signalling and regulation

    Nat. Rev. Mol. Cell Biol.

    (2015)
  • C.J. Oldfield et al.

    Intrinsically disordered proteins and intrinsically disordered protein regions

    Annu. Rev. Biochem.

    (2014)
  • Cited by (24)

    • Protein–protein interaction prediction with deep learning: A comprehensive review

      2022, Computational and Structural Biotechnology Journal
      Citation Excerpt :

      These include, for example, graph neural networks for molecule representation [331–334], prediction of amino acid sequence for a particular structure using deep neural networks [104], using GAN to generate DNA sequence [335], and structure prediction methods using neural networks [180,336,337]. In several instances, functional folded proteins have been acquired from random-sequence libraries; however, this process is often laborious and limited in the types of protein they can model [338–340,92]. Machine learning algorithms offer an alternative and possibly complementary approach capable of using the information available in protein sequence and structure databases.

    • Structure and function of naturally evolved de novo proteins

      2021, Current Opinion in Structural Biology
      Citation Excerpt :

      However, the methodology used in these studies (such as circular dichroism, fluorescence emission and limited proteolysis) would not necessarily differentiate a specific and stable protein fold from a less specific collapsed structure. Intriguingly, all proteins that have so far been selected from random or semi-random libraries (even if starting from a structural scaffold) for a specific function were later characterized to be highly dynamic or molten globule structures [42,79–81]. Significant ordering was however observed upon binding of the ligands, substrates or cofactors [82,83].

    • Substrate promiscuity of a de novo designed peroxidase

      2021, Journal of Inorganic Biochemistry
      Citation Excerpt :

      The role of such flexibility in facilitating or hindering substrate access, peroxide reduction, product release, and ultimately, catalysis within such de novo scaffolds remains unresolved. However, such behaviour is not unusual in this type of simple de novo protein, and there are now several examples where dynamic behaviour has been observed in conjunction with desirable functions [36–38]. Irrespective of this issue, C45 represents a robust, catalytically promiscuous and potentially valuable catalyst.

    • The ascent of man(made oxidoreductases)

      2018, Current Opinion in Structural Biology
      Citation Excerpt :

      The Hecht group have used a library approach of de novo sequences to identify proteins that can rescue auxotroph E. coli strains. Many of these proteins act on gene regulation [66] and, there is evidence for some of these structures that, despite being highly stable, they do not form well ordered structures in vitro [67]. The lack of highly specific substrate binding sites in many natural peroxidases can lead to broad substrate promiscuity, which we have observed in the case of C45, which can catalyse a variety of peroxidase substrates including guaiacol, reactive blue 4, and halogenated phenols [22••].

    • Proteins of well-defined structures can be designed without backbone readjustment by a statistical model

      2016, Journal of Structural Biology
      Citation Excerpt :

      Another desired property of the energy function is to support the design of specific backbones with an intermediate structural resolution, so that the fluctuations of a backbone structure caused by tolerable sequence variations do not need to be differentiated explicitly. Then the resulting designs may encompass larger conformational plasticity, which should benefit subsequent functional adaptation (Murphy et al., 2016). Recently, we have reported a statistical energy function named ABACUS (acronym for A Backbone based Amino aCid Usage Survey, see also Computational Method) for backbone-specific protein design.

    • An expandable, modular de novo protein platform for precision redox engineering

      2023, Proceedings of the National Academy of Sciences of the United States of America
    View all citing articles on Scopus
    1

    Present address: J. B. Greisman, D. E. Shaw Research, New York, NY 10036, USA.

    View full text