Using multi-objective computational design to extend protein promiscuity

https://doi.org/10.1016/j.bpc.2009.12.003Get rights and content

Abstract

Many enzymes possess, besides their native function, additional promiscuous activities. Proteins with several activities (multipurpose catalysts) may have a wide range of biotechnological and biomedical applications. Natural promiscuity, however, appears to be of limited scope in this context, because the latent (promiscuous) function is often related to the evolved one (sharing the active site and even the chemical mechanism) and its enhancement upon suitable mutations usually brings about a decrease in the native activity. Here we explore the use of computational protein design to overcome these limitations. The high-plasticity positions close to the original (“native”) active-site are the most promising candidates for mutations that create a second active-site associated to a new function. To avoid compromising protein folding and native activity, we propose a minimal-perturbation approach based on the combinatorial optimization of, both the de novo catalytic activity and the folding free-energy: essentially, we construct the Pareto Set of optimal stability/promiscuous-function solutions. We validate our approach by introducing a promiscuous esterase activity in E. coli thioredoxin on the basis of mutations at positions close to the native-active-site disulfide-bridge. Native oxidoreductase activity is not compromised and it is, in fact, found to be 1.5-fold enhanced, as determined by an insulin-reduction assay. This work provides general guidelines as to how computational design can be used to expand the scope and applications of protein promiscuity. From a more general viewpoint, it illustrates the potential of multi-objective optimization as the computational analogue of multi-feature natural selection.

Introduction

Enzymes are natural protein catalysts that can enhance reaction rates up to 23 orders of magnitude [1] with an impressive affinity and/or specificity. Their applications are enormous, and enzyme design holds the promise to provide important impact in society areas like medicine (i.e. treatment of neurodegenerative diseases [2], design of therapeutically antibodies to bind tumor-associated antigenic determinants while maintaining a small immunogenicity, development of peptide-based vaccines), biotechnology (biosensors [3], [4], biocatalysts with activity in non-natural environments) and bioremediation (design of enzymes that would reduce waste by-products and toxicity [5]).

In the last few years, studies on enzyme evolvability have given convincing evidence on the mechanism of evolution of protein functions from cross-reactive proteins [6], [7], [8], [9], [10], starting from the observation that most proteins possess, besides their native function, additional promiscuous functions with specificities in the range of kcat/Km  10 2–106 [10]. In the proposed mechanism, a weak promiscuous function arises due to neutral evolution, protein robustness (the ability of proteins to tolerate mutations without compromising fitness) and plasticity (the ability of gaining new functions by a reduced number of mutations). Under the right selection pressure, natural selection can improve the new function once it has arisen until, at some point, the protein may become specialized for the new function.

In the classic view, selection constraints on the native function were believed to be determinant in the process (if a given function appeared by natural selection, there must be a large penalty associated to the loss of that function) reducing the allowed evolution scenario to gene duplication at an early stage of the specialization. However, modern studies have shown that in most cases the coupling of the nascent and the original functions is smaller than expected (see [10] and references therein). This allows for several generations of “generalist” proteins [11] that are able to perform both functions, and suggests that gene duplication acts after the new function has appeared [8] and not before.

Proteins with several activities (multipurpose enzymes) may have a wide range of biotechnological applications related (but not limited) to industrial organic synthesis and metabolic engineering [12], [13], [14], [15], [16], [17]. However, natural protein promiscuity (as described in the two preceding paragraphs) may perhaps be of limited use in the development of these multipurpose catalysts. First of all, natural promiscuous activities are often related to the evolved ones, sharing the same active sites and even the basic chemical mechanisms and, in general, bearing a significant resemblance to the original function [10]. Secondly, development of the promiscuous activity upon suitable mutations usually brings about a decrease in the evolved activity (see Table 2 in [10]). We explore in this work the use of computational design to overcome these limitations.

We have selected thioredoxin from E. coli (PDB code 2trx, 1.5 Å resolution [18]) as scaffold for our studies. It is a small (108 residues) general disulphide oxidoreductase found in all the kingdoms of living organisms; it is a common model for protein design studies because of its high stability and good expression properties [19]. We aim at introducing an esterase activity [the nucleophilic hydrolysis of the p-nitrophenyl acetate (PNPA) into p-nitrophenol (PNP) and acetate] in E. coli thioredoxin, following an approach similar to that used in ref [20]. Unlike these previous studies [20], however, we intend to preserve the natural thioredoxin activity (see below for details).

Natural promiscuous activities appear to be shaped by residues at the “wall and perimeter” of the native active site [10]. These residues show high-plasticity, likely because they do not belong to the protein scaffold or the native catalytic machinery [10] and provide a suitable target for the introduction of new, non-natural promiscuous activities [21], [22]. However, designing a new active-site implies introduces “unsatisfied” destabilizing interactions (which will hopefully be satisfied upon ligand and/or transition-state binding). Designing a new active-site in close proximity to the native one poses the additional problem that the introduced destabilizing interactions may disrupt the native active site and affect the original activity. Our computational design approach, therefore, is based on a multi-objective optimization. Both a measure of protein stability and de-novo catalytic activity are simultaneously optimized by using two competing score functions for folding free energy and binding free energy of the protein-ligand (i.e. transition-state-model) complex , obtaining the Pareto Set [23] of optimal stability/promiscuous-function solutions. The goal of this procedure is the development of the new activity with the lowest possible stability cost, which has two advantages. First of all, the need to introduce additional stabilizing mutations (to compensate the destabilizing effect of the new active-site mutations) is minimized. This is an important point, since the original (native) active-site would be one obvious target for stabilizing mutations (active-site residues are optimized for function, not for stability) and, in this case, we aim at preserving the original activity. Indeed, in a previous work [20], the catalytic D26 residue was mutated to isoleucine, a change which enhances stability but which will impair the natural thioredoxin oxido-reductase activity. Secondly and most important, since our designed active-site is spatially close to the original one, the low-stability–cost strategy guarantees the minimal perturbation required for maintaining the original activity.

Our computational design approach is based upon the DESIGNER software [24], [25], which optimizes protein sequence for a given target structure. This procedure uses atomic models and rotamer libraries to represent side-chain conformations. The free energies of the different models are calculated with the CHARMM force-field [26] and a free-energy solvation term proportional to the surface area [27]. Calculations on a reference state (taken to represent the unfolded state) allow a quantity akin to the unfolding free energy to be computed for each model.

The DESIGNER program was originally developed to address the inverse folding problem. However, as we show in this work, it can also be used to design new active-sites. This requires that the design is approached as an optimization in sequence-space of a protein structure which includes a model of the transition state of the chemical reaction (the tetrahedral PNPA intermediate in the case of interest here) with reference to the structure without the transition-state model bound, thus yielding a quantity akin to the free energy barrier of the reaction. Furthermore, since DESIGNER also leads to a quantity akin to the unfolding free-energy, the multi-objective stability-function optimization is indeed feasible.

Section snippets

General approach

We aim at introducing a promiscuous esterase activity in E. coli thioredoxin: the nucleophilic hydrolysis of the p-nitrophenyl acetate (PNPA) into p-nitrophenol (PNP) and acetate. The general approach we use is similar to that described in ref. [20]; i.e., we choose a histidine residue as a nuclophile for the reaction and we model the tetrahedral transition state for the reaction as PNPA-histidine structure constructed as a generalized rotamer of the histidine. An initial exploration of the

Multi-objective optimization and construction of the Pareto Set

We approach enzyme design as a two-objective problem. We consider as one objective the effect on stability of the designed region, which is coupled to the primary function of thioredoxin due to spatial proximity. It is estimated from an approximation to the folding free energy as defined previously [24] and does not involve the presence of the PNPA tetrahedral intermediate. The other objective is a quantity akin to the activation free energy of the reaction which is estimated from calculations

Acknowledgments

P.T. acknowledges an EMBO long-term fellowship. A.J. acknowledges support from HPC-EUROPA2 (project 228398) and the use of the BSC and IDRIS supercomputer facilities to perform the calculations reported here. This research was supported by Feder Funds and Grant BIO2006-07332 (Spanish Ministry of Education and Science) to J.M.S.-R. Grant CVI-1668 (Junta de Andalucía) to B.I.-M and Grants BioModularH2 (FP6-NEST-043340), ATIGE (Genopole), TARPOL (FP7-KBBE-212894) to A.J.

References (42)

  • A. Holmgren

    Thioredoxin catalyzes the reduction of insulin disulfides by dithiothreitol and dihydrolipoamide

    J. Biol. Chem.

    (1979)
  • W. Kauzmann

    Some factors in the interpretation of protein denaturation

    Adv. Protein Chem.

    (1959)
  • D.A. Kraut et al.

    Challenges in enzyme mechanism and energetics

    Annu. Rev. Biochem.

    (2003)
  • L. Lecanu, V. Papadopoulos, Cutting-edge patents in Alzheimer's disease drug discovery: anticipation of potential...
  • J.S. Marvin et al.

    The rational design of allosteric interactions in a monomeric protein and its applications to the construction of biosensors

    Proc. Natl. Acad. Sci. U. S. A.

    (1997)
  • L.L. Looger et al.

    Computational design of receptor and sensor proteins with novel functions

    Nature

    (2003)
  • E.L. Anga et al.

    Recent advances in the bioremediation of persistent organic pollutants via biomolecular engineering

    Enzyme Microb. Technol.

    (2005)
  • R.A. Jensen

    Enzyme recruitment in evolution of new function

    Annu. Rev. Microbiol.

    (1976)
  • A. Aharoni et al.

    The ‘evolvability’ of promiscuous protein functions

    Nat. Genet.

    (2005)
  • C. Jürgens et al.

    Directed evolution of a (βα)8-barrel enzyme to catalyze related reactions in two different metabolic pathways

    Proc. Natl. Acad. Sci. U. S. A.

    (2000)
  • K.A. Canada et al.

    Directed evolution of toluene ortho-monooxygenase for enhanced 1-naphthol synthesis and chlorinated ethene degradation

    J. Bacteriol.

    (2002)
  • Cited by (16)

    • The E. Coli thioredoxin folding mechanism: The key role of the C-terminal helix

      2015, Biochimica et Biophysica Acta - Proteins and Proteomics
      Citation Excerpt :

      This fact supports the idea that EcTRX is also a kinetically stable protein [31]. In addition, there is invaluable thermodynamic and kinetic information for an extended list of EcTRX point mutants [13,14,26,31–33]. Godoy-Ruiz and coworkers have found, based on results from urea-induced folded/unfolded experiments, that there is a large fraction of residues that occupy unstructured regions in the EcTRX TSE, yielding a high energy barrier, presumably as the result of the evolution towards a highly kinetically stable conformation [31].

    • De novo computational enzyme design

      2014, Current Opinion in Biotechnology
    • Programmable bacterial catalysis - Designing cells for biosynthesis of value-added compounds

      2012, FEBS Letters
      Citation Excerpt :

      Extensive computational analysis is gradually adopted into the design process of novel proteins. For example, a multi-objective computational design approach is able to extend protein promiscuity and to endow the thioredoxin from E. coli with a promiscuous esterase function while maintaining the native oxidoreductase activity [51]. For the production of compounds for which no natural pathways have been elucidated, feasible solutions can be predicted through a retro-biosynthetic approach similar to the retro-synthesis method developed in organic chemistry that the metabolic pathway leading to the synthesis of a target compound is specified by considering the biotransformation of functional groups rather than the entire structure, assuming the availability of enzymes for the desired transformation.

    • Dynamic causal modeling with genetic algorithms

      2011, Journal of Neuroscience Methods
      Citation Excerpt :

      In addition, GAs are often applied in bioinformatics or physical research to get approximations in adequate time. In bioinformatics, GAs are used, for instance, in peptide and protein design (Gronwald et al., 2008; Suarez et al., 2010). The basis of a GA is a population of solutions and a fitness function.

    • Pareto optimization of combinatorial mutagenesis libraries

      2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics
    View all citing articles on Scopus
    1

    Present address: Department of Chemistry and Biochemistry and Center for Biomolecular Structure and Organization, University of Maryland, College Park, Maryland 20742, USA.

    View full text