Learning epistatic interactions from sequence-activity data to predict enantioselectivity

Zaugg, Julian; Gumulya, Yosephine; Malde, Alpeshkumar K.; Bodén, Mikael

doi:10.1007/s10822-017-0090-x

Learning epistatic interactions from sequence-activity data to predict enantioselectivity

Published: 12 December 2017

Volume 31, pages 1085–1096, (2017)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

966 Accesses
20 Citations
2 Altmetric
Explore all metrics

Abstract

Enzymes with a high selectivity are desirable for improving economics of chemical synthesis of enantiopure compounds. To improve enzyme selectivity mutations are often introduced near the catalytic active site. In this compact environment epistatic interactions between residues, where contributions to selectivity are non-additive, play a significant role in determining the degree of selectivity. Using support vector machine regression models we map mutations to the experimentally characterised enantioselectivities for a set of 136 variants of the epoxide hydrolase from the fungus Aspergillus niger (AnEH). We investigate whether the influence a mutation has on enzyme selectivity can be accurately predicted through linear models, and whether prediction accuracy can be improved using higher-order counterparts. Comparing linear and polynomial degree = 2 models, mean Pearson coefficients (r) from \(50\,{\times }\,5\)-fold cross-validation increase from 0.84 to 0.91 respectively. Equivalent models tested on interaction-minimised sequences achieve values of \(r=0.90\) and \(r=0.93\). As expected, testing on a simulated control data set with no interactions results in no significant improvements from higher-order models. Additional experimentally derived AnEH mutants are tested with linear and polynomial degree = 2 models, with values increasing from \(r=0.51\) to \(r=0.87\) respectively. The study demonstrates that linear models perform well, however the representation of epistatic interactions in predictive models improves identification of selectivity-enhancing mutations. The improvement is attributed to higher-order kernel functions that represent epistatic interactions between residues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes

Article Open access 13 November 2018

Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants

Article 11 February 2023

Application of fourier transform and proteochemometrics principles to protein engineering

Article Open access 16 October 2018

References

Agranat I, Caner H, Caldwell J (2002) Putting chirality to work: the strategy of chiral switches. Nat Rev Drug Discov 1(10):753–768
Article CAS Google Scholar
Agranat I, Wainschtein SR, Zusman EZ (2012) The predicated demise of racemic new molecular entities is an exaggeration. Nat Rev Drug Discov 11(12):972–973
Article CAS Google Scholar
Branch SK, Agranat I (2014) “New drug” designations for new therapeutic entities: new active substance, new chemical entity, new biological entity, new molecular entity. J Med Chem 57(21):8729–8765
Article CAS Google Scholar
Morley KL, Kazlauskas RJ (2005) Improving enzyme properties: when are closer mutations better? Trends Biotechnol 23(5):231–237
Article CAS Google Scholar
Miton CM, Tokuriki N (2016) How mutational epistasis impairs predictability in protein evolution and design. Protein Sci. 25(7):1260–1272
Article CAS Google Scholar
Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25(7):1204–1218
Article CAS Google Scholar
Kondrashov DA, Kondrashov FA (2015) Topological features of rugged fitness landscapes in sequence space. Trends Genet 31(1):24–33
Article CAS Google Scholar
Li Y, Drummond DA, Sawayama AM, Snow CD, Bloom JD, Arnold FH (2007) A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments. Nat Biotechnol 25(9):1051–1056
Article CAS Google Scholar
Fox RJ, Davis SC, Mundorff EC, Newman LM, Gavrilovic V, Ma SK, Chung LM, Ching C, Tam S, Muley S, Grate J, Gruber J, Whitman JC, Sheldon RA, Huisman GW (2007) Improving catalytic function by ProSAR-driven enzyme evolution. Nat Biotechnol 25(3):338–344
Article CAS Google Scholar
Liao J, Warmuth MK, Govindarajan S, Ness JE, Wang RP, Gustafsson C, Minshull J (2007) Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol 7(1):16
Article Google Scholar
Romero PA, Arnold FH (2012) Random field model reveals structure of the protein recombinational landscape. PLoS Comput Biol 8(10):e1002,713
Article CAS Google Scholar
Fox R (2005) Directed molecular evolution by machine learning and the influence of nonlinear interactions. J Theor Biol 234(2):187–199
Article CAS Google Scholar
Buske FA, Their R, Gillam EMJ, Bodén M (2009) In silico characterization of protein chimeras: Relating sequence and function within the same fold. Proteins 77(1):111–120
Article CAS Google Scholar
Romero PA, Krause A, Arnold FH (2013) Navigating the protein fitness landscape with Gaussian processes. Proc Natl Acad Sci (USA) 110(3):E193–201
Article CAS Google Scholar
Funar-Timofei S, Suzuki T, Paier JA, Steinreiber A, Faber K, Fabian WMF (2003) Quantitative structure-activity relationships for the enantioselectivity of oxirane ring-opening catalyzed by epoxide hydrolases. J Chem Inf Comput Sci 43(3):934–940
Article CAS Google Scholar
Caetano S, Aires-de Sousa J, Daszykowski M, Heyden YV (2005) Prediction of enantioselectivity using chirality codes and classification and regression trees. Anal Chim Acta 544(1–2):315–326
Article CAS Google Scholar
Gu J, Liu J, Yu H (2011) Quantitative prediction of enantioselectivity of Candida antarctica lipase B by combining docking simulations and quantitative structure–activity relationship (QSAR) analysis. J Mol Catal B 72(3–4):238–247
Article CAS Google Scholar
Hartman JH, Cothren SD, Park SH, Yun CH, Darsey JA, Miller GP (2013) Predicting CYP2C19 catalytic parameters for enantioselective oxidations using artificial neural networks and a chirality code. Bioorg Med Chem 21(13):3749–3759
Article CAS Google Scholar
Tomić S, Kojić-Prodić B (2002) A quantitative model for predicting enzyme enantioselectivity: application to Burkholderia cepacia lipase and 3-(aryloxy)-1,2-propanediol derivatives. J Mol Graph Model 21(3):241–252
Article Google Scholar
Wijma HJ, Marrink SJ, Janssen DB (2014) Computationally efficient and accurate enantioselectivity modeling by clusters of molecular dynamics simulations. J Chem Inf Model 54(7):2079–2092
Article CAS Google Scholar
Wijma HJ, Floor RJ, Bjelic S, Marrink SJ, Baker D, Janssen DB (2015) Enantioselective enzymes by computational design and in silico screening. Angew Chem Int Ed 54(12):3726–3730
Article CAS Google Scholar
Braiuca P, Lorena K, Ferrario V, Ebert C, Gardossi L (2009) A three-dimensional quanititative structure-activity relationship (3D-QSAR) model for predicting the enantioselectivity of Candida antarctica Lipase B. Adv Synth Catal 351(9):1293–1302
Article CAS Google Scholar
Feng X, Sanchis J, Reetz MT, Rabitz H (2012) Enhancing the efficiency of directed evolution in focused enzyme libraries by the adaptive substituent reordering algorithm. Chem Eur J 18(18):5646–5654
Article CAS Google Scholar
Liang J, Mundorff E, Voladri R, Jenne S, Gilson L, Conway A, Krebber A, Wong J, Huisman G, Truesdell S, Lalonde J (2010) Highly enantioselective reduction of a small heterocyclic ketone: biocatalytic reduction of tetrahydrothiophene-3-one to the corresponding (R)-alcohol. Org Process Res Dev 14(1):188–192
Article CAS Google Scholar
Chaput L, Sanejouand YH, Balloumi A, Tran V, Graber M (2012) Contribution of both catalytic constant and Michaelis constant to CALB enantioselectivity: Use of FEP calculations for prediction studies. J Mol Catal B 76:29–36
Article CAS Google Scholar
Noey EL, Tibrewal N, Jiménez-Osés G, Osuna S, Park J, Bond CM, Cascio D, Liang J, Zhang X, Huisman GW, Tang Y, Houk KN (2015) Origins of stereoselectivity in evolved ketoreductases. Proc Natl Acad Sci (USA) 112(51):E7065–72
CAS Google Scholar
Minshull J, Ness JE, Gustafsson C, Govindarajan S (2005) Predicting enzyme function from protein sequence. Curr Opin Chem Biol 9(2):202–209
Article CAS Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book Google Scholar
Bedbrook CN, Yang KK, Rice AJ, Gradinaru V, Arnold FH (2017) Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization. PLoS Comput Biol 13(10):e1005,786
Article Google Scholar
Romero P, Stone E, Lamb C, Chantranupong L, Krause A, Miklos A, Hughes R, Fechtel B, Ellington A, Arnold FH (2012) SCHEMA-designed variants of human Arginase I and II reveal sequence elements important to stability and catalysis. ACS Synth Biol 1(6):221–228
Article CAS Google Scholar
Smith MA, Rentmeister A, Snow CD, Wu T, Farrow MF, Mingardon F, Arnold FH (2012) A diverse set of family 48 bacterial glycoside hydrolase cellulases created by structure-guided recombination. FEBS J 279(24):4453–4465
Article CAS Google Scholar
Pissurlenkar RRS, Malde AK, Khedkar SA, Coutinho EC (2007) Encoding type and position in peptide QSAR: application to peptides binding to class I MHC molecule HLA-A*0201. Mol Inform 26(2):189–203
CAS Google Scholar
Verma J, Khedkar VM, Prabhu AS, Khedkar SA, Malde AK, Coutinho EC (2008) A comprehensive analysis of the thermodynamic events involved in ligand–receptor binding using CoRIA and its variants. J Comput Aided Mol Des 22(2):91–104
Article CAS Google Scholar
Voigt CA, Martinez C, Wang ZG, Mayo SL, Arnold FH (2002) Protein building blocks preserved by recombination. Nat Struct Biol 9(7):553–558
CAS Google Scholar
Silberg JJ, Endelman JB, Arnold FH (2004) SCHEMA-guided protein recombination. Meth Enzymol 388:35–42
Article CAS Google Scholar
Zaugg J, Gumulya Y, Gillam EMJ, Bodén M (2014) Computational tools for directed evolution: a comparison of prospective and retrospective strategies. Methods Mol Biol 1179:315–333
Article Google Scholar
Endelman JB, Silberg JJ, Wang ZG, Arnold FH (2004) Site-directed protein recombination as a shortest-path problem. Protein Eng Des Sel 17:589–594
Article CAS Google Scholar
Heinzelman P, Snow CD, Wu I, Nguyen C, Villalobos A, Govindarajan S, Minshull J, Arnold FH (2009) A family of thermostable fungal cellulases created by structure-guided recombination. Proc Natl Acad Sci (USA) 106(14):5610–5615
Article CAS Google Scholar
Packer MS, Liu DR (2015) Methods for the directed evolution of proteins. Nat Rev Genet 16(7):379–394
Article CAS Google Scholar
Reetz MT, Bocola M, Carballeira JD, Zha D, Vogel A (2005) Expanding the range of substrate acceptance of enzymes: combinatorial active-site saturation test. Angew Chem Int Ed 44(27):4192–4196
Article CAS Google Scholar
Reetz MT, Carballeira JD (2007) Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat Protoc 2(4):891–903
Article CAS Google Scholar
Gumulya Y, Sanchis J, Reetz MT (2012) Many pathways in laboratory evolution can lead to improved enzymes: how to escape from local minima. Chembiochem 13(7):1060–1066
Article CAS Google Scholar
Reetz MT, Wang LW, Bocola M (2006) Directed evolution of enantioselective enzymes: iterative cycles of CASTing for probing protein-sequence space. Angew Chem 118(8):1258–1263
Article Google Scholar
Reetz MT, Sanchis J (2008) Constructing and analyzing the fitness landscape of an experimental evolutionary process. Chembiochem 9(14):2260–2267
Article CAS Google Scholar
Wang LW (2006) Directed evolution of the Aspergillus niger Epoxide Hydrolase. PhD thesis, Ruhr-Universität Bochum, Bochum
Straathof AJJ, Jongejan JA (1997) The enantiomeric ratio: origin, determination and prediction. Enzyme Microb Technol 21(8):559–571
Article CAS Google Scholar
Faber K (2011) Biotransformations In Organic Chemistry, 6th edn. Springer, Berlin
Book Google Scholar
Rakels JL, Straathof AJ, Heijnen JJ (1993) A simple method to determine the enantiomeric ratio in enantioselective biocatalysis. Enzyme Microb Technol 15(12):1051–1056
Article CAS Google Scholar
Kauffman SA, Weinberger ED (1989) The NK model of rugged fitness landscapes and its application to maturation of the immune response. J Theor Biol 141(2):211–245
Article CAS Google Scholar
Fox R, Roy A, Govindarajan S, Minshull J, Gustafsson C, Jones JT, Emig R (2003) Optimizing the search algorithm for protein engineering by directed evolution. Protein Eng 16(8):589–597
Article CAS Google Scholar
Vapnik VN, Vapnik V (1998) Statistical learning theory, vol 1. Wiley, New York
Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book Google Scholar
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Article Google Scholar
Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G (2008) Support vector machines and kernels for computational biology. PLoS Comput Biol 4(10):e1000,173
Article Google Scholar
van Westen GJP, Wegner JK, IJzerman AP, van Vlijmen HWT, Bender A (2011) Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. Med Chem Commun 2(1):16–30
Article Google Scholar
Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374
Article CAS Google Scholar
Saraf MC, Horswill AR, Benkovic SJ, Maranas CD (2004) FamClash: a method for ranking the activity of engineered enzymes. Proc Natl Acad Sci (USA) 101(12):4142–4147
Article CAS Google Scholar
Pantazes RJ, Saraf MC, Maranas CD (2007) Optimal protein library design using recombination or point mutations based on sequence-based scoring functions. Protein Eng Des Sel 20(8):361–373
Article CAS Google Scholar
Sulimova V, Mottl V, Kulikowski C, Muchnik I (2008) Probabilistic evolutionary model for substitution matrices of PAM and BLOSUM families. DIMACS Tech Report
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. Atlas Protein Seq Struct 5:345–358
Google Scholar
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282
CAS Google Scholar
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699
Article CAS Google Scholar
Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320
Article CAS Google Scholar
Liò P, Goldman N (1998) Models of molecular evolution and phylogeny. Genome Res 8(12):1233–1244
Article Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10,915–10,919
Article CAS Google Scholar
Leslie CS, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for svm protein classification. In: Pacific symposium on biocomputing, Hawaii, USA, vol 7, pp 566–575
Chen CS, Fujimoto Y, Girdaukas G, Sih CJ (1982) Quantitative analyses of biochemical kinetic resolutions of enantiomers. J Am Chem Soc 104(25):7294–7299
Article CAS Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27–27
Google Scholar
Bornscheuer UT, Huisman GW, Kazlauskas RJ, Lutz S, Moore JC, Robins K (2012) Engineering the third wave of biocatalysis. Nature 485(7397):185–194
Article CAS Google Scholar
Ness JE, Cox T, Govindarajan S, Gustafsson C, Gross RA, Minshull J (2005) Empirical biocatalyst engineering: escaping the tyranny of high-throughput screening. ACS Symp Ser 900:37–50
Article CAS Google Scholar
van den Berg BA, Reinders MJT, van der Laan JM, Roubos JA, de Ridder D (2014) Protein redesign by learning from data. Protein Eng Des Sel 27(9):281–288
Article Google Scholar
Dai DZ, Xia LM (2006) Resolution of (R, S)-2-octanol by Penicillium expansum PED-03 lipase immobilized on modified ultrastable-Y molecular sieve in microaqueous media. Process Biochem 41(6):1455–1460
Article CAS Google Scholar
Berglund P, Holmquist M, Hult K, Högberg HE (1995) Alcohols as enantioselective inhibitors in a lipase catalysed esterification of a chiral acyl donor. Biotechnol Lett 17(1):55–60
Article CAS Google Scholar
Machado SS, Wandel U, Jongejan JA, Straathof AJ, Duine JA (1999) Characterization of the enantioselective properties of the quinohemoprotein alcohol dehydrogenase of Acetobacter pasteurianus LMG 1635. 1. different enantiomeric ratios of whole cells and purified enzyme in the kinetic resolution of racemic glycidol. Biosci Biotechnol Biochem 63(1):10–20
Article CAS Google Scholar
Horsman GP, Liu AMF, Henke E, Bornscheuer UT, Kazlauskas RJ (2003) Mutations in distant residues moderately increase the enantioselectivity of Pseudomonas fluorescens esterase towards methyl 3-bromo-2-methylpropanoate and ethyl 3-phenylbutyrate. Chem Eur J 9(9):1933–1939
Article CAS Google Scholar
Sun Z, Wikmark Y, Bäckvall JE, Reetz MT (2016) New concepts for increasing the efficiency in directed evolution of stereoselective enzymes. Chem Eur J 22(15):5046–5054
Article CAS Google Scholar
Léonard V, Fransson L, Lamare S, Hult K, Graber M (2007) A water molecule in the stereospecificity pocket of Candida antarctica lipase B enhances enantioselectivity towards pentan-2-ol. Chembiochem 8(6):662–667
Article Google Scholar

Download references

Acknowledgements

The authors thank G. Foley (School of Chemistry and Molecular Biosciences, University of Queensland) for reviewing the manuscript. Funding to support this study was provided by the Australian Research Council Discovery Project scheme (DP160100865) and by the Australian Government Research Training Program (RTP).

Author information

Authors and Affiliations

School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia
Julian Zaugg, Yosephine Gumulya, Alpeshkumar K. Malde & Mikael Bodén

Authors

Julian Zaugg
View author publications
You can also search for this author in PubMed Google Scholar
Yosephine Gumulya
View author publications
You can also search for this author in PubMed Google Scholar
Alpeshkumar K. Malde
View author publications
You can also search for this author in PubMed Google Scholar
Mikael Bodén
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Julian Zaugg or Mikael Bodén.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 44 kb)

Supplementary material 2 (XLSX 66 kb)

Supplementary material 3 (XLSX 54 kb)

Supplementary material 4 (PDF 181 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zaugg, J., Gumulya, Y., Malde, A.K. et al. Learning epistatic interactions from sequence-activity data to predict enantioselectivity. J Comput Aided Mol Des 31, 1085–1096 (2017). https://doi.org/10.1007/s10822-017-0090-x

Download citation

Received: 10 August 2017
Accepted: 04 December 2017
Published: 12 December 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10822-017-0090-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning epistatic interactions from sequence-activity data to predict enantioselectivity

Abstract

Access this article

Similar content being viewed by others

A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes

Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants

Application of fourier transform and proteochemometrics principles to protein engineering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

Supplementary material 1 (XLSX 44 kb)

Supplementary material 2 (XLSX 66 kb)

Supplementary material 3 (XLSX 54 kb)

Supplementary material 4 (PDF 181 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning epistatic interactions from sequence-activity data to predict enantioselectivity

Abstract

Access this article

Similar content being viewed by others

A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes

Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants

Application of fourier transform and proteochemometrics principles to protein engineering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

Supplementary material 1 (XLSX 44 kb)

Supplementary material 2 (XLSX 66 kb)

Supplementary material 3 (XLSX 54 kb)

Supplementary material 4 (PDF 181 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation