Skip to main content
Log in

Learning epistatic interactions from sequence-activity data to predict enantioselectivity

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Enzymes with a high selectivity are desirable for improving economics of chemical synthesis of enantiopure compounds. To improve enzyme selectivity mutations are often introduced near the catalytic active site. In this compact environment epistatic interactions between residues, where contributions to selectivity are non-additive, play a significant role in determining the degree of selectivity. Using support vector machine regression models we map mutations to the experimentally characterised enantioselectivities for a set of 136 variants of the epoxide hydrolase from the fungus Aspergillus niger (AnEH). We investigate whether the influence a mutation has on enzyme selectivity can be accurately predicted through linear models, and whether prediction accuracy can be improved using higher-order counterparts. Comparing linear and polynomial degree = 2 models, mean Pearson coefficients (r) from \(50\,{\times }\,5\)-fold cross-validation increase from 0.84 to 0.91 respectively. Equivalent models tested on interaction-minimised sequences achieve values of \(r=0.90\) and \(r=0.93\). As expected, testing on a simulated control data set with no interactions results in no significant improvements from higher-order models. Additional experimentally derived AnEH mutants are tested with linear and polynomial degree = 2 models, with values increasing from \(r=0.51\) to \(r=0.87\) respectively. The study demonstrates that linear models perform well, however the representation of epistatic interactions in predictive models improves identification of selectivity-enhancing mutations. The improvement is attributed to higher-order kernel functions that represent epistatic interactions between residues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Agranat I, Caner H, Caldwell J (2002) Putting chirality to work: the strategy of chiral switches. Nat Rev Drug Discov 1(10):753–768

    Article  CAS  Google Scholar 

  2. Agranat I, Wainschtein SR, Zusman EZ (2012) The predicated demise of racemic new molecular entities is an exaggeration. Nat Rev Drug Discov 11(12):972–973

    Article  CAS  Google Scholar 

  3. Branch SK, Agranat I (2014) “New drug” designations for new therapeutic entities: new active substance, new chemical entity, new biological entity, new molecular entity. J Med Chem 57(21):8729–8765

    Article  CAS  Google Scholar 

  4. Morley KL, Kazlauskas RJ (2005) Improving enzyme properties: when are closer mutations better? Trends Biotechnol 23(5):231–237

    Article  CAS  Google Scholar 

  5. Miton CM, Tokuriki N (2016) How mutational epistasis impairs predictability in protein evolution and design. Protein Sci. 25(7):1260–1272

    Article  CAS  Google Scholar 

  6. Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25(7):1204–1218

    Article  CAS  Google Scholar 

  7. Kondrashov DA, Kondrashov FA (2015) Topological features of rugged fitness landscapes in sequence space. Trends Genet 31(1):24–33

    Article  CAS  Google Scholar 

  8. Li Y, Drummond DA, Sawayama AM, Snow CD, Bloom JD, Arnold FH (2007) A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments. Nat Biotechnol 25(9):1051–1056

    Article  CAS  Google Scholar 

  9. Fox RJ, Davis SC, Mundorff EC, Newman LM, Gavrilovic V, Ma SK, Chung LM, Ching C, Tam S, Muley S, Grate J, Gruber J, Whitman JC, Sheldon RA, Huisman GW (2007) Improving catalytic function by ProSAR-driven enzyme evolution. Nat Biotechnol 25(3):338–344

    Article  CAS  Google Scholar 

  10. Liao J, Warmuth MK, Govindarajan S, Ness JE, Wang RP, Gustafsson C, Minshull J (2007) Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol 7(1):16

    Article  Google Scholar 

  11. Romero PA, Arnold FH (2012) Random field model reveals structure of the protein recombinational landscape. PLoS Comput Biol 8(10):e1002,713

    Article  CAS  Google Scholar 

  12. Fox R (2005) Directed molecular evolution by machine learning and the influence of nonlinear interactions. J Theor Biol 234(2):187–199

    Article  CAS  Google Scholar 

  13. Buske FA, Their R, Gillam EMJ, Bodén M (2009) In silico characterization of protein chimeras: Relating sequence and function within the same fold. Proteins 77(1):111–120

    Article  CAS  Google Scholar 

  14. Romero PA, Krause A, Arnold FH (2013) Navigating the protein fitness landscape with Gaussian processes. Proc Natl Acad Sci (USA) 110(3):E193–201

    Article  CAS  Google Scholar 

  15. Funar-Timofei S, Suzuki T, Paier JA, Steinreiber A, Faber K, Fabian WMF (2003) Quantitative structure-activity relationships for the enantioselectivity of oxirane ring-opening catalyzed by epoxide hydrolases. J Chem Inf Comput Sci 43(3):934–940

    Article  CAS  Google Scholar 

  16. Caetano S, Aires-de Sousa J, Daszykowski M, Heyden YV (2005) Prediction of enantioselectivity using chirality codes and classification and regression trees. Anal Chim Acta 544(1–2):315–326

    Article  CAS  Google Scholar 

  17. Gu J, Liu J, Yu H (2011) Quantitative prediction of enantioselectivity of Candida antarctica lipase B by combining docking simulations and quantitative structure–activity relationship (QSAR) analysis. J Mol Catal B 72(3–4):238–247

    Article  CAS  Google Scholar 

  18. Hartman JH, Cothren SD, Park SH, Yun CH, Darsey JA, Miller GP (2013) Predicting CYP2C19 catalytic parameters for enantioselective oxidations using artificial neural networks and a chirality code. Bioorg Med Chem 21(13):3749–3759

    Article  CAS  Google Scholar 

  19. Tomić S, Kojić-Prodić B (2002) A quantitative model for predicting enzyme enantioselectivity: application to Burkholderia cepacia lipase and 3-(aryloxy)-1,2-propanediol derivatives. J Mol Graph Model 21(3):241–252

    Article  Google Scholar 

  20. Wijma HJ, Marrink SJ, Janssen DB (2014) Computationally efficient and accurate enantioselectivity modeling by clusters of molecular dynamics simulations. J Chem Inf Model 54(7):2079–2092

    Article  CAS  Google Scholar 

  21. Wijma HJ, Floor RJ, Bjelic S, Marrink SJ, Baker D, Janssen DB (2015) Enantioselective enzymes by computational design and in silico screening. Angew Chem Int Ed 54(12):3726–3730

    Article  CAS  Google Scholar 

  22. Braiuca P, Lorena K, Ferrario V, Ebert C, Gardossi L (2009) A three-dimensional quanititative structure-activity relationship (3D-QSAR) model for predicting the enantioselectivity of Candida antarctica Lipase B. Adv Synth Catal 351(9):1293–1302

    Article  CAS  Google Scholar 

  23. Feng X, Sanchis J, Reetz MT, Rabitz H (2012) Enhancing the efficiency of directed evolution in focused enzyme libraries by the adaptive substituent reordering algorithm. Chem Eur J 18(18):5646–5654

    Article  CAS  Google Scholar 

  24. Liang J, Mundorff E, Voladri R, Jenne S, Gilson L, Conway A, Krebber A, Wong J, Huisman G, Truesdell S, Lalonde J (2010) Highly enantioselective reduction of a small heterocyclic ketone: biocatalytic reduction of tetrahydrothiophene-3-one to the corresponding (R)-alcohol. Org Process Res Dev 14(1):188–192

    Article  CAS  Google Scholar 

  25. Chaput L, Sanejouand YH, Balloumi A, Tran V, Graber M (2012) Contribution of both catalytic constant and Michaelis constant to CALB enantioselectivity: Use of FEP calculations for prediction studies. J Mol Catal B 76:29–36

    Article  CAS  Google Scholar 

  26. Noey EL, Tibrewal N, Jiménez-Osés G, Osuna S, Park J, Bond CM, Cascio D, Liang J, Zhang X, Huisman GW, Tang Y, Houk KN (2015) Origins of stereoselectivity in evolved ketoreductases. Proc Natl Acad Sci (USA) 112(51):E7065–72

    CAS  Google Scholar 

  27. Minshull J, Ness JE, Gustafsson C, Govindarajan S (2005) Predicting enzyme function from protein sequence. Curr Opin Chem Biol 9(2):202–209

    Article  CAS  Google Scholar 

  28. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  29. Bedbrook CN, Yang KK, Rice AJ, Gradinaru V, Arnold FH (2017) Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization. PLoS Comput Biol 13(10):e1005,786

    Article  Google Scholar 

  30. Romero P, Stone E, Lamb C, Chantranupong L, Krause A, Miklos A, Hughes R, Fechtel B, Ellington A, Arnold FH (2012) SCHEMA-designed variants of human Arginase I and II reveal sequence elements important to stability and catalysis. ACS Synth Biol 1(6):221–228

    Article  CAS  Google Scholar 

  31. Smith MA, Rentmeister A, Snow CD, Wu T, Farrow MF, Mingardon F, Arnold FH (2012) A diverse set of family 48 bacterial glycoside hydrolase cellulases created by structure-guided recombination. FEBS J 279(24):4453–4465

    Article  CAS  Google Scholar 

  32. Pissurlenkar RRS, Malde AK, Khedkar SA, Coutinho EC (2007) Encoding type and position in peptide QSAR: application to peptides binding to class I MHC molecule HLA-A*0201. Mol Inform 26(2):189–203

    CAS  Google Scholar 

  33. Verma J, Khedkar VM, Prabhu AS, Khedkar SA, Malde AK, Coutinho EC (2008) A comprehensive analysis of the thermodynamic events involved in ligand–receptor binding using CoRIA and its variants. J Comput Aided Mol Des 22(2):91–104

    Article  CAS  Google Scholar 

  34. Voigt CA, Martinez C, Wang ZG, Mayo SL, Arnold FH (2002) Protein building blocks preserved by recombination. Nat Struct Biol 9(7):553–558

    CAS  Google Scholar 

  35. Silberg JJ, Endelman JB, Arnold FH (2004) SCHEMA-guided protein recombination. Meth Enzymol 388:35–42

    Article  CAS  Google Scholar 

  36. Zaugg J, Gumulya Y, Gillam EMJ, Bodén M (2014) Computational tools for directed evolution: a comparison of prospective and retrospective strategies. Methods Mol Biol 1179:315–333

    Article  Google Scholar 

  37. Endelman JB, Silberg JJ, Wang ZG, Arnold FH (2004) Site-directed protein recombination as a shortest-path problem. Protein Eng Des Sel 17:589–594

    Article  CAS  Google Scholar 

  38. Heinzelman P, Snow CD, Wu I, Nguyen C, Villalobos A, Govindarajan S, Minshull J, Arnold FH (2009) A family of thermostable fungal cellulases created by structure-guided recombination. Proc Natl Acad Sci (USA) 106(14):5610–5615

    Article  CAS  Google Scholar 

  39. Packer MS, Liu DR (2015) Methods for the directed evolution of proteins. Nat Rev Genet 16(7):379–394

    Article  CAS  Google Scholar 

  40. Reetz MT, Bocola M, Carballeira JD, Zha D, Vogel A (2005) Expanding the range of substrate acceptance of enzymes: combinatorial active-site saturation test. Angew Chem Int Ed 44(27):4192–4196

    Article  CAS  Google Scholar 

  41. Reetz MT, Carballeira JD (2007) Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat Protoc 2(4):891–903

    Article  CAS  Google Scholar 

  42. Gumulya Y, Sanchis J, Reetz MT (2012) Many pathways in laboratory evolution can lead to improved enzymes: how to escape from local minima. Chembiochem 13(7):1060–1066

    Article  CAS  Google Scholar 

  43. Reetz MT, Wang LW, Bocola M (2006) Directed evolution of enantioselective enzymes: iterative cycles of CASTing for probing protein-sequence space. Angew Chem 118(8):1258–1263

    Article  Google Scholar 

  44. Reetz MT, Sanchis J (2008) Constructing and analyzing the fitness landscape of an experimental evolutionary process. Chembiochem 9(14):2260–2267

    Article  CAS  Google Scholar 

  45. Wang LW (2006) Directed evolution of the Aspergillus niger Epoxide Hydrolase. PhD thesis, Ruhr-Universität Bochum, Bochum

  46. Straathof AJJ, Jongejan JA (1997) The enantiomeric ratio: origin, determination and prediction. Enzyme Microb Technol 21(8):559–571

    Article  CAS  Google Scholar 

  47. Faber K (2011) Biotransformations In Organic Chemistry, 6th edn. Springer, Berlin

    Book  Google Scholar 

  48. Rakels JL, Straathof AJ, Heijnen JJ (1993) A simple method to determine the enantiomeric ratio in enantioselective biocatalysis. Enzyme Microb Technol 15(12):1051–1056

    Article  CAS  Google Scholar 

  49. Kauffman SA, Weinberger ED (1989) The NK model of rugged fitness landscapes and its application to maturation of the immune response. J Theor Biol 141(2):211–245

    Article  CAS  Google Scholar 

  50. Fox R, Roy A, Govindarajan S, Minshull J, Gustafsson C, Jones JT, Emig R (2003) Optimizing the search algorithm for protein engineering by directed evolution. Protein Eng 16(8):589–597

    Article  CAS  Google Scholar 

  51. Vapnik VN, Vapnik V (1998) Statistical learning theory, vol 1. Wiley, New York

    Google Scholar 

  52. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  53. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

    Article  Google Scholar 

  54. Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G (2008) Support vector machines and kernels for computational biology. PLoS Comput Biol 4(10):e1000,173

    Article  Google Scholar 

  55. van Westen GJP, Wegner JK, IJzerman AP, van Vlijmen HWT, Bender A (2011) Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. Med Chem Commun 2(1):16–30

    Article  Google Scholar 

  56. Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374

    Article  CAS  Google Scholar 

  57. Saraf MC, Horswill AR, Benkovic SJ, Maranas CD (2004) FamClash: a method for ranking the activity of engineered enzymes. Proc Natl Acad Sci (USA) 101(12):4142–4147

    Article  CAS  Google Scholar 

  58. Pantazes RJ, Saraf MC, Maranas CD (2007) Optimal protein library design using recombination or point mutations based on sequence-based scoring functions. Protein Eng Des Sel 20(8):361–373

    Article  CAS  Google Scholar 

  59. Sulimova V, Mottl V, Kulikowski C, Muchnik I (2008) Probabilistic evolutionary model for substitution matrices of PAM and BLOSUM families. DIMACS Tech Report

  60. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. Atlas Protein Seq Struct 5:345–358

    Google Scholar 

  61. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282

    CAS  Google Scholar 

  62. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699

    Article  CAS  Google Scholar 

  63. Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320

    Article  CAS  Google Scholar 

  64. Liò P, Goldman N (1998) Models of molecular evolution and phylogeny. Genome Res 8(12):1233–1244

    Article  Google Scholar 

  65. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10,915–10,919

    Article  CAS  Google Scholar 

  66. Leslie CS, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for svm protein classification. In: Pacific symposium on biocomputing, Hawaii, USA, vol 7, pp 566–575

  67. Chen CS, Fujimoto Y, Girdaukas G, Sih CJ (1982) Quantitative analyses of biochemical kinetic resolutions of enantiomers. J Am Chem Soc 104(25):7294–7299

    Article  CAS  Google Scholar 

  68. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27–27

    Google Scholar 

  69. Bornscheuer UT, Huisman GW, Kazlauskas RJ, Lutz S, Moore JC, Robins K (2012) Engineering the third wave of biocatalysis. Nature 485(7397):185–194

    Article  CAS  Google Scholar 

  70. Ness JE, Cox T, Govindarajan S, Gustafsson C, Gross RA, Minshull J (2005) Empirical biocatalyst engineering: escaping the tyranny of high-throughput screening. ACS Symp Ser 900:37–50

    Article  CAS  Google Scholar 

  71. van den Berg BA, Reinders MJT, van der Laan JM, Roubos JA, de Ridder D (2014) Protein redesign by learning from data. Protein Eng Des Sel 27(9):281–288

    Article  Google Scholar 

  72. Dai DZ, Xia LM (2006) Resolution of (R, S)-2-octanol by Penicillium expansum PED-03 lipase immobilized on modified ultrastable-Y molecular sieve in microaqueous media. Process Biochem 41(6):1455–1460

    Article  CAS  Google Scholar 

  73. Berglund P, Holmquist M, Hult K, Högberg HE (1995) Alcohols as enantioselective inhibitors in a lipase catalysed esterification of a chiral acyl donor. Biotechnol Lett 17(1):55–60

    Article  CAS  Google Scholar 

  74. Machado SS, Wandel U, Jongejan JA, Straathof AJ, Duine JA (1999) Characterization of the enantioselective properties of the quinohemoprotein alcohol dehydrogenase of Acetobacter pasteurianus LMG 1635. 1. different enantiomeric ratios of whole cells and purified enzyme in the kinetic resolution of racemic glycidol. Biosci Biotechnol Biochem 63(1):10–20

    Article  CAS  Google Scholar 

  75. Horsman GP, Liu AMF, Henke E, Bornscheuer UT, Kazlauskas RJ (2003) Mutations in distant residues moderately increase the enantioselectivity of Pseudomonas fluorescens esterase towards methyl 3-bromo-2-methylpropanoate and ethyl 3-phenylbutyrate. Chem Eur J 9(9):1933–1939

    Article  CAS  Google Scholar 

  76. Sun Z, Wikmark Y, Bäckvall JE, Reetz MT (2016) New concepts for increasing the efficiency in directed evolution of stereoselective enzymes. Chem Eur J 22(15):5046–5054

    Article  CAS  Google Scholar 

  77. Léonard V, Fransson L, Lamare S, Hult K, Graber M (2007) A water molecule in the stereospecificity pocket of Candida antarctica lipase B enhances enantioselectivity towards pentan-2-ol. Chembiochem 8(6):662–667

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank G. Foley (School of Chemistry and Molecular Biosciences, University of Queensland) for reviewing the manuscript. Funding to support this study was provided by the Australian Research Council Discovery Project scheme (DP160100865) and by the Australian Government Research Training Program (RTP).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Julian Zaugg or Mikael Bodén.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zaugg, J., Gumulya, Y., Malde, A.K. et al. Learning epistatic interactions from sequence-activity data to predict enantioselectivity. J Comput Aided Mol Des 31, 1085–1096 (2017). https://doi.org/10.1007/s10822-017-0090-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-017-0090-x

Keywords

Navigation