Skip to main content

Advertisement

Log in

Convex-PL: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

We present a novel optimization approach to train a free-shape distance-dependent protein-ligand scoring function called Convex-PL. We do not impose any functional form of the scoring function. Instead, we decompose it into a polynomial basis and deduce the expansion coefficients from the structural knowledge base using a convex formulation of the optimization problem. Also, for the training set we do not generate false poses with molecular docking packages, but use constant RMSD rigid-body deformations of the ligands inside the binding pockets. This allows the obtained scoring function to be generally applicable to scoring of structural ensembles generated with different docking methods. We assess the Convex-PL scoring function using data from D3R Grand Challenge 2 submissions and the docking test of the CASF 2013 study. We demonstrate that our results outperform the other 20 methods previously assessed in CASF 2013. The method is available at http://team.inria.fr/nano-d/software/Convex-PL/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Böhm HJ (1994) The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J Comput-Aided Mol Des 8(3):243–256

    Article  Google Scholar 

  2. Böhm HJ (1998) Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3d database search programs. J Comput-Aided Mol Des 12(4):309–309

    Article  Google Scholar 

  3. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York

    Book  Google Scholar 

  4. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) Charmm: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4(2):187–217

    Article  CAS  Google Scholar 

  5. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) Blast+: architecture and applications. BMC Bioinf 10(1):421

    Article  Google Scholar 

  6. Carlson HA, Smith RD, Damm-Ganamet KL, Stuckey JA, Ahmed A, Convery MA, Somers DO, Kranz M, Elkins PA, Cui G, Lambert MH, Dunbar JB Jr (2016) CSAR 2014: a benchmark exercise using unpublished data from pharma. J Chem Inf Model 56:1063–1077

    Article  CAS  Google Scholar 

  7. Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ (2005) The amber biomolecular simulation programs. J Comput Chem 26(16):1668–1688

    Article  CAS  Google Scholar 

  8. Chae MH, Krull F, Lorenzen S, Knapp EW (2010) Predicting protein complex geometries with a neural network. Proteins 78(4):1026–1039

    Article  CAS  Google Scholar 

  9. Chaskar P, Zoete V, Röhrig UF (2014) Toward on-the-fly quantum mechanical/molecular mechanical (qm/mm) docking: development and benchmark of a scoring function. J Chem Inf Model 54(11):3137–3152

    Article  CAS  Google Scholar 

  10. Cheng T, Li X, Li Y, Liu Z, Wang R (2009) Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model 49(4):1079–1093

    Article  CAS  Google Scholar 

  11. Chuang GY, Kozakov D, Brenke R, Comeau SR, Vajda S (2008) Dars (decoys as the reference state) potentials for protein-protein docking. Biophys J 95(9):4217–4227

    Article  CAS  Google Scholar 

  12. Corbeil CR, Williams CI, Labute P (2012) Variability in docking success rates due to dataset preparation. J Comput-Aided Mol Des 26(6):775–786

    Article  CAS  Google Scholar 

  13. Cross JB, Thompson DC, Rai BK, Baber JC, Fan KY, Hu Y, Humblet C (2009) Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J Chem Inf Model 49(6):1455–1474

    Article  CAS  Google Scholar 

  14. Damm-Ganamet KL, Smith RD, Dunbar JB Jr, Stuckey JA, Carlson HA (2013) CSAR benchmark exercise 2011–2012: evaluation of results from docking and relative ranking of blinded congeneric series. J Chem Inf Model 53(8):1853–1870

    Article  CAS  Google Scholar 

  15. Eldridge MD, Murray CW, Auton TR, Paolini GV, Mee RP (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput-Aided Mol Des 11(5):425–445

    Article  CAS  Google Scholar 

  16. Ewing TJ, Makino S, Skillman AG, Kuntz ID (2001) Dock 4.0: search strategies for automated molecular docking of flexible molecule databases. J Comput-Aided Mol Des 15(5):411–428

    Article  CAS  Google Scholar 

  17. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS (2004) Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47(7):1739–1749

    Article  CAS  Google Scholar 

  18. Gabel J, Desaphy J, Rognan D (2014) Beware of machine learning-based scoring functions–on the danger of developing black boxes. J Chem Inf Model 54(10):2807–2815

    Article  CAS  Google Scholar 

  19. Gathiaka S, Liu S, Chiu M, Yang H, Stuckey JA, Kang YN, Delproposto J, Kubish G, Dunbar JB, Carlson HA et al (2016) D3R grand challenge 2015: evaluation of protein-ligand pose and affinity predictions. J Comput-Aided Mol Des 30(9):651–668

    Article  CAS  Google Scholar 

  20. Gehlhaar DK, Verkhivker GM, Rejto PA, Sherman CJ, Fogel DR, Fogel LJ, Freer ST (1995) Molecular recognition of the inhibitor ag-1343 by hiv-1 protease: conformationally flexible docking by evolutionary programming. Chem Biol (Oxford, UK) 2(5):317–324

    CAS  Google Scholar 

  21. Gohlke H, Hendlich M, Klebe G (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 295(2):337–356

    Article  CAS  Google Scholar 

  22. Goto J, Kataoka R, Muta H, Hirayama N (2008) Asedock-docking based on alpha spheres and excluded volumes. J Chem Inf Model 48(3):583–590

    Article  CAS  Google Scholar 

  23. Grudinin S, Kadukova M, Eisenbarth A, Marillet S, Cazals F (2016) Predicting binding affinities for protein - ligand complexes in the 2015 d3r grand challenge using a physical model with a ridge regression parameter estimation. J Comput-Aided Mol Des 30:791–804

    Article  CAS  Google Scholar 

  24. Grudinin S, Popov P, Neveu E, Cheremovskiy G (2016) Predicting binding poses and affinities in the CSAR 2013–2014 docking exercises using the knowledge-based convex-pl potential. J Chem Inf Model 56(6):1053–1062

    Article  CAS  Google Scholar 

  25. Hess B, Kutzner C, Van Der Spoel D, Lindahl E (2008) Gromacs 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput 4(3):435–447

    Article  CAS  Google Scholar 

  26. Homeyer N, Gohlke H (2013) FEW: a workflow tool for free energy calculations of ligand binding. J Comput Chem 34(11):965–973

    Article  CAS  Google Scholar 

  27. Hsieh JH, Yin S, Liu S, Sedykh A, Dokholyan NV, Tropsha A (2011) Combined application of cheminformatics and physical force field-based scoring functions improves binding affinity prediction for CSAR data sets. J Chem Inf Model 51(9):2027–2035

    Article  CAS  Google Scholar 

  28. Huang SY, Zou X (2008) An iterative knowledge-based scoring function for protein–protein recognition. Proteins 72(2):557–579

    Article  CAS  Google Scholar 

  29. Huang SY, Zou X (2010) Inclusion of solvation and entropy in the knowledge-based scoring function for protein-ligand interactions. J Chem Inf Model 50(2):262–273

    Article  CAS  Google Scholar 

  30. Huang SY, Zou X (2010) Mean-force scoring functions for protein-ligand binding. Annu Rep Comput Chem 6:280–296

    Article  Google Scholar 

  31. Huang SY, Zou X (2011) Scoring and lessons learned with the CSAR benchmark using an improved iterative knowledge-based scoring function. J Chem Inf Model 51(9):2097–2106

    Article  CAS  Google Scholar 

  32. Jain AN (1996) Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities. J Comput-Aided Mol Des 10(5):427–440

    Article  CAS  Google Scholar 

  33. Jorgensen WL, Maxwell DS, Tirado-Rives J (1996) Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc 45:11225–11226

    Article  Google Scholar 

  34. Kadukova M, Grudinin S (2016) Knodle: a support vector machines-based automatic perception of organic molecules from 3d coordinates. J Chem Inf Model 56(8):1410–1419

    Article  CAS  Google Scholar 

  35. Kadukova M, Grudinin S (2017) Docking of small molecules to farnesoid X receptors using AutoDock Vina with the Convex-PL potential: lessons learned from D3R Grand Challenge 2. J Comput Aided Mol Des. doi:10.1007/s10822-017-0062-1

    Google Scholar 

  36. Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L, Bourne PE (2011) A machine learning-based method to improve docking scoring functions and its application to drug repurposing. J Chem Inf Model 51(2):408–419

    Article  CAS  Google Scholar 

  37. Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 3(11):935–949

    Article  CAS  Google Scholar 

  38. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAS 2:1137–1145

    Google Scholar 

  39. Korb O, Stutzle T, Exner TE (2009) Empirical scoring functions for advanced protein-ligand docking with plants. J Chem Inf Model 49(1):84–96

    Article  CAS  Google Scholar 

  40. Krammer A, Kirchhoff PD, Jiang X, Venkatachalam C, Waldman M (2005) Ligscore: a novel scoring function for predicting binding affinities. J Mol Graphics Modell 23(5):395–407

    Article  CAS  Google Scholar 

  41. Kuhn B, Gerber P, Schulz-Gasch T, Stahl M (2005) Validation and use of the mm-pbsa approach for drug discovery. J Med Chem 48(12):4040–4048

    Article  CAS  Google Scholar 

  42. Labute P (2008) The generalized born/volume integral implicit solvent model: estimation of the free energy of hydration using london dispersion instead of atomic surface area. J Comput Chem 29(10):1693–1698

    Article  CAS  Google Scholar 

  43. Lee Y, Mangasarian O (2001) RSVM: Reduced support vector machines. In: Proceedings of the First SIAM International Conference on Data Mining, pp. 00–07

  44. Lensink MF, Velankar S, Kryshtafovych A, Huang SY, Schneidman-Duhovny D, Sali A, Segura J, Fernandez-Fuentes N, Viswanath S, Elber R, Grudinin S, Popov P, Neveu E, Lee H, Baek M, Park S, Heo L, Rie Lee G, Seok C, Qin S, Zhou H, Ritchie DW, Maigret B, Devignes MD, Ghoorah AW, Torchala M, Chaleil RA, Bates PA, Ben-Zeev E, Eisenstein M, Negi S, Weng Z, Vreven T, Pierce BG, Borrman TM, Yu J, Ochsenbein F, Guerois R, Vangone A, Rodrigues JP, van Zundert G, Nellen M, Xue L, Karaca E, Melquiond A, Visscher K, Kastritis PL, Bonvin AMJJ, Xu X, Qiu L, Yan C, Li J, Ma Z, Cheng J, Zou X, Shen Y, Peterson L, Kim H, Roy A, Han X, Esquivel-Rodriguez J, Kihara D, Yu X, Bruce N, Fuller J, Wade R, Anishchenko I, Kundrotas PJ, Vakser IA Vakser, Imai K, Yamada k, Oda T, Nakamura T, Tomii k, Pallara C, Romero-Durana M, Jimenez-Garcia B, Moal IH, Fernandez-Recio J, Young Joung J, Kim JY, Joo k, Lee J, Kozakov D, Vajda S, Chermak E, CavalloL, Oliva R, Tovchigrechko A, Wodak S (2016) Prediction of homo- and hetero-protein complexes by ab-initio and template-based docking: a CASP-CAPRI experiment. Proteins 84:323–348

    Article  Google Scholar 

  45. Li GB, Yang LL, Wang WJ, Li LL, Yang SY (2013) Id-score: a new empirical scoring function based on a comprehensive set of descriptors related to protein-ligand interactions. J Chem Inf Model 53(3):592–600

    Article  CAS  Google Scholar 

  46. Li Y, Han L, Liu Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results. J Chem Inf Model 54(6):1717–1736

    Article  CAS  Google Scholar 

  47. Li Y, Liu Z, Li J, Han L, Liu J, Zhao Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set. J Chem Inf Model 54(6):1700–1716

    Article  CAS  Google Scholar 

  48. Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55(3):475–482

    Article  CAS  Google Scholar 

  49. Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55(3):475–482

    Article  CAS  Google Scholar 

  50. Maiorov VN, Grippen GM (1992) Contact potential that recognizes the correct folding of globular proteins. J Mol Biol 227(3):876–888

    Article  CAS  Google Scholar 

  51. McInnes C (2007) Virtual screening strategies in drug discovery. Curr Opin Chem Biol 11(5):494–502

    Article  CAS  Google Scholar 

  52. Mooij W, Verdonk ML (2005) General and targeted statistical potentials for protein-ligand interactions. Proteins 61(2):272–287

    Article  CAS  Google Scholar 

  53. Muegge I (2000) A knowledge-based scoring function for protein-ligand interactions: Probing the reference state. In: Virtual screening: an alternative or complement to high throughput screening?, Springer, Berlin pp 99–114

  54. Muegge I (2001) Effect of ligand volume correction on pmf scoring. J Comput Chem 22(4):418–425

    Article  CAS  Google Scholar 

  55. Muegge I, Martin YC (1999) A general and fast scoring function for protein-ligand interactions: a simplified potential approach. J Med Chem 42(5):791–804

    Article  CAS  Google Scholar 

  56. Neudert G, Klebe G (2011) Dsx: a knowledge-based scoring function for the assessment of protein-ligand complexes. J Chem Inf Model 51(10):2731–2745

    Article  CAS  Google Scholar 

  57. Neudert G, Klebe G (2011) fconv: format conversion, manipulation and feature computation of molecular data. Bioinformatics 27(7):1021–1022

    Article  CAS  Google Scholar 

  58. Neveu E, Ritchie DW, Popov P, Grudinin S (2016) Pepsi-dock: a detailed data-driven protein-protein interaction potential accelerated by polar fourier correlation. Bioinformatics 32(17):i693–i701

    Article  CAS  Google Scholar 

  59. Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Neural networks for signal processing [1997] VII. Proceedings of the 1997 IEEE Workshop, pp 276–285

  60. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods. MIT Press, Cambridge, MA

    Google Scholar 

  61. Popov P, Grudinin S (2014) Rapid determination of RMSDs corresponding to macromolecular rigid body motions. J Comput Chem 35(12):950–956

    Article  CAS  Google Scholar 

  62. Popov P, Grudinin S (2015) Knowledge of native protein-protein interfaces is sufficient to construct predictive models for the selection of binding candidates. J Chem Inf Model 55(10):2242–2255

    Article  CAS  Google Scholar 

  63. Qiu J, Elber R (2005) Atomically detailed potentials to recognize native and approximate protein structures. Proteins 61(1):44–55

    Article  CAS  Google Scholar 

  64. Quiroga R, Villarreal MA (2016) Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening. PLoS ONE 11(5):e0155183

    Article  Google Scholar 

  65. Samudrala R, Moult J (1998) An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 275(5):895–916

    Article  CAS  Google Scholar 

  66. BIOVIA DS, Discovery Studio Modeling Environment BIOVIA, (2016) Dassault Systemes, Realease 2017

  67. Shen My, Sali A (2006) Statistical potential for assessment and prediction of protein structures. Protein Sci. 15(11):2507–2524

    Article  CAS  Google Scholar 

  68. Smith RD, Dunbar j Jr, Ung PM, Esposito EX, Yang CY, Wang S, Carlson HA (2011) CSAR benchmark exercise of 2010: Combined evaluation across all submitted scoring functions. J Chem Inf Model 51:2115–2131

    Article  CAS  Google Scholar 

  69. Sotriffer C (2012) Scoring functions for protein-ligand interactions. Protein-ligand interactions, First Edition pp 237–263 Wiley: Weinham

  70. Sotriffer C, Matter H (2011) Virtual screening: principles, challenges, and practical guidelines, chap 7. Wiley, Weinham

    Book  Google Scholar 

  71. Tobi D, Bahar I (2006) Optimal design of protein docking potentials: efficiency and limitations. Proteins 62(4):970–981

    Article  CAS  Google Scholar 

  72. Trott O, Olson AJ (2010) Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31(2):455–461

    CAS  Google Scholar 

  73. Vapnik V (1979) Estimation of dependences based on empirical data. Nauka, Moscow

    Google Scholar 

  74. Vapnik V (2000) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  75. Wang C, Zhang Y (2017) Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 38(3):169–177

    Article  CAS  Google Scholar 

  76. Wang L, Berne B, Friesner RA (2012) On achieving high accuracy and reliability in the calculation of relative protein-ligand binding affinities. Proc Natl Acad Sci USA 109(6):1937–1942

    Article  CAS  Google Scholar 

  77. Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, Romero DL, Masse C, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E, Sherman W, Brewer M, Wester R, Murcko M, Frye L, Farid R, Lin T, Mobley DL, Jorgensen WL, Berne BJ, Friesner RA, Abel R (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J Am Chem Soc 137(7):2695–2703

    Article  CAS  Google Scholar 

  78. Wang R, Fang X, Lu Y, Wang S (2004) The PDB bind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J Med Chem 47(12):2977–2980

    Article  CAS  Google Scholar 

  79. Wang R, Fang X, Lu Y, Yang CY, Wang S (2005) The PDB bind database: methodologies and updates. J Med Chem 48(12):4111–4119

    Article  CAS  Google Scholar 

  80. Wang R, Lai L, Wang S (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput-Aided Mol Des 16(1):11–26

    Article  CAS  Google Scholar 

  81. Wang R, Lu Y, Wang S (2003) Comparative evaluation of 11 scoring functions for molecular docking. J Med Chem 46(12):2287–2303

    Article  CAS  Google Scholar 

  82. Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH, Lindvall M, Nevins N, Semus SF, Senger S, Tedesco G, Wall ID, Woolven JM, Peishoff CE, Head MS (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49(20):5912–5931

    Article  CAS  Google Scholar 

  83. Yan Z, Wang J (2016) Incorporating specificity into optimization: evaluation of spa using CSAR 2014 and casf 2013 benchmarks. J Comput-Aided Mol Des 30(3):219–227

    Article  CAS  Google Scholar 

  84. Zheng Z, Merz KM (2013) Development of the knowledge-based and empirical combined scoring algorithm (kecsa) to score protein-ligand interactions. J Chem Inf Model 53(5):1073–1083

    Article  CAS  Google Scholar 

  85. Zhou H, Skolnick J (2011) Goap: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys J 101(8):2043–2052

    Article  CAS  Google Scholar 

  86. Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11(11):2714–2726

    Article  CAS  Google Scholar 

  87. Zilian D, Sotriffer CA (2013) Sfcscorerf: a random forest-based scoring function for improved affinity prediction of protein-ligand complexes. J Chem Inf Model 53(8):1923–1933

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank Georgy Cheremovskiy from Moscow Institute of Physics and Technology for the initial development of the potential, and Georgy Derevyanko from Concordia University who proposed the initial formulation of the optimization problem. The authors also thank Valentin Gordeliy from IBS Grenoble, and Vladimir Chupin and Petr Popov from MIPT Moscow for fruitful discussions during this work. This work was partially supported by RSF research Project 14-14-00995.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergei Grudinin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 235 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kadukova, M., Grudinin, S. Convex-PL: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization. J Comput Aided Mol Des 31, 943–958 (2017). https://doi.org/10.1007/s10822-017-0068-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-017-0068-8

Keywords

Navigation