Abstract
We present a novel optimization approach to train a free-shape distance-dependent protein-ligand scoring function called Convex-PL. We do not impose any functional form of the scoring function. Instead, we decompose it into a polynomial basis and deduce the expansion coefficients from the structural knowledge base using a convex formulation of the optimization problem. Also, for the training set we do not generate false poses with molecular docking packages, but use constant RMSD rigid-body deformations of the ligands inside the binding pockets. This allows the obtained scoring function to be generally applicable to scoring of structural ensembles generated with different docking methods. We assess the Convex-PL scoring function using data from D3R Grand Challenge 2 submissions and the docking test of the CASF 2013 study. We demonstrate that our results outperform the other 20 methods previously assessed in CASF 2013. The method is available at http://team.inria.fr/nano-d/software/Convex-PL/.
Similar content being viewed by others
References
Böhm HJ (1994) The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J Comput-Aided Mol Des 8(3):243–256
Böhm HJ (1998) Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3d database search programs. J Comput-Aided Mol Des 12(4):309–309
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) Charmm: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4(2):187–217
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) Blast+: architecture and applications. BMC Bioinf 10(1):421
Carlson HA, Smith RD, Damm-Ganamet KL, Stuckey JA, Ahmed A, Convery MA, Somers DO, Kranz M, Elkins PA, Cui G, Lambert MH, Dunbar JB Jr (2016) CSAR 2014: a benchmark exercise using unpublished data from pharma. J Chem Inf Model 56:1063–1077
Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ (2005) The amber biomolecular simulation programs. J Comput Chem 26(16):1668–1688
Chae MH, Krull F, Lorenzen S, Knapp EW (2010) Predicting protein complex geometries with a neural network. Proteins 78(4):1026–1039
Chaskar P, Zoete V, Röhrig UF (2014) Toward on-the-fly quantum mechanical/molecular mechanical (qm/mm) docking: development and benchmark of a scoring function. J Chem Inf Model 54(11):3137–3152
Cheng T, Li X, Li Y, Liu Z, Wang R (2009) Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model 49(4):1079–1093
Chuang GY, Kozakov D, Brenke R, Comeau SR, Vajda S (2008) Dars (decoys as the reference state) potentials for protein-protein docking. Biophys J 95(9):4217–4227
Corbeil CR, Williams CI, Labute P (2012) Variability in docking success rates due to dataset preparation. J Comput-Aided Mol Des 26(6):775–786
Cross JB, Thompson DC, Rai BK, Baber JC, Fan KY, Hu Y, Humblet C (2009) Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J Chem Inf Model 49(6):1455–1474
Damm-Ganamet KL, Smith RD, Dunbar JB Jr, Stuckey JA, Carlson HA (2013) CSAR benchmark exercise 2011–2012: evaluation of results from docking and relative ranking of blinded congeneric series. J Chem Inf Model 53(8):1853–1870
Eldridge MD, Murray CW, Auton TR, Paolini GV, Mee RP (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput-Aided Mol Des 11(5):425–445
Ewing TJ, Makino S, Skillman AG, Kuntz ID (2001) Dock 4.0: search strategies for automated molecular docking of flexible molecule databases. J Comput-Aided Mol Des 15(5):411–428
Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS (2004) Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47(7):1739–1749
Gabel J, Desaphy J, Rognan D (2014) Beware of machine learning-based scoring functions–on the danger of developing black boxes. J Chem Inf Model 54(10):2807–2815
Gathiaka S, Liu S, Chiu M, Yang H, Stuckey JA, Kang YN, Delproposto J, Kubish G, Dunbar JB, Carlson HA et al (2016) D3R grand challenge 2015: evaluation of protein-ligand pose and affinity predictions. J Comput-Aided Mol Des 30(9):651–668
Gehlhaar DK, Verkhivker GM, Rejto PA, Sherman CJ, Fogel DR, Fogel LJ, Freer ST (1995) Molecular recognition of the inhibitor ag-1343 by hiv-1 protease: conformationally flexible docking by evolutionary programming. Chem Biol (Oxford, UK) 2(5):317–324
Gohlke H, Hendlich M, Klebe G (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 295(2):337–356
Goto J, Kataoka R, Muta H, Hirayama N (2008) Asedock-docking based on alpha spheres and excluded volumes. J Chem Inf Model 48(3):583–590
Grudinin S, Kadukova M, Eisenbarth A, Marillet S, Cazals F (2016) Predicting binding affinities for protein - ligand complexes in the 2015 d3r grand challenge using a physical model with a ridge regression parameter estimation. J Comput-Aided Mol Des 30:791–804
Grudinin S, Popov P, Neveu E, Cheremovskiy G (2016) Predicting binding poses and affinities in the CSAR 2013–2014 docking exercises using the knowledge-based convex-pl potential. J Chem Inf Model 56(6):1053–1062
Hess B, Kutzner C, Van Der Spoel D, Lindahl E (2008) Gromacs 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput 4(3):435–447
Homeyer N, Gohlke H (2013) FEW: a workflow tool for free energy calculations of ligand binding. J Comput Chem 34(11):965–973
Hsieh JH, Yin S, Liu S, Sedykh A, Dokholyan NV, Tropsha A (2011) Combined application of cheminformatics and physical force field-based scoring functions improves binding affinity prediction for CSAR data sets. J Chem Inf Model 51(9):2027–2035
Huang SY, Zou X (2008) An iterative knowledge-based scoring function for protein–protein recognition. Proteins 72(2):557–579
Huang SY, Zou X (2010) Inclusion of solvation and entropy in the knowledge-based scoring function for protein-ligand interactions. J Chem Inf Model 50(2):262–273
Huang SY, Zou X (2010) Mean-force scoring functions for protein-ligand binding. Annu Rep Comput Chem 6:280–296
Huang SY, Zou X (2011) Scoring and lessons learned with the CSAR benchmark using an improved iterative knowledge-based scoring function. J Chem Inf Model 51(9):2097–2106
Jain AN (1996) Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities. J Comput-Aided Mol Des 10(5):427–440
Jorgensen WL, Maxwell DS, Tirado-Rives J (1996) Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc 45:11225–11226
Kadukova M, Grudinin S (2016) Knodle: a support vector machines-based automatic perception of organic molecules from 3d coordinates. J Chem Inf Model 56(8):1410–1419
Kadukova M, Grudinin S (2017) Docking of small molecules to farnesoid X receptors using AutoDock Vina with the Convex-PL potential: lessons learned from D3R Grand Challenge 2. J Comput Aided Mol Des. doi:10.1007/s10822-017-0062-1
Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L, Bourne PE (2011) A machine learning-based method to improve docking scoring functions and its application to drug repurposing. J Chem Inf Model 51(2):408–419
Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 3(11):935–949
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAS 2:1137–1145
Korb O, Stutzle T, Exner TE (2009) Empirical scoring functions for advanced protein-ligand docking with plants. J Chem Inf Model 49(1):84–96
Krammer A, Kirchhoff PD, Jiang X, Venkatachalam C, Waldman M (2005) Ligscore: a novel scoring function for predicting binding affinities. J Mol Graphics Modell 23(5):395–407
Kuhn B, Gerber P, Schulz-Gasch T, Stahl M (2005) Validation and use of the mm-pbsa approach for drug discovery. J Med Chem 48(12):4040–4048
Labute P (2008) The generalized born/volume integral implicit solvent model: estimation of the free energy of hydration using london dispersion instead of atomic surface area. J Comput Chem 29(10):1693–1698
Lee Y, Mangasarian O (2001) RSVM: Reduced support vector machines. In: Proceedings of the First SIAM International Conference on Data Mining, pp. 00–07
Lensink MF, Velankar S, Kryshtafovych A, Huang SY, Schneidman-Duhovny D, Sali A, Segura J, Fernandez-Fuentes N, Viswanath S, Elber R, Grudinin S, Popov P, Neveu E, Lee H, Baek M, Park S, Heo L, Rie Lee G, Seok C, Qin S, Zhou H, Ritchie DW, Maigret B, Devignes MD, Ghoorah AW, Torchala M, Chaleil RA, Bates PA, Ben-Zeev E, Eisenstein M, Negi S, Weng Z, Vreven T, Pierce BG, Borrman TM, Yu J, Ochsenbein F, Guerois R, Vangone A, Rodrigues JP, van Zundert G, Nellen M, Xue L, Karaca E, Melquiond A, Visscher K, Kastritis PL, Bonvin AMJJ, Xu X, Qiu L, Yan C, Li J, Ma Z, Cheng J, Zou X, Shen Y, Peterson L, Kim H, Roy A, Han X, Esquivel-Rodriguez J, Kihara D, Yu X, Bruce N, Fuller J, Wade R, Anishchenko I, Kundrotas PJ, Vakser IA Vakser, Imai K, Yamada k, Oda T, Nakamura T, Tomii k, Pallara C, Romero-Durana M, Jimenez-Garcia B, Moal IH, Fernandez-Recio J, Young Joung J, Kim JY, Joo k, Lee J, Kozakov D, Vajda S, Chermak E, CavalloL, Oliva R, Tovchigrechko A, Wodak S (2016) Prediction of homo- and hetero-protein complexes by ab-initio and template-based docking: a CASP-CAPRI experiment. Proteins 84:323–348
Li GB, Yang LL, Wang WJ, Li LL, Yang SY (2013) Id-score: a new empirical scoring function based on a comprehensive set of descriptors related to protein-ligand interactions. J Chem Inf Model 53(3):592–600
Li Y, Han L, Liu Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results. J Chem Inf Model 54(6):1717–1736
Li Y, Liu Z, Li J, Han L, Liu J, Zhao Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set. J Chem Inf Model 54(6):1700–1716
Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55(3):475–482
Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55(3):475–482
Maiorov VN, Grippen GM (1992) Contact potential that recognizes the correct folding of globular proteins. J Mol Biol 227(3):876–888
McInnes C (2007) Virtual screening strategies in drug discovery. Curr Opin Chem Biol 11(5):494–502
Mooij W, Verdonk ML (2005) General and targeted statistical potentials for protein-ligand interactions. Proteins 61(2):272–287
Muegge I (2000) A knowledge-based scoring function for protein-ligand interactions: Probing the reference state. In: Virtual screening: an alternative or complement to high throughput screening?, Springer, Berlin pp 99–114
Muegge I (2001) Effect of ligand volume correction on pmf scoring. J Comput Chem 22(4):418–425
Muegge I, Martin YC (1999) A general and fast scoring function for protein-ligand interactions: a simplified potential approach. J Med Chem 42(5):791–804
Neudert G, Klebe G (2011) Dsx: a knowledge-based scoring function for the assessment of protein-ligand complexes. J Chem Inf Model 51(10):2731–2745
Neudert G, Klebe G (2011) fconv: format conversion, manipulation and feature computation of molecular data. Bioinformatics 27(7):1021–1022
Neveu E, Ritchie DW, Popov P, Grudinin S (2016) Pepsi-dock: a detailed data-driven protein-protein interaction potential accelerated by polar fourier correlation. Bioinformatics 32(17):i693–i701
Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Neural networks for signal processing [1997] VII. Proceedings of the 1997 IEEE Workshop, pp 276–285
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods. MIT Press, Cambridge, MA
Popov P, Grudinin S (2014) Rapid determination of RMSDs corresponding to macromolecular rigid body motions. J Comput Chem 35(12):950–956
Popov P, Grudinin S (2015) Knowledge of native protein-protein interfaces is sufficient to construct predictive models for the selection of binding candidates. J Chem Inf Model 55(10):2242–2255
Qiu J, Elber R (2005) Atomically detailed potentials to recognize native and approximate protein structures. Proteins 61(1):44–55
Quiroga R, Villarreal MA (2016) Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening. PLoS ONE 11(5):e0155183
Samudrala R, Moult J (1998) An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 275(5):895–916
BIOVIA DS, Discovery Studio Modeling Environment BIOVIA, (2016) Dassault Systemes, Realease 2017
Shen My, Sali A (2006) Statistical potential for assessment and prediction of protein structures. Protein Sci. 15(11):2507–2524
Smith RD, Dunbar j Jr, Ung PM, Esposito EX, Yang CY, Wang S, Carlson HA (2011) CSAR benchmark exercise of 2010: Combined evaluation across all submitted scoring functions. J Chem Inf Model 51:2115–2131
Sotriffer C (2012) Scoring functions for protein-ligand interactions. Protein-ligand interactions, First Edition pp 237–263 Wiley: Weinham
Sotriffer C, Matter H (2011) Virtual screening: principles, challenges, and practical guidelines, chap 7. Wiley, Weinham
Tobi D, Bahar I (2006) Optimal design of protein docking potentials: efficiency and limitations. Proteins 62(4):970–981
Trott O, Olson AJ (2010) Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31(2):455–461
Vapnik V (1979) Estimation of dependences based on empirical data. Nauka, Moscow
Vapnik V (2000) The nature of statistical learning theory. Springer, New York
Wang C, Zhang Y (2017) Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 38(3):169–177
Wang L, Berne B, Friesner RA (2012) On achieving high accuracy and reliability in the calculation of relative protein-ligand binding affinities. Proc Natl Acad Sci USA 109(6):1937–1942
Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, Romero DL, Masse C, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E, Sherman W, Brewer M, Wester R, Murcko M, Frye L, Farid R, Lin T, Mobley DL, Jorgensen WL, Berne BJ, Friesner RA, Abel R (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J Am Chem Soc 137(7):2695–2703
Wang R, Fang X, Lu Y, Wang S (2004) The PDB bind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J Med Chem 47(12):2977–2980
Wang R, Fang X, Lu Y, Yang CY, Wang S (2005) The PDB bind database: methodologies and updates. J Med Chem 48(12):4111–4119
Wang R, Lai L, Wang S (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput-Aided Mol Des 16(1):11–26
Wang R, Lu Y, Wang S (2003) Comparative evaluation of 11 scoring functions for molecular docking. J Med Chem 46(12):2287–2303
Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH, Lindvall M, Nevins N, Semus SF, Senger S, Tedesco G, Wall ID, Woolven JM, Peishoff CE, Head MS (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49(20):5912–5931
Yan Z, Wang J (2016) Incorporating specificity into optimization: evaluation of spa using CSAR 2014 and casf 2013 benchmarks. J Comput-Aided Mol Des 30(3):219–227
Zheng Z, Merz KM (2013) Development of the knowledge-based and empirical combined scoring algorithm (kecsa) to score protein-ligand interactions. J Chem Inf Model 53(5):1073–1083
Zhou H, Skolnick J (2011) Goap: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys J 101(8):2043–2052
Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11(11):2714–2726
Zilian D, Sotriffer CA (2013) Sfcscorerf: a random forest-based scoring function for improved affinity prediction of protein-ligand complexes. J Chem Inf Model 53(8):1923–1933
Acknowledgements
The authors thank Georgy Cheremovskiy from Moscow Institute of Physics and Technology for the initial development of the potential, and Georgy Derevyanko from Concordia University who proposed the initial formulation of the optimization problem. The authors also thank Valentin Gordeliy from IBS Grenoble, and Vladimir Chupin and Petr Popov from MIPT Moscow for fruitful discussions during this work. This work was partially supported by RSF research Project 14-14-00995.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kadukova, M., Grudinin, S. Convex-PL: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization. J Comput Aided Mol Des 31, 943–958 (2017). https://doi.org/10.1007/s10822-017-0068-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-017-0068-8