Abstract
Insights on the potential of target proteins to bind small molecules with high affinity can be derived from the knowledge of their three-dimensional structural details especially of their binding pockets. The present study uses high-throughput screening (HTS) results on various targets, to obtain mathematical predictive models in which a minimal set of structural parameters significantly contributing to the hit rates or the affinity of the protein binding pockets for small molecular entities, is identified. An emphasis is given to focus on target variation aspect of the data by consideration of commonly tested compounds against the HTS targets. We identify ‘four-parameter’ models with R 2, \( R_{\text{adj}}^{2} \), SEE, and LOO q 2 values of 0.70, 0.60, 0.27 and 0.50, respectively, or better. We demonstrate through cross-validation exercises that our regression models apply well on varied data sets. Thus we can use these models to estimate hit rates for HTS campaigns and thereby assign priority to drug targets before they undergo such resource intense experimental screening and follow-up.
Similar content being viewed by others
Notes
Tanimoto TT, IBM Internal Report, November 17, 1957.
Abbreviations
- CCT:
-
Common compounds tested
- FDA:
-
Food and drug administration (U.S. Department of Health and Human Services)
- HR:
-
Hit rate
- HTS:
-
High-throughput screening
- IQR:
-
Inter quartile range
- LOO :
-
Leave-one-out
- MWSS:
-
Model without site score
- NME:
-
New/novel molecular entity
- NMR:
-
Nuclear magnetic resonance
- SEE :
-
Standard error of estimate
References
Betz UA (2005) How many genomic targets can a portfolio afford? Drug Discov Today 10(15):1057–1063. doi:10.1016/S1359-6446(05)03498-7
Hopkins AL, Groom CR (2002) The druggable genome. Nat Rev Drug Discov 1(9):727–730. doi:10.1038/nrd892
Drews J (2006) Whats’s in a number? Nat Rev Drug Discov 5:975. doi:10.1038/nrd2205
Zambrowicz BP, Sands AT (2003) Knockouts model the 100 best-selling drugs–will they model the next 100? Nat Rev Drug Discov 2(1):38–51. doi:10.1038/nrd987
Hajduk PJ, Huth JR, Fesik SW (2005) Druggability indices for protein targets derived from NMR-based screening data. J Med Chem 48(7):2518–2525. doi:10.1021/jm049131r
Han LY, Zheng CJ, Xie B, Jia J, Ma XH, Zhu F, Lin HH, Chen X, Chen YZ (2007) Support vector machines approach for predicting druggable proteins: recent progress in its exploration and investigation of its usefulness. Drug Discov Today 12(7–8):304–313. doi:10.1016/j.drudis.2007.02.015
Nayal M, Honig B (2006) On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins 63(4):892–906. doi:10.1002/prot.20897
Cheng AC, Coleman RG, Smyth KT, Cao Q, Soulard P, Caffrey DR, Salzberg AC, Huang ES (2007) Structure-based maximal affinity model redicts small-molecule druggability. Nat Biotechnol 25(1):71–75. doi:10.1038/nbt1273
Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are the there? Nature 5(12):993–996. doi:10.1038/nrd2199
Lipinski CA (2000) Drug-like properties and the causes of poor solubility and poor permeability. J Pharmacol Toxicol Methods 44(1):235–249. doi:10.1016/S1056-8719(00)00107-6
Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 46(1–3):3–26. doi:10.1016/S0169-409X(00)00129-0
Kuntz ID, Chen K, Sharp KA, Kollman PA (1999) The maximal affinity of ligands. Proc Natl Acad Sci USA 96(18):9997–10002. doi:10.1073/pnas.96.18.9997
Bajorath J (2002) Integration of virtual and high-throughput screening. Nat Rev Drug Discov 1(11):882–894. doi:10.1038/nrd941
Davies JW, Glick M, Jenkins JL (2006) Streamlining lead discovery by aligning in silico and high-throughput screening. Curr Opin Chem Biol 10(4):343–351. doi:10.1016/j.cbpa.2006.06.022
Pereira DA, Williams JA (2007) Origin and evolution of high throughput screening. Br J Pharmacol 152(1):53–61. doi:10.1038/sj.bjp.0707373
Pellecchia M, Bertini I, Cowburn D, Dalvit C, Giralt E, Jahnke W, James TL, Homans SW, Kessler H, Luchinat C, Meyer B, Oschkinat H, Peng J, Schwalbe H, Siegal G (2008) Perspectives on NMR in drug discovery: a technique comes of age. Nat Rev Drug Discov 7:738–745. doi:10.1038/nrd2606
Puvanendrampillai D, Mitchell JB (2003) L/D Protein ligand database (PLD): additional understanding of the nature and specificity of protein-ligand complexes. Bioinformatics 19(14):1856–1857. doi:10.1093/bioinformatics/btg243
Kumar MD, Gromiha MM (2006) Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein–ligand complexes. Nucleic Acids Res 34(Database issue):195–198. doi:10.1093/nar/gkj017
Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res 35(Database issue):198–201. doi:10.1093/nar/gkl999
Perola E, Walters WP, Charifson PS (2004) A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance Proteins. Proteins 56(2):235–249. doi:10.1002/prot.20088
Böhm HJ (1998) Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs. J Comput Aided Mol Des 12(4):309–323. doi:10.1023/A:1007999920146
Schrödinger SiteMap Fast, accurate and practical binding site identification. 8.0. (2008) New York, NY, Schrödinger, LLC. 2005. Ref Type: Computer Program
Connolly ML (1993) The molecular surface package. J Mol Graph 11(2):139–141. doi:10.1016/0263-7855(93)87010-3
Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ (2005) The Amber biomolecular simulation programs. J Comput Chem 26(16):1668–1688. doi:10.1002/jcc.20290
Gupta AK, Babu MA, Kaskhedikar SG (2004) VALSTAT : validation program for quantitative structure activity relationship studies. Indian J Pharm Sci 66(4):396–402
Wold S, Eriksson L (1995) Statistical validation of QSAR results. In: van de Waterbeemd H (ed) Chemometrics methods in molecular design. VCH, Weinheim, pp 309–318
Veretnik S, Fink JL, Bourne PE (2008) Computational biology resources lack persistence and usability. PLOS Comput Biol 4(7):e1000136. doi:10.1371/journal.pcbi.1000136
Abad-Zapatero CMJT (2005) Ligand efficiency indices as guideposts for drug discovery. Drug Discov Today 10(7):464–469. doi:10.1016/S1359-6446(05)03386-6
Hopkins AL, Groom CR, Alex A (2004) Ligand efficiency: a useful metric for lead selection. Drug Discov Today 9(10):430–431. doi:10.1016/S1359-6446(04)03069-7
Acknowledgments
We would like to thank Dr. Stefan Schmitt (SS) for offering valuable suggestions during the course of this project. We are also grateful to SS, Drs. Bheemarao Ugarkar, Manoranjan Panda and Raghuram Tangirala for their comments on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gupta, A., Gupta, A.K. & Seshadri, K. Structural models in the assessment of protein druggability based on HTS data. J Comput Aided Mol Des 23, 583–592 (2009). https://doi.org/10.1007/s10822-009-9279-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-009-9279-y