Abstract
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed.
Similar content being viewed by others
References
Böhm H-J, Schneider G (eds) (2000) Virtual screening for bioactive molecules, Wiley-VCH, New York
Klebe G (ed) (2000) Virtual screening: an alternative or complement to high throughput screening, Kluwer, Dordrecht
Bajorath J (2002) Nature Rev Drug Discov 1:882
Delaney J, Clarke E, Hughes D, Rice M (2006) Drug Discov Today 11:839
Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Nature Rev Drug Discov 3:935
Leach AR, Shoichet BK, Peishoff CE (2006) J Med Chem 49:5851
Schneider G, Fechner U (2005) Nature Rev Drug Discov 4:649
Berman HM, Battistuz T, Bhat TN, Blum WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) Acta Cryst D 58:899
Congreve M, Murray CW, Blundell TL (2005) Drug Discov Today 10:895
Willett P (1987) Similarity and clustering in chemical information systems. Research Studies Press, Letchworth
Sheridan RP, Kearsley SK (2002) Drug Discov Today 7:903
Bender A, Glen RC (2004) Org Biomol Chem 2:3204
Martin YC, In: Martin YC, Willett P (eds) (1998) Designing bioactive molecules: three-dimensional techniques and applications. American Chemical Society, Washington, pp 121–148
Güner O (ed) (2000) Pharmacophore perception, development and use in drug design. International University Line, La Jolla CA
Martin YC (1992) J Med Chem 35:2145
Willett P (1995) J Mol Recognit 8:290
Mitchell TM (1997) Machine learning. McGraw-Hill, New York, NY
Cramer RD, Redl G, Berkoff CE (1974) J Med Chem 17:533
Redl G, Cramer RD, Berkoff CE (1974) Chem Soc Rev 3:273
Duda RO, Hart PE, Stork DG (2000) Pattern classification. 2nd ed., Wiley Interscience, New York
Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press Cambridge MA
Wagener M, van Geerestein VJ (2000) J Chem Inf Comput Sci 40:280
Saeh JC, Lyne PD, Takasaki BK, Cosgrove DA (2005) J Chem Inf Model 45:1122
Hawkins DM, Young SS, Rusinko A (1997) Quant Struct-Active Relat 16:296
Harper G, Bradshaw J, Gittins JC, Green DVS, Leach AR (2001) J Chem Inf Comput Sci 41:1295
Johnson MA, Maggiora GM (eds) (1990) Concepts and applications of molecular similarity. John Wiley, New York
Kubinyi H (1998) Perspect Drug Discov Design 9–11:225
Stahura FL, Bajorath J (2002) Drug Discov Today 7:S41
Carhart RE, Smith DH, Venkataraghavan R (1985) J Chem Inf Comput Sci 25:64
Willett P, Winterman V, Bawden D (1986) J Chem Inf Comput Sci 26:36
Willett P, Barnard JM, Downs GM (1998) J Chem Inf Comput Sci 38:983
Nikolova N, Jaworska J (2003) QSAR Combin Sci 22:1006
Brown RD, Martin YC (1996) J Chem Inf Comput Sci 36:572
Brown RD, Martin YC (1997) J Chem Inf Comput Sci 37:1
Martin YC, Kofron JL, Traphagen LM (2002) J Med Chem 45:4350
Parzen E (1962) Ann Math Stat 33:1065
Christianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge Universtity Press, Cambridge
Wilton D, Willett P, Lawson K, Mullier G (2003) J Chem Inf Comput Sci 43:469
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) J Chem Inf Comput Sci 44:1177
Wilton DJ, Harrison RF, Willett P, Delaney J, Lawson K, Mullier G (2006) J Chem Inf Model 46:471
Chen B, Harrison RF, Pasupa K, Wilton DJ, Willett P, Wood DJ, Lewell XQ (2006) J Chem Inf Model 46:478
Aitchison J, Aitken CGG (1976) Biometrika 63:413
Constans P, Hirst JD (2000) J Chem Inf Comput Sci 40:452
McNeany TJ, Hirst JD (2005) J Chem Inf Comput Sci 45:768
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2006) J Chem Inf Comput Sci 46:462
Clark DE (2003) Drug Discov Today 8:927
Hodes L, Hazard GF, Geran RI, Richman S (1977) J Med Chem 20:469
Hodes L (1981) J Chem Inf Comput Sci 21:132
Hodes L (1981) J Chem Inf Comput Sci 21:128
Ormerod A, Willett P, Bawden D (1989) Quant Struct-Active Relat 8:115
Ormerod A, Willett P, Bawden D (1990) Quant Struct-Active Relat 9:302
Cosgrove DA, Willett P (1998) J Mol Graph Model 16:19
Anzali S, Barnickel G, Cezanne B, Krug M, Filimonov D, Poroikov V (2001) J Chem Inf Comput Sci 44:2432
Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:170
Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:1708
Glick M, Klon AE, Acklin P, Davies JW (2004) J Biomol Screen 9:32
Klon AE, Glick M, Davies JW (2004) J Med Chem 47:4356
Xia XY, Maliski EG, Gallant P, Rogers D (2004) J Med Chem 47:4463
Rogers D, Brown RD, Hahn M (2005) J Biomol Screen 10:682
Glick M, Jenkins JL, Nettles JH, Hitchings H, Davies JW (2006) J Chem Inf Model 46:193
Capelli AM, Feriani A, Tedesco G, Pozzan A (2006) J Chem Inf Model 46:659
Eckert H, Bajorath J (2006) J Med Chem 49:2284
Domingos P, Pazzani M (1997) Machine Learn 29:103
Hand DJ, Yu K (2001) Int Stat Rev 69:385
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) Org Biomol Chem 2:3256
Whittle M, Gillet VJ, Willett P, Alex A, Loesel J (2004) J Chem Inf Comput Sci 44:1840
Zhang Q, Muegge I (2006) J Med Chem 49:1536
Williams C (2006) Mol Divers 10:311
Willett P (2006) QSAR Combin Sci 25:1143
Acknowledgements
We thank the following: the Alexander S. Onassis Public Benefit Foundation, the Engineering and Physical Sciences Research Council and the Novartis Institutes for Biomedical Research for funding George Papadatos; the Biotechnology and Biological Sciences Research Council and GlaxoSmithKline for funding David Wood; MDL Information Systems Inc. for provision of the MDL Drug Data Report database; and the Royal Society, SciTegic Inc., Tripos Inc. and the Wolfson Foundation for hardware, laboratory and software support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, B., Harrison, R.F., Papadatos, G. et al. Evaluation of machine-learning methods for ligand-based virtual screening. J Comput Aided Mol Des 21, 53–62 (2007). https://doi.org/10.1007/s10822-006-9096-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-006-9096-5