skip to main content
research-article

Chemoinformatics—an introduction for computer scientists

Published:23 February 2009Publication History
Skip Abstract Section

Abstract

Chemoinformatics is an interface science aimed primarily at discovering novel chemical entities that will ultimately result in the development of novel treatments for unmet medical needs, although these same methods are also applied in other fields that ultimately design new molecules. The field combines expertise from, among others, chemistry, biology, physics, biochemistry, statistics, mathematics, and computer science. In this general review of chemoinformatics the emphasis is placed on describing the general methods that are routinely applied in molecular discovery and in a context that provides for an easily accessible article for computer scientists as well as scientists from other numerate disciplines.

References

  1. Adam, D. 2002. Chemists synthesize a single naming system. Nature 417, 369.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bajorath, J., Ed. 2004. Chemoinformatics: Concepts, Methods and Tools for Drug Discovery. Humana Press, Totowa, NJ.Google ScholarGoogle Scholar
  3. Balaban, A. T. 1985. Applications of graph theory in chemistry. J. Chem. Inf. Comput. Sci. 25, 334--343.Google ScholarGoogle ScholarCross RefCross Ref
  4. Barnard, J. M. and Downs, G. M. 1992. Clustering of chemical structures on the basis of two-dimensional similarity measures. J. Chem. Inf. Comput. Sci. 32, 644--649.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bauerschmidt, S. and Gasteiger, J. 1997. Overcoming the limitations of a connection table description: A universal representation of chemical species. J. Chem. Inf. Comput. Sci. 37, 705--714.Google ScholarGoogle ScholarCross RefCross Ref
  6. Bemis, G. W. and Murcko, M. A. 1996. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887--2893.Google ScholarGoogle ScholarCross RefCross Ref
  7. Bender, A. and Glen, R. C. 2004. Molecular similarity: A key technique in molecular informatics. Org. Biomol. Chem. 2, 3204--3218.Google ScholarGoogle ScholarCross RefCross Ref
  8. Böhm, H.-J., Flohr, A., and Stahl, M. 2004. Scaffold hopping. Drug Discov. Today: Tech. 1, 217--224.Google ScholarGoogle ScholarCross RefCross Ref
  9. Brooijmans, N. and Kuntz, I. D. 2003. Molecular recognition and docking algorithms. Ann. Rev. Biophys. Biomol. Struct. 32, 335--373.Google ScholarGoogle ScholarCross RefCross Ref
  10. Brown, F. K. 1998. Chemoinformatics: What is it and how does it impact drug discovery? Ann. Rep. Med. Chem. 33, 375--384.Google ScholarGoogle ScholarCross RefCross Ref
  11. Brown, N. and Jacoby, E. 2006. On scaffolds and hopping in medicinal chemistry. Mini Rev. Med. Chem. 6, 1217--1229.Google ScholarGoogle ScholarCross RefCross Ref
  12. Brown, N. and Lewis, R. A. 2006. Exploiting QSAR methods in lead optimization. Curr. Opin. Drug Discov. Devel. 9, 419--424.Google ScholarGoogle Scholar
  13. Brown, N., McKay, B., and Gasteiger, J. 2005. Fingal: A novel approach to geometric fingerprinting and a comparative study of its application to 3D QSAR modelling. QSAR Comb. Sci. 24, 480--484.Google ScholarGoogle ScholarCross RefCross Ref
  14. Brown, N., McKay, B., Gilardoni, F., and Gasteiger, J. 2004. A graph-based genetic algorithm and its application to the multiobjective evolution of median molecules. J. Chem. Inf. Comput. Sci. 44, 1079--1087.Google ScholarGoogle ScholarCross RefCross Ref
  15. Brown, R. D. and Martin, Y. C. 1997. The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. J. Chem. Inf. Comput. Sci. 37, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  16. Cechetto, J. D., Elowe, N. H., Blanchard, J. E., and Brown, E. D. 2004. High-throughput screening at McMaster University: Automation in academe. J. Assoc. Lab. Auto. 9, 307--311.Google ScholarGoogle ScholarCross RefCross Ref
  17. Cohen, J. 2004. Bioinformatics—an introduction for computer scientists. ACM Comput. Surv. 36, 122--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Coles, S. J., Day, N. E., Murray-Rust, P., Rzepa, H. S., and Zhang, Y. 2005. Enhancement of the chemical semantic web through the use of InChI identifiers. Org. Biomol. Chem. 3, 1832--1834.Google ScholarGoogle ScholarCross RefCross Ref
  19. Corey, E. J. and Cheng, X.-M. 1995. The Logic of Chemical Synthesis. Wiley, New York, NY.Google ScholarGoogle Scholar
  20. Cramer, R. D., III., Patterson, D. E., and Bunce, J. D. 1988. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carried proteins. J. Amer. Chem. Soc. 110, 5959--5967.Google ScholarGoogle ScholarCross RefCross Ref
  21. Crum Brown, A. 1864. On the theory of isomeric compounds. Trans. Roy. Soc. Edinb. 23, 707--719.Google ScholarGoogle ScholarCross RefCross Ref
  22. Crum Brown, A. and Fraser, T. R. 1869. V.—On the connection between chemical constitution and physiological action. Part. I.—On the physiological action of the salts of the ammonium bases, derived from strychnia, brucia, thebaia, codeia, morphia, and nicotia. Trans. Roy. Soc. Edinb. 25, 151--203.Google ScholarGoogle ScholarCross RefCross Ref
  23. Diestel, R. 2000. Graph Theory, 2nd Ed. Springer-Verlag, New York, NY.Google ScholarGoogle Scholar
  24. Dimasi, J. A., Hansen, R. W., and Grabowski, H. G. 2003. The price of innovation: New estimates of drug development costs. J. Health Econ. 22, 151--185.Google ScholarGoogle ScholarCross RefCross Ref
  25. Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. 2002. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273--1280.Google ScholarGoogle ScholarCross RefCross Ref
  26. Eriksson, L., Arnhold, T., Beck, B., Fox, T., Johansson, E., and Kriegl, J. M. 2004. Onion design and its application to a pharmaceutical QSAR problem. J. Chemomet. 18, 188--202.Google ScholarGoogle ScholarCross RefCross Ref
  27. Eriksson, L., Jaworska, J., Worth, A. P., Cronin, M. T. D., and McDowell, R. M. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ. Health Perspect. 111, 1361--1375.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ertl, P. 2007. In silico identification of bioisosteric functional groups. Curr. Opin. Drug Discov. Devel. 10, 281--288.Google ScholarGoogle Scholar
  29. Ferrara, P., Priestle, J. P., Vangrevelinghe, E., and Jacoby, E. 2006. New developments and applications of docking and high-throughput docking for drug design and in silico screening. Curr. Comp.-Aided Drug Des. 2, 83--91.Google ScholarGoogle ScholarCross RefCross Ref
  30. Fujita, T., Iwasa, J., and Hansch, C. 1964. A new substituent constant, π, derived from partition coefficients. J. Amer. Chem. Soc. 86, 5175--5180.Google ScholarGoogle ScholarCross RefCross Ref
  31. Gasteiger, J., Ed. 2003. The Handbook of Chemoinformatics. Wiley-VCH, Weinheim, Germany.Google ScholarGoogle Scholar
  32. Gasteiger, J. and Engel, T., Eds. 2003. Chemoinformatics: A Textbook. Wiley-VCH, Weinheim, Germany.Google ScholarGoogle Scholar
  33. Gasteiger, J., Pförtner, M., Sitzmann, M., Höllering, R., Sacher, O., Kostka, T., and Karg, N. 2000. Computer-assisted synthesis and reaction planning in combinatorial chemistry. Persp. Drug Discov. Des. 20, 1--21.Google ScholarGoogle ScholarCross RefCross Ref
  34. Gasteiger, J., Rudolph, C., and Sadowski, J. 1990. Automatic generation of 3D atomic coordinates for organic molecules. Tetrahed. Comput. Methodol. 3, 537--547.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ghose, A. K. and Crippen, G. M. 1986. Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure-activity relationships. I. Partition coefficients as a measure of hydrophobicity. J. Comp. Chem. 7, 565--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gillet, V. J., Willett, P., Bradshaw, J., and Green, D. V. S. 1999. Selecting combinatorial libraries to optimize diversity and physical properties. J. Chem. Inf. Comput. Sci. 39, 169--177.Google ScholarGoogle ScholarCross RefCross Ref
  37. Goldberg, K., Newman, M., and Haynsworth, E. 1972. Combinatorial Analysis. In Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables, 10th ed. Abramowitz, M., Stegun, I. A. Eds. U.S. Government Printing Office: Washington, DC, 824--825.Google ScholarGoogle Scholar
  38. Gorse, A.-D. 2006. Diversity in medicinal chemistry space. Curr. Top. Med. Chem. 6, 3--18.Google ScholarGoogle ScholarCross RefCross Ref
  39. Gund, P. 1979. Pharmacophoric pattern searching and receptor mapping. Ann. Rep. Med. Chem. 14, 299--308.Google ScholarGoogle Scholar
  40. Güner, O. F. 2005. The impact of pharmacophore modeling in drug design. IDrugs 8, 567--572.Google ScholarGoogle Scholar
  41. Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, NY.Google ScholarGoogle Scholar
  42. Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. 2004. Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44, 1177--1185.Google ScholarGoogle ScholarCross RefCross Ref
  43. Johnson, M. A. and Maggiora, G. M. Eds. 1990. Concepts and Applications of Molecular Similarity. Wiley Inter-Science, New York, NY.Google ScholarGoogle Scholar
  44. Jones, G., Willett, P., Glen, R. C., Leach, A. R., and Taylor, R. 1997. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727--748.Google ScholarGoogle Scholar
  45. Karelson, M. 2000. Molecular Descriptors in QSAR/QSPR. Wiley-VCH, Weinheim, Germany.Google ScholarGoogle Scholar
  46. Kitchen, D. B., Decornez, H., Furr, J. R., and Bajorath, J. 2004. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nature Rev. Drug Discov. 3, 935--949.Google ScholarGoogle ScholarCross RefCross Ref
  47. Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R., and Ferrin, T. E. 1982. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269--288.Google ScholarGoogle ScholarCross RefCross Ref
  48. Leach, A. R. 2001. Molecular Modelling: Principles and Applications, 2nd ed. Prentice Hall, Harlow, U.K.Google ScholarGoogle Scholar
  49. Leach, A. R. and Gillet, V. J. 2003. An Introduction to Chemoinformatics. Kluwer Academic Publishers, Dordrecht, The Netherlands. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Lewell, X. Q., Judd, D. B., Watson, S. P., and Hann, M. M. 1998. RECAP—retrosynthetic analysis procedure: A powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J. Chem. Inf. Comput. Sci. 38, 511--522.Google ScholarGoogle ScholarCross RefCross Ref
  51. Lipinski, C. A., Lombardo, F., Dominy, B. W., and Feeney, P. J. 2001. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3--26.Google ScholarGoogle ScholarCross RefCross Ref
  52. Livingstone, D. J. 2000. The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195--209.Google ScholarGoogle ScholarCross RefCross Ref
  53. Lynch, M. F. 2004. Introduction of computers in chemical structure information systems, or what is not recorded in the annals. In The History and Heritage of Scientific and Technological Information Systems: Proceedings of the 2002 Conference, W. B. Rayward and M. E. Bowden, Eds. Information Today, Inc., Medford, NJ, 137--148.Google ScholarGoogle Scholar
  54. Markush, E. A. 1924. Pyrazolone dye and process of making the same. U.S. Patent No. 1,506,316, August 26.Google ScholarGoogle Scholar
  55. Migliavacca, E. 2003. Applied introduction to multivariate methods used in drug discovery. Mini Rev. Med. Chem. 3, 831--843.Google ScholarGoogle ScholarCross RefCross Ref
  56. Morgan, H. L. 1965. The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J. Chem. Doc. 5, 107--113.Google ScholarGoogle ScholarCross RefCross Ref
  57. Nicolaou, C. A., Brown, N., and Pattichis, C. S. 2007. Molecular optimization using multi-objective methods. Curr. Opin. Drug Discov. Devel. 10, 316--324.Google ScholarGoogle Scholar
  58. Oprea, T. Ed. 2005a. Chemoinformatics in Drug Discovery. Wiley-VCH, Weinheim, Germany.Google ScholarGoogle Scholar
  59. Oprea, T. 2005b. Is safe exchange of data possible? Chem. Eng. News 83, 24--29.Google ScholarGoogle Scholar
  60. Pearlman, R. S. 1987. Rapid generation of high quality approximate 3D molecular structures. Chem. Des. Automa. News 2, 5--7.Google ScholarGoogle Scholar
  61. Raevsky, O. A. 2004. Physicochemical descriptors in property-based drug design. Mini Rev. Med. Chem. 4, 1041--1052.Google ScholarGoogle ScholarCross RefCross Ref
  62. Reich, H. J. and Cram, D. J. 1969. Macro rings. XXXVII. Multiple electrophilic substitution reactions of {2,2}paracyclophanes and interconversions of polysubstituted derivatives. J. Am. Chem. Soc. 91, 3527--3533.Google ScholarGoogle ScholarCross RefCross Ref
  63. Rogers, D., Brown, R. D., and Hahn, M. 2005. Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. J. Biomol. Screen. 10, 682--686.Google ScholarGoogle ScholarCross RefCross Ref
  64. Russo, E. 2002. Chemistry plans a structural overhaul. Nature Jobs 419, 4--7.Google ScholarGoogle ScholarCross RefCross Ref
  65. Schneider, G. and Fechner, U. 2005. Computer-based de novo design of drug-like molecules. Nature Rev. Drug Discov. 4, 649--663.Google ScholarGoogle ScholarCross RefCross Ref
  66. Schuffenhauer, A. and Brown, N. 2006. Chemical diversity and biological activity. Drug Discov. Today: Technol. 3, 387--395.Google ScholarGoogle ScholarCross RefCross Ref
  67. Schuffenhauer, A., Brown, N., Selzer, P., Ertl, P., and Jacoby, E. 2006. Relationships between molecular complexity, biological activity, and structural activity. J. Chem. Inf. Mod. 46, 525--535.Google ScholarGoogle ScholarCross RefCross Ref
  68. Schuffenhauer, A., Floersheim, P., Acklin, P., and Jacoby, E. 2003. Similarity metrics for ligands reflecting the similarity of the target proteins. J. Chem. Inf. Comput. Sci. 43, 391--405.Google ScholarGoogle ScholarCross RefCross Ref
  69. Schuffenhauer, A., Brown, N., Ertl, P., Jenkins, J. L., Selzer, P., and Hamon, J. 2007. Clustering and rule-based classifications of chemical structures evaluated in the biological activity space. J. Chem. Inf. Mod. 47, 325--336.Google ScholarGoogle ScholarCross RefCross Ref
  70. Snarey, M., Terrett, N. K., Willett, P., and Wilton, D. J. 1997. Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Mod. 15, 372--385.Google ScholarGoogle ScholarCross RefCross Ref
  71. Todeschini, R. and Consonni, V. 2000. Handbook of Molecular Descriptors. Wiley-VCH, Weinheim, Germany.Google ScholarGoogle Scholar
  72. Weininger, D. 1988. Smiles a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Willett, P. 1991. Three-Dimensional Chemical Structure Handling. Research Studies Press, Balclock, Hertfordshine, U.K. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Willett, P. 2000. Textual and chemical information processing: Different domains but similar algorithms. Inform. Res. 5, http://informationr.net/ir/5-2/paper69.html.Google ScholarGoogle Scholar
  75. Willett, P., Barnard, J. M., and Downs, G. M. 1998. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983--996.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Chemoinformatics—an introduction for computer scientists

                                Recommendations

                                Comments

                                Login options

                                Check if you have access through your login credentials or your institution to get full access on this article.

                                Sign in

                                Full Access

                                • Published in

                                  cover image ACM Computing Surveys
                                  ACM Computing Surveys  Volume 41, Issue 2
                                  February 2009
                                  248 pages
                                  ISSN:0360-0300
                                  EISSN:1557-7341
                                  DOI:10.1145/1459352
                                  Issue’s Table of Contents

                                  Copyright © 2009 ACM

                                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                                  Publisher

                                  Association for Computing Machinery

                                  New York, NY, United States

                                  Publication History

                                  • Published: 23 February 2009
                                  • Accepted: 1 March 2008
                                  • Revised: 1 September 2007
                                  • Received: 1 June 2006
                                  Published in csur Volume 41, Issue 2

                                  Permissions

                                  Request permissions about this article.

                                  Request Permissions

                                  Check for updates

                                  Qualifiers

                                  • research-article
                                  • Research
                                  • Refereed

                                PDF Format

                                View or Download as a PDF file.

                                PDF

                                eReader

                                View online with eReader.

                                eReader