Abstract
Several quantitative models for the prediction of aqueous solubility of organic compounds were developed based on a diverse dataset with 2084 compounds by using multi-linear regression analysis and backpropagation neural networks. The compounds were described by two different structure representation methods: (1) with 18 topological descriptors; and (2) with 32 radial distribution function codes representing the 3D structure of a molecule and eight additional descriptors. The dataset was divided into a training and a test set based on Kohonen's self-organizing neural network. Good prediction results were obtained for backpropagation neural network models: with 18 topological descriptors, for the 936 compounds in the test set, a correlation coefficient of 0.92, and a standard deviation of 0.62 were achieved; with 3D descriptors, for the 866 compounds in the test set, a correlation coefficient of 0.90, and a standard deviation of 0.73 were achieved. The models were also tested by using another dataset, and the relationship of the two datasets was examined by Kohonen's self-organizing neural network.
Abbreviations: BPG – backpropagation; KNN – Kohonen's self-organizing neural network; MLRA – multilinear regression analysis; MMP – mean molecular polarizability; RDF – radial distribution function.
Similar content being viewed by others
References
Lipinski, C.A., Lombardo, F., Dominy, B.W. and Feeney, P.J., Adv. Drug Deliv. Rev., 23(1997) 3.
Jorgensen, W.L. and Duffy, E.M., Adv. Drug Deliv. Rev., 54 (2002) 355.
Gao, H., Shanmugasundaram, V. and Lee, P., Pharmaceut. Res., 19 (2002) 497.
Bodor, N. and Huang, M.J., J. Am. Chem. Soc., 113 (1991) 9480.
Sutter, J.M. and Jurs, P.C., J. Chem. Inf. Comput. Sci., 36 (1996) 100.
Mitchell, B.E. and Jurs, P.C., J. Chem. Inf. Comput. Sci., 38 (1998) 489.
Mcelroy, N.R. and Jurs, P.C., J. Chem. Inf. Comput. Sci., 41 (2001) 1237.
Bruneau, P., J. Chem. Inf. Comput. Sci., 41 (2001) 1605.
Huuskonen, J., J. Chem. Inf. Comput. Sci., 40 (2000) 773.
Tetko, I.V., Tanchuk, V.Y., Kasheva, T.N. and Villa, A.E.P., J. Chem. Inf. Comput. Sci., 41 (2001) 1488.
Liu, R.F. and So, S.S., J. Chem. Inf. Comput. Sci., 41 (2001) 1633.
Yan, A.X. and Gasteiger, J., QSAR Comb. Sci., 22 (2003) 821.
Yan, A.X. and Gasteiger, J., J. Chem. Inf. Comput. Sci., 43 (2003) 429.
Engkvist, O. and Wrede, P., J. Chem. Inf. Comput. Sci., 42 (2002) 1247.
Wegner, J.K. and Zell, A., J. Chem. Inf. Comput. Sci., 43 (2003) 1077.
Peterson, D.L. and Yalkowsky, S.H., J. Chem. Inf. Comput. Sci., 41 (2001) 1531.
Ran, Y.Q., Jain, N. and Yalkowsky, S.H., J. Chem. Inf. Comput. Sci., 41 (2001) 1208.
Yang, G., Ran, Y.Q. and Yalkowsky, S.H., J. Pharm. Sci., 91 (2002) 517.
Kuhne, R., Ebert, R.-U., Kleint, F., Schmidt, G. and Schuurmann, G., Chemosphere, 30 (1995) 2061.
Klopman, G. and Zhu, H., J. Chem. Inf. Comput. Sci., 41 (2001) 439.
Hemmer, M.C., Steinhauer, V. and Gasteiger, J., Vibrat. Spectrosc., 19 (1999) 151.
Hemmer, M.C. and Gasteiger, J., Anal. Chim. Acta, 420 (2000) 145.
Zupan, J. and Gasteiger, J., Neural Networks in Chemistry and Drug Design, Second edn. Wiley-VCH, Weinheim, Germany, 1999.
Yalkowsky, S.H. and Dannefelser, R.M., The ARIZONA dATAbASE of Aqueous Solubility. College of Pharmacy, University of Arizona, Tucson, AZ, 1990.
Syracuse Research Corporation. Physical/Chemical Property Database (PHYSPROP), SRC Environmental Science Center, Syracuse, NY, 1994.
Gasteiger, J. and Marsili, M., Tetrahedron, 36 (1980) 3219.
Gasteiger, J. and Saller, H., Angew. Chem. Int. Ed. Engl., 24 (1985) 687.
Gasteiger J., Empirical methods for the calculation of physicochemical data of organic compounds. In: Jochum, C., Hicks, M.G. and Sunkel, J. (Eds.), Physical Property Prediction in Organic Compounds. Springer Verlag, Heidelberg, Germany, 1988, pp. 119–138.
PETRA can also be accessed on the web: http://www2.chemie.uni-erlangen.de/software/petra/index.html, see also http://www.mol-net.de
Ghose, A.K. and Crippen, G.M., J. Comput. Chem., 7 (1986) 565.
Ghose, A.K. and Crippen, G.M., J. Chem. Inf. Comput. Sci., 27 (1987) 21.
Ghose, A.K., Pritchett, A. and Crippen, G.M., J. Comput. Chem., 9 (1988) 80.
Viswanadhan, V.N., Ghose, A.K., Revankar, G.R. and Robins, R.K., J. Chem. Inf. Comput. Sci., 29 (1989) 163.
Wagener, M., Sadowski, J. and Gasteiger, J., J. Am. Chem. Soc., 117 (1995) 7769.
Gasteiger, J. and Hutchings, M.G., J. Chem. Soc. Perkin 2, (1984) 559.
Miller K.J., J. Am. Chem. Soc., 112 (1990) 8533.
Sadowski, J. and Gasteiger J., Chem. Rev., 93 (1993) 2567. http://www2.chemie.uni-erlangen.de/software/corina/ index.html
Harrison, R.W., J. Math. Chem., 26 (1999) 125.
Aguilera, P.A., Frenich, A.G., Torres, J.A., Castro, H., Vidal, J.L.M. and Canton M., Water Res., 35 (2001) 4053.
Brodnjak-Voncina, D., Dobcnik, D., Novic, M. and Zupan, J., Anal. Chim. Acta, 462 (2002) 87.
Anzali, S., Mederski, W.W.K.R., Osswald, M. and Dorsch, D., Bioorg. Med. Chem. Lett., 8 (1998) 11.
Simon, V., Gasteiger, J. and Zupan, J., J. Am. Chem. Soc., 115 (1993) 9148.
Terfloth, L. and Gasteiger, J., Screening-Trends Drug Discov., 2 (2001) 49. http://www2.chemie.uni-erlangen.de/software/ kmap/ and http://www.mol-net.de
SPSS v. 10.0, SPSS Inc., Chicago, IL. http://www.spss.com
SNNS: Stuttgart Neural Network Simulator, Version 4.2, developed at University of Stuttgart, maintained at University of Tübingen, 1995. http://www-ra.informatik.unituebingen.de/SNNS/
Tetko, I.V., Livingstone, D.J. and Luik, A.I., J. Chem. Inf. Comput. Sci., 35 (1995) 826.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Yan, A., Gasteiger, J., Krug, M. et al. Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods. J Comput Aided Mol Des 18, 75–87 (2004). https://doi.org/10.1023/B:jcam.0000030031.81235.05
Issue Date:
DOI: https://doi.org/10.1023/B:jcam.0000030031.81235.05