J. Chem. Inf. Comput. Sci., 44 (6), 2061 -2069, 2004. 10.1021/ci040023n S0095-2338(04)00023-X
Web Release Date: October 22, 2004

Copyright © 2004 American Chemical Society

Use of Classification Regression Tree in Predicting Oral Absorption in Humans

Jane P. F. Bai,* Andrey Utis,# Gordon Crippen, Han-Dan He, Volker Fischer, Robert Tullman, He-Qun Yin, Cheng-Pang Hsu, Lan Jiang, and Kin-Kai Hwang

ZyxBio, LLC, P.O. Box 2255, Hudson, Ohio 44236, ZyxBio, LLC, 11,000 Cedar Avenue, Cleveland, Ohio 4106, College of Pharmacy, University of Michigan, Ann Arbor, Michigan, Novartis Pharmaceuticals, East Hanover, New Jersey, Johnson & Johnson Pharmaceutical Research & Development, LLC, and Aventis Pharmaceuticals, Bridgewater, New Jersey

Received March 31, 2004

Abstract:

The purpose of this study is to explore the use of classification regression trees (CART) in predicting, in the dose-independent range, the fraction dose absorbed in humans. Since the results from clinical formulations in humans were used for training the model, a hypothetical state of drug molecules already dissolved in the intestinal fluid was adopted. Therefore, the molecular attributes affecting dissolution were not considered in the model. As a result, the model projects the highest achievable fraction dose absorbed, providing a reference point for manipulating the formulations or solid states to optimize oral clinical efficacy. A set of approximately 1260 structures and their human oral pharmacokinetic data, including bioavailability and/or absorption and/or radio-labeled studies, were used, with 899 compounds as the training set and 362 the test set. The numerical range of the fraction dose absorbed, 0 to 1, was divided into 6 classes with each class having a size of approximately 0.16. A set of 28 structural descriptors was used for modeling oral absorption without considering active transport. Then, a separate branch was created for modeling oral absorption involving active transport. The AAE of the training set was 0.12 and those of five test sets ranged from 0.17 to 0.2. In terms of classification, two test sets of unpublished, proprietary compounds showed 79% to 86% prediction when the predicted values fallen within ± one class of real values were considered predicted. Overall, the computational errors from all the test sets of diverse structures were similar and reasonably acceptable. As compared to artificial membranes for ranking drug absorption potential, prediction by the CART model is considered fast and reasonably accurate for accelerating drug discovery. One can not only improve continuously the accuracy of CART computations by expanding the chemical space of the training set but also calculate the statistical errors associated with individual decision paths resulting from the training set to determine whether to accept individual computations of any test sets.


Download the full text: PDF | HTML