Abstract
The knowledge of whether one enzyme can interact with a small molecule is essential for understanding the molecular and cellular functions of organisms. In this paper, we introduce a classifier to predict the small molecule– enzyme interaction, i.e., whether they can interact with each other. Small molecules are represented by their chemical functional groups, and enzymes are represented by their biochemical and physicochemical properties, resulting in a total of 160 features. These features are input into the AdaBoost classifier, which is known to have good generalization ability to predict interaction. As a result, the overall prediction accuracy, tested by tenfold cross-validation and independent sets, is 81.76% and 83.35%, respectively, suggesting that this strategy is effective. In this research, we typically choose interactions between small molecules and enzymes involved in metabolism to ultimately improve further understanding of metabolic pathways. An online predictor developed by this research is available at http://chemdata.shu.edu.cn/small_m.
Similar content being viewed by others
References
Metzler DE (1977) Biochemistry: the chemical reactions of living cells. Academic, London
Marchand-Geneste N, Watson KA, Alsberg BK, King RD (2002) New approach to pharmacophore mapping and QSAR analysis using inductive logic programming application to thermolysin inhibitors and glycogen phosphorylase b inhibitors. J Med Chem 45(2): 399–409
Caspi R, Foerster H, Fulcher CA et al (2006) MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res 34: D511–D516
Wishart DS, Tzur D, Knox C et al (2007) HMDB: the human metabolome database. Nucleic Acids Res 35: D521–D526
Brooksbank C, Cameron G, Thornton J (2005) The European bioinformatics institute’s data resources: towards systems biology. Nucleic Acids Res 33: 46–53
Wheeler DL, Barrett T, Benson DA et al (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35: 5–12
Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P (2008) STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res 36: 684–688
Sarah AT, Stuart CGR, Janet MT, Monica R, Julian G, Cyrus C (2001) Small-molecule metabolism: an enzyme mosaic. Trends Biotech 19: 482–486
Chou KC, Cai YD, Zhong WZ (2006) Predicting networking couples for metabolic pathways of Arabidopsis. EXCLI J 5: 55–65
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13: 21–27
Cai YD, Muldoon M (2007) Metabolic pathway modeling by using the nearest neighbor algorithm. MIMS EPrint: 1–21
Creighton TE (1993) Proteins—structures and molecular properties. Freeman, New York
Mucchielli-Giorgi MH, Hazout S, Tuffery P (1999) PredAcc: prediction of solvent accessibility. Bioinformatics 15: 176–177
Tusnady GE, Simon I (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol 283: 489–506
Freund Y, Mansour Y, Schapire RE (2004) Generalization bounds for averaged classifiers. Ann Stat 32: 1698–1722
Freund Y, Iyer R, Schapire RE, Singer Y (2004) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4: 933–969
Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26: 1651–1686
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37: 297–336
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55: 119–139
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37: 277–296
Freund Y, Schapire RE (2000) Additive logistic regression: a statistical view of boosting—discussion. Ann Stat 28: 391–393
Niu B, Cai YD, Lu WC, Li GZ, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Pept Lett 13: 489–492
Niu B, Jin YH, Feng KY, Lu WC, Cai YD, Li GZ (2008) Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers 12: 41–45
Jin YH, Niu B, Feng KY, Lu WC, Cai YD, Li GZ (2008) Predicting subcellular localization with AdaBoost Learner. Protein Pept Lett 15: 286–289
Cai YD, Feng KY, Lu WC, Chou KC (2006) Using LogitBoost classifier to predict protein structural classes. J Theor Biol 238: 172–176
Jiang XY, Wei R, Zhao YJ, Zhang TL (2008) Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location. Amino Acids 34: 669–675
Ochs RA, Goldin JG, Abtin F et al (2007) Automated classification of lung bronchovascular anatomy in CT using AdaBoost. Med Image Anal 11: 315–324
Xie XD, Wu SH, Lam KM, Yan H (2006) PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm. Bioinformatics 22: 2722–2728
Tan C, Li ML, Qin X (2007) Study of the feasibility of distinguishing cigarettes of different brands using an Adaboost algorithm and near-infrared spectroscopy. Anal Bioanal Chem 389: 667–674
Quinlan R (1993) C4.5: programs for machine learning Morgan. Kaufmann, San Mateo, CA
Huberty CJ (1994) Applied discriminant analysis. Wiley, New York
Fix E, Hodges JL (1951) Discriminatory analysis: nonparametric discrimination—consistency properties. USAF School of Aviation Medicine: Randolph Field, TX, pp 261–279
Johnson RA, Wichern DW (1982) Applied multiVariate statistical analysis. Prentice Hall, Englewood Cliffs, NJ
Kohonen T (1988) An introduction to neural computing. Neural Netw 1: 3–8
Bishop CM (1995) Neural networks for pattern recognition. Oxford, London
Kohonen T, Somervuo P (1998) Self-organizing maps of symbol strings. Neurocomputing 21: 19–30
Kohonen T, Kaski SHL (1997) Self-organized formation of various invariant-feature filters in the adaptive-subspace SOM. Neural Comput 9: 1321–1344
Vapnik VNT (1995) The nature of statistical learning theory. Springer, New York
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2: 121–167
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge, MA
Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for the classification and prediction of beta-turn types. J Pept Sci 8: 297–301
Brown MPS, Grundy WN, Lin D et al (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97: 262–267
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for predicting the specificity of GaINAc-transferase. Peptides 23: 205–208
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for predicting HIV protease cleavage sites in protein. J Comput Chem 23: 267–274
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. J Cell Biochem 84: 343–348
Cai YD, Liu XJ, Xu XB, Chou KC (2003) Support vector machines for prediction of protein domain structural class. J Theor Biol 221: 115–120
Cai YD, Zhou GP, Chou KC (2003) Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J 84: 3257–3263
Goto S, Nishioka T, Kanehisa M (1998) LIGAND: chemical database for enzyme reactions. Bioinformatics 14: 591–599
Kanehisa M, Goto S, Hattori M et al (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34: D354–D357
Bender ML, Brubacher LJ (1973) Catalysis and enzyme action. McGraw-Hill, NewYork
Hermann D (2005) Bioorganic chemistry. A chemical approach to enzyme action, Chap. 2. Springer, New York
Michael P, Andrew W (1997) Organic and bio-organic mechanisms. Addison-Wesley Longman, Harlow, UK
Bugg T (1997) An introduction to enzyme and coenzyme chemistry. Blackwell, Oxford
Chou KC, Cai YD (2004) Predicting protein structural class by functional domain composition. Biochem Biophys Res Commun 321: 1007–1009
Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim S-H (1999) Recognition of a protein fold in the context of the SCOP classification. Proteins Struct Funct Genet 35: 401–407
Chothia C, Finkelstein AV (1990) The classification and origins of protein folding patterns. Annu Rev Biochem 59: 1007–1039
Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins Struct Funct Genet 27: 329–335
Mucchielli-Giorgi MH, Hazout S, Tuffery P (1999) Pred Acc:prediction of solvent accessibility. Bioinformatics 15: 176–177
Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim S-H (1999) Recognition of a protein fold in the context of the structural classification of proteins (SCOP) classification. Proteins 35: 401–407
Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins 21: 319–344
Chou KC, Blinn JR (1997) Classification and prediction of beta-turn types. J Protein Chem 16: 575–595
Chou KC, Cai YD (2005) Prediction of membrane protein types by incorporating amphipathic effects. J Chem Inf Model 45: 407–413
Cai YD, Liu XJ, Chou KC (2001) Artificial neural network model for predicting membrane protein types. J Biomol Struct Dyn 18: 607–610
Cai YD, Liu XJ, Chou KC (2002) Artificial neural network model for predicting protein subcellular location. Comput Chem 26: 179–182
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Artificial neural network method for predicting protein secondary structure content. Comput Chem 26: 347–350
Hyone-Myong E (1996) Enzymology primer for recombinant DNA technology. Academic Press, Hardbound
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic, London
Chou KC, Elrod DW (1999) Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet 34: 137–153
Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Niu, B., Jin, Y., Lu, L. et al. Prediction of interaction between small molecule and enzyme using AdaBoost. Mol Divers 13, 313–320 (2009). https://doi.org/10.1007/s11030-009-9116-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-009-9116-1