Skip to main content
Log in

Prediction of interaction between small molecule and enzyme using AdaBoost

  • Full Length Paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

The knowledge of whether one enzyme can interact with a small molecule is essential for understanding the molecular and cellular functions of organisms. In this paper, we introduce a classifier to predict the small molecule– enzyme interaction, i.e., whether they can interact with each other. Small molecules are represented by their chemical functional groups, and enzymes are represented by their biochemical and physicochemical properties, resulting in a total of 160 features. These features are input into the AdaBoost classifier, which is known to have good generalization ability to predict interaction. As a result, the overall prediction accuracy, tested by tenfold cross-validation and independent sets, is 81.76% and 83.35%, respectively, suggesting that this strategy is effective. In this research, we typically choose interactions between small molecules and enzymes involved in metabolism to ultimately improve further understanding of metabolic pathways. An online predictor developed by this research is available at http://chemdata.shu.edu.cn/small_m.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Metzler DE (1977) Biochemistry: the chemical reactions of living cells. Academic, London

    Google Scholar 

  2. Marchand-Geneste N, Watson KA, Alsberg BK, King RD (2002) New approach to pharmacophore mapping and QSAR analysis using inductive logic programming application to thermolysin inhibitors and glycogen phosphorylase b inhibitors. J Med Chem 45(2): 399–409

    Article  PubMed  CAS  Google Scholar 

  3. Caspi R, Foerster H, Fulcher CA et al (2006) MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res 34: D511–D516

    Article  PubMed  CAS  Google Scholar 

  4. Wishart DS, Tzur D, Knox C et al (2007) HMDB: the human metabolome database. Nucleic Acids Res 35: D521–D526

    Article  PubMed  CAS  Google Scholar 

  5. Brooksbank C, Cameron G, Thornton J (2005) The European bioinformatics institute’s data resources: towards systems biology. Nucleic Acids Res 33: 46–53

    Article  Google Scholar 

  6. Wheeler DL, Barrett T, Benson DA et al (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35: 5–12

    Article  Google Scholar 

  7. Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P (2008) STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res 36: 684–688

    Article  Google Scholar 

  8. Sarah AT, Stuart CGR, Janet MT, Monica R, Julian G, Cyrus C (2001) Small-molecule metabolism: an enzyme mosaic. Trends Biotech 19: 482–486

    Article  Google Scholar 

  9. Chou KC, Cai YD, Zhong WZ (2006) Predicting networking couples for metabolic pathways of Arabidopsis. EXCLI J 5: 55–65

    Google Scholar 

  10. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13: 21–27

    Article  Google Scholar 

  11. Cai YD, Muldoon M (2007) Metabolic pathway modeling by using the nearest neighbor algorithm. MIMS EPrint: 1–21

  12. Creighton TE (1993) Proteins—structures and molecular properties. Freeman, New York

    Google Scholar 

  13. Mucchielli-Giorgi MH, Hazout S, Tuffery P (1999) PredAcc: prediction of solvent accessibility. Bioinformatics 15: 176–177

    Article  PubMed  CAS  Google Scholar 

  14. Tusnady GE, Simon I (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol 283: 489–506

    Article  PubMed  CAS  Google Scholar 

  15. Freund Y, Mansour Y, Schapire RE (2004) Generalization bounds for averaged classifiers. Ann Stat 32: 1698–1722

    Article  Google Scholar 

  16. Freund Y, Iyer R, Schapire RE, Singer Y (2004) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4: 933–969

    Article  Google Scholar 

  17. Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26: 1651–1686

    Article  Google Scholar 

  18. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37: 297–336

    Article  Google Scholar 

  19. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55: 119–139

    Article  Google Scholar 

  20. Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37: 277–296

    Article  Google Scholar 

  21. Freund Y, Schapire RE (2000) Additive logistic regression: a statistical view of boosting—discussion. Ann Stat 28: 391–393

    Google Scholar 

  22. Niu B, Cai YD, Lu WC, Li GZ, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Pept Lett 13: 489–492

    Article  PubMed  CAS  Google Scholar 

  23. Niu B, Jin YH, Feng KY, Lu WC, Cai YD, Li GZ (2008) Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers 12: 41–45

    Article  PubMed  CAS  Google Scholar 

  24. Jin YH, Niu B, Feng KY, Lu WC, Cai YD, Li GZ (2008) Predicting subcellular localization with AdaBoost Learner. Protein Pept Lett 15: 286–289

    Article  PubMed  CAS  Google Scholar 

  25. Cai YD, Feng KY, Lu WC, Chou KC (2006) Using LogitBoost classifier to predict protein structural classes. J Theor Biol 238: 172–176

    Article  PubMed  CAS  Google Scholar 

  26. Jiang XY, Wei R, Zhao YJ, Zhang TL (2008) Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location. Amino Acids 34: 669–675

    Article  PubMed  CAS  Google Scholar 

  27. Ochs RA, Goldin JG, Abtin F et al (2007) Automated classification of lung bronchovascular anatomy in CT using AdaBoost. Med Image Anal 11: 315–324

    Article  PubMed  Google Scholar 

  28. Xie XD, Wu SH, Lam KM, Yan H (2006) PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm. Bioinformatics 22: 2722–2728

    Article  PubMed  CAS  Google Scholar 

  29. Tan C, Li ML, Qin X (2007) Study of the feasibility of distinguishing cigarettes of different brands using an Adaboost algorithm and near-infrared spectroscopy. Anal Bioanal Chem 389: 667–674

    Article  PubMed  CAS  Google Scholar 

  30. Quinlan R (1993) C4.5: programs for machine learning Morgan. Kaufmann, San Mateo, CA

    Google Scholar 

  31. Huberty CJ (1994) Applied discriminant analysis. Wiley, New York

    Google Scholar 

  32. Fix E, Hodges JL (1951) Discriminatory analysis: nonparametric discrimination—consistency properties. USAF School of Aviation Medicine: Randolph Field, TX, pp 261–279

    Google Scholar 

  33. Johnson RA, Wichern DW (1982) Applied multiVariate statistical analysis. Prentice Hall, Englewood Cliffs, NJ

    Google Scholar 

  34. Kohonen T (1988) An introduction to neural computing. Neural Netw 1: 3–8

    Article  Google Scholar 

  35. Bishop CM (1995) Neural networks for pattern recognition. Oxford, London

    Google Scholar 

  36. Kohonen T, Somervuo P (1998) Self-organizing maps of symbol strings. Neurocomputing 21: 19–30

    Article  Google Scholar 

  37. Kohonen T, Kaski SHL (1997) Self-organized formation of various invariant-feature filters in the adaptive-subspace SOM. Neural Comput 9: 1321–1344

    Article  Google Scholar 

  38. Vapnik VNT (1995) The nature of statistical learning theory. Springer, New York

    Google Scholar 

  39. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2: 121–167

    Article  Google Scholar 

  40. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  41. Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge, MA

    Google Scholar 

  42. Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York

    Google Scholar 

  43. Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for the classification and prediction of beta-turn types. J Pept Sci 8: 297–301

    Article  PubMed  CAS  Google Scholar 

  44. Brown MPS, Grundy WN, Lin D et al (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97: 262–267

    Article  PubMed  CAS  Google Scholar 

  45. Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for predicting the specificity of GaINAc-transferase. Peptides 23: 205–208

    Article  PubMed  CAS  Google Scholar 

  46. Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for predicting HIV protease cleavage sites in protein. J Comput Chem 23: 267–274

    Article  PubMed  CAS  Google Scholar 

  47. Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. J Cell Biochem 84: 343–348

    Article  PubMed  Google Scholar 

  48. Cai YD, Liu XJ, Xu XB, Chou KC (2003) Support vector machines for prediction of protein domain structural class. J Theor Biol 221: 115–120

    Article  PubMed  CAS  Google Scholar 

  49. Cai YD, Zhou GP, Chou KC (2003) Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J 84: 3257–3263

    Article  PubMed  CAS  Google Scholar 

  50. Goto S, Nishioka T, Kanehisa M (1998) LIGAND: chemical database for enzyme reactions. Bioinformatics 14: 591–599

    Article  PubMed  CAS  Google Scholar 

  51. Kanehisa M, Goto S, Hattori M et al (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34: D354–D357

    Article  PubMed  CAS  Google Scholar 

  52. Bender ML, Brubacher LJ (1973) Catalysis and enzyme action. McGraw-Hill, NewYork

    Google Scholar 

  53. Hermann D (2005) Bioorganic chemistry. A chemical approach to enzyme action, Chap. 2. Springer, New York

  54. Michael P, Andrew W (1997) Organic and bio-organic mechanisms. Addison-Wesley Longman, Harlow, UK

    Google Scholar 

  55. Bugg T (1997) An introduction to enzyme and coenzyme chemistry. Blackwell, Oxford

    Google Scholar 

  56. Chou KC, Cai YD (2004) Predicting protein structural class by functional domain composition. Biochem Biophys Res Commun 321: 1007–1009

    Article  PubMed  CAS  Google Scholar 

  57. Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim S-H (1999) Recognition of a protein fold in the context of the SCOP classification. Proteins Struct Funct Genet 35: 401–407

    Article  PubMed  CAS  Google Scholar 

  58. Chothia C, Finkelstein AV (1990) The classification and origins of protein folding patterns. Annu Rev Biochem 59: 1007–1039

    Article  PubMed  CAS  Google Scholar 

  59. Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins Struct Funct Genet 27: 329–335

    Article  PubMed  CAS  Google Scholar 

  60. Mucchielli-Giorgi MH, Hazout S, Tuffery P (1999) Pred Acc:prediction of solvent accessibility. Bioinformatics 15: 176–177

    Article  PubMed  CAS  Google Scholar 

  61. Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim S-H (1999) Recognition of a protein fold in the context of the structural classification of proteins (SCOP) classification. Proteins 35: 401–407

    Article  PubMed  CAS  Google Scholar 

  62. Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins 21: 319–344

    Article  PubMed  CAS  Google Scholar 

  63. Chou KC, Blinn JR (1997) Classification and prediction of beta-turn types. J Protein Chem 16: 575–595

    Article  PubMed  CAS  Google Scholar 

  64. Chou KC, Cai YD (2005) Prediction of membrane protein types by incorporating amphipathic effects. J Chem Inf Model 45: 407–413

    Article  PubMed  CAS  Google Scholar 

  65. Cai YD, Liu XJ, Chou KC (2001) Artificial neural network model for predicting membrane protein types. J Biomol Struct Dyn 18: 607–610

    PubMed  CAS  Google Scholar 

  66. Cai YD, Liu XJ, Chou KC (2002) Artificial neural network model for predicting protein subcellular location. Comput Chem 26: 179–182

    Article  PubMed  CAS  Google Scholar 

  67. Cai YD, Liu XJ, Xu XB, Chou KC (2002) Artificial neural network method for predicting protein secondary structure content. Comput Chem 26: 347–350

    Article  PubMed  CAS  Google Scholar 

  68. Hyone-Myong E (1996) Enzymology primer for recombinant DNA technology. Academic Press, Hardbound

    Google Scholar 

  69. Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic, London

    Google Scholar 

  70. Chou KC, Elrod DW (1999) Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet 34: 137–153

    Article  PubMed  CAS  Google Scholar 

  71. Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wencong Lu or Yudong Cai.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niu, B., Jin, Y., Lu, L. et al. Prediction of interaction between small molecule and enzyme using AdaBoost. Mol Divers 13, 313–320 (2009). https://doi.org/10.1007/s11030-009-9116-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-009-9116-1

Keywords

Navigation