Skip to main content

Multiple Instance Learning Based on Mol2vec Molecular Substructure Embeddings for Discovery of NDM-1 Inhibitors

  • Conference paper
  • First Online:
  • 188 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 553))

Abstract

In this paper, we first present a new dataset of NDM-1 biological activities that is compiled by a cleaned version of the NMDI database. A literature review enriched the former database by 741 new compounds, comprising activities against NDM-1 classified in three classes (inactive, weakly and strongly active compounds) by specifying a unifying procedure for the labeling, which covers a range of different activity properties. Second, we restate the classification problem in the Multiple Instance Learning (MIL) setting by representing the compounds as a collection of Mol2vec vectors, each of them corresponding to a specific substructure (either atom or atom including their firsts neighbors). We observe an amelioration up to 45.7% and 38.47% in respect to balanced accuracy and F1-score, respectively, for the strongly active class in the MIL approach when compared to the classical Machine Learning paradigm. Finally, we present a classification and ranking framework based on classifiers learned by a k-fold CV procedure, which possess different hyper-parameters per fold, learnt by a Bayes optimization procedure. We observe that the top-3 and top-5 ranked accuracies of the strongly active classified compounds yield 100% for the MIL setting.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.chemcomp.com/Products.htm.

  2. 2.

    https://pubmed.ncbi.nlm.nih.gov/.

  3. 3.

    https://github.com/rdkit/rdkit.

References

  1. Mojica MF, Bonomo RA, Fast W (2016) B1-Metallo-β-Lactamases: where do we stand? Curr Drug Targets 17(9):1029–1050

    Article  Google Scholar 

  2. González-Bello C (2017) Antibiotic adjuvants–a strategy to unlock bacterial resistance to antibiotics. Bioorg Med Chem Lett 27(18):4221–4228

    Article  Google Scholar 

  3. Linciano P et al (2019) Ten Years with New Delhi Metallo-β-lactamase-1 (NDM-1): from structural insights to inhibitor design. ACS Infect Dis 5(1):9–34

    Article  Google Scholar 

  4. DiMasi JA, Grabowski HG, Hansen RW (2016) Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ 47:20–33

    Article  Google Scholar 

  5. Khan AU (2015) Virtual screening strategies: a state of art to combat with multiple drug resistance strains. MOJ Proteomics Bioinform 2(2):61–66

    Google Scholar 

  6. Sterling T, Irwin JJ (2015) ZINC 15–ligand discovery for everyone. J Chem Inf Model 55(11):2324–2337

    Article  Google Scholar 

  7. Gaulton A et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–D954

    Article  Google Scholar 

  8. Dara S et al (2022) Machine learning in drug discovery: a review. Artif Intell Rev 55(3):1947–1999

    Article  Google Scholar 

  9. Chan HCS et al (2019) Advancing drug discovery via artificial intelligence. Trends Pharmacol Sci 40(8):592–604

    Article  Google Scholar 

  10. Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1):31–71

    Article  Google Scholar 

  11. Papastergiou T, Zacharaki EI, Megalooikonomou V (2018) Tensor decomposition for multiple-instance classification of high-order medical data. Complexity 2018:1–13

    Article  Google Scholar 

  12. Papastergiou T, Zacharaki EI, Megalooikonomou V (2019) TensMIL2: improved multiple instance classification through tensor decomposition and instance selection. In: 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, September 2019, pp 1–5

    Google Scholar 

  13. Branikas E et al (2019) Instance selection techniques for multiple instance classification. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), PATRAS, Greece, July 2019, pp 1–7

    Google Scholar 

  14. Carbonneau M-A et al (2018) Multiple instance learning: a survey of problem characteristics and applications. Pattern Recognit 77:329–353

    Article  Google Scholar 

  15. Wigh DS, Goodman JM, Lapkin AA (2022) A review of molecular representation in the age of machine learning. WIREs Comput Mol Sci e1603

    Google Scholar 

  16. Jaeger S, Fulle S, Turk S (2018) Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 58(1):27–35

    Article  Google Scholar 

  17. Shi C et al (2020) Applications of machine-learning methods for the discovery of NDM-1 inhibitors. Chem Biol Drug Des 96(5):1232–1243

    Article  Google Scholar 

  18. Burlingham BT, Widlanski TS (2003) An intuitive look at the relationship of Ki and IC50: a more general use for the Dixon plot. J Chem Educ 80(2):214

    Article  Google Scholar 

  19. Andrews JM (2001) Determination of minimum inhibitory concentrations. J Antimicrob Chemother 48(suppl_1):5–16

    Google Scholar 

  20. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754

    Article  Google Scholar 

Download references

Acknowledgements

This project was publicly funded through ANR (the French National Research Agency) under the “Investissements d’avenir” programme with the reference ANR-16-IDEX-0006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Papastergiou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Papastergiou, T., Azé, J., Bringay, S., Louet, M., Poncelet, P., Gavara, L. (2023). Multiple Instance Learning Based on Mol2vec Molecular Substructure Embeddings for Discovery of NDM-1 Inhibitors. In: Fdez-Riverola, F., Rocha, M., Mohamad, M.S., Caraiman, S., Gil-González, A.B. (eds) Practical Applications of Computational Biology and Bioinformatics, 16th International Conference (PACBB 2022). PACBB 2022. Lecture Notes in Networks and Systems, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-031-17024-9_6

Download citation

Publish with us

Policies and ethics