Abstract
In this paper, we first present a new dataset of NDM-1 biological activities that is compiled by a cleaned version of the NMDI database. A literature review enriched the former database by 741 new compounds, comprising activities against NDM-1 classified in three classes (inactive, weakly and strongly active compounds) by specifying a unifying procedure for the labeling, which covers a range of different activity properties. Second, we restate the classification problem in the Multiple Instance Learning (MIL) setting by representing the compounds as a collection of Mol2vec vectors, each of them corresponding to a specific substructure (either atom or atom including their firsts neighbors). We observe an amelioration up to 45.7% and 38.47% in respect to balanced accuracy and F1-score, respectively, for the strongly active class in the MIL approach when compared to the classical Machine Learning paradigm. Finally, we present a classification and ranking framework based on classifiers learned by a k-fold CV procedure, which possess different hyper-parameters per fold, learnt by a Bayes optimization procedure. We observe that the top-3 and top-5 ranked accuracies of the strongly active classified compounds yield 100% for the MIL setting.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Mojica MF, Bonomo RA, Fast W (2016) B1-Metallo-β-Lactamases: where do we stand? Curr Drug Targets 17(9):1029–1050
González-Bello C (2017) Antibiotic adjuvants–a strategy to unlock bacterial resistance to antibiotics. Bioorg Med Chem Lett 27(18):4221–4228
Linciano P et al (2019) Ten Years with New Delhi Metallo-β-lactamase-1 (NDM-1): from structural insights to inhibitor design. ACS Infect Dis 5(1):9–34
DiMasi JA, Grabowski HG, Hansen RW (2016) Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ 47:20–33
Khan AU (2015) Virtual screening strategies: a state of art to combat with multiple drug resistance strains. MOJ Proteomics Bioinform 2(2):61–66
Sterling T, Irwin JJ (2015) ZINC 15–ligand discovery for everyone. J Chem Inf Model 55(11):2324–2337
Gaulton A et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–D954
Dara S et al (2022) Machine learning in drug discovery: a review. Artif Intell Rev 55(3):1947–1999
Chan HCS et al (2019) Advancing drug discovery via artificial intelligence. Trends Pharmacol Sci 40(8):592–604
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1):31–71
Papastergiou T, Zacharaki EI, Megalooikonomou V (2018) Tensor decomposition for multiple-instance classification of high-order medical data. Complexity 2018:1–13
Papastergiou T, Zacharaki EI, Megalooikonomou V (2019) TensMIL2: improved multiple instance classification through tensor decomposition and instance selection. In: 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, September 2019, pp 1–5
Branikas E et al (2019) Instance selection techniques for multiple instance classification. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), PATRAS, Greece, July 2019, pp 1–7
Carbonneau M-A et al (2018) Multiple instance learning: a survey of problem characteristics and applications. Pattern Recognit 77:329–353
Wigh DS, Goodman JM, Lapkin AA (2022) A review of molecular representation in the age of machine learning. WIREs Comput Mol Sci e1603
Jaeger S, Fulle S, Turk S (2018) Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 58(1):27–35
Shi C et al (2020) Applications of machine-learning methods for the discovery of NDM-1 inhibitors. Chem Biol Drug Des 96(5):1232–1243
Burlingham BT, Widlanski TS (2003) An intuitive look at the relationship of Ki and IC50: a more general use for the Dixon plot. J Chem Educ 80(2):214
Andrews JM (2001) Determination of minimum inhibitory concentrations. J Antimicrob Chemother 48(suppl_1):5–16
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754
Acknowledgements
This project was publicly funded through ANR (the French National Research Agency) under the “Investissements d’avenir” programme with the reference ANR-16-IDEX-0006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Papastergiou, T., Azé, J., Bringay, S., Louet, M., Poncelet, P., Gavara, L. (2023). Multiple Instance Learning Based on Mol2vec Molecular Substructure Embeddings for Discovery of NDM-1 Inhibitors. In: Fdez-Riverola, F., Rocha, M., Mohamad, M.S., Caraiman, S., Gil-González, A.B. (eds) Practical Applications of Computational Biology and Bioinformatics, 16th International Conference (PACBB 2022). PACBB 2022. Lecture Notes in Networks and Systems, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-031-17024-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-17024-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17023-2
Online ISBN: 978-3-031-17024-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)