skip to main content
10.1145/3534678.3542676acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization

Authors Info & Claims
Published:14 August 2022Publication History

ABSTRACT

Leveraging computational methods to generate small molecules with desired properties has been an active research area in the drug discovery field. Towards real-world applications, however, efficient generation of molecules that satisfy multiple property requirements simultaneously remains a key challenge. In this paper, we tackle this challenge using a search-based approach and propose a simple yet effective framework called MolSearch for multi-objective molecular generation (optimization).We show that given proper design and sufficient domain information, search-based methods can achieve performance comparable or even better than deep learning methods while being computationally efficient. Such efficiency enables massive exploration of chemical space given constrained computational resources. In particular, MolSearch starts with existing molecules and uses a two-stage search strategy to gradually modify them into new ones, based on transformation rules derived systematically and exhaustively from large compound libraries. We evaluate MolSearch in multiple benchmark generation settings and demonstrate its effectiveness and efficiency.

References

  1. Sungsoo Ahn, Junsu Kim, Hankook Lee, and Jinwoo Shin. 2020. Guiding deep molecular optimization with genetic exploration. arXiv preprint arXiv:2007.04897 (2020).Google ScholarGoogle Scholar
  2. Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2 (2002), 235--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mahendra Awale, Jérôme Hert, Laura Guasch, Sereina Riniker, and Christian Kramer. 2021. The Playbooks of Medicinal Chemistry Design Moves. Journal of Chemical Information and Modeling, Vol. 61, 2 (2021), 729--742.Google ScholarGoogle ScholarCross RefCross Ref
  4. Dávid Bajusz, Anita Rácz, and Károly Héberger. 2015. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of cheminformatics, Vol. 7, 1 (2015), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  5. Richard Bellman. 1957. A Markovian decision process. Journal of mathematics and mechanics, Vol. 6, 5 (1957), 679--684.Google ScholarGoogle Scholar
  6. G Richard Bickerton, Gaia V Paolini, Jérémy Besnard, Sorel Muresan, and Andrew L Hopkins. 2012. Quantifying the chemical beauty of drugs. Nature chemistry, Vol. 4, 2 (2012), 90--98.Google ScholarGoogle Scholar
  7. Benjamin E Blass. 2015. Basic principles of drug discovery and development. Elsevier.Google ScholarGoogle Scholar
  8. Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, Vol. 4, 1 (2012), 1--43.Google ScholarGoogle ScholarCross RefCross Ref
  9. Darko Butina. 1999. Unsupervised data base clustering based on daylight's fingerprint and Tanimoto similarity: A fast and automated way to cluster small and large data sets. Journal of Chemical Information and Computer Sciences, Vol. 39, 4 (1999), 747--750.Google ScholarGoogle ScholarCross RefCross Ref
  10. Weizhe Chen and Lantao Liu. 2021. Pareto monte carlo tree search for multi-objective informative planning. arXiv preprint arXiv:2111.01825 (2021).Google ScholarGoogle Scholar
  11. Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. 2018. Syntax-directed variational autoencoder for structured data. arXiv preprint arXiv:1802.08786 (2018).Google ScholarGoogle Scholar
  12. Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973 (2018).Google ScholarGoogle Scholar
  13. Daniel C Elton, Zois Boukouvalas, Mark D Fuge, and Peter W Chung. 2019. Deep learning for molecular design-a review of the state of the art. Molecular Systems Design & Engineering, Vol. 4, 4 (2019), 828--849.Google ScholarGoogle ScholarCross RefCross Ref
  14. Peter Ertl and Ansgar Schuffenhauer. 2009. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, Vol. 1, 1 (2009), 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  15. Sylvain Gelly and David Silver. 2011. Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, Vol. 175, 11 (2011), 1856--1875.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjam'in Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, Vol. 4, 2 (2018), 268--276.Google ScholarGoogle Scholar
  17. Jameed Hussain and Ceara Rea. 2010. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. Journal of chemical information and modeling, Vol. 50, 3 (2010), 339--348.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jan H Jensen. 2019. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chemical science, Vol. 10, 12 (2019), 3567--3572.Google ScholarGoogle Scholar
  19. Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2018. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning. PMLR, 2323--2332.Google ScholarGoogle Scholar
  20. Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2020. Multi-objective molecule generation using interpretable substructures. In International Conference on Machine Learning. PMLR, 4849--4859.Google ScholarGoogle Scholar
  21. Govinda B Kc, Giovanni Bocci, Srijan Verma, Md Mahmudulla Hassan, Jayme Holmes, Jeremy J Yang, Suman Sirimulla, and Tudor I Oprea. 2021. A machine learning platform to estimate anti-SARS-CoV-2 activities. Nature Machine Intelligence, Vol. 3, 6 (2021), 527--535.Google ScholarGoogle ScholarCross RefCross Ref
  22. Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. In European conference on machine learning. Springer, 282--293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Matt J Kusner, Brooks Paige, and José Miguel Hernández-Lobato. 2017. Grammar variational autoencoder. In International Conference on Machine Learning. PMLR, 1945--1954.Google ScholarGoogle Scholar
  24. Yibo Li, Liangren Zhang, and Zhenming Liu. 2018. Multi-objective de novo drug design with conditional graph generative model. Journal of cheminformatics, Vol. 10, 1 (2018), 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  25. Youzhi Luo, Keqiang Yan, and Shuiwang Ji. 2021. GraphDF: A discrete flow model for molecular graph generation. arXiv preprint arXiv:2102.01189 (2021).Google ScholarGoogle Scholar
  26. David Mendez, Anna Gaulton, A Patr'icia Bento, Jon Chambers, Marleen De Veij, Eloy Félix, Mar'ia Paula Magari nos, Juan F Mosquera, Prudence Mutowo, Michał Nowotka, et al. 2019. ChEMBL: towards direct deposition of bioassay data. Nucleic acids research, Vol. 47, D1 (2019), D930--D940.Google ScholarGoogle Scholar
  27. AkshatKumar Nigam, Pascal Friederich, Mario Krenn, and Alán Aspuru-Guzik. 2019. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. arXiv preprint arXiv:1909.11655 (2019).Google ScholarGoogle Scholar
  28. Anand A Rajasekar, Karthik Raman, and Balaraman Ravindran. 2020. Goal directed molecule generation using monte carlo tree search. arXiv preprint arXiv:2010.16399 (2020).Google ScholarGoogle Scholar
  29. David Rogers and Mathew Hahn. 2010. Extended-connectivity fingerprints. Journal of chemical information and modeling, Vol. 50, 5 (2010), 742--754.Google ScholarGoogle ScholarCross RefCross Ref
  30. Benjamin Sanchez-Lengeling, Carlos Outeiral, Gabriel L Guimaraes, and Alan Aspuru-Guzik. 2017. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). (2017).Google ScholarGoogle Scholar
  31. Gisbert Schneider and Uli Fechner. 2005. Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery, Vol. 4, 8 (2005), 649--663.Google ScholarGoogle ScholarCross RefCross Ref
  32. Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE, Vol. 104, 1 (2015), 148--175.Google ScholarGoogle ScholarCross RefCross Ref
  33. Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, and Jian Tang. 2020. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382 (2020).Google ScholarGoogle Scholar
  34. Gregory Sliwoski, Sandeepkumar Kothiwale, Jens Meiler, and Edward W Lowe. 2014. Computational methods in drug discovery. Pharmacological reviews, Vol. 66, 1 (2014), 334--395.Google ScholarGoogle Scholar
  35. Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, Vol. 12 (1999).Google ScholarGoogle Scholar
  36. Weijia Wang and Michele Sebag. 2012. Multi-objective monte-carlo tree search. In Asian conference on machine learning. PMLR, 507--522.Google ScholarGoogle Scholar
  37. Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, and Lei Li. 2021. Mars: Markov molecular sampling for multi-objective drug discovery. arXiv preprint arXiv:2103.10432 (2021).Google ScholarGoogle Scholar
  38. Xiufeng Yang, Jinzhe Zhang, Kazuki Yoshizoe, Kei Terayama, and Koji Tsuda. 2017. ChemTS: an efficient python library for de novo molecular generation. Science and technology of advanced materials, Vol. 18, 1 (2017), 972--976.Google ScholarGoogle Scholar
  39. Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, and Jure Leskovec. 2018. Graph convolutional policy network for goal-directed molecular graph generation. arXiv preprint arXiv:1806.02473 (2018).Google ScholarGoogle Scholar
  40. Wenbo Yu and Alexander D MacKerell. 2017. Computer-aided drug design methods. In Antibiotics. Springer, 85--106.Google ScholarGoogle Scholar
  41. Chengxi Zang and Fei Wang. 2020. MoFlow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 617--626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yi Zhang. 2021. An In-depth Summary of Recent Artificial Intelligence Applications in Drug Design. arXiv preprint arXiv:2110.05478 (2021).Google ScholarGoogle Scholar
  43. Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N Zare, and Patrick Riley. 2019. Optimization of molecules via deep reinforcement learning. Scientific reports, Vol. 9, 1 (2019), 1--10.Google ScholarGoogle Scholar

Index Terms

  1. MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2022
        5033 pages
        ISBN:9781450393850
        DOI:10.1145/3534678

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 August 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader