ABSTRACT
Leveraging computational methods to generate small molecules with desired properties has been an active research area in the drug discovery field. Towards real-world applications, however, efficient generation of molecules that satisfy multiple property requirements simultaneously remains a key challenge. In this paper, we tackle this challenge using a search-based approach and propose a simple yet effective framework called MolSearch for multi-objective molecular generation (optimization).We show that given proper design and sufficient domain information, search-based methods can achieve performance comparable or even better than deep learning methods while being computationally efficient. Such efficiency enables massive exploration of chemical space given constrained computational resources. In particular, MolSearch starts with existing molecules and uses a two-stage search strategy to gradually modify them into new ones, based on transformation rules derived systematically and exhaustively from large compound libraries. We evaluate MolSearch in multiple benchmark generation settings and demonstrate its effectiveness and efficiency.
- Sungsoo Ahn, Junsu Kim, Hankook Lee, and Jinwoo Shin. 2020. Guiding deep molecular optimization with genetic exploration. arXiv preprint arXiv:2007.04897 (2020).Google Scholar
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2 (2002), 235--256.Google ScholarDigital Library
- Mahendra Awale, Jérôme Hert, Laura Guasch, Sereina Riniker, and Christian Kramer. 2021. The Playbooks of Medicinal Chemistry Design Moves. Journal of Chemical Information and Modeling, Vol. 61, 2 (2021), 729--742.Google ScholarCross Ref
- Dávid Bajusz, Anita Rácz, and Károly Héberger. 2015. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of cheminformatics, Vol. 7, 1 (2015), 1--13.Google ScholarCross Ref
- Richard Bellman. 1957. A Markovian decision process. Journal of mathematics and mechanics, Vol. 6, 5 (1957), 679--684.Google Scholar
- G Richard Bickerton, Gaia V Paolini, Jérémy Besnard, Sorel Muresan, and Andrew L Hopkins. 2012. Quantifying the chemical beauty of drugs. Nature chemistry, Vol. 4, 2 (2012), 90--98.Google Scholar
- Benjamin E Blass. 2015. Basic principles of drug discovery and development. Elsevier.Google Scholar
- Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, Vol. 4, 1 (2012), 1--43.Google ScholarCross Ref
- Darko Butina. 1999. Unsupervised data base clustering based on daylight's fingerprint and Tanimoto similarity: A fast and automated way to cluster small and large data sets. Journal of Chemical Information and Computer Sciences, Vol. 39, 4 (1999), 747--750.Google ScholarCross Ref
- Weizhe Chen and Lantao Liu. 2021. Pareto monte carlo tree search for multi-objective informative planning. arXiv preprint arXiv:2111.01825 (2021).Google Scholar
- Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. 2018. Syntax-directed variational autoencoder for structured data. arXiv preprint arXiv:1802.08786 (2018).Google Scholar
- Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973 (2018).Google Scholar
- Daniel C Elton, Zois Boukouvalas, Mark D Fuge, and Peter W Chung. 2019. Deep learning for molecular design-a review of the state of the art. Molecular Systems Design & Engineering, Vol. 4, 4 (2019), 828--849.Google ScholarCross Ref
- Peter Ertl and Ansgar Schuffenhauer. 2009. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, Vol. 1, 1 (2009), 1--11.Google ScholarCross Ref
- Sylvain Gelly and David Silver. 2011. Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, Vol. 175, 11 (2011), 1856--1875.Google ScholarDigital Library
- Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjam'in Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, Vol. 4, 2 (2018), 268--276.Google Scholar
- Jameed Hussain and Ceara Rea. 2010. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. Journal of chemical information and modeling, Vol. 50, 3 (2010), 339--348.Google ScholarCross Ref
- Jan H Jensen. 2019. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chemical science, Vol. 10, 12 (2019), 3567--3572.Google Scholar
- Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2018. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning. PMLR, 2323--2332.Google Scholar
- Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2020. Multi-objective molecule generation using interpretable substructures. In International Conference on Machine Learning. PMLR, 4849--4859.Google Scholar
- Govinda B Kc, Giovanni Bocci, Srijan Verma, Md Mahmudulla Hassan, Jayme Holmes, Jeremy J Yang, Suman Sirimulla, and Tudor I Oprea. 2021. A machine learning platform to estimate anti-SARS-CoV-2 activities. Nature Machine Intelligence, Vol. 3, 6 (2021), 527--535.Google ScholarCross Ref
- Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. In European conference on machine learning. Springer, 282--293.Google ScholarDigital Library
- Matt J Kusner, Brooks Paige, and José Miguel Hernández-Lobato. 2017. Grammar variational autoencoder. In International Conference on Machine Learning. PMLR, 1945--1954.Google Scholar
- Yibo Li, Liangren Zhang, and Zhenming Liu. 2018. Multi-objective de novo drug design with conditional graph generative model. Journal of cheminformatics, Vol. 10, 1 (2018), 1--24.Google ScholarCross Ref
- Youzhi Luo, Keqiang Yan, and Shuiwang Ji. 2021. GraphDF: A discrete flow model for molecular graph generation. arXiv preprint arXiv:2102.01189 (2021).Google Scholar
- David Mendez, Anna Gaulton, A Patr'icia Bento, Jon Chambers, Marleen De Veij, Eloy Félix, Mar'ia Paula Magari nos, Juan F Mosquera, Prudence Mutowo, Michał Nowotka, et al. 2019. ChEMBL: towards direct deposition of bioassay data. Nucleic acids research, Vol. 47, D1 (2019), D930--D940.Google Scholar
- AkshatKumar Nigam, Pascal Friederich, Mario Krenn, and Alán Aspuru-Guzik. 2019. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. arXiv preprint arXiv:1909.11655 (2019).Google Scholar
- Anand A Rajasekar, Karthik Raman, and Balaraman Ravindran. 2020. Goal directed molecule generation using monte carlo tree search. arXiv preprint arXiv:2010.16399 (2020).Google Scholar
- David Rogers and Mathew Hahn. 2010. Extended-connectivity fingerprints. Journal of chemical information and modeling, Vol. 50, 5 (2010), 742--754.Google ScholarCross Ref
- Benjamin Sanchez-Lengeling, Carlos Outeiral, Gabriel L Guimaraes, and Alan Aspuru-Guzik. 2017. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). (2017).Google Scholar
- Gisbert Schneider and Uli Fechner. 2005. Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery, Vol. 4, 8 (2005), 649--663.Google ScholarCross Ref
- Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE, Vol. 104, 1 (2015), 148--175.Google ScholarCross Ref
- Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, and Jian Tang. 2020. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382 (2020).Google Scholar
- Gregory Sliwoski, Sandeepkumar Kothiwale, Jens Meiler, and Edward W Lowe. 2014. Computational methods in drug discovery. Pharmacological reviews, Vol. 66, 1 (2014), 334--395.Google Scholar
- Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, Vol. 12 (1999).Google Scholar
- Weijia Wang and Michele Sebag. 2012. Multi-objective monte-carlo tree search. In Asian conference on machine learning. PMLR, 507--522.Google Scholar
- Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, and Lei Li. 2021. Mars: Markov molecular sampling for multi-objective drug discovery. arXiv preprint arXiv:2103.10432 (2021).Google Scholar
- Xiufeng Yang, Jinzhe Zhang, Kazuki Yoshizoe, Kei Terayama, and Koji Tsuda. 2017. ChemTS: an efficient python library for de novo molecular generation. Science and technology of advanced materials, Vol. 18, 1 (2017), 972--976.Google Scholar
- Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, and Jure Leskovec. 2018. Graph convolutional policy network for goal-directed molecular graph generation. arXiv preprint arXiv:1806.02473 (2018).Google Scholar
- Wenbo Yu and Alexander D MacKerell. 2017. Computer-aided drug design methods. In Antibiotics. Springer, 85--106.Google Scholar
- Chengxi Zang and Fei Wang. 2020. MoFlow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 617--626.Google ScholarDigital Library
- Yi Zhang. 2021. An In-depth Summary of Recent Artificial Intelligence Applications in Drug Design. arXiv preprint arXiv:2110.05478 (2021).Google Scholar
- Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N Zare, and Patrick Riley. 2019. Optimization of molecules via deep reinforcement learning. Scientific reports, Vol. 9, 1 (2019), 1--10.Google Scholar
Index Terms
- MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization
Recommendations
Integrating Co-Evolutionary Information in Monte Carlo Based Method for Proteins Trajectory Simulation
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health InformaticsThe conformational space of proteins is complex and high dimensional, which makes its analysis a highly challenging task. Understanding the structure and dynamics of proteins is essential in order to understand their function. Intermediate structures ...
Antiviral potential of natural compounds against influenza virus hemagglutinin
The antiviral activity of natural compounds against the HA protein of different subtypes of Influenza virus has been investigated using binding free energy and hydrogen bonding interactions.Display Omitted The curucmin derivatives (CI, CII and CIII) ...
Characterizing the protonation states of the catalytic residues in apo and substrate-bound human T-cell leukemia virus type 1 protease
Display Omitted Protonation states of the catalytic residues in HTLV-1 protease were investigated.In apo HTLV-1 protease, the two catalytic residues are both unprotonated.In HTLV-1 protease-substrate complex, Asp32 is protonated, Asp32' is ...
Comments