research-article

Public Access

MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization

Authors:
Mengying Sun

Michigan State University, East Lansing, MI, USA

Michigan State University, East Lansing, MI, USA
View Profile

,
Jing Xing

Michigan State University, Grand Rapids, MI, USA

Michigan State University, Grand Rapids, MI, USA
View Profile

,
Han Meng

Michigan State University, East Lansing, MI, USA

Michigan State University, East Lansing, MI, USA
View Profile

,
Huijun Wang

Agios Pharmaceuticals, Cambridge, MA, USA

Agios Pharmaceuticals, Cambridge, MA, USA
View Profile

,
Bin Chen

Michigan State University, Grand Rapids, MI, USA

Michigan State University, Grand Rapids, MI, USA
View Profile

,
Jiayu Zhou

Michigan State University, East Lansing, MI, USA

Michigan State University, East Lansing, MI, USA
View Profile

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2022Pages 4724–4732https://doi.org/10.1145/3534678.3542676

Published:14 August 2022Publication History

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 4724–4732

ABSTRACT

Leveraging computational methods to generate small molecules with desired properties has been an active research area in the drug discovery field. Towards real-world applications, however, efficient generation of molecules that satisfy multiple property requirements simultaneously remains a key challenge. In this paper, we tackle this challenge using a search-based approach and propose a simple yet effective framework called MolSearch for multi-objective molecular generation (optimization).We show that given proper design and sufficient domain information, search-based methods can achieve performance comparable or even better than deep learning methods while being computationally efficient. Such efficiency enables massive exploration of chemical space given constrained computational resources. In particular, MolSearch starts with existing molecules and uses a two-stage search strategy to gradually modify them into new ones, based on transformation rules derived systematically and exhaustively from large compound libraries. We evaluate MolSearch in multiple benchmark generation settings and demonstrate its effectiveness and efficiency.

References

Sungsoo Ahn, Junsu Kim, Hankook Lee, and Jinwoo Shin. 2020. Guiding deep molecular optimization with genetic exploration. arXiv preprint arXiv:2007.04897 (2020).Google Scholar
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2 (2002), 235--256.Google ScholarDigital Library
Mahendra Awale, Jérôme Hert, Laura Guasch, Sereina Riniker, and Christian Kramer. 2021. The Playbooks of Medicinal Chemistry Design Moves. Journal of Chemical Information and Modeling, Vol. 61, 2 (2021), 729--742.Google ScholarCross Ref
Dávid Bajusz, Anita Rácz, and Károly Héberger. 2015. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of cheminformatics, Vol. 7, 1 (2015), 1--13.Google ScholarCross Ref
Richard Bellman. 1957. A Markovian decision process. Journal of mathematics and mechanics, Vol. 6, 5 (1957), 679--684.Google Scholar
G Richard Bickerton, Gaia V Paolini, Jérémy Besnard, Sorel Muresan, and Andrew L Hopkins. 2012. Quantifying the chemical beauty of drugs. Nature chemistry, Vol. 4, 2 (2012), 90--98.Google Scholar
Benjamin E Blass. 2015. Basic principles of drug discovery and development. Elsevier.Google Scholar
Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, Vol. 4, 1 (2012), 1--43.Google ScholarCross Ref
Darko Butina. 1999. Unsupervised data base clustering based on daylight's fingerprint and Tanimoto similarity: A fast and automated way to cluster small and large data sets. Journal of Chemical Information and Computer Sciences, Vol. 39, 4 (1999), 747--750.Google ScholarCross Ref
Weizhe Chen and Lantao Liu. 2021. Pareto monte carlo tree search for multi-objective informative planning. arXiv preprint arXiv:2111.01825 (2021).Google Scholar
Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. 2018. Syntax-directed variational autoencoder for structured data. arXiv preprint arXiv:1802.08786 (2018).Google Scholar
Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973 (2018).Google Scholar
Daniel C Elton, Zois Boukouvalas, Mark D Fuge, and Peter W Chung. 2019. Deep learning for molecular design-a review of the state of the art. Molecular Systems Design & Engineering, Vol. 4, 4 (2019), 828--849.Google ScholarCross Ref
Peter Ertl and Ansgar Schuffenhauer. 2009. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, Vol. 1, 1 (2009), 1--11.Google ScholarCross Ref
Sylvain Gelly and David Silver. 2011. Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, Vol. 175, 11 (2011), 1856--1875.Google ScholarDigital Library
Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjam'in Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, Vol. 4, 2 (2018), 268--276.Google Scholar
Jameed Hussain and Ceara Rea. 2010. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. Journal of chemical information and modeling, Vol. 50, 3 (2010), 339--348.Google ScholarCross Ref
Jan H Jensen. 2019. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chemical science, Vol. 10, 12 (2019), 3567--3572.Google Scholar
Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2018. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning. PMLR, 2323--2332.Google Scholar
Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2020. Multi-objective molecule generation using interpretable substructures. In International Conference on Machine Learning. PMLR, 4849--4859.Google Scholar
Govinda B Kc, Giovanni Bocci, Srijan Verma, Md Mahmudulla Hassan, Jayme Holmes, Jeremy J Yang, Suman Sirimulla, and Tudor I Oprea. 2021. A machine learning platform to estimate anti-SARS-CoV-2 activities. Nature Machine Intelligence, Vol. 3, 6 (2021), 527--535.Google ScholarCross Ref
Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. In European conference on machine learning. Springer, 282--293.Google ScholarDigital Library
Matt J Kusner, Brooks Paige, and José Miguel Hernández-Lobato. 2017. Grammar variational autoencoder. In International Conference on Machine Learning. PMLR, 1945--1954.Google Scholar
Yibo Li, Liangren Zhang, and Zhenming Liu. 2018. Multi-objective de novo drug design with conditional graph generative model. Journal of cheminformatics, Vol. 10, 1 (2018), 1--24.Google ScholarCross Ref
Youzhi Luo, Keqiang Yan, and Shuiwang Ji. 2021. GraphDF: A discrete flow model for molecular graph generation. arXiv preprint arXiv:2102.01189 (2021).Google Scholar
David Mendez, Anna Gaulton, A Patr'icia Bento, Jon Chambers, Marleen De Veij, Eloy Félix, Mar'ia Paula Magari nos, Juan F Mosquera, Prudence Mutowo, Michał Nowotka, et al. 2019. ChEMBL: towards direct deposition of bioassay data. Nucleic acids research, Vol. 47, D1 (2019), D930--D940.Google Scholar
AkshatKumar Nigam, Pascal Friederich, Mario Krenn, and Alán Aspuru-Guzik. 2019. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. arXiv preprint arXiv:1909.11655 (2019).Google Scholar
Anand A Rajasekar, Karthik Raman, and Balaraman Ravindran. 2020. Goal directed molecule generation using monte carlo tree search. arXiv preprint arXiv:2010.16399 (2020).Google Scholar
David Rogers and Mathew Hahn. 2010. Extended-connectivity fingerprints. Journal of chemical information and modeling, Vol. 50, 5 (2010), 742--754.Google ScholarCross Ref
Benjamin Sanchez-Lengeling, Carlos Outeiral, Gabriel L Guimaraes, and Alan Aspuru-Guzik. 2017. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). (2017).Google Scholar
Gisbert Schneider and Uli Fechner. 2005. Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery, Vol. 4, 8 (2005), 649--663.Google ScholarCross Ref
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE, Vol. 104, 1 (2015), 148--175.Google ScholarCross Ref
Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, and Jian Tang. 2020. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382 (2020).Google Scholar
Gregory Sliwoski, Sandeepkumar Kothiwale, Jens Meiler, and Edward W Lowe. 2014. Computational methods in drug discovery. Pharmacological reviews, Vol. 66, 1 (2014), 334--395.Google Scholar
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, Vol. 12 (1999).Google Scholar
Weijia Wang and Michele Sebag. 2012. Multi-objective monte-carlo tree search. In Asian conference on machine learning. PMLR, 507--522.Google Scholar
Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, and Lei Li. 2021. Mars: Markov molecular sampling for multi-objective drug discovery. arXiv preprint arXiv:2103.10432 (2021).Google Scholar
Xiufeng Yang, Jinzhe Zhang, Kazuki Yoshizoe, Kei Terayama, and Koji Tsuda. 2017. ChemTS: an efficient python library for de novo molecular generation. Science and technology of advanced materials, Vol. 18, 1 (2017), 972--976.Google Scholar
Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, and Jure Leskovec. 2018. Graph convolutional policy network for goal-directed molecular graph generation. arXiv preprint arXiv:1806.02473 (2018).Google Scholar
Wenbo Yu and Alexander D MacKerell. 2017. Computer-aided drug design methods. In Antibiotics. Springer, 85--106.Google Scholar
Chengxi Zang and Fei Wang. 2020. MoFlow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 617--626.Google ScholarDigital Library
Yi Zhang. 2021. An In-depth Summary of Recent Artificial Intelligence Applications in Drug Design. arXiv preprint arXiv:2110.05478 (2021).Google Scholar
Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N Zare, and Patrick Riley. 2019. Optimization of molecules via deep reinforcement learning. Scientific reports, Vol. 9, 1 (2019), 1--10.Google Scholar

Index Terms

MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization
1. Applied computing
  1. Life and medical sciences
    1. Bioinformatics
2. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Integrating Co-Evolutionary Information in Monte Carlo Based Method for Proteins Trajectory Simulation
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

The conformational space of proteins is complex and high dimensional, which makes its analysis a highly challenging task. Understanding the structure and dynamics of proteins is essential in order to understand their function. Intermediate structures ...
Read More
Antiviral potential of natural compounds against influenza virus hemagglutinin

The antiviral activity of natural compounds against the HA protein of different subtypes of Influenza virus has been investigated using binding free energy and hydrogen bonding interactions.Display Omitted The curucmin derivatives (CI, CII and CIII) ...
Read More
Characterizing the protonation states of the catalytic residues in apo and substrate-bound human T-cell leukemia virus type 1 protease

Display Omitted Protonation states of the catalytic residues in HTLV-1 protease were investigated.In apo HTLV-1 protease, the two catalytic residues are both unprotonated.In HTLV-1 protease-substrate complex, Asp32 is protonated, Asp32' is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
design moves
molecular generation and optimization
monte carlo tree search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 806
  Total Downloads
- Downloads (Last 12 months)368
- Downloads (Last 6 weeks)41
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Integrating Co-Evolutionary Information in Monte Carlo Based Method for Proteins Trajectory Simulation

Antiviral potential of natural compounds against influenza virus hemagglutinin

Characterizing the protonation states of the catalytic residues in apo and substrate-bound human T-cell leukemia virus type 1 protease

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Integrating Co-Evolutionary Information in Monte Carlo Based Method for Proteins Trajectory Simulation

Antiviral potential of natural compounds against influenza virus hemagglutinin

Characterizing the protonation states of the catalytic residues in apo and substrate-bound human T-cell leukemia virus type 1 protease

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media