Abstract
We consider the problem of affinity prediction for protein ligands. For this purpose, small molecule candidates can easily become regression algorithm inputs if they are represented as vectors indexed by a set of physico-chemical properties or structural features of their molecular graphs. There are plenty of so-called molecular fingerprints, each with a characteristic composition or generation of features. This raises the question which fingerprint to choose for a given learning task? In addition, none of the standard fingerprints, however, systematically gathers all circular and tree patterns independent of size and the adjacency information of atoms. Since structural and neighborhood information are crucial for the binding capacity of small molecules, we combine the features of existing graph kernels in a novel way such that finally both aspects are covered and the fingerprint choice is included in the learning process. More precisely, we apply the Weisfeiler-Lehman labeling algorithm to encode neighborhood information in the vertex labels. Based on the relabeled graphs we calculate four types of structural features: Cyclic and tree patterns, shortest paths and the Weisfeiler-Lehman labels. We combine these different views using different multi-view regression algorithms. Our experiments demonstrate that affinity prediction profits from the application of multiple views, outperforming state-of-the-art single fingerprint approaches.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
See [8] for a definition of such a canonical representation.
- 2.
Binding database, https://www.bindingdb.org/bind/index.jsp.
- 3.
- 4.
- 5.
References
Balfer, J., Bajorath, J.: Artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis. PLoS ONE 10 (2015)
Bender, A., Jenkins, J.L., Scheiber, J., Sukuru, S.C.K., Glick, M., Davies, J.W.: How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J. Chem. Inf. Model. 49, 108–119 (2009)
Borgwardt, K.M., Kriegel, H.-P.: Shortest-path kernels on graphs. In: Proceedings of ICDM, pp. 74–81 (2005)
Cherkasov, A., Muratov, E.N., Fourches, D., Varnek, A., Baskin, I., Cronin, M., et al.: QSAR modeling: where have you been? Where are you going to? J. Med. Chem. 57, 4977–5010 (2014)
Christianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, New York (2000)
Cortes, C., Mohri, M., Rostaminzadeh, A.: \({L}_2\) regularization for learning kernels. In: Proceedings of UAI, pp. 109–116 (2009)
Gaüzère, B., Brun, L., Villemin, D.: Treelet kernel incorporating cyclic, stereo and inter pattern information in Chemoinformatics. Pattern Recogn. 48, 356–367 (2014)
Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: Proceedings of KDD, pp. 158–167 (2004)
Liu, W., Meng, X., Xu, Q., Flower, D.R., Li, T.: Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models. BMC Bioinform. 7 (2006)
Myint, K.-Z., Wang, L., Tong, Q., Xie, X.-Q.: Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions. Mol. Pharm. 9, 2912–2923 (2012)
Ning, X., Rangwala, H., Karypis, E.: Multi-assay-based structure-activity-relationship models: improving structure-activity-relationship models by incorporating activity information from related targets. J. Chem. Inf. Model. 49, 2444–2456 (2009)
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18, 1093–1110 (2005)
Rogers, D., Hahn, M.: Extended connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010)
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12, 2539–2561 (2011)
Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D., Williamson, B. (eds.) COLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001). doi:10.1007/3-540-44581-1_27
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004)
Sugaya, N.: Ligand efficiency-based support vector regression models for predicting bioactivities of ligands to drug target proteins. J. Chem. Inf. Model. 54, 2751–2763 (2014)
Qiu, S., Lane, T.: Multiple kernel support vector regression for siRNA efficacy prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 4983, 367–378 (2008)
Vishwanathan, S.V.N., Sun, Z., Theera-Ampornpunt, N., Varma, M.: Multiple kernel learning and the SMO algorithm. In: Proceedings of NIPS, pp. 2361–2369 (2010)
Acknowledgements
We want to thank Dr. Martin Vogt from the Department of Life Science Informatics, B-IT, of the university of Bonn for preparing the protein dataset and making it available for us. Furthermore, we thank Dr. Martin Vogt and his colleagues for many valuable discussions on this topic. We would also like to thank Prof. Thomas Gärtner for guidance and advice.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ullrich, K., Mack, J., Welke, P. (2016). Ligand Affinity Prediction with Multi-pattern Kernels. In: Calders, T., Ceci, M., Malerba, D. (eds) Discovery Science. DS 2016. Lecture Notes in Computer Science(), vol 9956. Springer, Cham. https://doi.org/10.1007/978-3-319-46307-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-46307-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46306-3
Online ISBN: 978-3-319-46307-0
eBook Packages: Computer ScienceComputer Science (R0)