Abstract
Tropomyosin receptor kinases (TRKs) are important broad-spectrum anticancer targets. The oncogenic rearrangement of the NTRK gene disrupts the extracellular structural domain and epitopes for therapeutic antibodies, making small-molecule inhibitors essential for treating NTRK fusion-driven tumors. In this work, several algorithms were used to construct descriptor-based and nondescriptor-based models, and the models were evaluated by outer 10-fold cross-validation. To find a model with good generalization ability, the dataset was partitioned by random and cluster-splitting methods to construct in- and cross-domain models, respectively. Among the 48 models built, the model with the combination of the deep neural network (DNN) algorithm and extended connectivity fingerprints 4 (ECFP4) descriptors achieved excellent performance in both dataset divisions. The results indicate that the DNN algorithm has a strong generalization prediction ability, and the richness of features plays a vital role in predicting unknown spatial molecules. Additionally, we combined the clustering results and decision tree models of fingerprint descriptors to perform structure–activity relationship analysis. It was found that nitrogen-containing aromatic heterocyclic and benzo heterocyclic structures play a crucial role in enhancing the activity of TRK inhibitors.
Graphical abstract
Workflow for generating predictive models for TRK inhibitors.
Similar content being viewed by others
Data availability
The datasets used in this work can be found in the Supplementary Information. The source of the Reaxys dataset can be found at https://www.reaxys.com/. The source of the ChEMBL dataset can be found at https://www.ebi.ac.uk/chembl/.
Code availability
The source code used in this work are freely available at GitHub repository (https://github.com/CathyZakZhao/Classifier-for-Trk-Inhibitor).
References
Cocco E, Scaltriti M, Drilon A (2018) NTRK fusion-positive cancers and TRK inhibitor therapy. Nat Rev Clin Oncol 15:731–747. https://doi.org/10.1038/s41571-018-0113-0
Huang EJ, Reichardt LF (2001) Neurotrophins: roles in neuronal development and function. Annu Rev Neurosci 24:677–736. https://doi.org/10.1146/annurev.neuro.24.1.677
Huse M, Kuriyan J (2002) The conformational plasticity of protein kinases. Cell 109:275–282. https://doi.org/10.1016/S0092-8674(02)00741-9
Demir IE, Tieftrunk E, Schorn S et al (2016) Nerve growth factor & TrkA as novel therapeutic targets in cancer. Biochim Biophys Acta BBA—Rev Cancer 1866:37–50. https://doi.org/10.1016/j.bbcan.2016.05.003
Skaper SD (2018) Neurotrophic factors: an overview. In: Skaper SD (ed) Neurotrophic factors. Springer, New York, pp 1–17
Stephens RM, Loeb DM, Copeland TD et al (1994) Trk receptors use redundant signal transduction pathways involving SHC and PLC-γ1 to mediate NGF responses. Neuron 12:691–705. https://doi.org/10.1016/0896-6273(94)90223-2
Greco A, Fusetti L, Miranda C et al (1998) Role of the TFG N-terminus and coiled-coil domain in the transforming activity of the thyroid TRK-T3 oncogene. Oncogene 16:809–816. https://doi.org/10.1038/sj.onc.1201596
Segal RA (2003) Selectivity in neurotrophin signaling: theme and variations. Annu Rev Neurosci 26:299–330. https://doi.org/10.1146/annurev.neuro.26.041002.131421
Zito Marino F, Pagliuca F, Ronchi A et al (2020) NTRK fusions, from the diagnostic algorithm to innovative treatment in the era of precision medicine. Int J Mol Sci 21:3718. https://doi.org/10.3390/ijms21103718
Scott LJ (2019) Larotrectinib: first global approval. Drugs 79:201–206. https://doi.org/10.1007/s40265-018-1044-x
Al-Salama ZT, Keam SJ (2019) Entrectinib: first global approval. Drugs 79:1477–1483. https://doi.org/10.1007/s40265-019-01177-y
Ardini E, Menichincheri M, Banfi P et al (2016) Entrectinib, a Pan–TRK, ROS1, and ALK inhibitor with activity in multiple molecularly defined cancer indications. Mol Cancer Ther 15:628–639. https://doi.org/10.1158/1535-7163.MCT-15-0758
Federman N, McDermott R (2019) Larotrectinib, a highly selective tropomyosin receptor kinase (TRK) inhibitor for the treatment of TRK fusion cancer. Expert Rev Clin Pharmacol 12:931–939. https://doi.org/10.1080/17512433.2019.1661775
Drilon A, Nagasubramanian R, Blake JF et al (2017) A next-generation TRK kinase inhibitor overcomes acquired resistance to prior TRK kinase inhibition in patients with TRK fusion-positive solid tumors. Cancer Discov 7:963–972. https://doi.org/10.1158/2159-8290.CD-17-0507
Zhai D, Deng W, Huang J et al (2017) Abstract 3161: TPX-0005, an ALK/ROS1/TRK inhibitor, overcomes multiple resistance mechanisms by targeting SRC/FAK signaling. Cancer Res 77:3161–3161. https://doi.org/10.1158/1538-7445.AM2017-3161
Drilon A (2019) TRK inhibitors in TRK fusion-positive cancers. Ann Oncol 30:viii23–viii30. https://doi.org/10.1093/annonc/mdz282
Wang Z, Wang J, Wang Y et al (2022) Discovery of the first highly selective and broadly effective macrocycle-based type II TRK inhibitors that overcome clinically acquired resistance. J Med Chem 65:6325–6337. https://doi.org/10.1021/acs.jmedchem.2c00308
Shoombuatong W, Schaduangrat N, Nantasenamat C (2018) Towards understanding aromatase inhibitory activity via QSAR modeling. Excli J. https://doi.org/10.17179/EXCLI2018-1417
Muratov EN, Bajorath J, Sheridan RP et al (2020) Correction: QSAR without borders. Chem Soc Rev 49:3716–3716. https://doi.org/10.1039/D0CS90041A
Yan W, Zhang L, Lv F et al (2021) Discovery of pyrazolo-thieno[3,2-d]pyrimidinylamino-phenyl acetamides as type-II pan-tropomyosin receptor kinase (TRK) inhibitors: design, synthesis, and biological evaluation. Eur J Med Chem 216:113265. https://doi.org/10.1016/j.ejmech.2021.113265
Ivanova L, Karelson M, Dobchev D (2018) Identification of natural compounds against neurodegenerative diseases using in silico techniques. Molecules 23:1847. https://doi.org/10.3390/molecules23081847
Tammiku-Taul J, Park R, Jaanson K et al (2016) Indole-like Trk receptor antagonists. Eur J Med Chem 121:541–552. https://doi.org/10.1016/j.ejmech.2016.06.003
Er-rajy M, El fadili M, Mujwar S et al (2023) Design of novel anti-cancer drugs targeting TRKs inhibitors based 3D QSAR, molecular docking and molecular dynamics simulation. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2023.2170471
de Boves HP (2015) Support vector machine classification trees. Anal Chem 87:11065–11071. https://doi.org/10.1021/acs.analchem.5b03113
Schonlau M, Zou RY (2020) The random forest algorithm for statistical learning. Stata J Promot Commun Stat Stata 20:3–29. https://doi.org/10.1177/1536867X20909688
Ma J, Sheridan RP, Liaw A et al (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55:263–274. https://doi.org/10.1021/ci500747n
Yang K, Swanson K, Jin W et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59:3370–3388. https://doi.org/10.1021/acs.jcim.9b00237
Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940. https://doi.org/10.1093/nar/gky1075
Sushko I, Novotarskyi S, Körner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25:533–554. https://doi.org/10.1007/s10822-011-9440-2
Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. CATENA 145:164–179. https://doi.org/10.1016/j.catena.2016.06.004
Hajibabaee P, Pourkamali-Anaraki F, Hariri-Ardebili MA (2021) An empirical evaluation of the t-SNE algorithm for data visualization in structural engineering. In: 2021 20th IEEE international conference on machine learning and applications (ICMLA). IEEE, Pasadena, CA, pp 1674–1680
Frades I, Matthiesen R (2010) Overview on techniques in cluster analysis. In: Matthiesen R (ed) Bioinformatics methods in clinical research. Humana Press, Totowa, pp 81–107
Kanungo T, Mount DM, Netanyahu NS et al (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24:881–892. https://doi.org/10.1109/TPAMI.2002.1017616
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. https://doi.org/10.1021/ci100050t
Vilar S, Cozza G, Moro S (2008) Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. Curr Top Med Chem 8:1555–1572. https://doi.org/10.2174/156802608786786624
Riniker S, Landrum GA (2013) Open-source platform to benchmark fingerprints for ligand-based virtual screening. J Cheminformatics 5:26. https://doi.org/10.1186/1758-2946-5-26
RDKit. Open-source cheminformatics software. http://www.rdkit.org. Accessed Oct 2021
Steyerberg E (1999) Stepwise selection in small data sets a simulation study of bias in logistic regression analysis. J Clin Epidemiol 52:935–942. https://doi.org/10.1016/S0895-4356(99)00103-1
Maltarollo VG, Kronenberger T, Espinoza GZ et al (2019) Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 14:23–33. https://doi.org/10.1080/17460441.2019.1549033
Polishchuk PG, Muratov EN, Artemenko AG et al (2009) Application of random forest approach to QSAR prediction of aquatic toxicity. J Chem Inf Model 49:2481–2488. https://doi.org/10.1021/ci900203n
Song Y-Y, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27:130–135. https://doi.org/10.11919/j.issn.1002-0829.215044
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35:352–359. https://doi.org/10.1016/S1532-0464(03)00034-0
Bisong E (2019) More supervised machine learning techniques with Scikit-learn. Building machine learning and deep learning models on google cloud platform. Apress, Berkeley, pp 287–308
Babajide Mustapha I, Saeed F (2016) Bioactive molecule prediction using extreme gradient boosting. Molecules 21:983. https://doi.org/10.3390/molecules21080983
Xiong Z, Wang D, Liu X et al (2020) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63:8749–8760. https://doi.org/10.1021/acs.jmedchem.9b00959
Karpov P, Godin G, Tetko IV (2020) Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminformatics 12:17. https://doi.org/10.1186/s13321-020-00423-w
Pedregosa F, Varoquaux G, Gramfort A, et al Scikit-learn: machine learning in python. Mach Learn PYTHON
Fushiki T (2011) Estimation of prediction error by using K-fold cross-validation. Stat Comput 21:137–146. https://doi.org/10.1007/s11222-009-9153-8
Azar AT, Elshazly HI, Hassanien AE, Elkorany AM (2014) A random forest classifier for lymph diseases. Comput Methods Programs Biomed 113:465–473. https://doi.org/10.1016/j.cmpb.2013.11.004
Priyanka NA, Kumar D (2020) Decision tree classifier: a detailed survey. Int J Inf Decis Sci 12:246. https://doi.org/10.1504/IJIDS.2020.108141
Abu Alfeilat HA, Hassanat ABA, Lasassmeh O et al (2019) Effects of distance measure choice on K-nearest neighbor classifier performance: a review. Big Data 7:221–248. https://doi.org/10.1089/big.2018.0175
Carmona P, Climent F, Momparler A (2019) Predicting failure in the U.S. banking sector: an extreme gradient boosting approach. Int Rev Econ Finance 61:304–323. https://doi.org/10.1016/j.iref.2018.03.008
Walsh I, Fishman D, Garcia-Gasulla D et al (2021) DOME: recommendations for supervised machine learning validation in biology. Nat Methods 18:1122–1127. https://doi.org/10.1038/s41592-021-01205-4
Dorrity MW, Saunders LM, Queitsch C et al (2020) Dimensionality reduction by UMAP to visualize physical and genetic interactions. Nat Commun 11:1537. https://doi.org/10.1038/s41467-020-15351-4
Malik AA, Chotpatiwetchkul W, Phanus-umporn C et al (2021) StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors. J Comput Aided Mol Des 35:1037–1053. https://doi.org/10.1007/s10822-021-00418-1
Jiang D, Wu Z, Hsieh C-Y et al (2021) Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminformatics 13:12. https://doi.org/10.1186/s13321-020-00479-8
Bai P, Miljković F, John B, Lu H (2023) Interpretable bilinear attention network with domain adaptation improves drug–target prediction. Nat Mach Intell 5:126–136. https://doi.org/10.1038/s42256-022-00605-1
Muratov EN, Bajorath J, Sheridan RP et al (2020) QSAR without borders. Chem Soc Rev 49:3525–3564. https://doi.org/10.1039/D0CS00098A
Wang H, Qin Z, Yan A (2021) Classification models and SAR analysis on CysLT1 receptor antagonists using machine learning algorithms. Mol Divers 25:1597–1616. https://doi.org/10.1007/s11030-020-10165-4
Menichincheri M, Ardini E, Magnaghi P et al (2016) Discovery of entrectinib: a new 3-aminoindazole as a potent anaplastic lymphoma kinase (ALK), c-ros oncogene 1 kinase (ROS1), and pan-tropomyosin receptor kinases (Pan-TRKs) inhibitor. J Med Chem 59:3392–3408. https://doi.org/10.1021/acs.jmedchem.6b00064
Ghilardi JR, Freeman KT, Jimenez-Andrade JM et al (2010) Administration of a tropomyosin receptor kinase inhibitor attenuates sarcoma-induced nerve sprouting, neuroma formation and bone cancer pain. Mol Pain 6:1744-8069-6–87. https://doi.org/10.1186/1744-8069-6-87
Drilon A, Ou S-HI, Cho BC et al (2018) Repotrectinib (TPX-0005) is a next-generation ROS1/TRK/ALK inhibitor that potently inhibits ROS1/TRK/ALK solvent- front mutations. Cancer Discov 8:1227–1236. https://doi.org/10.1158/2159-8290.CD-18-0484
Regina A, Elagoz A, Albert V et al (2019) Abstract 2198: PBI-200: a novel, brain penetrant, next generation pan-TRK kinase inhibitor. Cancer Res 79:2198–2198. https://doi.org/10.1158/1538-7445.AM2019-2198
Albanese C, Alzani R, Amboldi N et al (2010) Dual targeting of CDK and tropomyosin receptor kinase families by the oral inhibitor PHA-848125, an agent with broad-spectrum antitumor efficacy. Mol Cancer Ther 9:2243–2254. https://doi.org/10.1158/1535-7163.MCT-10-0190
Funding
This work was supported by The Research on National Reference Material and Product Development of Natural Products (SG030801).
Author information
Authors and Affiliations
Contributions
XZ and YK conceived the experiments. XZ and YJ collected and organized data. XZ evaluated the models. XX and LC performed analysis. CY and GC modified the language. CY contributed to project administration. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors confirm that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, X., Kong, Y., Ji, Y. et al. Classification models for predicting the bioactivity of pan-TRK inhibitors and SAR analysis. Mol Divers (2023). https://doi.org/10.1007/s11030-023-10735-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11030-023-10735-2