Skip to main content
Log in

Classification models for predicting the bioactivity of pan-TRK inhibitors and SAR analysis

  • Original Article
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

Tropomyosin receptor kinases (TRKs) are important broad-spectrum anticancer targets. The oncogenic rearrangement of the NTRK gene disrupts the extracellular structural domain and epitopes for therapeutic antibodies, making small-molecule inhibitors essential for treating NTRK fusion-driven tumors. In this work, several algorithms were used to construct descriptor-based and nondescriptor-based models, and the models were evaluated by outer 10-fold cross-validation. To find a model with good generalization ability, the dataset was partitioned by random and cluster-splitting methods to construct in- and cross-domain models, respectively. Among the 48 models built, the model with the combination of the deep neural network (DNN) algorithm and extended connectivity fingerprints 4 (ECFP4) descriptors achieved excellent performance in both dataset divisions. The results indicate that the DNN algorithm has a strong generalization prediction ability, and the richness of features plays a vital role in predicting unknown spatial molecules. Additionally, we combined the clustering results and decision tree models of fingerprint descriptors to perform structure–activity relationship analysis. It was found that nitrogen-containing aromatic heterocyclic and benzo heterocyclic structures play a crucial role in enhancing the activity of TRK inhibitors.

Graphical abstract

Workflow for generating predictive models for TRK inhibitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets used in this work can be found in the Supplementary Information. The source of the Reaxys dataset can be found at https://www.reaxys.com/. The source of the ChEMBL dataset can be found at https://www.ebi.ac.uk/chembl/.

Code availability

The source code used in this work are freely available at GitHub repository (https://github.com/CathyZakZhao/Classifier-for-Trk-Inhibitor).

References

  1. Cocco E, Scaltriti M, Drilon A (2018) NTRK fusion-positive cancers and TRK inhibitor therapy. Nat Rev Clin Oncol 15:731–747. https://doi.org/10.1038/s41571-018-0113-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Huang EJ, Reichardt LF (2001) Neurotrophins: roles in neuronal development and function. Annu Rev Neurosci 24:677–736. https://doi.org/10.1146/annurev.neuro.24.1.677

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Huse M, Kuriyan J (2002) The conformational plasticity of protein kinases. Cell 109:275–282. https://doi.org/10.1016/S0092-8674(02)00741-9

    Article  CAS  PubMed  Google Scholar 

  4. Demir IE, Tieftrunk E, Schorn S et al (2016) Nerve growth factor & TrkA as novel therapeutic targets in cancer. Biochim Biophys Acta BBA—Rev Cancer 1866:37–50. https://doi.org/10.1016/j.bbcan.2016.05.003

    Article  CAS  Google Scholar 

  5. Skaper SD (2018) Neurotrophic factors: an overview. In: Skaper SD (ed) Neurotrophic factors. Springer, New York, pp 1–17

    Chapter  Google Scholar 

  6. Stephens RM, Loeb DM, Copeland TD et al (1994) Trk receptors use redundant signal transduction pathways involving SHC and PLC-γ1 to mediate NGF responses. Neuron 12:691–705. https://doi.org/10.1016/0896-6273(94)90223-2

    Article  CAS  PubMed  Google Scholar 

  7. Greco A, Fusetti L, Miranda C et al (1998) Role of the TFG N-terminus and coiled-coil domain in the transforming activity of the thyroid TRK-T3 oncogene. Oncogene 16:809–816. https://doi.org/10.1038/sj.onc.1201596

    Article  CAS  PubMed  Google Scholar 

  8. Segal RA (2003) Selectivity in neurotrophin signaling: theme and variations. Annu Rev Neurosci 26:299–330. https://doi.org/10.1146/annurev.neuro.26.041002.131421

    Article  CAS  PubMed  Google Scholar 

  9. Zito Marino F, Pagliuca F, Ronchi A et al (2020) NTRK fusions, from the diagnostic algorithm to innovative treatment in the era of precision medicine. Int J Mol Sci 21:3718. https://doi.org/10.3390/ijms21103718

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Scott LJ (2019) Larotrectinib: first global approval. Drugs 79:201–206. https://doi.org/10.1007/s40265-018-1044-x

    Article  CAS  PubMed  Google Scholar 

  11. Al-Salama ZT, Keam SJ (2019) Entrectinib: first global approval. Drugs 79:1477–1483. https://doi.org/10.1007/s40265-019-01177-y

    Article  PubMed  Google Scholar 

  12. Ardini E, Menichincheri M, Banfi P et al (2016) Entrectinib, a Pan–TRK, ROS1, and ALK inhibitor with activity in multiple molecularly defined cancer indications. Mol Cancer Ther 15:628–639. https://doi.org/10.1158/1535-7163.MCT-15-0758

    Article  CAS  PubMed  Google Scholar 

  13. Federman N, McDermott R (2019) Larotrectinib, a highly selective tropomyosin receptor kinase (TRK) inhibitor for the treatment of TRK fusion cancer. Expert Rev Clin Pharmacol 12:931–939. https://doi.org/10.1080/17512433.2019.1661775

    Article  CAS  PubMed  Google Scholar 

  14. Drilon A, Nagasubramanian R, Blake JF et al (2017) A next-generation TRK kinase inhibitor overcomes acquired resistance to prior TRK kinase inhibition in patients with TRK fusion-positive solid tumors. Cancer Discov 7:963–972. https://doi.org/10.1158/2159-8290.CD-17-0507

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhai D, Deng W, Huang J et al (2017) Abstract 3161: TPX-0005, an ALK/ROS1/TRK inhibitor, overcomes multiple resistance mechanisms by targeting SRC/FAK signaling. Cancer Res 77:3161–3161. https://doi.org/10.1158/1538-7445.AM2017-3161

    Article  Google Scholar 

  16. Drilon A (2019) TRK inhibitors in TRK fusion-positive cancers. Ann Oncol 30:viii23–viii30. https://doi.org/10.1093/annonc/mdz282

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wang Z, Wang J, Wang Y et al (2022) Discovery of the first highly selective and broadly effective macrocycle-based type II TRK inhibitors that overcome clinically acquired resistance. J Med Chem 65:6325–6337. https://doi.org/10.1021/acs.jmedchem.2c00308

    Article  CAS  PubMed  Google Scholar 

  18. Shoombuatong W, Schaduangrat N, Nantasenamat C (2018) Towards understanding aromatase inhibitory activity via QSAR modeling. Excli J. https://doi.org/10.17179/EXCLI2018-1417

    Article  PubMed  PubMed Central  Google Scholar 

  19. Muratov EN, Bajorath J, Sheridan RP et al (2020) Correction: QSAR without borders. Chem Soc Rev 49:3716–3716. https://doi.org/10.1039/D0CS90041A

    Article  CAS  PubMed  Google Scholar 

  20. Yan W, Zhang L, Lv F et al (2021) Discovery of pyrazolo-thieno[3,2-d]pyrimidinylamino-phenyl acetamides as type-II pan-tropomyosin receptor kinase (TRK) inhibitors: design, synthesis, and biological evaluation. Eur J Med Chem 216:113265. https://doi.org/10.1016/j.ejmech.2021.113265

    Article  CAS  PubMed  Google Scholar 

  21. Ivanova L, Karelson M, Dobchev D (2018) Identification of natural compounds against neurodegenerative diseases using in silico techniques. Molecules 23:1847. https://doi.org/10.3390/molecules23081847

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Tammiku-Taul J, Park R, Jaanson K et al (2016) Indole-like Trk receptor antagonists. Eur J Med Chem 121:541–552. https://doi.org/10.1016/j.ejmech.2016.06.003

    Article  CAS  PubMed  Google Scholar 

  23. Er-rajy M, El fadili M, Mujwar S et al (2023) Design of novel anti-cancer drugs targeting TRKs inhibitors based 3D QSAR, molecular docking and molecular dynamics simulation. J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2023.2170471

    Article  PubMed  Google Scholar 

  24. de Boves HP (2015) Support vector machine classification trees. Anal Chem 87:11065–11071. https://doi.org/10.1021/acs.analchem.5b03113

    Article  CAS  Google Scholar 

  25. Schonlau M, Zou RY (2020) The random forest algorithm for statistical learning. Stata J Promot Commun Stat Stata 20:3–29. https://doi.org/10.1177/1536867X20909688

    Article  Google Scholar 

  26. Ma J, Sheridan RP, Liaw A et al (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55:263–274. https://doi.org/10.1021/ci500747n

    Article  CAS  PubMed  Google Scholar 

  27. Yang K, Swanson K, Jin W et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59:3370–3388. https://doi.org/10.1021/acs.jcim.9b00237

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940. https://doi.org/10.1093/nar/gky1075

    Article  CAS  PubMed  Google Scholar 

  29. Sushko I, Novotarskyi S, Körner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25:533–554. https://doi.org/10.1007/s10822-011-9440-2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. CATENA 145:164–179. https://doi.org/10.1016/j.catena.2016.06.004

    Article  Google Scholar 

  31. Hajibabaee P, Pourkamali-Anaraki F, Hariri-Ardebili MA (2021) An empirical evaluation of the t-SNE algorithm for data visualization in structural engineering. In: 2021 20th IEEE international conference on machine learning and applications (ICMLA). IEEE, Pasadena, CA, pp 1674–1680

  32. Frades I, Matthiesen R (2010) Overview on techniques in cluster analysis. In: Matthiesen R (ed) Bioinformatics methods in clinical research. Humana Press, Totowa, pp 81–107

    Chapter  Google Scholar 

  33. Kanungo T, Mount DM, Netanyahu NS et al (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24:881–892. https://doi.org/10.1109/TPAMI.2002.1017616

    Article  Google Scholar 

  34. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. https://doi.org/10.1021/ci100050t

    Article  CAS  PubMed  Google Scholar 

  35. Vilar S, Cozza G, Moro S (2008) Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. Curr Top Med Chem 8:1555–1572. https://doi.org/10.2174/156802608786786624

    Article  CAS  PubMed  Google Scholar 

  36. Riniker S, Landrum GA (2013) Open-source platform to benchmark fingerprints for ligand-based virtual screening. J Cheminformatics 5:26. https://doi.org/10.1186/1758-2946-5-26

    Article  CAS  Google Scholar 

  37. RDKit. Open-source cheminformatics software. http://www.rdkit.org. Accessed Oct 2021

  38. Steyerberg E (1999) Stepwise selection in small data sets a simulation study of bias in logistic regression analysis. J Clin Epidemiol 52:935–942. https://doi.org/10.1016/S0895-4356(99)00103-1

    Article  CAS  PubMed  Google Scholar 

  39. Maltarollo VG, Kronenberger T, Espinoza GZ et al (2019) Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 14:23–33. https://doi.org/10.1080/17460441.2019.1549033

    Article  CAS  PubMed  Google Scholar 

  40. Polishchuk PG, Muratov EN, Artemenko AG et al (2009) Application of random forest approach to QSAR prediction of aquatic toxicity. J Chem Inf Model 49:2481–2488. https://doi.org/10.1021/ci900203n

    Article  CAS  PubMed  Google Scholar 

  41. Song Y-Y, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27:130–135. https://doi.org/10.11919/j.issn.1002-0829.215044

    Article  PubMed  PubMed Central  Google Scholar 

  42. Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35:352–359. https://doi.org/10.1016/S1532-0464(03)00034-0

    Article  PubMed  Google Scholar 

  43. Bisong E (2019) More supervised machine learning techniques with Scikit-learn. Building machine learning and deep learning models on google cloud platform. Apress, Berkeley, pp 287–308

    Chapter  Google Scholar 

  44. Babajide Mustapha I, Saeed F (2016) Bioactive molecule prediction using extreme gradient boosting. Molecules 21:983. https://doi.org/10.3390/molecules21080983

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Xiong Z, Wang D, Liu X et al (2020) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63:8749–8760. https://doi.org/10.1021/acs.jmedchem.9b00959

    Article  CAS  PubMed  Google Scholar 

  46. Karpov P, Godin G, Tetko IV (2020) Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminformatics 12:17. https://doi.org/10.1186/s13321-020-00423-w

    Article  Google Scholar 

  47. Pedregosa F, Varoquaux G, Gramfort A, et al Scikit-learn: machine learning in python. Mach Learn PYTHON

  48. Fushiki T (2011) Estimation of prediction error by using K-fold cross-validation. Stat Comput 21:137–146. https://doi.org/10.1007/s11222-009-9153-8

    Article  Google Scholar 

  49. Azar AT, Elshazly HI, Hassanien AE, Elkorany AM (2014) A random forest classifier for lymph diseases. Comput Methods Programs Biomed 113:465–473. https://doi.org/10.1016/j.cmpb.2013.11.004

    Article  PubMed  Google Scholar 

  50. Priyanka NA, Kumar D (2020) Decision tree classifier: a detailed survey. Int J Inf Decis Sci 12:246. https://doi.org/10.1504/IJIDS.2020.108141

    Article  Google Scholar 

  51. Abu Alfeilat HA, Hassanat ABA, Lasassmeh O et al (2019) Effects of distance measure choice on K-nearest neighbor classifier performance: a review. Big Data 7:221–248. https://doi.org/10.1089/big.2018.0175

    Article  PubMed  Google Scholar 

  52. Carmona P, Climent F, Momparler A (2019) Predicting failure in the U.S. banking sector: an extreme gradient boosting approach. Int Rev Econ Finance 61:304–323. https://doi.org/10.1016/j.iref.2018.03.008

    Article  Google Scholar 

  53. Walsh I, Fishman D, Garcia-Gasulla D et al (2021) DOME: recommendations for supervised machine learning validation in biology. Nat Methods 18:1122–1127. https://doi.org/10.1038/s41592-021-01205-4

    Article  CAS  PubMed  Google Scholar 

  54. Dorrity MW, Saunders LM, Queitsch C et al (2020) Dimensionality reduction by UMAP to visualize physical and genetic interactions. Nat Commun 11:1537. https://doi.org/10.1038/s41467-020-15351-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Malik AA, Chotpatiwetchkul W, Phanus-umporn C et al (2021) StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors. J Comput Aided Mol Des 35:1037–1053. https://doi.org/10.1007/s10822-021-00418-1

    Article  CAS  PubMed  Google Scholar 

  56. Jiang D, Wu Z, Hsieh C-Y et al (2021) Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminformatics 13:12. https://doi.org/10.1186/s13321-020-00479-8

    Article  CAS  Google Scholar 

  57. Bai P, Miljković F, John B, Lu H (2023) Interpretable bilinear attention network with domain adaptation improves drug–target prediction. Nat Mach Intell 5:126–136. https://doi.org/10.1038/s42256-022-00605-1

    Article  Google Scholar 

  58. Muratov EN, Bajorath J, Sheridan RP et al (2020) QSAR without borders. Chem Soc Rev 49:3525–3564. https://doi.org/10.1039/D0CS00098A

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Wang H, Qin Z, Yan A (2021) Classification models and SAR analysis on CysLT1 receptor antagonists using machine learning algorithms. Mol Divers 25:1597–1616. https://doi.org/10.1007/s11030-020-10165-4

    Article  CAS  PubMed  Google Scholar 

  60. Menichincheri M, Ardini E, Magnaghi P et al (2016) Discovery of entrectinib: a new 3-aminoindazole as a potent anaplastic lymphoma kinase (ALK), c-ros oncogene 1 kinase (ROS1), and pan-tropomyosin receptor kinases (Pan-TRKs) inhibitor. J Med Chem 59:3392–3408. https://doi.org/10.1021/acs.jmedchem.6b00064

    Article  CAS  PubMed  Google Scholar 

  61. Ghilardi JR, Freeman KT, Jimenez-Andrade JM et al (2010) Administration of a tropomyosin receptor kinase inhibitor attenuates sarcoma-induced nerve sprouting, neuroma formation and bone cancer pain. Mol Pain 6:1744-8069-6–87. https://doi.org/10.1186/1744-8069-6-87

    Article  CAS  Google Scholar 

  62. Drilon A, Ou S-HI, Cho BC et al (2018) Repotrectinib (TPX-0005) is a next-generation ROS1/TRK/ALK inhibitor that potently inhibits ROS1/TRK/ALK solvent- front mutations. Cancer Discov 8:1227–1236. https://doi.org/10.1158/2159-8290.CD-18-0484

    Article  CAS  PubMed  Google Scholar 

  63. Regina A, Elagoz A, Albert V et al (2019) Abstract 2198: PBI-200: a novel, brain penetrant, next generation pan-TRK kinase inhibitor. Cancer Res 79:2198–2198. https://doi.org/10.1158/1538-7445.AM2019-2198

    Article  Google Scholar 

  64. Albanese C, Alzani R, Amboldi N et al (2010) Dual targeting of CDK and tropomyosin receptor kinase families by the oral inhibitor PHA-848125, an agent with broad-spectrum antitumor efficacy. Mol Cancer Ther 9:2243–2254. https://doi.org/10.1158/1535-7163.MCT-10-0190

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

This work was supported by The Research on National Reference Material and Product Development of Natural Products (SG030801).

Author information

Authors and Affiliations

Authors

Contributions

XZ and YK conceived the experiments. XZ and YJ collected and organized data. XZ evaluated the models. XX and LC performed analysis. CY and GC modified the language. CY contributed to project administration. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Changyuan Yu.

Ethics declarations

Conflicts of interest

The authors confirm that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 997 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Kong, Y., Ji, Y. et al. Classification models for predicting the bioactivity of pan-TRK inhibitors and SAR analysis. Mol Divers (2023). https://doi.org/10.1007/s11030-023-10735-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11030-023-10735-2

Keywords

Navigation