ABSTRACT
This study develops an automated model using the supervised learning framework(s) for the classification of the histological subtypes of non-small cell lung cancer (NSCLC). The machine learning (ML) approach is performed on gene expression profiles for the diagnosis of lung cancer that is the primary cause of cancer deaths worldwide. The performance of five classical Machine Learning (ML) estimators and four ensemble ML classifiers are evaluated on an RNA-Sequence dataset of 127 cases of NSCLC. The Decision Tree (DT) and Bagging models show promising classification accuracy up to 100% and area under curves (AUCs) is more than 0.97. The implemented ensemble methods collectively exhibit good performance in terms of AUCs (0.68 -- 1.00). The findings are comparable to the high precision ML models and the results provide an insight into the supervised models that can achieve higher diagnosis accuracy on RNA-Seq-based gene expression profiles of NSCLC subtypes.
- Mohammad Ali Abbas, Syed Usama Khalid Bukhari, Asmara Syed, and Syed Sajid Hussain Shah. 2020. The Histopathological Diagnosis of Adenocarcinoma & Squamous Cells Carcinoma of Lungs by Artificial intelligence: A Comparative Study of Convolutional Neural Networks. medRxiv (2020).Google Scholar
- Abedalrhman Alkhateeb, Iman Rezaeian, Siva Singireddy, Dora Cavallo-Medved, Lisa A Porter, and Luis Rueda. 2019. Transcriptomics Signature from Next-generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer. Cancer informatics 18 (2019), 1176935119835522.Google Scholar
- Farzana Anowar, Samira Sadaoui, and Bassant Selim. 2021. Conceptual and Empirical Comparison of Dimensionality Reduction Algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Computer Science Review 40 (2021), 100378.Google ScholarDigital Library
- Michele Avanzo, Joseph Stancanello, Giovanni Pirrone, and Giovanna Sartor. 2020. Radiomics and Deep Learning in Lung Cancer. Strahlentherapie und Onkologie (2020), 1--9.Google Scholar
- Usman Bashir, Bhavin Kawa, Muhammad Siddique, Sze Mun Mak, Arjun Nair, Emma Mclean, Andrea Bille, Vicky Goh, and Gary Cook. 2019. Non-invasive Classification of Non-small Cell Lung Cancer: A Comparison Between Random Forest Models Utilising Radiomic and Semantic Features. The British journal of radiology 92, 1099 (2019), 20190159.Google Scholar
- Arunim Garg and Vijay Mago. 2021. Role of Machine Learning in Medical Research: A Survey. Computer Science Review 40 (2021), 100370.Google ScholarDigital Library
- Yong Han, Yuan Ma, Zhiyuan Wu, Feng Zhang, Deqiang Zheng, Xiangtong Liu, Lixin Tao, Zhigang Liang, Zhi Yang, Xia Li, et al. 2021. Histologic Subtype Classification of Non-small Cell Lung Cancer Using PET/CT Images. European journal of nuclear medicine and molecular imaging 48, 2 (2021), 350--360.Google Scholar
- Milos Hauskrecht, Richard Pelikan, Michal Valko, and James Lyons-Weiler. 2007. Feature Selection and Dimensionality Reduction in Genomics and Proteomics. In Fundamentals of data mining in genomics and proteomics. Springer, 149--172.Google Scholar
- Samuel H Hawkins, John N Korecki, Yoganand Balagurunathan, Yuhua Gu, Virendra Kumar, Satrajit Basu, Lawrence O Hall, Dmitry B Goldgof, Robert A Gatenby, and Robert J Gillies. 2014. Predicting Outcomes of Nonsmall Cell Lung Cancer Using CT Image Features. IEEE access 2 (2014), 1418--1426.Google ScholarCross Ref
- Nathan T Johnson, Andi Dhroso, Katelyn J Hughes, and Dmitry Korkin. 2018. Biological Classification with RNA-seq Data: Can Alternatively Spliced Transcript Expression Enhance Machine Learning Classifiers? Rna 24, 9 (2018), 1119--1132.Google ScholarCross Ref
- Jumpei Kashima, Rui Kitadai, and Yusuke Okuma. 2019. Molecular and Morphological Profiling of Lung Cancer: A Foundation for "Next-generation" Pathologists and Oncologists. Cancers 11, 5 (2019), 599.Google Scholar
- E Linning, Lin Lu, Li Li, Hao Yang, Lawrence H Schwartz, and Binsheng Zhao. 2019. Radiomics for Classification of Lung Cancer Histological Subtypes Based on Nonenhanced Computed Tomography. Academic radiology 26, 9 (2019), 1245--1252.Google Scholar
- Larissa A Pikor, Varune R Ramnarine, Stephen Lam, and Wan L Lam. 2013. Genetic Alterations Defining NSCLC Subtypes and Their Therapeutic Implications. Lung cancer 82, 2 (2013), 179--189.Google Scholar
- Mehdi Pirooznia, Jack Y Yang, Mary Qu Yang, and Youping Deng. 2008. A Comparative Study of Different Machine Learning Methods on Microarray Gene Expression Data. BMC genomics 9, 1 (2008), 1--13.Google Scholar
- Md Khurram Monir Rabby, AKM Kamrul Islam, Saeid Belkasim, and Marwan U Bikdash. 2021. Epileptic Seizures Classification in EEG Using PCA Based Genetic Algorithm Through Machine Learning. In Proceedings of the 2021 ACM Southeast Conference. 17--24.Google ScholarDigital Library
- Md Khurram Monir Rabby, AKM Kamrul Islam, Saeid Belkasim, and Marwan U Bikdash. 2021. Wavelet Transform-based Feature Extraction Approach for Epileptic Seizure Classification. In Proceedings of the 2021 ACM Southeast Conference. 164--169.Google ScholarDigital Library
- Sterling Ramroach, Melford John, and Ajay Joshi. 2019. The Efficacy of Various Machine Learning Models for Multi-class Classification of RNA-seq Expression Data. In Intelligent Computing-Proceedings of the Computing Conference. Springer, 918--928.Google Scholar
- Sterling Ramroach, Ajay Joshi, and Melford John. 2020. Optimisation of Cancer Classification by Machine Learning Generates an Enriched List of Candidate Drug Targets and Biomarkers. Molecular omics 16, 2 (2020), 113--125.Google Scholar
- Sebastian Raschka and Vahid Mirjalili. 2017. Python Machine Learning: Machine Learning and Deep Learning with Python. Scikit-Learn, and TensorFlow. Second edition ed (2017).Google Scholar
- Shingo Sakashita, Mai Sakashita, and Ming Sound Tsao. 2014. Genes and Pathology of Non-small Cell Lung Carcinoma. In Seminars in oncology, Vol. 41. Elsevier, 28--39.Google Scholar
- Ran Su, Jiahang Zhang, Xiaofeng Liu, and Leyi Wei. 2020. Identification of Expression Signatures for Non-small-cell Lung Carcinoma Subtype Classification. Bioinformatics 36, 2 (2020), 339--346.Google Scholar
- Reinel Tabares-Soto, Simon Orozco-Arias, Victor Romero-Cano, Vanesa Segovia Bucheli, José Luis Rodríguez-Sotelo, and Cristian Felipe Jiménez-Varón. 2020. A Comparative Study of Machine Learning and Deep Learning Algorithms to Classify Cancer Types Based on Microarray Gene Expression Data. PeerJ Computer Science 6 (2020), e270.Google ScholarCross Ref
- C Devi Arockia Vanitha, D Devaraj, and M Venkatesulu. 2015. Gene Expression Data Classification Using Support Vector Machine and Mutual Information-based Gene Selection. procedia computer science 47 (2015), 13--21.Google Scholar
- Shudong Wang, Liyuan Dong, Xun Wang, and Xingguang Wang. 2020. Classification of Pathological Types of Lung Cancer from CT Images by Deep Residual Neural Networks with Transfer Learning Strategy. Open Medicine 15, 1 (2020), 190--197.Google ScholarCross Ref
- Xiaowei Xu, Runping Hou, Wangyuan Zhao, Haohua Teng, Jianqi Sun, and Jun Zhao. 2020. A Weak Supervision-based Framework for Automatic Lung Cancer Classification on Whole Slide Image. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 1372--1375.Google ScholarCross Ref
- Fengchang Yang, Wei Chen, Haifeng Wei, Xianru Zhang, Shuanghu Yuan, Xu Qiao, and Yen-Wei Chen. 2020. Machine Learning for histologic Subtype Classification of Non-small Cell Lung Cancer: A Retrospective Multicenter Radiomics Study. Frontiers in Oncology 10 (2020).Google Scholar
- Kun-Hsing Yu, Feiran Wang, Gerald J Berry, Christopher Re, Russ B Altman, Michael Snyder, and Isaac S Kohane. 2019. Classifying Non-small Cell Lung Cancer Histopathology Types and Transcriptomic Subtypes Using Convolutional Neural Networks. bioRxiv (2019), 530360.Google Scholar
- Lingming Yu, Guangyu Tao, Lei Zhu, Gang Wang, Ziming Li, Jianding Ye, and Qunhui Chen. 2019. Prediction of Pathologic Stage in Non-small Cell Lung Cancer Using Machine Learning Algorithm Based on CT Image Feature Analysis. BMC cancer 19, 1 (2019), 1--12.Google ScholarCross Ref
Index Terms
- Histological classification of non-small cell lung cancer with RNA-seq data using machine learning models
Recommendations
Lung Cancer Prediction Using Stochastic Diffusion Search (SDS) Based Feature Selection and Machine Learning Methods
AbstractThe symptoms of cancer normally appear only in the advanced stages, so it is very hard to detect resulting in a high mortality rate among the other types of cancers. Thus, there is a need for early prediction of lung cancer for the purpose of ...
Comparison of Fusion Methodologies Using CNV and RNA-Seq for Cancer Classification: A Case Study on Non-Small-Cell Lung Cancer
Bioengineering and Biomedical Signal and Image ProcessingAbstractLung cancer is one of the most frequent cancer types, and one among those causing more deceases worldwide. Nowadays, in order to improve the diagnosis of cancer more screenings are performed to the same patient and various biological sources are ...
Deciphering unclassified tumors of non-small-cell lung cancer through radiomics
BackgroundTumors are highly heterogeneous at the phenotypic, physiologic, and genomic levels. They are categorized in terms of a differentiated appearance under a microscope. Non-small-cell lung cancer tumors are categorized into three main subgroups: ...
Comments