skip to main content
10.1145/3459930.3471168acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Histological classification of non-small cell lung cancer with RNA-seq data using machine learning models

Published:13 September 2021Publication History

ABSTRACT

This study develops an automated model using the supervised learning framework(s) for the classification of the histological subtypes of non-small cell lung cancer (NSCLC). The machine learning (ML) approach is performed on gene expression profiles for the diagnosis of lung cancer that is the primary cause of cancer deaths worldwide. The performance of five classical Machine Learning (ML) estimators and four ensemble ML classifiers are evaluated on an RNA-Sequence dataset of 127 cases of NSCLC. The Decision Tree (DT) and Bagging models show promising classification accuracy up to 100% and area under curves (AUCs) is more than 0.97. The implemented ensemble methods collectively exhibit good performance in terms of AUCs (0.68 -- 1.00). The findings are comparable to the high precision ML models and the results provide an insight into the supervised models that can achieve higher diagnosis accuracy on RNA-Seq-based gene expression profiles of NSCLC subtypes.

References

  1. Mohammad Ali Abbas, Syed Usama Khalid Bukhari, Asmara Syed, and Syed Sajid Hussain Shah. 2020. The Histopathological Diagnosis of Adenocarcinoma & Squamous Cells Carcinoma of Lungs by Artificial intelligence: A Comparative Study of Convolutional Neural Networks. medRxiv (2020).Google ScholarGoogle Scholar
  2. Abedalrhman Alkhateeb, Iman Rezaeian, Siva Singireddy, Dora Cavallo-Medved, Lisa A Porter, and Luis Rueda. 2019. Transcriptomics Signature from Next-generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer. Cancer informatics 18 (2019), 1176935119835522.Google ScholarGoogle Scholar
  3. Farzana Anowar, Samira Sadaoui, and Bassant Selim. 2021. Conceptual and Empirical Comparison of Dimensionality Reduction Algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Computer Science Review 40 (2021), 100378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michele Avanzo, Joseph Stancanello, Giovanni Pirrone, and Giovanna Sartor. 2020. Radiomics and Deep Learning in Lung Cancer. Strahlentherapie und Onkologie (2020), 1--9.Google ScholarGoogle Scholar
  5. Usman Bashir, Bhavin Kawa, Muhammad Siddique, Sze Mun Mak, Arjun Nair, Emma Mclean, Andrea Bille, Vicky Goh, and Gary Cook. 2019. Non-invasive Classification of Non-small Cell Lung Cancer: A Comparison Between Random Forest Models Utilising Radiomic and Semantic Features. The British journal of radiology 92, 1099 (2019), 20190159.Google ScholarGoogle Scholar
  6. Arunim Garg and Vijay Mago. 2021. Role of Machine Learning in Medical Research: A Survey. Computer Science Review 40 (2021), 100370.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yong Han, Yuan Ma, Zhiyuan Wu, Feng Zhang, Deqiang Zheng, Xiangtong Liu, Lixin Tao, Zhigang Liang, Zhi Yang, Xia Li, et al. 2021. Histologic Subtype Classification of Non-small Cell Lung Cancer Using PET/CT Images. European journal of nuclear medicine and molecular imaging 48, 2 (2021), 350--360.Google ScholarGoogle Scholar
  8. Milos Hauskrecht, Richard Pelikan, Michal Valko, and James Lyons-Weiler. 2007. Feature Selection and Dimensionality Reduction in Genomics and Proteomics. In Fundamentals of data mining in genomics and proteomics. Springer, 149--172.Google ScholarGoogle Scholar
  9. Samuel H Hawkins, John N Korecki, Yoganand Balagurunathan, Yuhua Gu, Virendra Kumar, Satrajit Basu, Lawrence O Hall, Dmitry B Goldgof, Robert A Gatenby, and Robert J Gillies. 2014. Predicting Outcomes of Nonsmall Cell Lung Cancer Using CT Image Features. IEEE access 2 (2014), 1418--1426.Google ScholarGoogle ScholarCross RefCross Ref
  10. Nathan T Johnson, Andi Dhroso, Katelyn J Hughes, and Dmitry Korkin. 2018. Biological Classification with RNA-seq Data: Can Alternatively Spliced Transcript Expression Enhance Machine Learning Classifiers? Rna 24, 9 (2018), 1119--1132.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jumpei Kashima, Rui Kitadai, and Yusuke Okuma. 2019. Molecular and Morphological Profiling of Lung Cancer: A Foundation for "Next-generation" Pathologists and Oncologists. Cancers 11, 5 (2019), 599.Google ScholarGoogle Scholar
  12. E Linning, Lin Lu, Li Li, Hao Yang, Lawrence H Schwartz, and Binsheng Zhao. 2019. Radiomics for Classification of Lung Cancer Histological Subtypes Based on Nonenhanced Computed Tomography. Academic radiology 26, 9 (2019), 1245--1252.Google ScholarGoogle Scholar
  13. Larissa A Pikor, Varune R Ramnarine, Stephen Lam, and Wan L Lam. 2013. Genetic Alterations Defining NSCLC Subtypes and Their Therapeutic Implications. Lung cancer 82, 2 (2013), 179--189.Google ScholarGoogle Scholar
  14. Mehdi Pirooznia, Jack Y Yang, Mary Qu Yang, and Youping Deng. 2008. A Comparative Study of Different Machine Learning Methods on Microarray Gene Expression Data. BMC genomics 9, 1 (2008), 1--13.Google ScholarGoogle Scholar
  15. Md Khurram Monir Rabby, AKM Kamrul Islam, Saeid Belkasim, and Marwan U Bikdash. 2021. Epileptic Seizures Classification in EEG Using PCA Based Genetic Algorithm Through Machine Learning. In Proceedings of the 2021 ACM Southeast Conference. 17--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Md Khurram Monir Rabby, AKM Kamrul Islam, Saeid Belkasim, and Marwan U Bikdash. 2021. Wavelet Transform-based Feature Extraction Approach for Epileptic Seizure Classification. In Proceedings of the 2021 ACM Southeast Conference. 164--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sterling Ramroach, Melford John, and Ajay Joshi. 2019. The Efficacy of Various Machine Learning Models for Multi-class Classification of RNA-seq Expression Data. In Intelligent Computing-Proceedings of the Computing Conference. Springer, 918--928.Google ScholarGoogle Scholar
  18. Sterling Ramroach, Ajay Joshi, and Melford John. 2020. Optimisation of Cancer Classification by Machine Learning Generates an Enriched List of Candidate Drug Targets and Biomarkers. Molecular omics 16, 2 (2020), 113--125.Google ScholarGoogle Scholar
  19. Sebastian Raschka and Vahid Mirjalili. 2017. Python Machine Learning: Machine Learning and Deep Learning with Python. Scikit-Learn, and TensorFlow. Second edition ed (2017).Google ScholarGoogle Scholar
  20. Shingo Sakashita, Mai Sakashita, and Ming Sound Tsao. 2014. Genes and Pathology of Non-small Cell Lung Carcinoma. In Seminars in oncology, Vol. 41. Elsevier, 28--39.Google ScholarGoogle Scholar
  21. Ran Su, Jiahang Zhang, Xiaofeng Liu, and Leyi Wei. 2020. Identification of Expression Signatures for Non-small-cell Lung Carcinoma Subtype Classification. Bioinformatics 36, 2 (2020), 339--346.Google ScholarGoogle Scholar
  22. Reinel Tabares-Soto, Simon Orozco-Arias, Victor Romero-Cano, Vanesa Segovia Bucheli, José Luis Rodríguez-Sotelo, and Cristian Felipe Jiménez-Varón. 2020. A Comparative Study of Machine Learning and Deep Learning Algorithms to Classify Cancer Types Based on Microarray Gene Expression Data. PeerJ Computer Science 6 (2020), e270.Google ScholarGoogle ScholarCross RefCross Ref
  23. C Devi Arockia Vanitha, D Devaraj, and M Venkatesulu. 2015. Gene Expression Data Classification Using Support Vector Machine and Mutual Information-based Gene Selection. procedia computer science 47 (2015), 13--21.Google ScholarGoogle Scholar
  24. Shudong Wang, Liyuan Dong, Xun Wang, and Xingguang Wang. 2020. Classification of Pathological Types of Lung Cancer from CT Images by Deep Residual Neural Networks with Transfer Learning Strategy. Open Medicine 15, 1 (2020), 190--197.Google ScholarGoogle ScholarCross RefCross Ref
  25. Xiaowei Xu, Runping Hou, Wangyuan Zhao, Haohua Teng, Jianqi Sun, and Jun Zhao. 2020. A Weak Supervision-based Framework for Automatic Lung Cancer Classification on Whole Slide Image. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 1372--1375.Google ScholarGoogle ScholarCross RefCross Ref
  26. Fengchang Yang, Wei Chen, Haifeng Wei, Xianru Zhang, Shuanghu Yuan, Xu Qiao, and Yen-Wei Chen. 2020. Machine Learning for histologic Subtype Classification of Non-small Cell Lung Cancer: A Retrospective Multicenter Radiomics Study. Frontiers in Oncology 10 (2020).Google ScholarGoogle Scholar
  27. Kun-Hsing Yu, Feiran Wang, Gerald J Berry, Christopher Re, Russ B Altman, Michael Snyder, and Isaac S Kohane. 2019. Classifying Non-small Cell Lung Cancer Histopathology Types and Transcriptomic Subtypes Using Convolutional Neural Networks. bioRxiv (2019), 530360.Google ScholarGoogle Scholar
  28. Lingming Yu, Guangyu Tao, Lei Zhu, Gang Wang, Ziming Li, Jianding Ye, and Qunhui Chen. 2019. Prediction of Pathologic Stage in Non-small Cell Lung Cancer Using Machine Learning Algorithm Based on CT Image Feature Analysis. BMC cancer 19, 1 (2019), 1--12.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Histological classification of non-small cell lung cancer with RNA-seq data using machine learning models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
      August 2021
      603 pages
      ISBN:9781450384506
      DOI:10.1145/3459930

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 September 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate254of885submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader