Abstract
Cancer is a heterogeneous disease concerning molecular, functional and clinical behaviour, and poses a challenge for timely detection and treatment. Early detection and prognosis of cancer type may facilitate refined clinical management of cancer treatment. Recent technological development, such as next-generation sequencing, generated a large number of omics datasets in cancer genomics. The genome-wide biological information, such as cancer driver mutations, aberrantly methylated regions, gene, and miRNA expression profiles, is helpful for predicting the cancer onset, subtypes, and treatment response and is valuable for improving diagnosis and therapeutic and clinical decisions. In this context, machine learning (ML) algorithms and artificial intelligence have been beneficial and essential for the better accuracy of cancer-related predictions. Here, we mainly focus on research based on these omics data, paying close attention to machine learning methods. We summarize various kinds of omics data and different ML algorithms effective in cancer prediction. We also highlighted the applications of the ML algorithm on genomic information in cancer, including cancer classification, therapy response, survival, metastasis, and biomarker identification. Further we discussed the novel approaches in machine learning for improving cancer prediction. These data-driven approaches can potentially provide a new solution for enhancing the precise treatment of cancer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Ahmed N, Greening D, Samardzija C, Escalona RM, Chen M, Findlay JK et al (2016) Unique proteome signature of post-chemotherapy ovarian cancer ascites-derived tumor cells. Sci Rep 6(1):1–13
Alfardus H, de los Angeles Estevez-Cebrero CM, Rowlinson J, Aboalmaaly A, Lourdusamy A, Abdelrazig S et al (2021) Intratumour heterogeneity in microRNAs expression regulates glioblastoma metabolism. Sci Rep 11(1):1–14
Baek B, Lee H (2020) Prediction of survival and recurrence in patients with pancreatic cancer by integrating multi-omics data. Sci Rep 10(1):1–11
Bam M, Chintala S, Fetcko K, Williamsen BC, Siraj S, Liu S et al (2021) Genome wide DNA methylation landscape reveals glioblastoma’s influence on epigenetic changes in tumor infiltrating CD4+ T cells. Oncotarget 12(10):967
Bannister AJ, Kouzarides T (2011) Regulation of chromatin by histone modifications. Cell Res 21(3):381–395
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M et al (2012) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(D1):D991–D9D5
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(11):2399–2434
Bhattacharya A, Ziebarth JD, Cui Y (2013) SomamiR: a database for somatic mutations impacting microRNA function in cancer. Nucleic Acids Res 41(D1):D977–DD82
Bomane A, Gonçalves A, Ballester PJ (2019) Paclitaxel response can be predicted with interpretable multi-variate classifiers exploiting DNA-methylation and miRNA data. Front Genet 10:1041
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (2017) Classification and regression trees. Routledge, New York
Cantero D, Rodríguez de Lope Á, Moreno De La Presa R, Sepúlveda JM, Borrás JM, Castresana JS et al (2018) Molecular study of long-term survivors of glioblastoma by gene-targeted next-generation sequencing. J Neuropathol Exp Neurol 77(8):710–716
Cardoso JG, Andersen MR, Herrgård MJ, Sonnenschein N (2015) Analysis of genetic variation and potential applications in genome-scale metabolic modeling. Front Bioeng Biotechnol 3:13
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (Chapelle, O. et al., Eds.; 2006) [book reviews]. IEEE Trans Neural Netw 20(3):542
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dagogo-Jack I, Shaw AT (2018) Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol 15(2):81–94
Davis AR, Stone SL, Oran AR, Sussman RT, Bhattacharyya S, Morrissette JJ et al (2021) Targeted massively parallel sequencing of mature lymphoid neoplasms: assessment of empirical application and diagnostic utility in routine clinical practice. Mod Pathol 34(5):904–921
Dietz S, Lifshitz A, Kazdal D, Harms A, Endris V, Winter H et al (2019) Global DNA methylation reflects spatial heterogeneity and molecular evolution of lung adenocarcinomas. Int J Cancer 144(5):1061–1072
Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A (2008) Misc functions of the Department of Statistics (e1071), TU Wien. R Pack 1:5–24
Dimov R, Feld M, Kipp DM, Ndiaye DA, Heckmann DD. Weka: Practical machine learning tools and techniques with java implementations. AI tools Seminar University of Saarland WS University of Waikato, Hamilton 2007;6(07)
Ditterrich T (1997) Machine learning research: four current direction. Artif Intell Mag 4:97–136
Flynn WF, Namburi S, Paisie CA, Reddi HV, Li S, Karuturi RKM et al (2018) Pan-cancer machine learning predictors of primary site of origin and molecular subtype. bioRxiv 2018:333914
Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO et al (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6(269):pl1–ppl
Gao X, Xia X, Li F, Zhang M, Zhou H, Wu X et al (2021) Circular RNA-encoded oncogenic E-cadherin variant promotes glioblastoma tumorigenicity through activation of EGFR–STAT3 signalling. Nat Cell Biol 23(3):278–291
Garrett A-M, Lastakchi S, McConville C (2020) The personalisation of glioblastoma treatment using whole exome sequencing: a pilot study. Gene 11(2):173
Goldman MJ, Craft B, Hastie M, Repečka K, McDade F, Kamath A et al (2020) Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol 38(6):675–678
Gullo RL, Daimiel I, Morris EA, Pinker K (2020) Combining molecular and imaging metrics in cancer: radiogenomics. Insights Imaging 11(1):1–17
Guo M, Peng Y, Gao A, Du C, Herman JG (2019) Epigenetic heterogeneity in cancer. Biomark Res 7(1):1–19
Gupta S, Chatterjee S, Mukherjee A, Mutsuddi M (2017) Whole exome sequencing: uncovering causal genetic variants for ocular diseases. Exp Eye Res 164:139–150
Hagan MT, Demuth HB, Beale M (1997) Neural network design. PWS Publishing, Boston
Hass R, von der Ohe J, Ungefroren H (2020) Impact of the tumor microenvironment on tumor heterogeneity and consequences for cancer cell plasticity and stemness. Cancer 12(12):3716
Hasty P, Montagna C (2014) Chromosomal rearrangements in cancer: detection and potential causal mechanisms. Mol Cell Oncol 1(1):e29904
He X, Chang S, Zhang J, Zhao Q, Xiang H, Kusonmano K et al (2007) MethyCancer: the database of human DNA methylation and cancer. Nucleic Acids Res 36:D836–DD41
Hinton G, Roweis ST (2002) Stochastic neighbor embedding. NIPS, Toronto
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform 2015:198363
Huang H-Y, Li J, Tang Y, Huang Y-X, Chen Y-G, Xie Y-Y et al (2021) MethHC 2.0: Information repository of DNA methylation and gene expression in human cancer. Nucleic Acids Res 49(D1):D1268–D1D75
Ilango S, Paital B, Jayachandran P, Padma PR, Nirmaladevi R (2020) Epigenetic alterations in cancer. Front Biosci 25(1):1058–1109
Kang Y, Pantel K (2013) Tumor cell dissemination: emerging biological insights from animal models and cancer patients. Cancer Cell 23(5):573–581
Kontomanolis EN, Koutras A, Syllaios A, Schizas D, Mastoraki A, Garmpis N et al (2020) Role of oncogenes and tumor-suppressor genes in carcinogenesis: a review. Anticancer Res 40(11):6009–6015
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24
Kramer O (2016) Scikit-learn. Machine learning for evolution strategies. Springer, Cham, pp 45–53
Lee JV, Berry CT, Kim K, Sen P, Kim T, Carrer A et al (2018) Acetyl-CoA promotes glioblastoma cell adhesion and migration through Ca2+-NFAT signaling. Genes Dev 32(7–8):497–511
Lee SC, Quinn A, Nguyen T, Venkatesh S, Quinn TP (2019a) A cross-cancer metastasis signature in the microRNA–mRNA axis of paired tissue samples. Mol Biol Rep 46(6):5919–5930
Lee MH, Kim J, Kim S-T, Shin H-M, You H-J, Choi JW et al (2019b) Prediction of IDH1 mutation status in glioblastoma using machine learning technique based on quantitative radiomic data. World Neurosurg 125:e688–ee96
Legendre C, Gooden GC, Johnson K, Martinez RA, Liang WS, Salhia B (2015) Whole-genome bisulfite sequencing of cell-free DNA identifies signature associated with metastatic breast cancer. Clin Epigenetics 7(1):1–10
Li C, Wu S, Yang Z, Zhang X, Zheng Q, Lin L et al (2017) Single-cell exome sequencing identifies mutations in KCP, LOC440040, and LOC440563 as drivers in renal cell carcinoma stem cells. Cell Res 27(4):590–593
Lightbody G, Haberland V, Browne F, Taggart L, Zheng H, Parkes E et al (2019) Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application. Brief Bioinform 20(5):1795–1811
Liu Y, Huang R, Liu Y, Song W, Wang Y, Yang Y et al (2018) Insights from multidimensional analyses of the pan-cancer DNA methylome heterogeneity and the uncanonical CpG–gene associations. Int J Cancer 143(11):2814–2827
Lu Y, Chan Y-T, Tan H-Y, Li S, Wang N, Feng Y (2020) Epigenetic regulation in human cancer: the potential role of epi-drug in cancer therapy. Mol Cancer 19(1):1–16
Martinez-Gutierrez AD, Catalan OM, Vázquez-Romo R, Porras Reyes FI, Alvarado-Miranda A, Lara Medina F et al (2019) miRNA profile obtained by next-generation sequencing in metastatic breast cancer patients is able to predict the response to systemic treatments. Int J Mol Med 44(4):1267–1280
Marusyk A, Janiszewska M, Polyak K (2020) Intratumor heterogeneity: the Rosetta stone of therapy resistance. Cancer Cell 37(4):471–484
McQuerry JA, Chang JT, Bowtell DD, Cohen A, Bild AH (2017) Mechanisms and clinical implications of tumor heterogeneity and convergence on recurrent phenotypes. J Mol Med 95(11):1167–1178
Meacham CE, Morrison SJ (2013) Tumour heterogeneity and cancer cell plasticity. Nature 501(7467):328–337
Meienberg J, Bruggmann R, Oexle K, Matyas G (2016) Clinical sequencing: is WGS the better WES? Hum Genet 135(3):359–362
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F, Chang C-C et al (2019) Package ‘e1071’. R J 2019:1071
Miranda SP, Baião FA, Fleck JL, Piccolo SR (2021) Predicting drug sensitivity of cancer cells based on DNA methylation levels. PLoS One 16(9):e0238757
Mitchel J, Chatlin K, Tong L, Wang MD (2019) A translational pipeline for overall survival prediction of breast cancer patients by decision-level integration of multi-omics data. In: 2019 IEEE International conference on bioinformatics and biomedicine (BIBM). IEEE, Piscataway, NJ
Mueller JJ, Schlappe BA, Kumar R, Olvera N, Dao F, Abu-Rustum N et al (2018) Massively parallel sequencing analysis of mucinous ovarian carcinomas: genomic profiling and differential diagnoses. Gynecol Oncol 150(1):127–135
Murtaza M, Dawson S-J, Pogrebniak K, Rueda OM, Provenzano E, Grant J et al (2015) Multifocal clonal evolution characterized using circulating tumour DNA in a case of metastatic breast cancer. Nat Commun 6(1):1–6
Nanda JS, Kumar R, Raghava GP (2016) dbEM: a database of epigenetic modifiers curated from cancerous and normal genomes. Sci Rep 6(1):1–6
O’Geen H, Echipare L, Farnham PJ (2011) Using ChIP-seq technology to generate high-resolution profiles of histone modifications. Epigenetics protocols. Springer, Cham, pp 265–286
Oliveira AM, Ross JS, Fletcher JA (2005) Tumor suppressor genes in breast cancer: the gatekeepers and the caretakers. Pathol Patterns Rev 124(suppl_1):S16–S28
Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H et al (2014) Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344(6190):1396–1401
Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Perera RM, Bardeesy N (2012) On oncogenes and tumor suppressor genes in the mammary gland. Cold Spring Harb Perspect Biol 4(6):a013466
Qu H, Zhou M, Yan Z, Wang H, Rustgi VK, Zhang S et al (2021) Genetic mutation and biological pathway prediction based on whole slide images in breast carcinoma using deep learning. NPJ Precis Oncol 5(1):1–11
Rehman O, Zhuang H, Muhamed Ali A, Ibrahim A, Li Z (2019) Validation of miRNAs as breast cancer biomarkers with a machine learning approach. Cancer 11(3):431
Ren L, Li J, Wang C, Lou Z, Gao S, Zhao L et al (2021) Single cell RNA sequencing for breast cancer: present and future. Cell Death Dis 7(1):1–11
Rish I (2001) An empirical study of the Naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence. SAGE, Montreal
Sakthikumar S, Roy A, Haseeb L, Pettersson ME, Sundström E, Marinescu VD et al (2020) Whole-genome sequencing of glioblastoma reveals enrichment of non-coding constraint mutations in known and novel genes. Genome Biol 21:1–22
Saurabh R, Nandi S, Sinha N, Shukla M, Sarkar RR (2020) Prediction of survival rate and effect of drugs on cancer patients with somatic mutations of genes—an AI based approach. Chem Biol Drug Des 96:1005–1019
Shui L, Ren H, Yang X, Li J, Chen Z, Yi C et al (2020) Era of radiogenomics in precision medicine: an emerging approach for prediction of the diagnosis, treatment and prognosis of tumors. Front Oncol 10:3195
Sood A, Miller AM, Brogi E, Sui Y, Armenia J, McDonough E et al (2016) Multiplexed immunofluorescence delineates proteomic cancer cell states associated with metabolism. JCI Insight 1(6):e87030
Tabl AA, Alkhateeb A, ElMaraghy W, Rueda L, Ngom A (2019) A machine learning approach for identifying gene biomarkers guiding the treatment of breast cancer. Front Genet 10:256
Tomczak K, Czerwińska P, Wiznerowicz M (2015) The cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol 19(1A):A68
Torgerson W (1952) The first major MDS breakthrough. Psychometrika 17:401–419
Tuo Y, An N, Zhang M (2018) Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods. Mol Med Rep 17(3):4281–4290
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
Wang L-H, Wu C-F, Rajasekaran N, Shin YK (2018) Loss of tumor suppressor gene function in human cancer: an overview. Cell Physiol Biochem 51(6):2647–2693
Wang N, Zheng J, Chen Z, Liu Y, Dura B, Kwak M et al (2019) Single-cell microRNA-mRNA co-sequencing reveals non-genetic heterogeneity and mechanisms of microRNA regulation. Nat Commun 10(1):1–12
Weigelt B, Bi R, Kumar R, Blecua P, Mandelker DL, Geyer FC et al (2018) The landscape of somatic genetic alterations in breast cancers from ATM germline mutation carriers. J Natl Cancer Inst 110(9):1030–1034
Welch DR, Hurst DR (2019) Defining the hallmarks of metastasis. Cancer Res 79(12):3011–3027
Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, Burlington
Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ (1999) Weka: Practical machine learning tools and techniques with Java implementations. ACM Sigmod Rec 31(1):76–77
Wu J, Hicks C (2021) Breast cancer type classification using machine learning. J Pers Med 11(2):61
Wu H, Xu T, Feng H, Chen L, Li B, Yao B et al (2015) Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res 43(21):e141
Xi Y, Shi J, Li W, Tanaka K, Allton KL, Richardson D et al (2018) Histone modification profiling in breast cancer cell lines highlights commonalities and differences among subtypes. BMC Genomics 19(1):1–11
Yates LR, Knappskog S, Wedge D, Farmery JH, Gonzalez S, Martincorena I et al (2017) Genomic evolution of breast cancer metastasis and relapse. Cancer Cell 32(2):169–184
Zhang X, Zhang M, Hou Y, Xu L, Li W, Zou Z et al (2016) Single-cell analyses of transcriptional heterogeneity in squamous cell carcinoma of urinary bladder. Oncotarget 7(40):66069
Zhang M, Yang D, Gold B (2019a) Origin of mutations in genes associated with human glioblastoma multiform cancer: random polymerase errors versus deamination. Heliyon 5(3):e01265
Zhang J, Bajari R, Andric D, Gerthoffert F, Lepsa A, Nahal-Bose H et al (2019b) The international cancer genome consortium data portal. Nat Biotechnol 37(4):367–369
Zhang Y-H, Li Z, Zeng T, Pan X, Chen L, Liu D et al (2020) Distinguishing glioblastoma subtypes by methylation signatures. Front Genet 11:1482
Zhao EY, Jones M, Jones SJ (2019) Whole-genome sequencing in cancer. Cold Spring Harb Perspect Med 9(3):a034579
Zhao Z, Zhang K-N, Wang Q, Li G, Zeng F, Zhang Y et al (2021) Chinese Glioma Genome Atlas (CGGA): a comprehensive resource with functional genomic data from Chinese gliomas. Genomics Proteomics Bioinformatics 19(1):1–2
Acknowledgements
CM acknowledges the Department of Science and Technology (DST) for the support as an Inspire Faculty (Award No. IFA19-PH248).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Gawade, P., Nandi, S., Meena, C., Sarkar, R.R. (2022). Artificial Intelligence and Machine Learning Techniques Using Omics Data for Cancer Diagnosis and Treatment. In: Singh, S. (eds) Systems Biomedicine Approaches in Cancer Research. Springer, Singapore. https://doi.org/10.1007/978-981-19-1953-4_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-1953-4_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1952-7
Online ISBN: 978-981-19-1953-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)