Skip to main content

Machine Learning and Deep Learning in Genetics and Genomics

  • Chapter
  • First Online:
Book cover Machine Learning in Dentistry

Abstract

In this chapter, we introduce various machine learning (ML) methods and deep learning (DL) algorithms, commonly adopted in genomics data analysis. We begin with a general introduction of genomics data and present a multi-omics study investigating early childhood oral health. We then review statistical methods and ML/DL methods and their application in genomics data analysis that include the following aspects: (1) association between genetic markers, mostly single nucleotide polymorphisms (SNPs), and complex diseases or traits in genome-wide association studies (GWAS), (2) copy number variation (CNV), and single nucleotide variant (SNV) calling in whole genome sequencing (WGS) or whole exome sequencing (WES) data of tumor samples, (3) association between DNA methylation status and phenotypes, which are commonly referred to as epigenome-wide association studies (EWAS), (4) analysis of genome-wide high-throughput chromosome conformation capture (Hi-C) data, (5) inference related to transcription factor binding sites (TF), and (6) single-cell RNA-seq data analysis. To complete the review, we present the results of a systematic review of the machine learning landscape in oral diseases. We conclude with a discussion of potential future applications of ML/DL in genetics and genomics in oral health.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943;5(4):115–33.

    Article  Google Scholar 

  2. Park WJ, Park J-B. History and application of artificial neural networks in dentistry. Eur J Dent. 2018;12(04):594–601.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Lin E, Lane H-Y. Machine learning and systems genomics approaches for multi-omics data. Biomarker Res. 2017;5(1):2.

    Article  Google Scholar 

  4. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Informn Proc Syst. 2012;25:1097–105.

    Google Scholar 

  5. Hung M, Voss MW, Rosales MN, Li W, Su W, Xu J, et al. Application of machine learning for diagnostic prediction of root caries. Gerodontology. 2019;36(4):395–404.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Liu Z, Liu J, Zhou Z, Zhang Q, Wu H, Zhai G, et al. Differential diagnosis of ameloblastoma and odontogenic keratocyst by machine learning of panoramic radiographs. Int J Comput Assist Radiol Surg. 2021;16(3):415–22

    Article  PubMed  PubMed Central  Google Scholar 

  7. Abdalla-Aslan R, Yeshua T, Kabla D, Leichter I, Nadler C. An artificial intelligence system using machine-learning for automatic detection and classification of dental restorations in panoramic radiography. Oral Surg Oral Med Oral Pathol Oral Radiol. 2020;130(5):593–602.

    Article  PubMed  Google Scholar 

  8. Xie X, Wang L, Wang A. Artificial neural network modeling for deciding if extractions are necessary prior to orthodontic treatment. Angle Orthod. 2010;80(2):262–6.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Montenegro RD, Oliveira AL, Cabral GG, Katz CR, Rosenblatt A. A comparative study of machine learning techniques for caries prediction. In: 2008 20th IEEE International Conference on tools with artificial intelligence. Piscataway, NJ: IEEE; 2008. p. 477–81.

    Chapter  Google Scholar 

  10. Patil S, Habib Awan K, Arakeri G, Jayampath Seneviratne C, Muddur N, Malik S, et al. Machine learning and its potential applications to the genomic study of head and neck cancer—a systematic review. J Oral Pathol Med. 2019;48(9):773–9.

    Article  PubMed  Google Scholar 

  11. Kebschull M, Papapanou PN. Exploring genome-wide expression profiles using machine learning techniques. Methods Oral Biol. 2017;1537:347–64. Springer

    Article  Google Scholar 

  12. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype–phenotype interactions. Nat Rev Genet. 2015;16(2):85–97.

    Article  PubMed  Google Scholar 

  13. Misra BB, Langefeld C, Olivier M, Cox LA. Integrated omics: tools, advances and future approaches. J Mol Endocrinol. 2019;62(1):R21–45.

    Article  Google Scholar 

  14. Fröhlich H, Patjoshi S, Yeghiazaryan K, Kehrer C, Kuhn W, Golubnitschaja O. Premenopausal breast cancer: potential clinical utility of a multi-omics based machine learning approach for patient stratification. EPMA J. 2018;9(2):175–86.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Divaris K. Fundamentals of precision medicine. Compend Contin Educ Dent. 2017;38(8 Suppl):30–2.

    PubMed  PubMed Central  Google Scholar 

  16. Selwitz RH, Ismail AI, Pitts NB. Dental caries. Lancet. 2007;369(9555):51–9. https://doi.org/10.1016/S0140-6736(07)60031-2.

    Article  PubMed  Google Scholar 

  17. Divaris K. Predicting dental caries outcomes in children: a “risky” concept. J Dent Res. 2016;95(3):248–54. https://doi.org/10.1177/0022034515620779.

    Article  PubMed  Google Scholar 

  18. Burne RA, Zeng L, Ahn SJ, Palmer SR, Liu Y, Lefebure T, et al. Progress dissecting the oral microbiome in caries and health. Adv Dent Res. 2012;24(2):77–80. https://doi.org/10.1177/0022034512449462.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Marsh PD. Microbial ecology of dental plaque and its significance in health and disease. Adv Dent Res. 1994;8(2):263–71. https://doi.org/10.1177/08959374940080022001.

    Article  PubMed  Google Scholar 

  20. Nyvad B, Crielaard W, Mira A, Takahashi N, Beighton D. Dental caries from a molecular microbiological perspective. Caries Res. 2013;47(2):89–102. https://doi.org/10.1159/000345367.

    Article  PubMed  Google Scholar 

  21. Falsetta ML, Klein MI, Colonne PM, Scott-Anne K, Gregoire S, Pai CH, et al. Symbiotic relationship between Streptococcus mutants and Candida albicans synergizes virulence of plaque biofilms in vivo. Infect Immun. 2014;82(5):1968–81. https://doi.org/10.1128/IAI.00087-14.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Delisle AL, Guo M, Chalmers NI, Barcak GJ, Rousseau GM, Moineau S. Biology and genome sequence of Streptococcus mutans phage M102AD. Appl Environ Microbiol. 2012;78(7):2264–71. https://doi.org/10.1128/AEM.07726-11.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Divaris K, Joshi A. The building blocks of precision oral health in early childhood: the ZOE 2.0 study. J Public Health Dent. 2018;80(Suppl 1):S31–6. https://doi.org/10.1111/jphd.12303.

    Article  PubMed  Google Scholar 

  24. Ginnis J, Ferreira Zandona AG, Slade GD, Cantrell J, Antonio ME, Pahel BT, et al. Measurement of early childhood Oral health for research purposes: dental caries experience and developmental defects of the enamel in the primary dentition. Methods Mol Biol. 1922;2019:511–23. https://doi.org/10.1007/978-1-4939-9012-2_39.

    Article  Google Scholar 

  25. Divaris K, Shungin D, Rodriguez-Cortes A, Basta PV, Roach J, Cho H, et al. The Supragingival biofilm in early childhood caries: clinical and laboratory protocols and bioinformatics pipelines supporting metagenomics, Metatranscriptomics, and metabolomics studies of the Oral microbiome. Methods Mol Biol. 1922;2019:525–48. https://doi.org/10.1007/978-1-4939-9012-2_40.

    Article  Google Scholar 

  26. Haworth S, Esberg A, Lif Holgerson P, Kuja-Halkola R, Timpson NJ, Magnusson PKE, et al. Heritability of caries scores, trajectories, and disease subtypes. J Dent Res. 2020;99(3):264–70. https://doi.org/10.1177/0022034519897910.

    Article  PubMed  Google Scholar 

  27. Shaffer JR, Feingold E, Wang X, Tcuenco KT, Weeks DE, DeSensi RS, et al. Heritable patterns of tooth decay in the permanent dentition: principal components and factor analyses. BMC Oral Health. 2012;12:7. https://doi.org/10.1186/1472-6831-12-7.

    Article  PubMed  PubMed Central  Google Scholar 

  28. GlobalSurg C. Writing g, patient r, statistical a, protocol d, project s, et al. global variation in anastomosis and end colostomy formation following left-sided colorectal resection. BJS Open. 2019;3(3):403–14. https://doi.org/10.1002/bjs5.50138.

    Article  Google Scholar 

  29. Divaris K. Searching deep and wide: advances in the molecular understanding of dental caries and periodontal disease. Adv Dent Res. 2019;30(2):40–4. https://doi.org/10.1177/0022034519877387.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46. https://doi.org/10.1186/gb-2014-15-3-r46.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17(3):377–86. https://doi.org/10.1101/gr.5969107.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Brady A, Salzberg S. PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat Methods. 2011;8(5):367. https://doi.org/10.1038/nmeth0511-367.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Craig J. Complex diseases: research and applications. Nature Education. 2008;1(1):184.

    Google Scholar 

  34. The Human Genome Project. https://www.genome.gov/human-genome-project. 2018; Accessed 2020.

  35. The International HapMap Consortium. The international HapMap project. Nature. 2003;426(6968):789–96.

    Article  Google Scholar 

  36. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–320.

    Article  PubMed Central  Google Scholar 

  37. The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–61.

    Article  PubMed Central  Google Scholar 

  38. The International HapMap Consortium. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467(7311):52–8. https://doi.org/10.1038/nature09298.

    Article  Google Scholar 

  39. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061–73. http://www.nature.com/nature/journal/v467/n7319/abs/nature09534.html#supplementary-information

    Article  PubMed Central  Google Scholar 

  40. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. https://doi.org/10.1038/nature11632.

    Article  PubMed  Google Scholar 

  41. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. https://doi.org/10.1038/nature15393.

    Article  Google Scholar 

  42. MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 2017;45(D1):D896–d901. https://doi.org/10.1093/nar/gkw1133.

    Article  PubMed  Google Scholar 

  43. Zhang Y, Liu JS. Bayesian inference of epistatic interactions in case-control studies. Nat Genet. 2007;39(9):1167–73.

    Article  PubMed  Google Scholar 

  44. Han B, Chen X-W, Talebizadeh Z. FEPI-MB: identifying SNPs-disease association using a Markov Blanket-based approach. BMC Bioinform. 2011;12(Suppl 12):S3.

    Article  Google Scholar 

  45. Uppu S, Krishna A, Gopalan RP. A review on methods for detecting SNP interactions in high-dimensional genomic data. IEEE/ACM Trans Comput Biol Bioinform. 2016;15(2):599–612.

    Article  PubMed  Google Scholar 

  46. Jiang R, Tang W, Wu X, Fu W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinform. 2009;10(1):S65.

    Article  Google Scholar 

  47. De Lobel L, Geurts P, Baele G, Castro-Giner F, Kogevinas M, Van Steen K. A screening methodology based on random forests to improve the detection of gene–gene interactions. Eur J Hum Genet. 2010;18(10):1127–32.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Yoshida M, Koike A. SNPInterForest: a new method for detecting epistatic interactions. BMC Bioinform. 2011;12(1):469.

    Article  Google Scholar 

  49. Schwarz DF, König IR, Ziegler A. On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics. 2010;26(14):1752–8.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Wu Q, Ye Y, Liu Y, Ng MK. SNP selection and classification of genome-wide SNP data using stratified sampling random forests. IEEE Trans Nanobioscience. 2012;11(3):216–27.

    Article  PubMed  Google Scholar 

  51. Lin HY, Ann Chen Y, Tsai YY, Qu X, Tseng TS, Park JY. TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions. Ann Hum Genet. 2012;76(1):53–62.

    Article  PubMed  Google Scholar 

  52. Pan Q, Hu T, Malley JD, Andrew AS, Karagas MR, Moore JH. Supervising random forest using attribute interaction networks. European conference on evolutionary computation, machine learning and data mining in bioinformatics. Berlin: Springer; 2013. p. 104–16.

    Google Scholar 

  53. Chen SH, Sun J, Dimitrov L, Turner AR, Adams TS, Meyers DA, et al. A support vector machine approach for detecting gene-gene interaction. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society. 2008;32(2):152–67.

    Article  Google Scholar 

  54. Özgür A, Vu T, Erkan G, Radev DR. Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics. 2008;24(13):i277–i85.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Shen Y, Liu Z, Ott J. Support vector machines with L 1 penalty for detecting gene-gene interactions. Int J Data Min Bioinform. 2012;6(5):463–70.

    Article  PubMed  Google Scholar 

  56. Fang YH, Chiu YF. SVM-based generalized multifactor dimensionality reduction approaches for detecting gene-gene interactions in family studies. Genet Epidemiol. 2012;36(2):88–98.

    Article  PubMed  Google Scholar 

  57. Marvel S, Motsinger-Reif A. Grammatical evolution support vector machines for predicting human genetic disease association. Proceedings of the 14th annual conference companion on Genetic and evolutionary computation 2012. p. 595–8.

    Google Scholar 

  58. Zhang H, Wang H, Dai Z, Chen M-S, Yuan Z. Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinform. 2012;13(1):298.

    Article  Google Scholar 

  59. Lin Y, Jeon Y. Random forests and adaptive nearest neighbors. J Am Stat Assoc. 2006;101(474):578–90. https://doi.org/10.1198/016214505000001230.

    Article  Google Scholar 

  60. Koo CL, Liew MJ, Mohamad MS, Salleh M, Hakim A. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. Biomed Res Int. 2013;2013:432375.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Roller E, Ivakhno S, Lee S, Royce T, Tanner S. Canvas: versatile and scalable detection of copy number variants. Bioinformatics. 2016;32(15):2375–7.

    Article  PubMed  Google Scholar 

  62. Ivakhno S, Roller E, Colombo C, Tedder P, Cox AJ. Canvas SPW: calling de novo copy number variants in pedigrees. Bioinformatics. 2018;34(3):516–8.

    Article  PubMed  Google Scholar 

  63. Wang Z, Hormozdiari F, Yang W-Y, Halperin E, Eskin E. CNVeM: copy number variation detection using uncertainty of read mapping. J Comput Biol. 2013;20(3):224–36.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Nguyen HT, Merriman TR, Black MA. The CNVrd2 package: measurement of copy number at complex loci using high-throughput sequencing data. Front Genet. 2014;5:248.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Miller CA, Hampton O, Coarfa C, Milosavljevic A. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One. 2011;6(1):e16327.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Aure MR, Vitelli V, Jernström S, Kumar S, Krohn M, Due EU, et al. Integrative clustering reveals a novel split in the luminal a subtype of breast cancer with impact on outcome. Breast Cancer Res. 2017;19(1):44. https://doi.org/10.1186/s13058-017-0812-y.

    Article  PubMed  PubMed Central  Google Scholar 

  67. Karim MR, Rahman A, Jares JB, Decker S, Beyan O. A snapshot neural ensemble method for cancer-type prediction based on copy number variations. Neural Comput & Applic. 2019:1–19.

    Google Scholar 

  68. AlShibli A, Mathkour H. A shallow convolutional learning network for classification of cancers based on copy number variations. Sensors. 2019;19(19):4207.

    Article  PubMed Central  Google Scholar 

  69. Fortin J-P, Triche TJ Jr, Hansen KD. Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi. Bioinformatics. 2017;33(4):558–60.

    Article  PubMed  Google Scholar 

  70. Robertson KD. DNA methylation and human disease. Nat Rev Genet. 2005;6(8):597–610.

    Article  PubMed  Google Scholar 

  71. Jiang Y, Oldridge DA, Diskin SJ, Zhang NR. CODEX: a normalization and copy number variation detection method for whole exome sequencing. Nucleic Acids Res. 2015;43(6):e39-e.

    Article  Google Scholar 

  72. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17(11):1665–74.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, et al. QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 2007;35(6):2013–25.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Zhang Z, Cheng H, Hong X, Di Narzo AF, Franzen O, Peng S, et al. EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data. Nucleic Acids Res. 2019;47(7):e39-e.

    Article  Google Scholar 

  75. Pounraja VK, Jayakar G, Jensen M, Kelkar N, Girirajan S. A machine-learning approach for accurate detection of copy number variants from exome sequencing. Genome Res. 2019;29(7):1134–43.

    Article  PubMed  PubMed Central  Google Scholar 

  76. Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983–7.

    Article  PubMed  Google Scholar 

  77. Hill T, Unckless RL. A deep learning approach for detecting copy number variation in next-generation sequencing data. G3: Genes, Genomes, Genetics. 2019;9(11):3575–82.

    Article  Google Scholar 

  78. Zhang Y, Jin L, Wang B, Hu D, Wang L, Li P, et al. DL-CNV: a deep learning method for identifying copy number variations based on next generation target sequencing. Math Biosci Eng: MBE. 2019;17(1):202–15.

    Article  PubMed  Google Scholar 

  79. Jiang Y, Qiu Y, Minn AJ, Zhang NR. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc Natl Acad Sci. 2016;113(37):E5528–E37.

    Article  PubMed  PubMed Central  Google Scholar 

  80. Liu J, Halloran JT, Bilmes JA, Daza RM, Lee C, Mahen EM, et al. Comprehensive statistical inference of the clonal structure of cancer from multiple biopsies. Sci Rep. 2017;7(1):1–13.

    Google Scholar 

  81. Holder LB, Haque MM, Skinner MK. Machine learning for epigenetics and future medical applications. Epigenetics. 2017;12(7):505–14.

    Article  PubMed  PubMed Central  Google Scholar 

  82. Ni P, Huang N, Zhang Z, Wang D-P, Liang F, Miao Y, et al. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics. 2019;35(22):4586–95.

    Article  PubMed  Google Scholar 

  83. Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 2017;18(1):67.

    Article  PubMed  PubMed Central  Google Scholar 

  84. Zhang W, Spector TD, Deloukas P, Bell JT, Engelhardt BE. Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome Biol. 2015;16(1):14.

    Article  PubMed  PubMed Central  Google Scholar 

  85. Zhang G, Huang KC, Xu Z, Tzeng JY, Conneely KN, Guan W, et al. Across-platform imputation of DNA methylation levels incorporating nonlocal information using penalized functional regression. Genet Epidemiol. 2016;40(4):333–40. https://doi.org/10.1002/gepi.21969.

    Article  PubMed  PubMed Central  Google Scholar 

  86. Zeng H, Gifford DK. Predicting the impact of non-coding variants on DNA methylation. Nucleic Acids Res. 2017;45(11):e99-e.

    Article  Google Scholar 

  87. Capper D, Jones DT, Sill M, Hovestadt V, Schrimpf D, Sturm D, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469–74.

    Article  PubMed  PubMed Central  Google Scholar 

  88. Cai Z, Xu D, Zhang Q, Zhang J, Ngai S-M, Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol BioSyst. 2015;11(3):791–800.

    Article  PubMed  Google Scholar 

  89. Wei SH, Balch C, Paik HH, Kim Y-S, Baldwin RL, Liyanarachchi S, et al. Prognostic DNA methylation biomarkers in ovarian cancer. Clin Cancer Res. 2006;12(9):2788–94.

    Article  PubMed  Google Scholar 

  90. Aran D, Sabato S, Hellman A. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 2013;14(3):R21.

    Article  PubMed  PubMed Central  Google Scholar 

  91. Forcato M, Nicoletti C, Pal K, Livi CM, Ferrari F, Bicciato S. Comparison of computational methods for Hi-C data analysis. Nat Methods. 2017;14(7):679–85. https://doi.org/10.1038/nmeth.4325.

    Article  PubMed  PubMed Central  Google Scholar 

  92. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80.

    Article  PubMed  PubMed Central  Google Scholar 

  93. Bonev B, Mendelson Cohen N, Szabo Q, Fritsch L, Papadopoulos GL, Lubling Y, et al. Multiscale 3D genome rewiring during mouse neural development. Cell. 2017;171(3):557–72.e24. https://doi.org/10.1016/j.cell.2017.09.043.

    Article  PubMed  PubMed Central  Google Scholar 

  94. Jin F, Li Y, Dixon JR, Selvaraj S, Ye Z, Lee AY, et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature. 2013;503(7475):290–4. https://doi.org/10.1038/nature12644.

    Article  PubMed  PubMed Central  Google Scholar 

  95. Zhang Y, An L, Xu J, Zhang B, Zheng WJ, Hu M, et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun. 2018;9(1):750. https://doi.org/10.1038/s41467-018-03113-2.

    Article  PubMed  PubMed Central  Google Scholar 

  96. Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics. 2019;35(21):4222–8. https://doi.org/10.1093/bioinformatics/btz251.

    Article  PubMed  PubMed Central  Google Scholar 

  97. Liu Q, Lv H, Jiang R. hicGAN infers super resolution Hi-C data with generative adversarial networks. Bioinformatics. 2019;35(14):i99–i107. https://doi.org/10.1093/bioinformatics/btz317.

    Article  PubMed  PubMed Central  Google Scholar 

  98. Lajoie BR, Dekker J, Kaplan N. The Hitchhiker’s guide to Hi-C analysis: practical guidelines. Methods. 2015;72:65–75. https://doi.org/10.1016/j.ymeth.2014.10.031.

    Article  PubMed  Google Scholar 

  99. Yaffe E, Tanay A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet. 2011;43(11):1059–65. https://doi.org/10.1038/ng.947.

    Article  PubMed  Google Scholar 

  100. Hu M, Deng K, Selvaraj S, Qin Z, Ren B, Liu JS. HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics. 2012;28(23):3131–3. https://doi.org/10.1093/bioinformatics/bts570.

    Article  PubMed  PubMed Central  Google Scholar 

  101. Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Methods. 2012;9(10):999–1003. https://doi.org/10.1038/nmeth.2148.

    Article  PubMed  PubMed Central  Google Scholar 

  102. Li Y, Hu M, Shen Y. Gene regulation in the 3D genome. Hum Mol Genet. 2018;27(R2):R228–r33. https://doi.org/10.1093/hmg/ddy164.

    Article  PubMed  PubMed Central  Google Scholar 

  103. Yu M, Ren B. The three-dimensional Organization of Mammalian Genomes. Annu Rev Cell Dev Biol. 2017;33:265–89. https://doi.org/10.1146/annurev-cellbio-100616-060531.

    Article  PubMed  PubMed Central  Google Scholar 

  104. Crowley C, Yang Y, Qiu Y, Hu B, Won H, Ren B, et al. FIREcaller: an R package for detecting frequently interacting regions from Hi-C data. bioRxiv. 2019; 619288. https://doi.org/10.1101/619288.

  105. Schmitt AD, Hu M, Jung I, Xu Z, Qiu Y, Tan CL, et al. A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep. 2016;17(8):2042–59. https://doi.org/10.1016/j.celrep.2016.10.061.

    Article  PubMed  PubMed Central  Google Scholar 

  106. Rao Suhas SP, Huntley Miriam H, Durand Neva C, Stamenova Elena K, Bochkov Ivan D, Robinson James T, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80. https://doi.org/10.1016/j.cell.2014.11.021.

    Article  PubMed  PubMed Central  Google Scholar 

  107. Kaul A, Bhattacharyya S, Ay F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Nat Protoc. 2020;15(3):991–1012. https://doi.org/10.1038/s41596-019-0273-0.

    Article  PubMed  PubMed Central  Google Scholar 

  108. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 2014; https://doi.org/10.1101/gr.160374.113.

  109. Juric I, Yu M, Abnousi A, Raviram R, Fang R, Zhao Y, et al. MAPS: model-based analysis of long-range chromatin interactions from PLAC-seq and HiChIP experiments. PLoS Comput Biol. 2019;15(4):e1006982. https://doi.org/10.1371/journal.pcbi.1006982.

    Article  PubMed  PubMed Central  Google Scholar 

  110. Xu Z, Zhang G, Jin F, Chen M, Furey TS, Sullivan PF, et al. A hidden Markov random field-based Bayesian method for the detection of long-range chromosomal interactions in Hi-C data. Bioinformatics. 2016;32(5):650–6. https://doi.org/10.1093/bioinformatics/btv650.

    Article  PubMed  Google Scholar 

  111. Xu Z, Zhang G, Wu C, Li Y, Hu M. FastHiC: a fast and accurate algorithm to detect long-range chromosomal interactions from Hi-C data. Bioinformatics. 2016;32(17):2692–5. https://doi.org/10.1093/bioinformatics/btw240.

    Article  PubMed  PubMed Central  Google Scholar 

  112. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 2014;24(6):999–1011. https://doi.org/10.1101/gr.160374.113.

    Article  PubMed  PubMed Central  Google Scholar 

  113. Lawrence CE, Reilly AA. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. 1990;7(1):41–51. https://doi.org/10.1002/prot.340070105.

    Article  PubMed  Google Scholar 

  114. Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006;34(Web Server issue):W369–73. https://doi.org/10.1093/nar/gkl198.

    Article  PubMed  PubMed Central  Google Scholar 

  115. Moses AM, Chiang DY, Eisen MB. Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. Pac Symp Biocomput. 2004:324–35. https://doi.org/10.1142/9789812704856_0031.

  116. Prakash A, Blanchette M, Sinha S, Tompa M. Motif discovery in heterogeneous sequence data. Pac Symp Biocomput. 2004:348–59. https://doi.org/10.1142/9789812704856_0033.

  117. Sinha S, Blanchette M, Tompa M. PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinform. 2004;5:170. https://doi.org/10.1186/1471-2105-5-170.

    Article  Google Scholar 

  118. Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33(8):831–8. https://doi.org/10.1038/nbt.3300.

    Article  PubMed  Google Scholar 

  119. Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27(12):1696–7.

    Article  PubMed  PubMed Central  Google Scholar 

  120. Foat BC, Morozov AV, Bussemaker HJ. Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics. 2006;22(14):e141–e9.

    Article  PubMed  Google Scholar 

  121. Quang D, Xie X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods. 2019;166:40–7. https://doi.org/10.1016/j.ymeth.2019.03.020.

    Article  PubMed  PubMed Central  Google Scholar 

  122. Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12(10):931–4. https://doi.org/10.1038/nmeth.3547.

    Article  PubMed  PubMed Central  Google Scholar 

  123. Ritchie GR, Dunham I, Zeggini E, Flicek P. Functional annotation of noncoding sequence variants. Nat Methods. 2014;11(3):294–6. https://doi.org/10.1038/nmeth.2832.

    Article  PubMed  PubMed Central  Google Scholar 

  124. Wang M, Tai C, Weinan E, Wei L. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res. 2018;46(11):e69. https://doi.org/10.1093/nar/gky215.

    Article  PubMed  PubMed Central  Google Scholar 

  125. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM III, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888–1902.e21.

    Article  PubMed  PubMed Central  Google Scholar 

  126. Adey AC. Integration of single-cell genomics datasets. Cell. 2019;177(7):1677–9.

    Article  PubMed  Google Scholar 

  127. Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell. 2019;177:1873–1887.e17.

    Article  PubMed  PubMed Central  Google Scholar 

  128. Li G, Yang Y, Van Buren E, Li Y. Dropout imputation and batch effect correction for single-cell RNA sequencing data. J Bio-X Res. 2019;2(4):169–77.

    Google Scholar 

  129. Bengio Y. Learning deep architectures for AI. Foundations and trends® in. Mach Learn. 2009;2(1):1–127.

    Google Scholar 

  130. Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. Adv Neural Inform Proc Syst. 2015:649–57.

    Google Scholar 

  131. Deng Y, Bao F, Dai Q, Wu LF, Altschuler SJ. Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning. Nat Methods. 2019;16(4):311–4.

    Article  PubMed  PubMed Central  Google Scholar 

  132. Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15(12):1053–8.

    Article  PubMed  PubMed Central  Google Scholar 

  133. Van Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ, et al. Recovering gene interactions from single-cell data using data diffusion. Cell. 2018;174(3):716–729.e27.

    Article  PubMed  PubMed Central  Google Scholar 

  134. Eraslan G, Simon LM, Mircea M, Mueller NS, Theis FJ. Single-cell RNA-seq denoising using a deep count autoencoder. Nat Commun. 2019;10(1):1–14.

    Article  Google Scholar 

  135. Way GP, Greene CS. Bayesian deep learning for single-cell analysis. Nat Methods. 2018;15(12):1009–10.

    Article  PubMed  Google Scholar 

  136. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Adv Neural Inform Process Syst. 2014;3:2672–80.

    Google Scholar 

  137. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15.

    Article  PubMed  PubMed Central  Google Scholar 

  138. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502.

    Article  PubMed  PubMed Central  Google Scholar 

  139. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32(4):381.

    Article  PubMed  PubMed Central  Google Scholar 

  140. Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014;11(7):740.

    Article  PubMed  PubMed Central  Google Scholar 

  141. Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015;16(1):278.

    Article  PubMed  PubMed Central  Google Scholar 

  142. Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8(1):1–12.

    Article  Google Scholar 

  143. Lun AT, McCarthy DJ, Marioni JC. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research. 2016;5:2122.

    PubMed  PubMed Central  Google Scholar 

  144. Chen W-P, Chang S-H, Tang C-Y, Liou M-L, Tsai S-JJ, Lin Y-L. Composition analysis and feature selection of the oral microbiota associated with periodontal disease. Biomed Res Int. 2018

    Google Scholar 

  145. Nakano Y, Suzuki N, Kuwata F. Predicting oral malodour based on the microbiota in saliva samples using a deep learning approach. BMC Oral Health. 2018;18(1):128.

    Article  PubMed  PubMed Central  Google Scholar 

  146. Hsieh C-H, Chen W-M, Hsieh Y-S, Fan Y-C, Yang PE, Kang S-T, et al. A novel multi-gene detection platform for the analysis of miRNA expression. Sci Rep. 2018;8(1):1–9.

    Article  Google Scholar 

  147. Saxena D, Caufield PW, Li Y, Brown S, Song J, Norman R. Genetic classification of severe early childhood caries by use of subtracted DNA fragments from Streptococcus mutans. J Clin Microbiol. 2008;46(9):2868–73.

    Article  PubMed  PubMed Central  Google Scholar 

  148. Carnielli CM, Macedo CCS, De Rossi T, Granato DC, Rivera C, Domingues RR, et al. Combining discovery and targeted proteomics reveals a prognostic signature in oral cancer. Nat Commun. 2018;9(1):1–17.

    Article  Google Scholar 

  149. Torres PJ, Thompson J, McLean JS, Kelley ST, Edlund A. Discovery of a novel periodontal disease-associated bacterium. Microb Ecol. 2019;77(1):267–76.

    Article  PubMed  Google Scholar 

  150. Vapnik V. The nature of statistical learning theory. Berlin: Springer Science & Business Media; 2000.

    Book  Google Scholar 

  151. Kramer MA. Nonlinear principal component analysis using autoassociative neural networks. AICHE J. 1991;37(2):233–43.

    Article  Google Scholar 

  152. Oh M, Zhang L. DeepMicro: deep representation learning for disease prediction based on microbiome data. Sci Rep. 2020;10(1):1–9.

    Google Scholar 

  153. Reiman D, Metwally A, Dai Y, Sun J. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data. IEEE J Biomed Health Inform. 2020;24(10):2993–3001.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This work was supported by grants from the National Institutes of Health (NIH), National Institute of Dental and Craniofacial Research, R03-DE028983 to DW and HC, U01-DE025046 to KD and HC, NIH R01 GM105785, R01 HL129132, and R01 HL146500 to YL, and NLM T15-LM012500 to MP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Di Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wu, D. et al. (2021). Machine Learning and Deep Learning in Genetics and Genomics. In: Ko, CC., Shen, D., Wang, L. (eds) Machine Learning in Dentistry. Springer, Cham. https://doi.org/10.1007/978-3-030-71881-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71881-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71880-0

  • Online ISBN: 978-3-030-71881-7

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics