Abstract
Molecular approaches exploit structural properties built deep into DNA by millions of years of evolution on Earth to code and/or extract some significant features from raw datasets for the purpose of extreme dimensionality reduction and solution efficiency. After describing the deep structure, it is leveraged to render several variations of this theme. They can be used obviously with genomic data, but perhaps surprisingly, with ordinary abiotic data just as well. Two major families of techniques of this kind are reviewed, namely genomic and pmeric coordinate systems for dimensionality reduction and data analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Watson J. D., & Crick, F. H. C. (1953). Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature, 171(4356), 737–738.
Mainali, S., Garzon, M., & Colorado, F. A. (2020). New genomic information systems (GenISs): Species delimitation and identification. In International Work-Conference on Bioinformatics and Biomedical Engineering (pp. 163–174). Springer.
Mainali, S., Colorado F. A., & Garzon M. H. (2021). Foretelling the phenotype of a genomic sequence. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(2), 777–783.
Mainali, S., Garzon, M., & Colorado F. A. (2020). Profiling environmental conditions from DNA. In International Work-Conference on Bioinformatics and Biomedical Engineering (pp. 647–658). Springer.
Adleman Leonard, M. (1994). Molecular computation of solutions to combinatorial problems. Science, 266(5187), 1021–1024.
Garzon, M. H., & Bobba, K. C. (2012). A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In International Workshop on DNA-Based Computers (pp. 73–85). Springer.
Seeman N. C. (2003). DNA in a material world. Nature, 421(6921), 427–431.
Linnaeus, C. (1758). System naturae (Vol. 1). Stockholm Laurentii Salvii.
Kumar, S., Stecher, G., Suleski, M., & Hedges S. B. (2017). TimeTree: a resource for timelines, timetrees, and divergence times. Molecular Biology and Evolution, 34(7), 1812–1819.
Wake, M. H. (2008). Integrative biology: Science for the 21st century. BioScience, 58(4), 349–353.
Mizrachi, I. (2007). GenBank: the nucleotide sequence database. The NCBI handbook [Internet], updated, 22.
Garzon, M. H., Bobba, K., Neel, A., & Phan, V. (2010). DNA-based indexing. International Journal of Nanotechnology and Molecular Computation (IJNMC), 2(3), 25–45.
Neel, A. J., & Garzon, M. H. (2008). DNA-based memories: a survey. In New developments in formal languages and applications (pp. 259–275). Springer.
Neel, A., & Garzon, M. H. (2012). Semantic methods for textual entailment. In Applied natural language processing: Identification, investigation and resolution (pp. 479–494). IGI Global.
Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E. M., Sipos, B., & Birney, E. (2013). Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature, 494(7435), 77–80.
Winfree, E., Liu, F., Wenzler L. A., & Seeman N. C. (1998). Design and self-assembly of two-dimensional DNA crystals. Nature, 394(6693), 539–544.
Garzon, M., Neathery, P., Deaton, R., Murphy, R. C., Franceschetti, D. R., & Stevens Jr., S. E. (1997). A new metric for DNA computing. In Proceedings of the 2nd Genetic Programming Conference (pp. 472–478). Morgan Kaufman.
Frutos, A. G., Condon, A., & Corn, R. (1997). Demonstration of a word design strategy for DNA computing on surface. Nucleic Acids Research, 25, 4748–4757.
Deaton, R., Garzon, M., Murphy, R. C., Rose, J. A., Franceschetti, D., & Stevens Jr., S. E. (1998). The reliability and efficiency of a DNA computation. Physical Review Letters, 80, 417.
Garzon, M. H., & Mainali, S. (2017). Towards reliable microarray analysis and design. In 9th International Conference on Bioinformatics and Computational Biology (ISCA) (6 pp.).
Wetmur, J. G. (1997). Physical chemistry of nucleic acid hybridization. In DIMACS series in discrete mathematics (vol. 48, pp. 1–23).
Arita, M., & Kobayashi, S. (2002). DNA sequence design using templates. New Generation Computing, 20(3), 263.
Roman, J. (1995). The theory of error-correcting codes (1st ed.). Springer-Verlag.
Mohammadi-Kambs, M., Hölz, K., & Somoza, M. M. (2017). Hamming distance as a concept in DNA molecular recognition. ACS Omega, 2, 1302–1308.
Phan, V., & Garzon Max, H. (2009). On codeword design in metric DNA spaces. Natural Computing, 8(3), 571.
Garzon, M. H., & Mainali, S. (2021). Deep structure of DNA for genomic analysis. Human Molecular Genetics, 31(4), 576–586. https://doi.org/10.1093/hmg/ddab272
Schena, M. (2003). Microarray analysis. Wiley-Liss.
Garzon, M. H., & Mainali, S. (2017). Towards a universal genomic positioning system: phylogenetics and species identification. In International Conference on Bioinformatics and Biomedical Engineering (pp. 469–479). Springer.
Behjati, S., & Tarpey P. S. (2013). What is next generation sequencing? Archives of Disease in Childhood-Education and Practice, 98(6), 236–238.
Marcus, G. (2018). Innateness, AlphaZero, and Artificial Intelligence. Preprint. arXiv:1801.05667.
Garzon, M. H. (2014). DNA codeword design: Theory and applications. Parallel Processing Letters, 24(02), 1–21.
Colorado-Garzón, F. A., Adler, P. H., GarcÃa, L. F., Muñoz de Hoyos, P., Bueno, M. L., & Matta, N. E. (2017). Estimating diversity of black flies in the Simulium ignescens and Simulium tunja complexes in Colombia: chromosomal rearrangements as the core of integrative taxonomy. Journal of Heredity, 108(1), 12–24.
Cook-Deegan, R., DeRienzo, C., Carbone, J., Chandrasekharan, S., Heaney, C., & Conover, C. (2010). Impact of gene patents and licensing practices on access to genetic testing for inherited susceptibility to cancer: comparing breast and ovarian cancers with colon cancers. Genetics in Medicine, 12(1), S15–S38.
Jin, Z., & Liu, Y. (2018). DNA methylation in human diseases. Genes & Diseases, 5(1), 1–8.
Mainali, S., Garzon, M., Venugopal, D., Jana, K., Yang, C. C., Kumar, N., Bowman, D., & Deng, L. Y. (2021). An information-theoretic approach to dimensionality reduction in data science. International Journal of Data Science and Analytics, 12, 1–19.
Sun, H., & Yu, G. (2019). New insights into the pathogenicity of non-synonymous variants through multi-level analysis. Scientific Reports, 9(1), 1–11.
Wang, X., Liu, J., & Chen, X. (2015). Microsoft malware classification challenge (big 2015) first place team: say no to overfitting. In No. Big.
Yang, P., Zhou, H., Zhu, Y., Liu, L., & Zhang, L. (2020). Malware classification based on shallow neural network. Future Internet, 12(12), 219.
Yan, J., Qi, Y., & Rao, Q. (2018). Detecting malware with an ensemble method based on deep neural network. Security and Communication Networks, 2018, Article ID 7247095. https://doi.org/doi.org/10.1155/2018/7247095
Yang, C. H., Wu, K. C., Chuang, L. Y., & Chang, H. W. (2021). DeepBarCoding: Deep learning for species classification using DNA barcoding. IEEE/ACM Transactions on Computational Biology and Bioinformatics.
Zhang, H., Xiao, X., Mercaldo, F., Ni, S., Martinelli, F., & Sangaiah A. K. (2019). Classification of ransomware families with machine learning based on n-gram of opcodes. Future Generation Computer Systems, 90, 211–221.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Garzon, M., Mainali, S. (2022). Molecular Computing Approaches. In: Garzon, M., Yang, CC., Venugopal, D., Kumar, N., Jana, K., Deng, LY. (eds) Dimensionality Reduction in Data Science. Springer, Cham. https://doi.org/10.1007/978-3-031-05371-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-05371-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05370-2
Online ISBN: 978-3-031-05371-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)