Abstract
The identification of gene coding regions of DNA sequences through digital signal processing techniques based on the so-called 3-base periodicity has been an emerging problem in bioinformatics. The signal to noise ratio (SNR) of a DNA sequence is computed after mapping the DNA symbolic sequence into numerical sequences. Typical mapping schemes include the Voss, Z-curve and tetrahedron representations and the like, which have been used to construct gene coding region detecting algorithms. In this paper, an extended definition of SNR is proposed, which has less computational cost and wider applicability than its original ones. Furthermore, we analyze the SNRs of different mapping schemes and derive the general relationship between Voss based SNR and that of its general affine transformations. We conclude that the SNRs of Z-curve and tetrahedron map are also linearly proportional to that of Voss map. Not only is our conclusion instructional for the design of other affine transformations, but it is also of much significance in understanding the role of the symbolic-to-numerical mapping in the detection of gene coding regions.
Similar content being viewed by others
References
Anastassiou D (2000) Frequency-domain analysis of biomolecular sequences. Bioinformatics 16: 1073–1081
Anastassiou D (2001) Genomic signal processing. IEEE Signal Process Mag 18:8–20
Bettecken T et al (2011) Human nucleosomes: special role of CG dinucleotides and Alu-nucleosomes. BMC Genomics 12: 273
Coward E (1997) Equivalence of two Fourier methods for biological sequences. J Math Biol 36: 64–70
Fickett JW (1982) Recognition of protein coding regions in DNA sequences. Nucleic Acids Res 10: 5303–5318
Fickett JW, Tung CS (1992) Assessment of protein coding measures. Nucleic Acids Res 20: 5303–5318
Gao J, Qi Y, Cao Y, Tung WW (2005) Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences. J Biomed Biotechnol 2: 139–146
George TP, Thomas T (2010) Discrete wavelet transform de-noising in eukaryotic gene splicing. BMC Bioinf 11(Suppl 1):S50
Kortlar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13: 1930–1937
Ning J, Moore CN, Nelson JC (2003) Preliminary wavelet analysis of genomic sequences. In: Proceedings of the IEEE bioinformatics conference (CSB), pp 509–510
Paar V et al (2008) Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats. BMC Bioinf(9): 466
Rushdi A, Tuqan J (2006) Gene identification using the Z-curve representation. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 2, pp 1024–1027
Saeys Y, Rouze P, Peer YVd (2007) In search of the short ones: improved prediction of short exons in vertebrates, plants, fungi and protists. Bioinformatics 23: 414–420
Sharma D et al (2004) Spectral repeat finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 9: 1405–1412
Sharma SD, Shakya K, Sharma SN (2011) Evaluation of DNA mapping schemes for exon detection. In: International conference on computer, communication and electrical technology, ICCCET 2011
Silverman BD, Linkser R (1986) A measure of DNA periodicity. J Theor Biol 118: 295–300
Song NY, Yan H (2011) Short exon detection in DNA sequences based on multifeature spectral analysis. EURASIP J Adv Signal Process. doi:10.1155/2011/780794 (article ID 780794)
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. CABIOS 13: 263–270
Tuqan J, Rushdi A (2008) A DSP Approach for Finding the Codon Bias in DNA Sequences. IEEE J Select Topics Signal Process 2(3): 343–356
Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68: 3805–3808
Wang L, Stein LD (2010) Localizing triplet periodicity in DNA and cDNA sequences. BMC Bioinf 11: 550
Yan M, Zhang CT (1998) A new Fourier transform approach for protein coding measure based on the format of the Z-curve. Bioinformatics 14: 685–690
Yin C, Yau SS-T (2005) A Fourier characteristic of coding sequences: origins and a non-Fourier approximation. J Comput Biol 9: 1153–1165
Yin C, Yau SS-T (2007) Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 247: 687–694
Zhang R, Zhang CT, Curves Z (1994) An intuitive tool for visualizing and analyzing the DNA sequences. J Biomol Struct Dyn 11: 767–782
Zhang CT, Wang J (2000) Recognition of protein coding genes in the yeast genome at better than 95 % accuracy based on the Z curve. Nucleic Acids Res 28: 2804–2814
Author information
Authors and Affiliations
Corresponding author
Additional information
The present study was supported in part by National Basic Research Program (2011CBA00800) of China.
Rights and permissions
About this article
Cite this article
Shao, J., Yan, X. & Shao, S. SNR of DNA sequences mapped by general affine transformations of the indicator sequences. J. Math. Biol. 67, 433–451 (2013). https://doi.org/10.1007/s00285-012-0564-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-012-0564-3
Keywords
- Gene coding regions (exons)
- 3-Base periodicity
- Voss mapping
- Signal to noise ratio (SNR)
- Affine transformations