Abstract
Being an essential key in biological research, the DNA sequences are often shared between researchers and digitally stored for future use. As these sequences grow in volume, it also grows the need to encode them, thus saving space for more sequences. Besides this, a better coding method corresponds to a better model of the sequence, allowing new insights about the DNA structure. In this paper, we present an algorithm capable of improving the encoding results of algorithms that depend of low-order finite-context models to encode DNA sequences. To do so, we implemented a variable order finite-context model, supported by a predictive function. The proposed algorithm allows using three finite-context models at once without requiring the inclusion of side information in the encoded sequence. Currently, the proposed method shows small improvements in the encoding results when compared with same order finite-context models. However, we also present results showing that there is space for further improvements regarding the use variable order finite-context models for DNA sequence coding.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Rowen, L., Mahairas, G., Hood, L.: Sequencing the human genome. Science 278, 605–607 (1997)
Dennis, C., Surridge, C.: A. thaliana genome. Nature 408, 791 (2000)
Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. Information Processing & Management 30(6), 875–886 (1994)
Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences and its applications in genome comparison. In: Asai, K., Miyano, S., Takagi, T. (eds.) Genome Informatics 1999: Proc. of the 10th Workshop, Tokyo, Japan, pp. 51–61 (1999)
Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences. IEEE Engineering in Medicine and Biology Magazine 20, 61–66 (2001)
Manzini, G., Rastero, M.: A simple and fast DNA compressor. Software—Practice and Experience 34, 1397–1411 (2004)
Korodi, G., Tabus, I.: An efficient normalized maximum likelihood algorithm for DNA sequence compression. ACM Trans. on Information Systems 23(1), 3–34 (2005)
Pinho, A.J., Neves, A.J.R., Ferreira, P.J.S.G.: Inverted-repeats-aware finite-context models for DNA coding. In: Proc. of the 16th European Signal Processing Conf., EUSIPCO 2008, Lausanne, Switzerland (August 2008)
Salomon, D.: Data compression - The complete reference, 2nd edn. Springer, Heidelberg (2000)
Bühlmann, P., Wyner, A.J.: Variable length Markov chains. The Annals of Statistics 27(2), 480–513 (1999)
Rissanen, J.: A universal data compression system. IEEE Trans. on Information Theory 29(5), 656–664 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martins, D.A., Neves, A.J.R., Pinho, A.J. (2009). Variable Order Finite-Context Models in DNA Sequence Coding. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2009. Lecture Notes in Computer Science, vol 5524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02172-5_59
Download citation
DOI: https://doi.org/10.1007/978-3-642-02172-5_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02171-8
Online ISBN: 978-3-642-02172-5
eBook Packages: Computer ScienceComputer Science (R0)