A quick tour on suffix arrays and compressed suffix arrays

https://doi.org/10.1016/j.tcs.2010.12.036Get rights and content
Under an Elsevier user license
open archive

Abstract

Suffix arrays are a key data structure for solving a run of problems on texts and sequences, from data compression and information retrieval to biological sequence analysis and pattern discovery. In their simplest version, they can just be seen as a permutation of the elements in {1,2,,n}, encoding the sorted sequence of suffixes from a given text of length n, under the lexicographic order. Yet, they are on a par with ubiquitous and sophisticated suffix trees. Over the years, many interesting combinatorial properties have been devised for this special class of permutations: for instance, they can implicitly encode extra information, and they are a well characterized subset of the n! permutations. This paper gives a short tutorial on suffix arrays and their compressed version to explore and review some of their algorithmic features, discussing the space issues related to their usage in text indexing, combinatorial pattern matching, and data compression.

Keywords

Pattern matching
Suffix array
Suffix tree
Text indexing
Data compression
Space efficiency
Implicitness and succinctness

Cited by (0)

Work partially supported by MIUR of Italy.