Skip to main content
Advertisement

< Back to Article

Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

Figure 5

The enhanced suffix array yields a tree structure of nested suffix intervals.

The enhanced suffix array for the sequence S: = attcttcggc (left) and its suffix interval tree (right), equivalent to the suffix trie in Fig. 2, is shown. The array suf represents the lexicographical order of the suffixes in S$. In other words, Ssuf[0], Ssuf[1], …, Ssuf[n] is the sequence of suffixes of S$ in ascending lexicographic order. The lcp-table lcp is an array of integers such that for each h, 1≤hn, lcp[h] is the length of the longest common prefix of Ssuf[h−1] and Ssuf[h]. A suffix interval [lr, h] denotes an interval in the suffix array with lcp[i]≥h for all i, l+1≤ir, i.e. all suffixes in the interval [l+1‥r] have a longest common prefix of length at least h. Additionally, requiring l = 0 or lcp[l]<h makes the suffix interval left maximal and requiring r = n or lcp[r+1]<h makes it right maximal. The suffix interval [0‥10, 0] spans the whole suffix array and is equivalent to the root of a suffix interval tree. This interval contains five subintervals, one for each character in S$, with h = 1. Equivalently, the root node of the suffix interval tree has five children. Note, that two children, labeled by 0 and 11, are singletons. The child nodes of singletons are not explicitly shown here.

Figure 5

doi: https://doi.org/10.1371/journal.pcbi.1000502.g005