Abstract
One of the most important data structures for string processing, the suffix array, needs to be augmented with the longest-common-prefix (LCP) array in numerous applications. We describe the first external memory algorithm for constructing the LCP array given the suffix array as input. The only previous way to compute the LCP array for data that is bigger than the RAM is to use a suffix array construction algorithm with complex modifications to produce the LCP array as a by-product. Compared to the best prior method, our algorithm needs much less disk space (by more than a factor of three) and is significantly faster. Furthermore, our algorithm can be combined with any suffix array construction algorithm including a better one developed in the future.
This research is partially supported by the Academy of Finland through grant 118653 (ALGODAN).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)
Beller, T., Gog, S., Ohlebusch, E., Schnattinger, T.: Computing the longest common prefix array based on the Burrows-Wheeler transform. J. Discrete Algorithms 18, 22–31 (2013)
Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and lcp arrays in external memory. In: Proc. ALENEX 2013, pp. 88–102. SIAM (2013)
Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM J. Experimental Algorithmics 12 (2008)
Dementiev, R., Kettner, L., Sanders, P.: STXXL: standard template library for XXL data sets. Softw., Pract. Exper. 38(6), 589–637 (2008)
Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012)
Gog, S., Ohlebusch, E.: Fast and lightweight lcp-array construction algorithms. In: Proc. ALENEX 2011, pp. 25–34. SIAM (2011)
Gonnet, G.H., Baeza-Yates, R.A., Snider, T.: New indices for text: Pat trees and Pat arrays. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice–Hall (1992)
Kärkkäinen, J., Kempa, D.: Engineering a lightweight external memory suffix array construction algorithm. In: Proc. ICABD 2014, pp. 53–60 (2014)
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Proc. DCC 2014, pp. 153–162. IEEE CS (2014)
Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009)
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Mäkinen, V.: Compact suffix array — a space efficient full-text index. Fundamenta Informaticae 56(1-2), 191–210 (2003)
Manber, U., Myers, G.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comp. 22(5), 935–948 (1993)
Manzini, G.: Two space saving tricks for linear time lcp array computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 372–383. Springer, Heidelberg (2004)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Computers 60(10), 1471–1484 (2011)
Ohlebusch, E.: Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag (2013)
Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2), 1–31 (2007)
Puglisi, S.J., Turpin, A.: Space-time tradeoffs for Longest-Common-Prefix array computation. In: Hong, S.-H., Nagamochi, H., Fukunaga, T. (eds.) ISAAC 2008. LNCS, vol. 5369, pp. 124–135. Springer, Heidelberg (2008)
Vitter, J.S.: Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science 2(4), 305–474 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kärkkäinen, J., Kempa, D. (2014). LCP Array Construction in External Memory. In: Gudmundsson, J., Katajainen, J. (eds) Experimental Algorithms. SEA 2014. Lecture Notes in Computer Science, vol 8504. Springer, Cham. https://doi.org/10.1007/978-3-319-07959-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-07959-2_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07958-5
Online ISBN: 978-3-319-07959-2
eBook Packages: Computer ScienceComputer Science (R0)