skip to main content
research-article
Artifacts Available / v1.1

Text Indexing for Long Patterns: Anchors are All you Need

Published:01 May 2023Publication History
Skip Abstract Section

Abstract

In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to simultaneously enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are: (i) index space; (ii) query time; (iii) construction space; and (iv) construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that text indexing with locally consistent anchors (lc-anchors) offers remarkably good performance in all four measures, when we have at hand a lower bound l on the length of the queried patterns --- which is arguably a quite reasonable assumption in practical applications. Specifically, we improve on the construction of the index proposed by Loukides and Pissis, which is based on bidirectional string anchors (bd-anchors), a new type of lc-anchors, by: (i) designing an average-case linear-time algorithm to compute bd-anchors; and (ii) developing a semi-external-memory implementation to construct the index in small space using near-optimal work. We then present an extensive experimental evaluation, based on the four measures, using real benchmark datasets. The results show that, for long patterns, the index constructed using our improved algorithms compares favorably to all classic indexes: (compressed) suffix tree; (compressed) suffix array; and the FM-index.

References

  1. James Abello, Adam L. Buchsbaum, and Jeffery R. Westbrook. 2002. A Functional Approach to External Graph Algorithms. Algorithmica 32, 3 (2002), 437--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alberto Apostolico, Maxime Crochemore, Martin Farach-Colton, Zvi Galil, and S. Muthukrishnan. 2016. 40 years of suffix trees. Commun. ACM 59, 4 (2016), 66--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mozhdeh Ariannezhad, Ali Montazeralghaem, Hamed Zamani, and Azadeh Shakery. 2017. Improving Retrieval Performance for Verbose Queries via Axiomatic Analysis of Term Discrimination Heuristic. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, 1201--1204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jérémy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro, and Yakov Nekrich. 2014. Efficient Fully-Compressed Sequence Representations. Algorithmica 69, 1 (2014), 232--268. Google ScholarGoogle ScholarCross RefCross Ref
  5. Djamal Belazzougui. 2014. Linear time construction of compressed text indices in compact space. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, David B. Shmoys (Ed.). ACM, 148--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Djamal Belazzougui, Fabio Cunial, Juha Kärkkäinen, and Veli Mäkinen. 2020. Linear-time String Indexing and Analysis in Small Space. ACM Trans. Algorithms 16, 2 (2020), 17:1--17:54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Djamal Belazzougui and Gonzalo Navarro. 2015. Optimal Lower and Upper Bounds for Representing Sequences. ACM Trans. Algorithms 11, 4 (2015), 31:1--31:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Djamal Belazzougui and Simon J. Puglisi. 2016. Range Predecessor and Lempel-Ziv Parsing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10--12, 2016, Robert Krauthgamer (Ed.). SIAM, 2053--2071. Google ScholarGoogle ScholarCross RefCross Ref
  9. Stav Ben-Nun, Shay Golan, Tomasz Kociumaka, and Matan Kraus. 2020. Time-Space Tradeoffs for Finding a Long Common Substring. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17--19, 2020, Copenhagen, Denmark (LIPIcs), Inge Li Gørtz and Oren Weimann (Eds.), Vol. 161. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 5:1--5:14. Google ScholarGoogle ScholarCross RefCross Ref
  10. Michael Bendersky and W. Bruce Croft. 2008. Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20--24, 2008, Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio Sebastiani, Tat-Seng Chua, and Mun-Kew Leong (Eds.). ACM, 491--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nico Bertram, Jonas Ellert, and Johannes Fischer. 2021. Lyndon Words Accelerate Suffix Sorting, See [77], 15:1--15:13. Google ScholarGoogle ScholarCross RefCross Ref
  12. Timo Bingmann, Johannes Fischer, and Vitaly Osipov. 2016. Inducing Suffix and LCP Arrays in External Memory. ACM J. Exp. Algorithmics 21, 1 (2016), 2.3:1--2.3:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Or Birenzwige, Shay Golan, and Ely Porat. 2020. Locally Consistent Parsing for Text Indexing in Small Space. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5--8, 2020, Shuchi Chawla (Ed.). SIAM, 607--626. Google ScholarGoogle ScholarCross RefCross Ref
  14. Peter A. Boncz, Thomas Neumann, and Viktor Leis. 2020. FSST: Fast Random Access String Compression. Proc. VLDB Endow. 13, 11 (2020), 2649--2661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Stefan Burkhardt and Juha Kärkkäinen. 2003. Fast Lightweight Suffix Array Construction and Checking. In Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25--27, 2003, Proceedings (Lecture Notes in Computer Science), Ricardo A. Baeza-Yates, Edgar Chávez, and Maxime Crochemore (Eds.), Vol. 2676. Springer, 55--69. Google ScholarGoogle ScholarCross RefCross Ref
  16. Timothy M. Chan, Kasper Green Larsen, and Mihai Patrascu. 2011. Orthogonal range searching on the RAM, revisited. In Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13--15, 2011, Ferran Hurtado and Marc J. van Kreveld (Eds.). ACM, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen.2018. Linear-Time Algorithm for Long LCF with k Mismatches. In Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2--4, 2018 - Qingdao, China (LIPIcs), Gonzalo Navarro, David Sankoff, and Binhai Zhu (Eds.), Vol. 105. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 23:1--23:16. Google ScholarGoogle ScholarCross RefCross Ref
  18. Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. 2021. Faster Algorithms for Longest Common Substring, See [77], 30:1--30:17. Google ScholarGoogle ScholarCross RefCross Ref
  19. Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. 2022. Longest Palindromic Substring in Sublinear Time. In 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, June 27--29, 2022, Prague, Czech Republic (LIPIcs), Hideo Bannai and Jan Holub (Eds.), Vol. 223. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 20:1--20:9. Google ScholarGoogle ScholarCross RefCross Ref
  20. Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro (Eds.). 2015. Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29 - July 1, 2015, Proceedings. Lecture Notes in Computer Science, Vol. 9133. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  21. Francisco Claude, Gonzalo Navarro, Hannu Peltola, Leena Salmela, and Jorma Tarhio. 2012. String matching with alphabet sampling. J. Discrete Algorithms 11 (2012), 37--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Richard Cole, Tsvi Kopelowitz, and Moshe Lewenstein. 2015. Suffix Trays and Suffix Trists: Structures for Faster Text Indexing. Algorithmica 72, 2 (2015), 450--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. 2007. Algorithms on strings. Cambridge University Press.Google ScholarGoogle Scholar
  24. Patrick Dinklage, Johannes Fischer, and Alexander Herlez. 2021. Engineering Predecessor Data Structures for Dynamic Integer Sets. In 19th International Symposium on Experimental Algorithms, SEA 2021, June 7--9, 2021, Nice, France (LIPIcs), David Coudert and Emanuele Natale (Eds.), Vol. 190. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 7:1--7:19. Google ScholarGoogle ScholarCross RefCross Ref
  25. Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz. 2020. Practical Performance of Space Efficient Data Structures for Longest Common Extensions, See [43], 39:1--39:20. Google ScholarGoogle ScholarCross RefCross Ref
  26. Martin Farach. 1997. Optimal Suffix Tree Construction with Large Alphabets. In 38th Annual Symposium on Foundations of Computer Science, FOCS '97, Miami Beach, Florida, USA, October 19--22, 1997. IEEE Computer Society, 137--143. Google ScholarGoogle ScholarCross RefCross Ref
  27. Paolo Ferragina, Rodrigo González, Gonzalo Navarro, and Rossano Venturini. 2008. Compressed text indexes: From theory to practice. ACM J. Exp. Algorithmics 13 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Paolo Ferragina and Giovanni Manzini. 2005. Indexing compressed text. J. ACM 52, 4 (2005), 552--581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. 2004. An Alphabet-Friendly FM-Index. In String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, Padova, Italy, October 5--8, 2004, Proceedings (Lecture Notes in Computer Science), Alberto Apostolico and Massimo Melucci (Eds.), Vol. 3246. Springer, 150--160. Google ScholarGoogle ScholarCross RefCross Ref
  30. Johannes Fischer and Pawel Gawrychowski. 2015. Alphabet-Dependent String Searching with Wexponential Search Trees, See [20], 160--171. Google ScholarGoogle ScholarCross RefCross Ref
  31. Johannes Fischer and Volker Heun. 2011. Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays. SIAM J. Comput. 40, 2 (2011), 465--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gianni Franceschini and S. Muthukrishnan. 2007. In-Place Suffix Sorting. In Automata, Languages and Programming, 34th International Colloquium, ICALP 2007, Wroclaw, Poland, July 9--13, 2007, Proceedings (Lecture Notes in Computer Science), Lars Arge, Christian Cachin, Tomasz Jurdzinski, and Andrzej Tarlecki (Eds.), Vol. 4596. Springer, 533--545. Google ScholarGoogle ScholarCross RefCross Ref
  33. Michael L. Fredman, János Komlós, and Endre Szemerédi. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (1984), 538--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Michael L. Fredman and Dan E. Willard. 1990. BLASTING through the Information Theoretic Barrier with FUSION TREES. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, May 13--17, 1990, Baltimore, Maryland, USA, Harriet Ortiz (Ed.). ACM, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Travis Gagie, Gonzalo Navarro, and Nicola Prezza. 2020. Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space. J. ACM 67, 1 (2020), 2:1--2:54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Younan Gao, Meng He, and Yakov Nekrich. 2020. Fast Preprocessing for Optimal Orthogonal Range Reporting and Range Successor with Applications to Text Indexing, See [43], 54:1--54:18. Google ScholarGoogle ScholarCross RefCross Ref
  37. Pawel Gawrychowski and Tomasz Kociumaka. 2017. Sparse Suffix Tree Construction in Optimal Time and Space. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16--19, Philip N. Klein (Ed.). SIAM, 425--439. Google ScholarGoogle ScholarCross RefCross Ref
  38. Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. From Theory to Practice: Plug and Play with Succinct Data Structures. In Experimental Algorithms - 13th International Symposium, SEA 2014, Copenhagen, Denmark, June 29 - July 1, 2014. Proceedings (Lecture Notes in Computer Science), Joachim Gudmundsson and Jyrki Katajainen (Eds.), Vol. 8504. Springer, 326--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Simon Gog, Juha Kärkkäinen, Dominik Kempa, Matthias Petri, and Simon J. Puglisi. 2019. Fixed Block Compression Boosting in FM-Indexes: Theory and Practice. Algorithmica 81, 4 (2019), 1370--1391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Simon Gog, Alistair Moffat, and Matthias Petri. 2017. CSA++: Fast Pattern Search for Large Alphabets. In Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments, ALENEX 2017, Barcelona, Spain, Hotel Porta Fira, January 17--18, 2017, Sándor P. Fekete and Vijaya Ramachandran (Eds.). SIAM, 73--82. Google ScholarGoogle ScholarCross RefCross Ref
  41. Keisuke Goto. 2019. Optimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets. In Prague Stringology Conference 2019, Prague, Czech Republic, August 26--28, 2019, Jan Holub and Jan Zdárek (Eds.). Czech Technical University in Prague, Faculty of Information Technology, Department of Theoretical Computer Science, 111--125. http://www.stringology.org/event/2019/p11.htmlGoogle ScholarGoogle Scholar
  42. Szymon Grabowski and Marcin Raniszewski. 2017. Sampled suffix array with minimizers. Softw. Pract. Exp. 47, 11 (2017), 1755--1771. Google ScholarGoogle ScholarCross RefCross Ref
  43. Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders (Eds.). 2020. 28th Annual European Symposium on Algorithms, ESA 2020, September 7--9, 2020, Pisa, Italy (Virtual Conference). LIPIcs, Vol. 173. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://www.dagstuhl.de/dagpub/978-3-95977-162-7Google ScholarGoogle Scholar
  44. Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput. 35, 2 (2005), 378--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Manish Gupta and Michael Bendersky. 2015. Information Retrieval with Verbose Queries. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9--13, 2015, Ricardo Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier A. Ribeiro-Neto (Eds.). ACM, 1121--1124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press. Google ScholarGoogle ScholarCross RefCross Ref
  47. Monika Rauch Henzinger. 2006. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6--11, 2006, Efthimis N. Efthimiadis, Susan T. Dumais, David Hawking, and Kalervo Järvelin (Eds.). ACM, 284--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. 2009. Breaking a Time-and-Space Barrier in Constructing Full-Text Indices. SIAM J. Comput. 38, 6 (2009), 2162--2178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. 2014. Faster Sparse Suffix Sorting. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), STACS 2014, March 5--8, 2014, Lyon, France (LIPIcs), Ernst W. Mayr and Natacha Portier (Eds.), Vol. 25. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 386--396. Google ScholarGoogle ScholarCross RefCross Ref
  50. Chirag Jain, Arang Rhie, Nancy Hansen, Sergey Koren, and Adam M. Phillippy. 2022. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods 19 (2022), 705--710. Google ScholarGoogle ScholarCross RefCross Ref
  51. Jiaojiao Jiang, Steve Versteeg, Jun Han, Md. Arafat Hossain, Jean-Guy Schneider, Christopher Leckie, and Zeinab Farahmandpour. 2019. P-Gram: Positional N-Gram for the Clustering of Machine-Generated Messages. IEEE Access 7 (2019), 88504--88516. Google ScholarGoogle ScholarCross RefCross Ref
  52. Juha Kärkkäinen and Dominik Kempa. 2016. LCP Array Construction in External Memory. ACM J. Exp. Algorithmics 21, 1 (2016), 1.7:1--1.7:22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Juha Kärkkäinen and Dominik Kempa. 2016. LCP Array Construction Using O(sort(n)) (or Less) I/Os. In String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Beppu, Japan, October 18--20, 2016, Proceedings (Lecture Notes in Computer Science), Shunsuke Inenaga, Kunihiko Sadakane, and Tetsuya Sakai (Eds.), Vol. 9954. 204--217. Google ScholarGoogle ScholarCross RefCross Ref
  54. Juha Kärkkäinen and Dominik Kempa. 2019. Better External Memory LCP Array Construction. ACM J. Exp. Algorithmics 24, 1 (2019), 1.3:1--1.3:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. 2015. Parallel External Memory Suffix Sorting, See [20], 329--342. Google ScholarGoogle ScholarCross RefCross Ref
  56. Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi, and Bella Zhukova. 2017. Engineering External Memory Induced Suffix Sorting. In Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments, ALENEX 2017, Barcelona, Spain, Hotel Porta Fira, January 17--18, 2017, Sándor P. Fekete and Vijaya Ramachandran (Eds.). SIAM, 98--108. Google ScholarGoogle ScholarCross RefCross Ref
  57. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. 2006. Linear work suffix array construction. J. ACM 53, 6 (2006), 918--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. 2001. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. In Combinatorial Pattern Matching, 12th Annual Symposium, CPM 2001 Jerusalem, Israel, July 1--4, 2001 Proceedings (Lecture Notes in Computer Science), Amihood Amir and Gad M. Landau (Eds.), Vol. 2089. Springer, 181--192. Google ScholarGoogle ScholarCross RefCross Ref
  59. Dominik Kempa and Tomasz Kociumaka. 2019. String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23--26, 2019, Moses Charikar and Edith Cohen (Eds.). ACM, 756--767. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Dominik Kempa and Tomasz Kociumaka. 2023. Breaking the O(n)-Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22--25, 2023, Nikhil Bansal and Viswanath Nagarajan (Eds.). SIAM, 5122--5202. Google ScholarGoogle ScholarCross RefCross Ref
  61. Tomasz Kociumaka. 2016. Minimal Suffix and Rotation of a Substring in Optimal Time. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27--29, 2016, Tel Aviv, Israel (LIPIcs), Roberto Grossi and Moshe Lewenstein (Eds.), Vol. 54. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 28:1--28:12. Google ScholarGoogle ScholarCross RefCross Ref
  62. Stefan Kurtz. 1999. Reducing the space requirement of suffix trees. Softw. Pract. Exp. 29, 13 (1999), 1149--1171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, 3 (2009), R25. Google ScholarGoogle ScholarCross RefCross Ref
  64. Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinform. 25, 14 (2009), 1754--1760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ruiqiang Li, Chang Yu, Yingrui Li, Tak Wah Lam, Siu-Ming Yiu, Karsten Kristiansen, and Jun Wang. 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinform. 25, 15 (2009), 1966--1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zhize Li, Jian Li, and Hongwei Huo. 2022. Optimal in-place suffix sorting. Inf. Comput. 285, Part (2022), 104818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Glennis A. Logsdon, Mitchell R. Vollger, and Evan E. Eichler. 2020. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 10 (2020), 597--614. Google ScholarGoogle ScholarCross RefCross Ref
  68. Grigorios Loukides and Solon P. Pissis. 2021. Bidirectional String Anchors: A New String Sampling Mechanism, See [77], 64:1--64:21. Google ScholarGoogle ScholarCross RefCross Ref
  69. Grigorios Loukides, Solon P. Pissis, and Michelle Sweering. 2023. Bidirectional String Anchors for Improved Text Indexing and Top-K Similarity Search. IEEE Trans. Knowl. Data Eng. (2023). Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Mamoru Maekawa. 1985. A Square Root N Algorithm for Mutual Exclusion in Decentralized Systems. ACM Trans. Comput. Syst. 3, 2 (1985), 145--159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Veli Mäkinen and Gonzalo Navarro. 2006. Position-Restricted Substring Searching. In LATIN 2006: Theoretical Informatics, 7th Latin American Symposium, Valdivia, Chile, March 20--24, 2006, Proceedings (Lecture Notes in Computer Science), José R. Correa, Alejandro Hevia, and Marcos A. Kiwi (Eds.), Vol. 3887. Springer, 703--714. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Udi Manber and Eugene W. Myers. 1993. Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput. 22, 5 (1993), 935--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Olena Medelyan and Ian H. Witten. 2006. Thesaurus based automatic keyphrase indexing. In ACM/IEEE Joint Conference on Digital Libraries, JCDL 2006, Chapel Hill, NC, USA, June 11--15, 2006, Proceedings, Gary Marchionini, Michael L. Nelson, and Catherine C. Marshall (Eds.). ACM, 296--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Donald R. Morrison. 1968. PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. J. ACM 15, 4 (1968), 514--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Ingo Müller, Cornelius Ratsch, and Franz Färber. 2014. Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24--28, 2014, Sihem Amer-Yahia, Vassilis Christophides, Anastasios Kementsietsidis, Minos N. Garofalakis, Stratos Idreos, and Vincent Leroy (Eds.). OpenProceedings.org, 283--294. Google ScholarGoogle ScholarCross RefCross Ref
  76. J. Ian Munro, Gonzalo Navarro, and Yakov Nekrich. 2017. Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16--19. 408--424. Google ScholarGoogle ScholarCross RefCross Ref
  77. Petra Mutzel, Rasmus Pagh, and Grzegorz Herman (Eds.). 2021. 29th Annual European Symposium on Algorithms, ESA 2021, September 6--8, 2021, Lisbon, Portugal (Virtual Conference). LIPIcs, Vol. 204. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://www.dagstuhl.de/dagpub/978-3-95977-204-4Google ScholarGoogle Scholar
  78. Gonzalo Navarro. 2016. Compact Data Structures - A Practical Approach. Cambridge University Press. http://www.cambridge.org/de/academic/subjects/computer-science/algorithmics-complexity-computer-algebra-and-computational-g/compact-data-structures-practical-approach?format=HBGoogle ScholarGoogle Scholar
  79. Gonzalo Navarro and Yakov Nekrich. 2017. Time-Optimal Top-k Document Retrieval. SIAM J. Comput. 46, 1 (2017), 80--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Enno Ohlebusch, Johannes Fischer, and Simon Gog. 2010. CST++. In String Processing and Information Retrieval - 17th International Symposium, SPIRE 2010, Los Cabos, Mexico, October 11--13, 2010. Proceedings (Lecture Notes in Computer Science), Edgar Chávez and Stefano Lonardi (Eds.), Vol. 6393. Springer, 322--333. Google ScholarGoogle ScholarCross RefCross Ref
  81. Nicola Prezza. 2021. Optimal Substring Equality Queries with Applications to Sparse Text Indexing. ACM Trans. Algorithms 17, 1 (2021), 7:1--7:23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Michael Roberts, Wayne Hayes, Brian R. Hunt, Stephen M. Mount, and James A. Yorke. 2004. Reducing storage requirements for biological sequence comparison. Bioinform. 20, 18 (2004), 3363--3369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Patricia Rodriguez-Tomé, Peter Stoehr, Graham Cameron, and Tomas P. Flores. 1996. The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res. 24, 1 (1996), 6--12. Google ScholarGoogle ScholarCross RefCross Ref
  84. Kunihiko Sadakane. 2007. Compressed Suffix Trees with Full Functionality. Theory Comput. Syst. 41, 4 (2007), 589--607. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Saul Schleimer, Daniel Shawcross Wilkerson, and Alexander Aiken. 2003. Winnowing: Local Algorithms for Document Fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9--12, 2003, Alon Y. Halevy, Zachary G. Ives, and AnHai Doan (Eds.). ACM, 76--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Kazutoshi Umemoto, Ruihua Song, Jian-Yun Nie, Xing Xie, Katsumi Tanaka, and Yong Rui. 2017. Search by Screenshots for Universal Article Clipping in Mobile Apps. ACM Trans. Inf. Syst. 35, 4 (2017), 34:1--34:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Jeffrey Scott Vitter. 2006. Algorithms and Data Structures for External Memory. Found. Trends Theor. Comput. Sci. 2, 4 (2006), 305--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, Alfons Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, and Manuel Then. 2018. Get Real: How Benchmarks Fail to Represent the Real World. In Proceedings of the 7th International Workshop on Testing Database Systems, DBTest@SIGMOD 2018, Houston, TX, USA, June 15, 2018, Alexander Böhm and Tilmann Rabl (Eds.). ACM, 1:1--1:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Peter Weiner. 1973. Linear Pattern Matching Algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15--17, 1973. IEEE Computer Society, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Aaron M. Wenger et al. 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37 (2019), 1155--1162. Google ScholarGoogle ScholarCross RefCross Ref
  91. Hongyu Zheng, Carl Kingsford, and Guillaume Marçais. 2020. Improved design and analysis of practical minimizers. Bioinform. 36, Supplement-1 (2020), i119--i127. Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader