Skip to main content

Succinct Text Indexing with Wildcards

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5721))

Abstract

A succinct text index uses space proportional to the text itself, say, two times n logσ for a text of n characters over an alphabet of size σ. In the past few years, there were several exciting results leading to succinct indexes that support efficient pattern matching. In this paper we present the first succinct index for a text that contains wildcards. The space complexity of our index is (3 + o(1))n logσ + O(ℓlogn) bits, where ℓ is the number of wildcard groups in the text. Such an index finds applications in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP), which could be modeled as wildcards.

In the course of deriving the above result, we also obtain an alternate succinct index of a set of d patterns for the purpose of dictionary matching. When compared with the succinct index in the literature, the new index doubles the size (precisely, from n logσ to 2 n logσ, where n is the total length of all patterns), yet it reduces the matching time to O(mlogσ + mlogd + occ), where m is the length of the query text. It is worth-mentioning that the time complexity no longer depends on the total dictionary size.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Corasick, M., Aho, A.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Burrow, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, California (1994)

    Google Scholar 

  3. Chan, H.L., Hon, W.K., Lam, T.W., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3(2) (2007)

    Google Scholar 

  4. Chazelle, B.: Filtering search: a new approach to query answering. SIAM J. Comput. 15(3), 703–724 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Symposium on Theory of Computing, pp. 91–100 (2004)

    Google Scholar 

  6. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of Symposium on Foundations of Computer Science, pp. 390–398 (2000)

    Google Scholar 

  7. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Fischer, M.J., Paterson, M.S.: String matching and other products. Technical Report MAC TM 41, Massachusetts Institute of Technology, Cambridge, MA, USA (January 1974)

    Google Scholar 

  9. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of Symposium on Theory of Computing, pp. 397–406 (2000)

    Google Scholar 

  10. Hon, W.K., Shah, R., Vitter, J.S., Lam, T.W., Tam, S.L.: Compressed index for dictionary matching. In: IEEE Data Compression Conference, pp. 23–32 (2008)

    Google Scholar 

  11. Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space efficient indexes for string matching with don’t cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)

    Google Scholar 

  13. Nekrich, Y.: Orthogonal range searching in linear and almost-linear space. Computational Geometry: Theory and Applications 42(4), 342–351 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  14. Torczon, L., Briggs, P.: An efficient representation for sparse sets. In: ACM Letters on Programming Languages and Systems 2, pp. 59–69 (1993)

    Google Scholar 

  15. Rahman, M.S., Iliopoulos, C.S.: Pattern matching algorithms with don’t cares. In: Proceedings of 34th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), vol. 2, pp. 116–126 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tam, A., Wu, E., Lam, TW., Yiu, SM. (2009). Succinct Text Indexing with Wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03784-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03783-2

  • Online ISBN: 978-3-642-03784-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics