Skip to main content

Alphabet Permutation for Differentially Encoding Text

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

  • 707 Accesses

Abstract

One degree of freedom not usually exploited in developing high-performance text-processing algorithms is the encoding of the underlying atomic character set. Here we consider a text compression method where the specific character set collating-sequence employed in encoding the text has a big impact on performance. We demonstrate that permuting the standard character collating-sequences yields a small win on Asian-language texts over gzip. We also show improved compression with our method for English texts, although not by enough to beat standard methods. However, we also design a class of artificial languages on which our method clearly beats gzip, often by an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chapin, B., Tate, S.: Higher compression from the burrows-wheeler transform by modified sorting. In: IEEE Data Compression Conference (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Landau, G.M., Levi, O., Skiena, S. (2004). Alphabet Permutation for Differentially Encoding Text. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30213-1_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23210-0

  • Online ISBN: 978-3-540-30213-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics