skip to main content
10.1145/2370816.2370870acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Base-delta-immediate compression: practical data compression for on-chip caches

Authors Info & Claims
Published:19 September 2012Publication History

ABSTRACT

Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high hardware complexity and unacceptable decompression/compression latencies, which in turn can negatively affect performance. Hence, there is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency.

In this paper, we introduce a new compression algorithm called Base-Delta-Immediate (BΔI) compression, a practical technique for compressing data in on-chip caches. The key idea is that, for many cache lines, the values within the cache line have a low dynamic range - i.e., the differences between values stored within the cache line are small. As a result, a cache line can be represented using a base value and an array of differences whose combined size is much smaller than the original cache line (we call this the base+delta encoding). Moreover, many cache lines intersperse such base+delta values with small values - our BΔI technique efficiently incorporates such immediate values into its encoding.

Compared to prior cache compression approaches, our studies show that BΔI strikes a sweet-spot in the tradeoff between compression ratio, decompression/compression latencies, and hardware complexity. Our results show that BΔI compression improves performance for both single-core (8.1% improvement) and multi-core workloads (9.5% / 11.2% improvement for two/four cores). For many applications, BΔI provides the performance benefit of doubling the cache size of the baseline system, effectively increasing average cache capacity by 1.53X.

References

  1. B. Abali, H. Franke, D. E. Poff, R. A. Saccone, C. O. Schulz, L. M. Herger, and T. B. Smith. Memory expansion technology (MXT): software support and performance. IBM JRD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. R. Alameldeen and D. A. Wood. Adaptive cache compression for high-performance processors. In ISCA-31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. R. Alameldeen and D. A. Wood. Frequent pattern compression: A significance-based compression scheme for L2 caches. Tech. Rep., University of Wisconsin-Madison, 2004.Google ScholarGoogle Scholar
  4. S. Balakrishnan and G. S. Sohi. Exploiting value locality in physical register files. In MICRO-36, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chen and W. A. Watson-III. Multi-threading performance on commodity multi-core processors. In Proceedings of HPCAsia, 2007.Google ScholarGoogle Scholar
  6. X. Chen, L. Yang, R. Dick, L. Shang, and H. Lekatsas. C-pack: A high-performance microprocessor cache compression algorithm. In IEEE TVLSI, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Das, A. Mishra, C. Nicopoulos, D. Park, V. Narayanan, R. Iyer, M. Yousif, and C. Das. Performance and power optimization through data compression in network-on-chip architectures. In HPCA, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Dusser, T. Piquet, and A. Seznec. Zero-content augmented caches. In ICS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Ekman and P. Stenström. A robust main-memory compression scheme. In ISCA-32, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Farrens and A. Park. Dynamic base register caching: a technique for reducing address bus width. In ISCA-18, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. G. Hallnor and S. K. Reinhardt. A fully associative software-managed cache design. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. G. Hallnor and S. K. Reinhardt. A unified compressed memory hierarchy. In HPCA-11, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. W. Hammerstrom and E. S. Davidson. Information content of CPU memory referencing behavior. In ISCA-4, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Huffman. A method for the construction of minimum-redundancy codes. 1952.Google ScholarGoogle Scholar
  15. M. M. Islam and P. Stenström. Zero-value caches: Cancelling loads that return zero. In PACT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. M. Islam and P. Stenström. Characterization and exploitation of narrow-width loads: the narrow-width cache approach. In CASES, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA-37, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Molka, D. Hackenberg, R. Schone, and M. Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In PACT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In HPCA-13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. K. Qureshi, D. Thompson, and Y. N. Patt. The V-Way cache: Demand based associativity via global replacement. ISCA-32, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Sazeides and J. E. Smith. The predictability of data values. In MICRO-30, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. B. Sharma, L. Golubchik, R. Govindan, and M. J. Neely. Dynamic data compression in multi-hop wireless networks. In SIGMETRICS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreaded processor. ASPLOS-9, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. SPEC CPU2006 Benchmarks. http://www.spec.org/.Google ScholarGoogle Scholar
  27. S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Sun, Y. Lu, F. Wu, and S. Li. DHTC: an effective DXTC-based HDR texture compression scheme. In Graphics Hardware, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Laboratories, 2008.Google ScholarGoogle Scholar
  30. Transaction Processing Performance Council. http://www.tpc.org/.Google ScholarGoogle Scholar
  31. L. Villa, M. Zhang, and K. Asanovic. Dynamic zero compression for cache energy reduction. In MICRO-33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The case for compressed caching in virtual memory systems. In USENIX ATC, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. In MICRO-33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Zhang, J. Yang, and R. Gupta. Frequent value locality and value-centric data cache design. ASPLOS-9, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Base-delta-immediate compression: practical data compression for on-chip caches

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
        September 2012
        512 pages
        ISBN:9781450311823
        DOI:10.1145/2370816

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 September 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate121of471submissions,26%

        Upcoming Conference

        PACT '24
        International Conference on Parallel Architectures and Compilation Techniques
        October 14 - 16, 2024
        Southern California , CA , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader