ABSTRACT
L2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional L2 cache architecture with 53% less area and around 40% less energy consumption. This architecture consists of an L2 cache and a small cache called residue cache. L2 and residue cache lines are half sized of the conventional L2 cache lines. Well compressed conventional L2 cache lines are stored only in the L2 cache while other poorly compressed lines are stored in both the L2 and residue caches. Although many conventional L2 cache lines are not fully captured by the residue cache, most accesses to them do not incur misses because not all their words are needed immediately, which are termed as partial hits in this paper. The residue cache architecture consumes much lower energy and area than conventional L2 cache architectures, and can be combined synergistically with other schemes such as the line distillation and ZCA. The residue cache architecture is also shown to perform well on a 4-way superscalar processor typically used in high performance systems.
- Cacti 6.5. http://www.hpl.hp.com/research/cacti/.Google Scholar
- MIPS32 74K. http://www.mips.com/products/cores/32-64-bit-cores/mips32-74k/.Google Scholar
- Spec2000 benchmarks. http://www.specbench.org/osg/cpu2000.Google Scholar
- B. Abali, H. Franke, X. Shen, D. E. Poff, and T. B. Smith. Performance of hardware compressed main memory. In HPCA, 2001. Google ScholarDigital Library
- A.-R. Adl-Tabatabai, A. M. Ghuloum, and S. O. Kanaujia. Compression in cache design. In ICS, pages 190--201, 2007. Google ScholarDigital Library
- A. R. Alameldeen and D. A. Wood. Frequent pattern compression: A significance based compression scheme for L2 caches. Technical report 1500, University of Wisconsin, Madison, Apr. 2004.Google Scholar
- A. R. Alameldeen and D. A. Wood. Adaptive cache compression for high-performance processors. In ISCA, pages 212--223, 2004. Google ScholarDigital Library
- D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. J. Instruction-Level Parallelism, 2, 2000.Google Scholar
- ARM. Cortex-a processors. http://www.arm.com/products/processors/cortex-a/.Google Scholar
- D. Brooks and M. Martonosi. Dynamically exploiting narrow width operands to improve processor power and performance. In HPCA, pages 13--22, 1999. Google ScholarDigital Library
- D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.Google ScholarDigital Library
- X. Chen, I. Yang, R. P. Dick, L. Shang, and H. Lekatsas. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. VLSI Syst, 18(8):1196--1208, 2010. Google ScholarDigital Library
- J. Dusser, T. Piquet, and A. Seznec. Zero-content augmented caches. In ICS, pages 46--55, Yorktown Heights, NY, USA, June 2009. ACM Press. Google ScholarDigital Library
- J. H. Edmondson. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal of Digital Equipment Corporation, 7(1):119--135, Winter 1995. Google ScholarDigital Library
- Ekman and Stenstrom. A robust main-memory compression scheme. CANEWS: ACM SIGARCH Computer Architecture News, 33, 2005. Google ScholarDigital Library
- K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: Techniques for reducing leakage power. In ISCA Computer Architecture News (CAN), Anchorage, AK, 2002. Google ScholarDigital Library
- K. Ghose and M. B. Kamble. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In ISPLED, pages 70--75, 1999. Google ScholarDigital Library
- M. Goudarzi and T. Ishihara. SRAM leakage reduction by row/column redundancy under random within-die delay variation. IEEE Trans. VLSI Syst, 18(12):1660--1671, 2010. Google ScholarDigital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization, Dec. 2001. Google ScholarDigital Library
- E. G. Hallnor and S. K. Reinhardt. A fully associative software-managed cache design. In ISCA, pages 107--116, 2000. Google ScholarDigital Library
- E. G. Hallnor and S. K. Reinhardt. A unified compressed memory hierarchy. In HPCA, pages 201--212. IEEE Computer Society, 2005. Google ScholarDigital Library
- J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, 4th ed. Morgan Kaufmann Publishers Inc., Sept. 2006. Google ScholarDigital Library
- Intel. Intel atom processor. http://www.intel.com/technology/atom.Google Scholar
- S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: exploiting generational behavior to reduce cache leakage power. In ISCA, pages 240--251, 2001. Google ScholarDigital Library
- D. Kroft. Retrospective: Lockup-free instruction fetch/prefetch cache organization. In 25 Years ISCA: Retrospectives and Reprints, pages 20--21, 1998. Google ScholarDigital Library
- J.-S. Lee, W.-K. Hong, and S.-D. Kim. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture, 46(15):1365--1382, 2000. Google ScholarDigital Library
- G. Memik, G. Reinman, and W. H. Mangione-Smith. Just say no: Benefits of early cache miss determinatio. In HPCA, pages 307--316, 2003. Google ScholarDigital Library
- P. Pujara and A. Aggarwal. Restrictive compression techniques to increase level 1 cache capacity. In ICCD, pages 327--333. IEEE Computer Society, 2005. Google ScholarDigital Library
- P. Pujara and A. Aggarwal. Increasing the cache efficiency by eliminating noise. In HPCA, pages 145--154. IEEE Computer Society, 2006.Google ScholarCross Ref
- P. Pujara and A. Aggarwal. Increasing cache capacity through word filtering. In B. J. Smith, editor, ICS, pages 222--231, Seattle, Washington, USA, June 2007. Google ScholarDigital Library
- M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In HPCA, pages 250--259. IEEE Computer Society, 2007. Google ScholarDigital Library
- L. Villa, M. Zhang, and K. Asanovic. Dynamic zero compression for cache energy reduction. In MICRO, pages 214--220, 2000. Google ScholarDigital Library
- P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The case for compressed caching in virtual memory systems. In Proceedings of the USENIX 1999 Annual Technical Conference, pages 101--116, 1999. Google ScholarDigital Library
- J. Yang and R. Gupta. Energy efficient frequent value data cache design. In MICRO, pages 197--207, 2002. Google ScholarDigital Library
- J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. In MICRO, pages 258--265, 2000. Google ScholarDigital Library
Index Terms
- Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
Recommendations
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Domino Cache: An Energy-Efficient Data Cache for Modern Applications
The energy consumption for processing modern workloads is challenging in data centers. Due to the large datasets of cloud workloads, the miss rate of the L1 data cache is high, and with respect to the energy efficiency concerns, such misses are costly ...
The filter cache: an energy efficient memory structure
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitectureMost modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often ...
Comments