skip to main content
10.1145/1669112.1669126acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Improving cache lifetime reliability at ultra-low voltages

Published:12 December 2009Publication History

ABSTRACT

Voltage scaling is one of the most effective mechanisms to reduce microprocessor power consumption. However, the increased severity of manufacturing-induced parameter variations at lower voltages limits voltage scaling to a minimum voltage, Vccmin, below which a processor cannot operate reliably. Memory cell failures in large memory structures (e.g., caches) typically determine the Vccmin for the whole processor. Memory failures can be persistent (i.e., failures at time zero which cause yield loss) or non-persistent (e.g., soft errors or erratic bit failures). Both types of failures increase as supply voltage decreases and both need to be addressed to achieve reliable operation at low voltages.

In this paper, we propose a novel adaptive technique to improve cache lifetime reliability and enable low voltage operation. This technique, multi-bit segmented ECC (MS-ECC) addresses both persistent and non-persistent failures. Like previous work on mitigating persistent failures, MS-ECC trades off cache capacity for lower voltages. However, unlike previous schemes, MS-ECC does not rely on testing to identify and isolate defective bits, and therefore enables error tolerance for nonpersistent failures like erratic bits and soft errors at low voltages. Furthermore, MS-ECC's design can allow the operating system to adaptively change the cache size and ECC capability to adjust to system operating conditions. Compared to current designs with single-bit correction, the most aggressive implementation for MS-ECC enables a 30% reduction in supply voltage, reducing power by 71% and energy per instruction by 42%.

References

  1. M. Agostinelli, et al., "Erratic Fluctuations of SRAM Cache Vmin at the 90nm Process Technology Node," IEDM Technical Digest, pp. 655--658, Dec 2005.Google ScholarGoogle Scholar
  2. T. Austin, "DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design," Proc. 32nd Annual Symposium on Microarchitecture (MICRO-32), pp. 196--207, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bhavnagarwala, et al., "The impact of intrinsic device fluctuations on CMOS SRAM cell stability," IEEE Journal of Solid State Circuits, Vol. 36, No. 4, pp. 658--665, April 2001.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. C. Bose and R. K. Ray-Chaudhuri, "On a Class of Error-Correcting Binary Group Codes," Information and control, Vol. 3, pp. 68--79, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. L. Chen and M. Y. Hsiao, "Error-correcting codes for semiconductor memory applications: A state-of-the-art-review," IBM J. Research Development, vol. 28, no. 2, pp. 124--134, Mar. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Constantinescu, "Impact of Intermittent Faults on Nanocomputing Devices" DSN 2007 Workshop on Dependable and Secure Nanocomputing, June, 2007.Google ScholarGoogle Scholar
  7. J. Doweck, "Inside the Core#8482; Microarchitecture," Proc. 18th IEEE Symposium on High-Performance Chips, August, 2006.Google ScholarGoogle Scholar
  8. S. Hareland, et al., "Impact of CMOS Scaling and SOI on Soft Error Rates of Logic Processes," VLSI Technology Digest of Technical Papers, pp. 73--74, 2001.Google ScholarGoogle Scholar
  9. H. Y. Hsiao et al., "Orthogonal Latin Square Codes," In IBM Journal of Research and Development, Vol. 14, Number 4, pp. 390--394, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Ihm, et al., "An 80nm 4Gb/s/pin 32b 512Mb GDDR4 Graphics DRAM with Low-Power and Low-Noise Data-Bus Inversion." Proceedings of the 2007 IEEE International Solid State Circuits Conference. pp. 492--493.Google ScholarGoogle Scholar
  11. Intel Corporation, "Intel® Celeron® Processor -- Low Power/Ultra Low Power," October 2001, http://download.intel.com/design/intarch/datashts/27350901.pdf.Google ScholarGoogle Scholar
  12. T. Karnik, et al., "Impact of Body Bias on Alpha- and Neutron-Induced Soft Error Rates of Flip_flops," Symposium On VLSI Circuits Digest of Technical Papers, pp. 324--325, 2004.Google ScholarGoogle Scholar
  13. Y. Kawakami, et al., "Investigation of Soft Error Rate Including Multi-Bit Upsets in Advanced SRAM using Neutron Irradiation Test and 3D Mixed-Mode Device Simulation," IEEE International Electron Devices Meeting, pp. 945--948, Dec. 2004.Google ScholarGoogle Scholar
  14. J. Kim, et al., "Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding," In 40th International Symposium on Micro-architecture (Micro-40), December 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. P. Kulkarni, K. Kim and K. Roy, "A 160 mV Robust Schmitt Trigger Based Subthreshold SRAM," IEEE Journal of Solid-state Circuits, Vol. 42, no. 10, pp. 2303--2313, October, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. X. Li, et al., "Scaling of Architecture Level Soft Error Rates for Superscalar Processors," Proc. 1st Workshop on the System Effects of Logic Soft Errors (SELSE), April 2005.Google ScholarGoogle Scholar
  17. S. Lin and D. J. Costello, "Error Control Coding," Second Edition. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Liu, J. Rho, and W. Sung, "Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories," in Proc. IEEE Workshop on Signal Processing Systems (SIPS), Banff, 2006, pp. 248--253.Google ScholarGoogle Scholar
  19. S. Mukherjee, J. Emer, and S. Reinhardt, "The Soft Error Problem: An Architectural Perspective," Proc. 11th International Symposium on High-Performance Computer Architecture (HPCA-2005), pp. 243--247, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Nakamura, and M. Horowitz, "A 50% Noise Reduction Interface Using Low-Weight Coding," Symposium on VLSI Circuits Digest of Technical Papers, pp. 144--145, June 1996.Google ScholarGoogle Scholar
  21. D. Roberts, N. S. Kim, and T. Mudge, "On-chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology," Proc. 10th Euromicro Conference on Digital System Design (DSD 2007), pp. 570--578, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Ruckerbauer and G. Georgakos, "Soft Error Rates in 65nm SRAMs -- Analysis of new Phenomena," Proc. 13th IEEE International On-Line Testing Symposium (IOLTS 2007), pp. 203--204, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Schinkel, et al., "A Double-Tail Latch-Type Voltage Sense Amplifier with 18ps Setup + Hold Time." Proceedings of the 2007 IEEE International Solid State Circuits Conference, pp. 314--315.Google ScholarGoogle Scholar
  24. S. E. Schuster, "Multiple Word/Bit Line Redundancy for Semiconductor Memories," IEEE Journal of Solid-State Circuits, Vol. SC-13, No. 5, pp. 698--703, October 1978.Google ScholarGoogle ScholarCross RefCross Ref
  25. P. Shivakumar, et al., "Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic," Proc. International Conference on Dependable Systems and Networks, pp. 389--398, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Srinivasan, et al., "The Case for Lifetime Reliability-Aware Microprocessors," Proc. 31st International Symposium on Computer Architecture (ISCA '04), pp. 276--287, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Taur and T. H. Ning, "Fundamentals of Modern VLSI Devices," Cambridge University Press, 1998, pp. 144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. TSMC standard cell libraries. http://www.cadence.com/partners/tsmc/SC_Brochure_9.pdfGoogle ScholarGoogle Scholar
  29. K. Ünlü, et al., "Neutron-induced Soft Error Rate Measurements in Semiconductor Memories," Nuclear Instruments and Methods in Physics Research Section A, Volume 579, Issue 1, pp. 252--255, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  30. C. Weaver, et al., "Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor," Proc. 31st International Symposium on Computer Architecture (ISCA-31), pp. 264--275, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Wilkerson, et al., "Trading off Cache Capacity for Reliability to Enable Low Voltage Operation," Proc. 35th International Symposium on Computer Architecture (ISCA-35), pp. 203--214, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Wu and D. Marculescu, "Soft Error Rate Reduction Using Redundancy Addition and Removal," in Proc. IEEE/ACM Asian-South Pacific Design Automation Conference (ASPDAC), Seoul, Korea, Jan. 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. F. Ziegler, et al., "Accelerated Testing for Cosmic Soft-Error Rate," IBM Journal of Research and Development, Vol. 40, No. 1, pp. 51--72, January 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Improving cache lifetime reliability at ultra-low voltages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
      December 2009
      601 pages
      ISBN:9781605587981
      DOI:10.1145/1669112

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 December 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader