skip to main content
research-article

Write Back Energy Optimization for STT-MRAM-based Last-level Cache with Data Pattern Characterization

Authors Info & Claims
Published:22 May 2020Publication History
Skip Abstract Section

Abstract

Traditional memory technologies face severe challenges in meeting the ever-increasing power and memory bandwidth requirements for high-performance computing and big-data analyses. Several emerging memory technologies are promising as the replacements of SRAM or DRAM. Among them, STT-MRAM can be used to replace SRAM as the last-level cache (LLC). However, it suffers from high write energy and latency. In this article, we investigate data patterns written from SRAM-based upper-level cache to STT-MRAM-based LLC to explore the write energy reduction potential. Depending on the data layout within a cache line, redundant bits can be identified and eliminated from write back operations to save STT-MRAM write energy. We also propose a dynamic profiling method to accommodate different application characteristics. The extensive simulation results show that write energy can be saved by 37.05% ∼ 38.89% for static profiling and 19.76% ∼ 34.29% for dynamic profiling.

References

  1. A. R. Alameldeen and D. A. Wood. 2004. Frequent Pattern Compression: A Significance-based Compression Scheme for L2 Caches. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google ScholarGoogle Scholar
  2. A. R. Alameldeen and D. A. Wood. 2004. Adaptive cache compression for high-performance processors. ACM SIGARCH Comput. Arch. News 32, 2 (2004), 212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Apalkov, A. Khvalkovskiy, S. Watts, V. Nikitin, X. Tang, D. Lottis, K. Moon, X. Luo, E. Chen, and A. Ong. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J. Emerg. Technol. Comput. Syst. 9, 2 (2013), 1--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.Google ScholarGoogle Scholar
  5. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, and S. Sardashti. 2011. The gem5 simulator. ACM SIGARCH Comput. Arch. News 39, 2 (2011), 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Borkar. 1999. Design challenges of technology scaling. IEEE Micro 19, 4 (1999), 23--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Chua. 1971. Memristor—The missing circuit element. IEEE Trans. Circ. Theor. 18, 5 (1971), 507--519.Google ScholarGoogle ScholarCross RefCross Ref
  8. K. C. Chun, H. Zhao, J. D. Harms, T. Kim, Ji. Wang, and C. H. Kim. 2012. A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory. IEEE J. Solid-State Circ. 48, 2 (2012), 598--610.Google ScholarGoogle ScholarCross RefCross Ref
  9. X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the IEEE/ACM Design Automation Conference. 554--559.Google ScholarGoogle Scholar
  10. X. Dong, C. Xu, and Y. Xie. 2012. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 31, 7 (2012), 994--1007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 459--462.Google ScholarGoogle Scholar
  12. N. S. Kim, T. Austin, D. Blaauw, T. Mudge, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan. 2003. Leakage current: Moore’s law meets static power. Computer 36, 12 (2003), 68--75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Kim, S. K. Gupta, S. P. Park, G. Panagopoulos, and K. Roy. 2012. Write-optimized reliable design of STT MRAM. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, 3--8.Google ScholarGoogle Scholar
  14. K. Kwon, S. H. Choday, Y. Kim, and K. Roy. 2014. AWARE (asymmetric write architecture with redundant blocks): A high write speed STT-MRAM cache architecture. IEEE Trans. Very Large Scale Integ. Syst. 22, 4 (2014), 712--720.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen, X. Li, W. Hsu, Y. Kao, and M. Liu. 2009. 45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 1--4.Google ScholarGoogle Scholar
  16. L. Liu, P. Chi, S. Li, Y. Cheng, and Y. Xie. 2017. Building energy-efficient multi-level cell STT-RAM caches with data compression. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, 751--756.Google ScholarGoogle Scholar
  17. H. Noguchi, K. Ikegami, N. Shimomura, T. Tetsufumi, J. Ito, and S. Fujita. 2014. Highly reliable and low-power nonvolatile cache memory with advanced perpendicular STT-MRAM for high-performance CPU. In Proceedings of the Symposium on VLSI Circuits Digest of Technical Papers. IEEE, 1--2.Google ScholarGoogle Scholar
  18. H. Noguchi, S. Takeda, K. Nomura, and K. Abe. 2014. Variable nonvolatile memory arrays for adaptive computing systems. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 25.Google ScholarGoogle Scholar
  19. F. Oboril, F. Hameed, R. Bishnoi, A. Ahari, H. Naeimi, and M. Tahoori. 2016. Normally-off STT-MRAM cache with zero-byte compression for energy efficient last-level caches. In Proceedings of the International Symposium on Low Power Electronics and Design. ACM, 236--241.Google ScholarGoogle Scholar
  20. D. A. Patterson and J. L. Hennessy. 2013. Computer Organization and Design MIPS Edition: The Hardware/software Interface. Morgan Kaufmann.Google ScholarGoogle Scholar
  21. G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. ACM, 377--388.Google ScholarGoogle Scholar
  22. A. Pirovano, A. L. Lacaita, D. Merlani, A. Benvenuti, F. Pellizzer, and R. Bez. 2002. Electronic switching effect in phase-change memory cells. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 923--926.Google ScholarGoogle Scholar
  23. M. Poremba, S. Mittal, D. Li, J. S. Vetter, and Y. Xie. 2015. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. ACM, 1543--1546.Google ScholarGoogle Scholar
  24. C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. IEEE, 50--61.Google ScholarGoogle Scholar
  25. G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. IEEE, 239--249.Google ScholarGoogle Scholar
  26. G. Sun, D. Niu, J. Ouyang, and Y. Xie. 2011. A frequent-value based PRAM memory architecture. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, 211--216.Google ScholarGoogle Scholar
  27. J. Wang, X. Dong, and Y. Xie. 2013. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 847--852.Google ScholarGoogle Scholar
  28. Z. Wang, L. Zhang, M. Wang, Z. Wang, D. Zhu, Y. Zhang, and W. Zhao. 2018. High-density NAND-like spin transfer torque memory with spin orbit torque erase operation. IEEE Electron Dev. Lett. 39, 3 (2018), 343--346.Google ScholarGoogle ScholarCross RefCross Ref
  29. B. Wu, X. Zhang, Y. Cheng, Z. Wang, D. Liu, Y. Zhang, and W. Zhao. 2018. Write energy optimization for STT-MRAM cache with data pattern characterization. In Proceedings of the IEEE Computer Society Symposium on VLSI. IEEE, 333--338.Google ScholarGoogle Scholar
  30. X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the IEEE/ACM International Symposium on Computer Architecture. IEEE, 34--35.Google ScholarGoogle Scholar
  31. B. Yang, J. Lee, J. Kim, J. Cho, S. Lee, and B. Yu. 2007. A low power phase-change random access memory using a data-comparison write scheme. In Proceedings of the IEEE International Symposium on Circuits and Systems. IEEE, 3014--3017.Google ScholarGoogle Scholar
  32. P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design. IEEE, 264--268.Google ScholarGoogle Scholar

Index Terms

  1. Write Back Energy Optimization for STT-MRAM-based Last-level Cache with Data Pattern Characterization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 16, Issue 3
      Special Issue on Nanoelectronic Device, Circuit, and Architecture Design, Part 1 and Regular Papers
      July 2020
      214 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3399633
      • Editor:
      • Ramesh Karri
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 May 2020
      • Online AM: 7 May 2020
      • Accepted: 1 February 2020
      • Revised: 1 November 2019
      • Received: 1 May 2019
      Published in jetc Volume 16, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format