Abstract
Traditional memory technologies face severe challenges in meeting the ever-increasing power and memory bandwidth requirements for high-performance computing and big-data analyses. Several emerging memory technologies are promising as the replacements of SRAM or DRAM. Among them, STT-MRAM can be used to replace SRAM as the last-level cache (LLC). However, it suffers from high write energy and latency. In this article, we investigate data patterns written from SRAM-based upper-level cache to STT-MRAM-based LLC to explore the write energy reduction potential. Depending on the data layout within a cache line, redundant bits can be identified and eliminated from write back operations to save STT-MRAM write energy. We also propose a dynamic profiling method to accommodate different application characteristics. The extensive simulation results show that write energy can be saved by 37.05% ∼ 38.89% for static profiling and 19.76% ∼ 34.29% for dynamic profiling.
- A. R. Alameldeen and D. A. Wood. 2004. Frequent Pattern Compression: A Significance-based Compression Scheme for L2 Caches. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
- A. R. Alameldeen and D. A. Wood. 2004. Adaptive cache compression for high-performance processors. ACM SIGARCH Comput. Arch. News 32, 2 (2004), 212.Google ScholarDigital Library
- D. Apalkov, A. Khvalkovskiy, S. Watts, V. Nikitin, X. Tang, D. Lottis, K. Moon, X. Luo, E. Chen, and A. Ong. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J. Emerg. Technol. Comput. Syst. 9, 2 (2013), 1--35.Google ScholarDigital Library
- Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.Google Scholar
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, and S. Sardashti. 2011. The gem5 simulator. ACM SIGARCH Comput. Arch. News 39, 2 (2011), 1--7.Google ScholarDigital Library
- S. Borkar. 1999. Design challenges of technology scaling. IEEE Micro 19, 4 (1999), 23--29.Google ScholarDigital Library
- L. Chua. 1971. Memristor—The missing circuit element. IEEE Trans. Circ. Theor. 18, 5 (1971), 507--519.Google ScholarCross Ref
- K. C. Chun, H. Zhao, J. D. Harms, T. Kim, Ji. Wang, and C. H. Kim. 2012. A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory. IEEE J. Solid-State Circ. 48, 2 (2012), 598--610.Google ScholarCross Ref
- X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the IEEE/ACM Design Automation Conference. 554--559.Google Scholar
- X. Dong, C. Xu, and Y. Xie. 2012. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 31, 7 (2012), 994--1007.Google ScholarDigital Library
- M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 459--462.Google Scholar
- N. S. Kim, T. Austin, D. Blaauw, T. Mudge, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan. 2003. Leakage current: Moore’s law meets static power. Computer 36, 12 (2003), 68--75.Google ScholarDigital Library
- Y. Kim, S. K. Gupta, S. P. Park, G. Panagopoulos, and K. Roy. 2012. Write-optimized reliable design of STT MRAM. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, 3--8.Google Scholar
- K. Kwon, S. H. Choday, Y. Kim, and K. Roy. 2014. AWARE (asymmetric write architecture with redundant blocks): A high write speed STT-MRAM cache architecture. IEEE Trans. Very Large Scale Integ. Syst. 22, 4 (2014), 712--720.Google ScholarDigital Library
- C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen, X. Li, W. Hsu, Y. Kao, and M. Liu. 2009. 45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 1--4.Google Scholar
- L. Liu, P. Chi, S. Li, Y. Cheng, and Y. Xie. 2017. Building energy-efficient multi-level cell STT-RAM caches with data compression. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, 751--756.Google Scholar
- H. Noguchi, K. Ikegami, N. Shimomura, T. Tetsufumi, J. Ito, and S. Fujita. 2014. Highly reliable and low-power nonvolatile cache memory with advanced perpendicular STT-MRAM for high-performance CPU. In Proceedings of the Symposium on VLSI Circuits Digest of Technical Papers. IEEE, 1--2.Google Scholar
- H. Noguchi, S. Takeda, K. Nomura, and K. Abe. 2014. Variable nonvolatile memory arrays for adaptive computing systems. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 25.Google Scholar
- F. Oboril, F. Hameed, R. Bishnoi, A. Ahari, H. Naeimi, and M. Tahoori. 2016. Normally-off STT-MRAM cache with zero-byte compression for energy efficient last-level caches. In Proceedings of the International Symposium on Low Power Electronics and Design. ACM, 236--241.Google Scholar
- D. A. Patterson and J. L. Hennessy. 2013. Computer Organization and Design MIPS Edition: The Hardware/software Interface. Morgan Kaufmann.Google Scholar
- G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. ACM, 377--388.Google Scholar
- A. Pirovano, A. L. Lacaita, D. Merlani, A. Benvenuti, F. Pellizzer, and R. Bez. 2002. Electronic switching effect in phase-change memory cells. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, 923--926.Google Scholar
- M. Poremba, S. Mittal, D. Li, J. S. Vetter, and Y. Xie. 2015. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. ACM, 1543--1546.Google Scholar
- C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. IEEE, 50--61.Google Scholar
- G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. IEEE, 239--249.Google Scholar
- G. Sun, D. Niu, J. Ouyang, and Y. Xie. 2011. A frequent-value based PRAM memory architecture. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, 211--216.Google Scholar
- J. Wang, X. Dong, and Y. Xie. 2013. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 847--852.Google Scholar
- Z. Wang, L. Zhang, M. Wang, Z. Wang, D. Zhu, Y. Zhang, and W. Zhao. 2018. High-density NAND-like spin transfer torque memory with spin orbit torque erase operation. IEEE Electron Dev. Lett. 39, 3 (2018), 343--346.Google ScholarCross Ref
- B. Wu, X. Zhang, Y. Cheng, Z. Wang, D. Liu, Y. Zhang, and W. Zhao. 2018. Write energy optimization for STT-MRAM cache with data pattern characterization. In Proceedings of the IEEE Computer Society Symposium on VLSI. IEEE, 333--338.Google Scholar
- X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the IEEE/ACM International Symposium on Computer Architecture. IEEE, 34--35.Google Scholar
- B. Yang, J. Lee, J. Kim, J. Cho, S. Lee, and B. Yu. 2007. A low power phase-change random access memory using a data-comparison write scheme. In Proceedings of the IEEE International Symposium on Circuits and Systems. IEEE, 3014--3017.Google Scholar
- P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design. IEEE, 264--268.Google Scholar
Index Terms
- Write Back Energy Optimization for STT-MRAM-based Last-level Cache with Data Pattern Characterization
Recommendations
Microarchitectural Exploration of STT-MRAM Last-level Cache Parameters for Energy-efficient Devices
As the technology scaling advances, limitations of traditional memories in terms of density and energy become more evident. Modern caches occupy a large part of a CPU physical size and high static leakage poses a limit to the overall efficiency of the ...
Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors
DAC '12: Proceedings of the 49th Annual Design Automation ConferenceMLC STT-MRAM (Multi-level Cell Spin-Transfer Torque Magnetic RAM), an emerging non-volatile memory technology, has become a promising candidate to construct L2 caches for high-end embedded processors. However, the long write latency limits the ...
Efficient STT-RAM last-level-cache architecture to replace DRAM cache
MEMSYS '17: Proceedings of the International Symposium on Memory SystemsRecent research has proposed die-stacked Last Level Cache (LLC) to overcome the Memory Wall. Lately, Spin-Transfer-Torque Random Access Memory (STT-RAM) caches have been recommended as they provide improved energy efficiency compared to DRAM caches. ...
Comments