ABSTRACT
Recent years have witnessed a rapid growth in the domain of Internet of Things (IoT). This network of billions of devices generates and exchanges huge amount of data. The limited cache capacity and memory bandwidth make transferring and processing such data on traditional CPUs and GPUs highly inefficient, both in terms of energy consumption and delay. However, many IoT applications are statistical at heart and can accept a part of inaccuracy in their computation. This enables the designers to reduce complexity of processing by approximating the results for a desired accuracy. In this paper, we propose an ultra-efficient approximate processing in-memory architecture, called APIM, which exploits the analog characteristics of non-volatile memories to support addition and multiplication inside the crossbar memory, while storing the data. The proposed design eliminates the overhead involved in transferring data to processor by virtually bringing the processor inside memory. APIM dynamically configures the precision of computation for each application in order to tune the level of accuracy during runtime. Our experimental evaluation running six general OpenCL applications shows that the proposed design achieves up to 20x performance improvement and provides 480x improvement in energy-delay product, ensuring acceptable quality of service. In exact mode, it achieves 28x energy savings and 4.8x speed up compared to the state-of-the-art GPU cores.
- J. Gubbi et al., "Internet of things (IoT): A vision, architectural elements, and future directions," Future Generation Computer Systems, vol. 29, no. 7, pp. 1645--1660, 2013. Google ScholarDigital Library
- M. Samragh et al., "Looknn: Neural network with no multiplication," in IEEE/ACM DATE, 2017.Google Scholar
- K. Hwang et al., Distributed and cloud computing: from parallel processing to the internet of things. Morgan Kaufmann, 2013. Google ScholarDigital Library
- R. Balasubramonian et al., "Near-data processing: Insights from a micro-46 workshop," Microarchitecture, vol. 34, no. 4, pp. 36--42, 2014.Google Scholar
- G. Loh et al., "A processing-in-memory taxonomy and a case for studying fixed-function pim," in WoNDP, 2013.Google Scholar
- M. Imani et al., "Mpim: Multi-purpose in-memory processing using configurable resistive memory," in IEEE ASP-DAC, pp. 757--763, IEEE, 2017.Google Scholar
- S. Pugsley et al., "Comparing implementations of near-data computing with in-memory mapreduce workloads," Microarchitecture, vol. 34, no. 4, pp. 44--52, 2014.Google Scholar
- A. M. Aly et al., "M3: Stream processing on main-memory mapreduce," in ICDE, pp. 1253--1256, IEEE, 2012. Google ScholarDigital Library
- J. Han et al., "Approximate computing: An emerging paradigm for energy-efficient design," in ETS, pp. 1--6, IEEE, 2013.Google Scholar
- M. Imani et al., "Efficient neural network acceleration on gpgpu using content addressable memory," in IEEE/ACM DATE, 2017.Google Scholar
- M. Imani et al., "Resistive configurable associative memory for approximate computing," in DATE, pp. 1327--1332, IEEE, 2016. Google ScholarDigital Library
- V. Gupta et al., "Impact: imprecise adders for low-power approximate computing," in ISLPED, pp. 409--414, IEEE, 2011. Google ScholarDigital Library
- M. Imani et al., "Masc: Ultra-low energy multiple-access single-charge tcam for approximate computing," in IEEE/ACM DATE, pp. 373--378, IEEE, 2016. Google ScholarDigital Library
- Q. Guo et al., "Ac-dimm: associative computing with stt-mram," in ISCA, vol. 41, pp. 189--200, ACM, 2013. Google ScholarDigital Library
- Q. Guo et al., "A resistive tcam accelerator for data-intensive computing," in Microarchitecture, pp. 339--350, ACM, 2011. Google ScholarDigital Library
- M. Imani et al., "Exploring hyperdimensional associative memory," in IEEE HPCA, IEEE, 2017.Google Scholar
- X. Yin et al., "Design and benchmarking of ferroelectric fet based tcam," in IEEE/ACM DATE, IEEE, 2017.Google Scholar
- J. Borghetti et al., "A hybrid nanomemristor/transistor logic circuit capable of self-programming," PNAS, vol. 106, no. 6, pp. 1699--1703, 2009.Google ScholarCross Ref
- M. Imani et al., "Acam: Approximate computing based on adaptive associative memory with online learning," in IEEE/ACM ISLPED, pp. 162--167, 2016. Google ScholarDigital Library
- L. Yavits et al., "Resistive associative processor," IEEE Computer Architecture Letters, vol. 14, no. 2, pp. 148--151, 2015. Google ScholarDigital Library
- J. Borghetti et al., "Memristive switches enable stateful logic operations via material implication," Nature, vol. 464, no. 7290, pp. 873--876, 2010.Google ScholarCross Ref
- S. Kvatinsky, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, "Memristor-based material implication (IMPLY) logic: design principles and methodologies," TVLSI, vol. 22, no. 10, pp. 2054--2066, 2014.Google Scholar
- S. Kvatinsky et al., "MAGIC -- memristor-aided logic," TCAS II, vol. 61, no. 11, pp. 895--899, 2014.Google ScholarCross Ref
- N. Talati et al., "Logic design within memristive memories using memristor-aided loGIC (MAGIC)," IEEE TNano, vol. 15, pp. 635--650, jul 2016.Google ScholarDigital Library
- A. Siemon et al., "A complementary resistive switch-based crossbar array adder," JETCAS, vol. 5, no. 1, pp. 64--74, 2015.Google Scholar
- V. Gupta et al., "Low-power digital signal processing using approximate adders," TCAD, vol. 32, no. 1, pp. 124--137, 2013. Google ScholarDigital Library
- R. Ubal et al., "Multi2sim: a simulation framework for cpu-gpu computing," in PACT, pp. 335--344, ACM, 2012. Google ScholarDigital Library
- S. Kvatinsky et al., "Vteam: a general model for voltage-controlled memristors," TCAS II, vol. 62, no. 8, pp. 786--790, 2015.Google ScholarCross Ref
- "Caltech Library." http://www.vision.caltech.edu/Image_Datasets/Caltech101/.Google Scholar
Index Terms
- Ultra-Efficient Processing In-Memory for Data Intensive Applications
Recommendations
Digital-based processing in-memory: a highly-parallel accelerator for data intensive applications
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsRecently, Processing In-Memory (PIM) has been shown as a promising solution to address data movement issue in the current processors. However, today's PIM technologies are mostly analog-based, which involve both scalability and efficiency issues. In ...
Towards memory-efficient processing-in-memory architecture for convolutional neural networks
LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded SystemsConvolutional neural networks (CNNs) are widely adopted in artificial intelligent systems. In contrast to conventional computing centric applications, the computational and memory resources of CNN applications are mixed together in the network weights. ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing SystemsThe non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Comments