research-article

Public Access

Ultra-Efficient Processing In-Memory for Data Intensive Applications

Authors:
Mohsen Imani

CSE Department, UC San Diego, La Jolla, CA, USA

CSE Department, UC San Diego, La Jolla, CA, USA
View Profile

,
Saransh Gupta

CSE Department, UC San Diego, La Jolla, CA, USA

CSE Department, UC San Diego, La Jolla, CA, USA
View Profile

,
Tajana Rosing

CSE Department, UC San Diego, La Jolla, CA, USA

CSE Department, UC San Diego, La Jolla, CA, USA
View Profile

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017June 2017Article No.: 6Pages 1–6https://doi.org/10.1145/3061639.3062337

Published:18 June 2017Publication History

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Pages 1–6

ABSTRACT

Recent years have witnessed a rapid growth in the domain of Internet of Things (IoT). This network of billions of devices generates and exchanges huge amount of data. The limited cache capacity and memory bandwidth make transferring and processing such data on traditional CPUs and GPUs highly inefficient, both in terms of energy consumption and delay. However, many IoT applications are statistical at heart and can accept a part of inaccuracy in their computation. This enables the designers to reduce complexity of processing by approximating the results for a desired accuracy. In this paper, we propose an ultra-efficient approximate processing in-memory architecture, called APIM, which exploits the analog characteristics of non-volatile memories to support addition and multiplication inside the crossbar memory, while storing the data. The proposed design eliminates the overhead involved in transferring data to processor by virtually bringing the processor inside memory. APIM dynamically configures the precision of computation for each application in order to tune the level of accuracy during runtime. Our experimental evaluation running six general OpenCL applications shows that the proposed design achieves up to 20x performance improvement and provides 480x improvement in energy-delay product, ensuring acceptable quality of service. In exact mode, it achieves 28x energy savings and 4.8x speed up compared to the state-of-the-art GPU cores.

References

J. Gubbi et al., "Internet of things (IoT): A vision, architectural elements, and future directions," Future Generation Computer Systems, vol. 29, no. 7, pp. 1645--1660, 2013. Google ScholarDigital Library
M. Samragh et al., "Looknn: Neural network with no multiplication," in IEEE/ACM DATE, 2017.Google Scholar
K. Hwang et al., Distributed and cloud computing: from parallel processing to the internet of things. Morgan Kaufmann, 2013. Google ScholarDigital Library
R. Balasubramonian et al., "Near-data processing: Insights from a micro-46 workshop," Microarchitecture, vol. 34, no. 4, pp. 36--42, 2014.Google Scholar
G. Loh et al., "A processing-in-memory taxonomy and a case for studying fixed-function pim," in WoNDP, 2013.Google Scholar
M. Imani et al., "Mpim: Multi-purpose in-memory processing using configurable resistive memory," in IEEE ASP-DAC, pp. 757--763, IEEE, 2017.Google Scholar
S. Pugsley et al., "Comparing implementations of near-data computing with in-memory mapreduce workloads," Microarchitecture, vol. 34, no. 4, pp. 44--52, 2014.Google Scholar
A. M. Aly et al., "M3: Stream processing on main-memory mapreduce," in ICDE, pp. 1253--1256, IEEE, 2012. Google ScholarDigital Library
J. Han et al., "Approximate computing: An emerging paradigm for energy-efficient design," in ETS, pp. 1--6, IEEE, 2013.Google Scholar
M. Imani et al., "Efficient neural network acceleration on gpgpu using content addressable memory," in IEEE/ACM DATE, 2017.Google Scholar
M. Imani et al., "Resistive configurable associative memory for approximate computing," in DATE, pp. 1327--1332, IEEE, 2016. Google ScholarDigital Library
V. Gupta et al., "Impact: imprecise adders for low-power approximate computing," in ISLPED, pp. 409--414, IEEE, 2011. Google ScholarDigital Library
M. Imani et al., "Masc: Ultra-low energy multiple-access single-charge tcam for approximate computing," in IEEE/ACM DATE, pp. 373--378, IEEE, 2016. Google ScholarDigital Library
Q. Guo et al., "Ac-dimm: associative computing with stt-mram," in ISCA, vol. 41, pp. 189--200, ACM, 2013. Google ScholarDigital Library
Q. Guo et al., "A resistive tcam accelerator for data-intensive computing," in Microarchitecture, pp. 339--350, ACM, 2011. Google ScholarDigital Library
M. Imani et al., "Exploring hyperdimensional associative memory," in IEEE HPCA, IEEE, 2017.Google Scholar
X. Yin et al., "Design and benchmarking of ferroelectric fet based tcam," in IEEE/ACM DATE, IEEE, 2017.Google Scholar
J. Borghetti et al., "A hybrid nanomemristor/transistor logic circuit capable of self-programming," PNAS, vol. 106, no. 6, pp. 1699--1703, 2009.Google ScholarCross Ref
M. Imani et al., "Acam: Approximate computing based on adaptive associative memory with online learning," in IEEE/ACM ISLPED, pp. 162--167, 2016. Google ScholarDigital Library
L. Yavits et al., "Resistive associative processor," IEEE Computer Architecture Letters, vol. 14, no. 2, pp. 148--151, 2015. Google ScholarDigital Library
J. Borghetti et al., "Memristive switches enable stateful logic operations via material implication," Nature, vol. 464, no. 7290, pp. 873--876, 2010.Google ScholarCross Ref
S. Kvatinsky, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, "Memristor-based material implication (IMPLY) logic: design principles and methodologies," TVLSI, vol. 22, no. 10, pp. 2054--2066, 2014.Google Scholar
S. Kvatinsky et al., "MAGIC -- memristor-aided logic," TCAS II, vol. 61, no. 11, pp. 895--899, 2014.Google ScholarCross Ref
N. Talati et al., "Logic design within memristive memories using memristor-aided loGIC (MAGIC)," IEEE TNano, vol. 15, pp. 635--650, jul 2016.Google ScholarDigital Library
A. Siemon et al., "A complementary resistive switch-based crossbar array adder," JETCAS, vol. 5, no. 1, pp. 64--74, 2015.Google Scholar
V. Gupta et al., "Low-power digital signal processing using approximate adders," TCAD, vol. 32, no. 1, pp. 124--137, 2013. Google ScholarDigital Library
R. Ubal et al., "Multi2sim: a simulation framework for cpu-gpu computing," in PACT, pp. 335--344, ACM, 2012. Google ScholarDigital Library
S. Kvatinsky et al., "Vteam: a general model for voltage-controlled memristors," TCAS II, vol. 62, no. 8, pp. 786--790, 2015.Google ScholarCross Ref
"Caltech Library." http://www.vision.caltech.edu/Image_Datasets/Caltech101/.Google Scholar

Index Terms

Ultra-Efficient Processing In-Memory for Data Intensive Applications
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures
  2. Integrated circuits
    1. Semiconductor memory
      1. Non-volatile memory

Recommendations

Digital-based processing in-memory: a highly-parallel accelerator for data intensive applications
MEMSYS '19: Proceedings of the International Symposium on Memory Systems

Recently, Processing In-Memory (PIM) has been shown as a promising solution to address data movement issue in the current processors. However, today's PIM technologies are mostly analog-based, which involve both scalability and efficiency issues. In ...
Read More
Towards memory-efficient processing-in-memory architecture for convolutional neural networks
LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

Convolutional neural networks (CNNs) are widely adopted in artificial intelligent systems. In contrast to conventional computing centric applications, the computational and memory resources of CNN applications are mixed together in the network weights. ...
Read More
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
June 2017
533 pages
ISBN:9781450349277
DOI:10.1145/3061639

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Emerging computing
Non-volatile memory
Processing in-memory
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 70
  Total Citations
  View Citations
- 1,543
  Total Downloads
- Downloads (Last 12 months)207
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ultra-Efficient Processing In-Memory for Data Intensive Applications

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

ABSTRACT

References

Cited By

Index Terms

Recommendations

Digital-based processing in-memory: a highly-parallel accelerator for data intensive applications

Towards memory-efficient processing-in-memory architecture for convolutional neural networks

Redesign the Memory Allocator for Non-Volatile Main Memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Ultra-Efficient Processing In-Memory for Data Intensive Applications

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

ABSTRACT

References

Cited By

Index Terms

Recommendations

Digital-based processing in-memory: a highly-parallel accelerator for data intensive applications

Towards memory-efficient processing-in-memory architecture for convolutional neural networks

Redesign the Memory Allocator for Non-Volatile Main Memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media