C4CAM: A Compiler for CAM-based In-memory Accelerators

Authors:
Hamid Farzaneh

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

https://orcid.org/0000-0002-1780-6217
View Profile

,
Joao Paulo Cardoso De Lima

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

https://orcid.org/0000-0001-9295-3519
View Profile

,
Mengyuan Li

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, USA

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, USA

https://orcid.org/0000-0002-6322-9908
View Profile

,
Asif Ali Khan

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

https://orcid.org/0000-0002-5130-9855
View Profile

,
Xiaobo Sharon Hu

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, United States of America

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, United States of America

https://orcid.org/0000-0002-6636-9738
View Profile

,
Jeronimo Castrillon

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

Chair for Compiler Construction, TU Dresden, Dresden, Saxony, Germany

https://orcid.org/0000-0002-5007-445X
View Profile

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3April 2024Pages 164–177https://doi.org/10.1145/3620666.3651386

Published:27 April 2024Publication History

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Pages 164–177

ABSTRACT

Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to overcome this von Neumann bottleneck. Platforms based on content-addressable memories (CAMs) are particularly interesting due to their efficient support for the search-based operations that form the foundation for many applications, including K-nearest neighbors (KNN), high-dimensional computing (HDC), recommender systems, and one-shot learning among others. Today, these platforms are designed by hand and can only be programmed with low-level code, accessible only to hardware experts. In this paper, we introduce C4CAM, the first compiler framework to quickly explore CAM configurations and seamlessly generate code from high-level Torch-Script code. C4CAM employs a hierarchy of abstractions that progressively lowers programs, allowing code transformations at the most suitable abstraction level. Depending on the type and technology, CAM arrays exhibit varying latencies and power profiles. Our framework allows analyzing the impact of such differences in terms of system-level performance and energy consumption, and thus supports designers in selecting appropriate designs for a given application.

References

Kaggle datasets, howpublished = https://www.kaggle.com/datasets, note = Accessed: 2023-11-20.Google Scholar
Pneumonia dataset, howpublished = https://dzl.de/en/core-datasets/pneumonia/, note = Accessed: 2023-11-20.Google Scholar
Intermediate representation execution environment. https://github.com/iree-org/iree/, 2021. Accessed: 2022-08-30.Google Scholar
Onnx-mlir. https://github.com/onnx/onnx-mlir, 2024. Accessed: 2024-03-01.Google Scholar
Ali Ahmed, Kyungbae Park, and Sanghyeon Baeg. Resource-efficient sram-based ternary content addressable memory. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(4):1583--1587, 2016.Google ScholarDigital Library
Mustafa Ali, Amogh Agrawal, and Kaushik Roy. Ramann: in-sram differentiable memory computations for memory-augmented neural networks. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 61--66, 2020.Google ScholarDigital Library
Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117--122, 2008.Google ScholarDigital Library
Hamza Errahmouni Barkam et al. Hdgim: Hyperdimensional genome sequence matching on unreliable highly scaled fefetyperdimensional. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2023.Google Scholar
Lorenzo Chelini et al. Progressive raising in multi-level ir. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 15--26, 2021.Google Scholar
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning, 2018.Google ScholarDigital Library
João Paulo C. de Lima, Asif Ali Khan, Luigi Carro, and Jeronimo Castrillon. Full-stack optimization for cam-only dnn inference. In 2024 Design, Automation and Test in Europe Conference (DATE), DATE'24, pages 1--6. IEEE, March 2024.Google Scholar
Paul Dlugosch et al. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems, 25(12):3088--3098, 2014.Google ScholarCross Ref
others. The torch-mlir project. https://github.com/llvm/torch-mlir, 2023. Accessed: 2023-11-20.Google Scholar
Catherine E Graves et al. In-memory computing with memristor content addressable memories for pattern matching. Advanced Materials, 32(37):2003437, 2020.Google ScholarCross Ref
Robert Hanhan et al. Edam: edit distance tolerant approximate matching content addressable memory. In 49th Annual International Symposium on Computer Architecture, pages 495--507, 2022.Google Scholar
Xiaobo Sharon Hu et al. In-memory computing with associative memories: a cross-layer perspective. In 2021 IEEE International Electron Devices Meeting (IEDM), pages 25.2.1--25.2.4. IEEE, 2021.Google Scholar
Li-Yue Huang, Meng-Fan Chang, Ching-Hao Chuang, Chia-Chen Kuo, Chien-Fu Chen, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Keng-Li Su, et al. Reram-based 4t2r nonvolatile tcam with 7x nvm-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing. In 2014 Symposium on VLSI Circuits Digest of Technical Papers, pages 1--2. IEEE, 2014.Google Scholar
Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. Exocompilation for productive programming of hardware accelerators. In 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2022, page 703--718, New York, NY, USA, 2022.Google ScholarDigital Library
Mohsen Imani, Yeseong Kim, and Tajana Rosing. Nngine: Ultra-efficient nearest neighbor accelerator based on in-memory computing. In 2017 IEEE International Conference on Rebooting Computing (ICRC), pages 1--8. IEEE, 2017.Google ScholarCross Ref
Mohsen Imani et al. Searchd: A memory-centric hyperdimensional computing with stochastic training. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 39(10):2422--2433, 2019.Google ScholarCross Ref
Hai Jin, Bo Lei, Haikun Liu, Xiaofei Liao, Zhuohui Duan, Chencheng Ye, and Yu Zhang. A compilation tool for computation offloading in reram-based cim architectures. ACM Transactions Archit. Code Optim., 20(4), oct 2023.Google Scholar
Roman Kaplan, Leonid Yavits, and Ran Ginosar. Rassa: resistive prealignment accelerator for approximate dna long read mapping. IEEE Micro, 39(4):44--54, 2018.Google ScholarDigital Library
Arman Kazemi et al. Achieving software-equivalent accuracy for hyper-dimensional computing with ferroelectric-based in-memory computing. Scientific reports, 12(1):19201, 2022.Google ScholarCross Ref
Arman Kazemi, Mohammad Mehdi Sharifi, Ann Franchesca Laguna, Franz Müller, Ramin Rajaei, Ricardo Olivo, Thomas Kämpfe, Michael Niemier, and X Sharon Hu. In-memory nearest neighbor search with fefet multi-bit content-addressable memories. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1084--1089. IEEE, 2021.Google Scholar
Asif Ali Khan, João Paulo C De Lima, Hamid Farzaneh, and Jeronimo Castrillon. The landscape of compute-near-memory and compute-in-memory: A research and commercial overview. arXiv preprint arXiv:2401.14428, 2024.Google Scholar
Asif Ali Khan, Hamid Farzaneh, Karl F. A. Friebel, Clément Fournier, Lorenzo Chelini, and Jeronimo Castrillon. Cinm (cinnamon): Acompilation infrastructure for heterogeneous compute in-memory and compute near-memory paradigms. arXiv preprint arXiv:2301.07486, 2023.Google Scholar
S Karen Khatamifard, Zamshed Chowdhury, Nakul Pande, Meisam Razaviyayn, Chris Kim, and Ulya R Karpuzcu. Genvom: Read mapping near non-volatile memory. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(6):3482--3496, 2021.Google Scholar
Ann Franchesca Laguna et al. Ferroelectric fet based in-memory computing for few-shot learning. In 2019 on Great Lakes Symposium on VLSI, pages 373--378, 2019.Google Scholar
Ann Franchesca Laguna, Hasindu Gamaarachchi, Xunzhao Yin, Michael Niemier, Sri Parameswaran, and X Sharon Hu. Seed-and-vote based in-memory accelerator for dna read mapping. In 39th International Conference on Computer-Aided Design, pages 1--9, 2020.Google ScholarDigital Library
Chris Lattner et al. Mlir: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 2--14, 2021.Google Scholar
Mengyuan Li et al. Associative memory based experience replay for deep reinforcement learning. In 41st IEEE/ACM International Conference on Computer-Aided Design, pages 1--9, 2022.Google Scholar
Mengyuan Li et al. imars: an in-memory-computing architecture for recommendation systems. In 59th ACM/IEEE Design Automation Conference, pages 463--468, 2022.Google Scholar
Mengyuan Li, Shiyi Liu, Mohammad Mehdi Sharifi, and X. Sharon Hu. Camasim: A comprehensive simulation framework for content-addressable memory based accelerators, 2024.Google Scholar
Liu Liu et al. A reconfigurable fefet content addressable memory for multi-state hamming distance. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023.Google Scholar
Song Liu, Yi Wang, and Fei Wang. A fast read alignment method based on seed-and-vote for next generation sequencing. BMC bioinformatics, 17:193--203, 2016.Google ScholarCross Ref
Liu Liu et al. Eva-cam: a circuit/architecture-level evaluation tool for general content addressable memories. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1173--1176. IEEE, 2022.Google Scholar
Siri Narla et al. Modeling and design for magnetoelectric ternary content addressable memory (tcam). IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, 8(1):44--52, 2022.Google ScholarCross Ref
National Center for Biotechnology Information. Genome Reference Consortium Human Data. https://www.ncbi.nlm.nih.gov/grc/human/data, 2023. Accessed on: 28/11/2023.Google Scholar
Kai Ni et al. Ferroelectric ternary content-addressable memory for one-shot learning. Nature Electronics, 2(11):521--529, 2019.Google ScholarCross Ref
Kostas Pagiamtzis and Ali Sheikholeslami. Content-addressable memory (cam) circuits and architectures: A tutorial and survey. IEEE journal of solid-state circuits, 41(3):712--727, 2006.Google Scholar
Giacomo Pedretti et al. X-time: An in-memory engine for accelerating machine learning on tabular data with cams. arXiv preprint arXiv:2304.01285, 2023.Google Scholar
Minh Pham, Yicheng Tu, and Xiaoyi Lv. Accelerating bwa-mem read mapping on gpus. In 37th International Conference on Supercomputing, pages 155--166, 2023.Google ScholarDigital Library
Songyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, and Ying Wang. Cim-mlc: A multi-level compilation stack for computing-in-memory accelerators. arXiv preprint arXiv:2401.12428, 2024.Google Scholar
Mariam Rakka et al. Dt2cam: A decision tree to content addressable memory framework. IEEE Transactions on Emerging Topics in Computing, 2023.Google Scholar
Indranil Roy and Srinivas Aluru. Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM Transactions on computational biology and bioinformatics, 13(1):99--111, 2015.Google Scholar
Adam Siemieniuk et al. Occ: An automated end-to-end machine learning optimizing compiler for computing-in-memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021.Google Scholar
Mathias Soeken et al. An mig-based compiler for programmable logic-in-memory architectures. In 53rd Annual Design Automation Conference, pages 1--6, 2016.Google Scholar
Jacob R Stevens, Ashish Ranjan, Dipankar Das, Bharat Kaul, and Anand Raghunathan. Manna: An accelerator for memory-augmented neural networks. In 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 794--806, 2019.Google ScholarDigital Library
Fang Wang et al. Speeding up querying and mining operations on data with modern hardware. 2022.Google Scholar
Xunzhao Yin, et al. Deep random forest with ferroelectric analog content addressable memory. arXiv preprint arXiv:2110.02495, 2021.Google Scholar
Xunzhao Yin et al. Fecam: A universal compact digital and analog content addressable memory using ferroelectric. IEEE Transactions on Electron Devices, 67(7):2785--2792, 2020.Google ScholarCross Ref
Tao Yu, Yichi Zhang, Zhiru Zhang, and Christopher M De Sa. Understanding hyperdimensional computing for parallel single-pass learning. Advances in Neural Info. Processing Sys., 35:1157--1169, 2022.Google Scholar
Xiaodong Yu et al. Robotomata: A framework for approximate pattern matching of big data on an automata processor. In 2017 IEEE International Conference on Big Data (Big Data), pages 283--292, 2017.Google Scholar
Zhuowen Zou, Hanning Chen, Prathyush Poduval, Yeseong Kim, Mahdi Imani, Elaheh Sadredini, Rosario Cammarota, and Mohsen Imani. Biohd: an efficient genome sequence search platform using hyperdimensional memorization. In 49th Annual International Symposium on Computer Architecture, ISCA '22, page 656--669, New York, NY, USA, 2022. Association for Computing Machinery.Google Scholar
Charles A Zukowski and Shao-Yi Wang. Use of selective precharge for low-power content-addressable memories. In 1997 IEEE International Symposium on Circuits and Systems (ISCAS), volume 3, pages 1788--1791. IEEE, 1997.Google ScholarCross Ref

Recommendations

Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems

Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the ...
Read More
Low-power content addressable memory (CAM) array for mobile devices

Large-capacity content-addressable memory (CAM) is beneficial in a variety of applications that require high-speed lookup table. It is used extensively in low power CPU design, network routers, and cache controllers. Content addressable memory system ...
Read More
A frequent-value based PRAM memory architecture
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation Conference

Phase Change Random Access Memory (PRAM) has great potential as the replacement of DRAM as main memory, due to its advantages of high density, non-volatility, fast read speed, and excellent scalability. However, poor endurance and high write energy ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
April 2024
1106 pages
ISBN:9798400703867
DOI:10.1145/3620666
General Chairs:
Nael Abu-Ghazaleh,
Rajiv Gupta,
Program Chairs:
Madan Musuvathi,
Dan Tsafrir
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 April 2024
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 128
  Total Downloads
- Downloads (Last 12 months)128
- Downloads (Last 6 weeks)128
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

C4CAM: A Compiler for CAM-based In-memory Accelerators

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

ABSTRACT

References

Cited By

Recommendations

Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems

Low-power content addressable memory (CAM) array for mobile devices

A frequent-value based PRAM memory architecture