ABSTRACT
Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to overcome this von Neumann bottleneck. Platforms based on content-addressable memories (CAMs) are particularly interesting due to their efficient support for the search-based operations that form the foundation for many applications, including K-nearest neighbors (KNN), high-dimensional computing (HDC), recommender systems, and one-shot learning among others. Today, these platforms are designed by hand and can only be programmed with low-level code, accessible only to hardware experts. In this paper, we introduce C4CAM, the first compiler framework to quickly explore CAM configurations and seamlessly generate code from high-level Torch-Script code. C4CAM employs a hierarchy of abstractions that progressively lowers programs, allowing code transformations at the most suitable abstraction level. Depending on the type and technology, CAM arrays exhibit varying latencies and power profiles. Our framework allows analyzing the impact of such differences in terms of system-level performance and energy consumption, and thus supports designers in selecting appropriate designs for a given application.
- Kaggle datasets, howpublished = https://www.kaggle.com/datasets, note = Accessed: 2023-11-20.Google Scholar
- Pneumonia dataset, howpublished = https://dzl.de/en/core-datasets/pneumonia/, note = Accessed: 2023-11-20.Google Scholar
- Intermediate representation execution environment. https://github.com/iree-org/iree/, 2021. Accessed: 2022-08-30.Google Scholar
- Onnx-mlir. https://github.com/onnx/onnx-mlir, 2024. Accessed: 2024-03-01.Google Scholar
- Ali Ahmed, Kyungbae Park, and Sanghyeon Baeg. Resource-efficient sram-based ternary content addressable memory. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(4):1583--1587, 2016.Google ScholarDigital Library
- Mustafa Ali, Amogh Agrawal, and Kaushik Roy. Ramann: in-sram differentiable memory computations for memory-augmented neural networks. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 61--66, 2020.Google ScholarDigital Library
- Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117--122, 2008.Google ScholarDigital Library
- Hamza Errahmouni Barkam et al. Hdgim: Hyperdimensional genome sequence matching on unreliable highly scaled fefetyperdimensional. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2023.Google Scholar
- Lorenzo Chelini et al. Progressive raising in multi-level ir. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 15--26, 2021.Google Scholar
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning, 2018.Google ScholarDigital Library
- João Paulo C. de Lima, Asif Ali Khan, Luigi Carro, and Jeronimo Castrillon. Full-stack optimization for cam-only dnn inference. In 2024 Design, Automation and Test in Europe Conference (DATE), DATE'24, pages 1--6. IEEE, March 2024.Google Scholar
- Paul Dlugosch et al. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems, 25(12):3088--3098, 2014.Google ScholarCross Ref
- others. The torch-mlir project. https://github.com/llvm/torch-mlir, 2023. Accessed: 2023-11-20.Google Scholar
- Catherine E Graves et al. In-memory computing with memristor content addressable memories for pattern matching. Advanced Materials, 32(37):2003437, 2020.Google ScholarCross Ref
- Robert Hanhan et al. Edam: edit distance tolerant approximate matching content addressable memory. In 49th Annual International Symposium on Computer Architecture, pages 495--507, 2022.Google Scholar
- Xiaobo Sharon Hu et al. In-memory computing with associative memories: a cross-layer perspective. In 2021 IEEE International Electron Devices Meeting (IEDM), pages 25.2.1--25.2.4. IEEE, 2021.Google Scholar
- Li-Yue Huang, Meng-Fan Chang, Ching-Hao Chuang, Chia-Chen Kuo, Chien-Fu Chen, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Keng-Li Su, et al. Reram-based 4t2r nonvolatile tcam with 7x nvm-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing. In 2014 Symposium on VLSI Circuits Digest of Technical Papers, pages 1--2. IEEE, 2014.Google Scholar
- Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. Exocompilation for productive programming of hardware accelerators. In 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2022, page 703--718, New York, NY, USA, 2022.Google ScholarDigital Library
- Mohsen Imani, Yeseong Kim, and Tajana Rosing. Nngine: Ultra-efficient nearest neighbor accelerator based on in-memory computing. In 2017 IEEE International Conference on Rebooting Computing (ICRC), pages 1--8. IEEE, 2017.Google ScholarCross Ref
- Mohsen Imani et al. Searchd: A memory-centric hyperdimensional computing with stochastic training. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 39(10):2422--2433, 2019.Google ScholarCross Ref
- Hai Jin, Bo Lei, Haikun Liu, Xiaofei Liao, Zhuohui Duan, Chencheng Ye, and Yu Zhang. A compilation tool for computation offloading in reram-based cim architectures. ACM Transactions Archit. Code Optim., 20(4), oct 2023.Google Scholar
- Roman Kaplan, Leonid Yavits, and Ran Ginosar. Rassa: resistive prealignment accelerator for approximate dna long read mapping. IEEE Micro, 39(4):44--54, 2018.Google ScholarDigital Library
- Arman Kazemi et al. Achieving software-equivalent accuracy for hyper-dimensional computing with ferroelectric-based in-memory computing. Scientific reports, 12(1):19201, 2022.Google ScholarCross Ref
- Arman Kazemi, Mohammad Mehdi Sharifi, Ann Franchesca Laguna, Franz Müller, Ramin Rajaei, Ricardo Olivo, Thomas Kämpfe, Michael Niemier, and X Sharon Hu. In-memory nearest neighbor search with fefet multi-bit content-addressable memories. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1084--1089. IEEE, 2021.Google Scholar
- Asif Ali Khan, João Paulo C De Lima, Hamid Farzaneh, and Jeronimo Castrillon. The landscape of compute-near-memory and compute-in-memory: A research and commercial overview. arXiv preprint arXiv:2401.14428, 2024.Google Scholar
- Asif Ali Khan, Hamid Farzaneh, Karl F. A. Friebel, Clément Fournier, Lorenzo Chelini, and Jeronimo Castrillon. Cinm (cinnamon): Acompilation infrastructure for heterogeneous compute in-memory and compute near-memory paradigms. arXiv preprint arXiv:2301.07486, 2023.Google Scholar
- S Karen Khatamifard, Zamshed Chowdhury, Nakul Pande, Meisam Razaviyayn, Chris Kim, and Ulya R Karpuzcu. Genvom: Read mapping near non-volatile memory. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(6):3482--3496, 2021.Google Scholar
- Ann Franchesca Laguna et al. Ferroelectric fet based in-memory computing for few-shot learning. In 2019 on Great Lakes Symposium on VLSI, pages 373--378, 2019.Google Scholar
- Ann Franchesca Laguna, Hasindu Gamaarachchi, Xunzhao Yin, Michael Niemier, Sri Parameswaran, and X Sharon Hu. Seed-and-vote based in-memory accelerator for dna read mapping. In 39th International Conference on Computer-Aided Design, pages 1--9, 2020.Google ScholarDigital Library
- Chris Lattner et al. Mlir: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 2--14, 2021.Google Scholar
- Mengyuan Li et al. Associative memory based experience replay for deep reinforcement learning. In 41st IEEE/ACM International Conference on Computer-Aided Design, pages 1--9, 2022.Google Scholar
- Mengyuan Li et al. imars: an in-memory-computing architecture for recommendation systems. In 59th ACM/IEEE Design Automation Conference, pages 463--468, 2022.Google Scholar
- Mengyuan Li, Shiyi Liu, Mohammad Mehdi Sharifi, and X. Sharon Hu. Camasim: A comprehensive simulation framework for content-addressable memory based accelerators, 2024.Google Scholar
- Liu Liu et al. A reconfigurable fefet content addressable memory for multi-state hamming distance. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023.Google Scholar
- Song Liu, Yi Wang, and Fei Wang. A fast read alignment method based on seed-and-vote for next generation sequencing. BMC bioinformatics, 17:193--203, 2016.Google ScholarCross Ref
- Liu Liu et al. Eva-cam: a circuit/architecture-level evaluation tool for general content addressable memories. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1173--1176. IEEE, 2022.Google Scholar
- Siri Narla et al. Modeling and design for magnetoelectric ternary content addressable memory (tcam). IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, 8(1):44--52, 2022.Google ScholarCross Ref
- National Center for Biotechnology Information. Genome Reference Consortium Human Data. https://www.ncbi.nlm.nih.gov/grc/human/data, 2023. Accessed on: 28/11/2023.Google Scholar
- Kai Ni et al. Ferroelectric ternary content-addressable memory for one-shot learning. Nature Electronics, 2(11):521--529, 2019.Google ScholarCross Ref
- Kostas Pagiamtzis and Ali Sheikholeslami. Content-addressable memory (cam) circuits and architectures: A tutorial and survey. IEEE journal of solid-state circuits, 41(3):712--727, 2006.Google Scholar
- Giacomo Pedretti et al. X-time: An in-memory engine for accelerating machine learning on tabular data with cams. arXiv preprint arXiv:2304.01285, 2023.Google Scholar
- Minh Pham, Yicheng Tu, and Xiaoyi Lv. Accelerating bwa-mem read mapping on gpus. In 37th International Conference on Supercomputing, pages 155--166, 2023.Google ScholarDigital Library
- Songyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, and Ying Wang. Cim-mlc: A multi-level compilation stack for computing-in-memory accelerators. arXiv preprint arXiv:2401.12428, 2024.Google Scholar
- Mariam Rakka et al. Dt2cam: A decision tree to content addressable memory framework. IEEE Transactions on Emerging Topics in Computing, 2023.Google Scholar
- Indranil Roy and Srinivas Aluru. Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM Transactions on computational biology and bioinformatics, 13(1):99--111, 2015.Google Scholar
- Adam Siemieniuk et al. Occ: An automated end-to-end machine learning optimizing compiler for computing-in-memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021.Google Scholar
- Mathias Soeken et al. An mig-based compiler for programmable logic-in-memory architectures. In 53rd Annual Design Automation Conference, pages 1--6, 2016.Google Scholar
- Jacob R Stevens, Ashish Ranjan, Dipankar Das, Bharat Kaul, and Anand Raghunathan. Manna: An accelerator for memory-augmented neural networks. In 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 794--806, 2019.Google ScholarDigital Library
- Fang Wang et al. Speeding up querying and mining operations on data with modern hardware. 2022.Google Scholar
- Xunzhao Yin, et al. Deep random forest with ferroelectric analog content addressable memory. arXiv preprint arXiv:2110.02495, 2021.Google Scholar
- Xunzhao Yin et al. Fecam: A universal compact digital and analog content addressable memory using ferroelectric. IEEE Transactions on Electron Devices, 67(7):2785--2792, 2020.Google ScholarCross Ref
- Tao Yu, Yichi Zhang, Zhiru Zhang, and Christopher M De Sa. Understanding hyperdimensional computing for parallel single-pass learning. Advances in Neural Info. Processing Sys., 35:1157--1169, 2022.Google Scholar
- Xiaodong Yu et al. Robotomata: A framework for approximate pattern matching of big data on an automata processor. In 2017 IEEE International Conference on Big Data (Big Data), pages 283--292, 2017.Google Scholar
- Zhuowen Zou, Hanning Chen, Prathyush Poduval, Yeseong Kim, Mahdi Imani, Elaheh Sadredini, Rosario Cammarota, and Mohsen Imani. Biohd: an efficient genome sequence search platform using hyperdimensional memorization. In 49th Annual International Symposium on Computer Architecture, ISCA '22, page 656--669, New York, NY, USA, 2022. Association for Computing Machinery.Google Scholar
- Charles A Zukowski and Shao-Yi Wang. Use of selective precharge for low-power content-addressable memories. In 1997 IEEE International Symposium on Circuits and Systems (ISCAS), volume 3, pages 1788--1791. IEEE, 1997.Google ScholarCross Ref
Recommendations
Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems
Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the ...
Low-power content addressable memory (CAM) array for mobile devices
Large-capacity content-addressable memory (CAM) is beneficial in a variety of applications that require high-speed lookup table. It is used extensively in low power CPU design, network routers, and cache controllers. Content addressable memory system ...
A frequent-value based PRAM memory architecture
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation ConferencePhase Change Random Access Memory (PRAM) has great potential as the replacement of DRAM as main memory, due to its advantages of high density, non-volatility, fast read speed, and excellent scalability. However, poor endurance and high write energy ...
Comments