skip to main content
10.1145/3620666.3651386acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open Access

C4CAM: A Compiler for CAM-based In-memory Accelerators

Published:27 April 2024Publication History

ABSTRACT

Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to overcome this von Neumann bottleneck. Platforms based on content-addressable memories (CAMs) are particularly interesting due to their efficient support for the search-based operations that form the foundation for many applications, including K-nearest neighbors (KNN), high-dimensional computing (HDC), recommender systems, and one-shot learning among others. Today, these platforms are designed by hand and can only be programmed with low-level code, accessible only to hardware experts. In this paper, we introduce C4CAM, the first compiler framework to quickly explore CAM configurations and seamlessly generate code from high-level Torch-Script code. C4CAM employs a hierarchy of abstractions that progressively lowers programs, allowing code transformations at the most suitable abstraction level. Depending on the type and technology, CAM arrays exhibit varying latencies and power profiles. Our framework allows analyzing the impact of such differences in terms of system-level performance and energy consumption, and thus supports designers in selecting appropriate designs for a given application.

References

  1. Kaggle datasets, howpublished = https://www.kaggle.com/datasets, note = Accessed: 2023-11-20.Google ScholarGoogle Scholar
  2. Pneumonia dataset, howpublished = https://dzl.de/en/core-datasets/pneumonia/, note = Accessed: 2023-11-20.Google ScholarGoogle Scholar
  3. Intermediate representation execution environment. https://github.com/iree-org/iree/, 2021. Accessed: 2022-08-30.Google ScholarGoogle Scholar
  4. Onnx-mlir. https://github.com/onnx/onnx-mlir, 2024. Accessed: 2024-03-01.Google ScholarGoogle Scholar
  5. Ali Ahmed, Kyungbae Park, and Sanghyeon Baeg. Resource-efficient sram-based ternary content addressable memory. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(4):1583--1587, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mustafa Ali, Amogh Agrawal, and Kaushik Roy. Ramann: in-sram differentiable memory computations for memory-augmented neural networks. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 61--66, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117--122, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hamza Errahmouni Barkam et al. Hdgim: Hyperdimensional genome sequence matching on unreliable highly scaled fefetyperdimensional. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2023.Google ScholarGoogle Scholar
  9. Lorenzo Chelini et al. Progressive raising in multi-level ir. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 15--26, 2021.Google ScholarGoogle Scholar
  10. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. João Paulo C. de Lima, Asif Ali Khan, Luigi Carro, and Jeronimo Castrillon. Full-stack optimization for cam-only dnn inference. In 2024 Design, Automation and Test in Europe Conference (DATE), DATE'24, pages 1--6. IEEE, March 2024.Google ScholarGoogle Scholar
  12. Paul Dlugosch et al. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems, 25(12):3088--3098, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  13. others. The torch-mlir project. https://github.com/llvm/torch-mlir, 2023. Accessed: 2023-11-20.Google ScholarGoogle Scholar
  14. Catherine E Graves et al. In-memory computing with memristor content addressable memories for pattern matching. Advanced Materials, 32(37):2003437, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  15. Robert Hanhan et al. Edam: edit distance tolerant approximate matching content addressable memory. In 49th Annual International Symposium on Computer Architecture, pages 495--507, 2022.Google ScholarGoogle Scholar
  16. Xiaobo Sharon Hu et al. In-memory computing with associative memories: a cross-layer perspective. In 2021 IEEE International Electron Devices Meeting (IEDM), pages 25.2.1--25.2.4. IEEE, 2021.Google ScholarGoogle Scholar
  17. Li-Yue Huang, Meng-Fan Chang, Ching-Hao Chuang, Chia-Chen Kuo, Chien-Fu Chen, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Keng-Li Su, et al. Reram-based 4t2r nonvolatile tcam with 7x nvm-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing. In 2014 Symposium on VLSI Circuits Digest of Technical Papers, pages 1--2. IEEE, 2014.Google ScholarGoogle Scholar
  18. Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. Exocompilation for productive programming of hardware accelerators. In 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2022, page 703--718, New York, NY, USA, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mohsen Imani, Yeseong Kim, and Tajana Rosing. Nngine: Ultra-efficient nearest neighbor accelerator based on in-memory computing. In 2017 IEEE International Conference on Rebooting Computing (ICRC), pages 1--8. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mohsen Imani et al. Searchd: A memory-centric hyperdimensional computing with stochastic training. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 39(10):2422--2433, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  21. Hai Jin, Bo Lei, Haikun Liu, Xiaofei Liao, Zhuohui Duan, Chencheng Ye, and Yu Zhang. A compilation tool for computation offloading in reram-based cim architectures. ACM Transactions Archit. Code Optim., 20(4), oct 2023.Google ScholarGoogle Scholar
  22. Roman Kaplan, Leonid Yavits, and Ran Ginosar. Rassa: resistive prealignment accelerator for approximate dna long read mapping. IEEE Micro, 39(4):44--54, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Arman Kazemi et al. Achieving software-equivalent accuracy for hyper-dimensional computing with ferroelectric-based in-memory computing. Scientific reports, 12(1):19201, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  24. Arman Kazemi, Mohammad Mehdi Sharifi, Ann Franchesca Laguna, Franz Müller, Ramin Rajaei, Ricardo Olivo, Thomas Kämpfe, Michael Niemier, and X Sharon Hu. In-memory nearest neighbor search with fefet multi-bit content-addressable memories. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1084--1089. IEEE, 2021.Google ScholarGoogle Scholar
  25. Asif Ali Khan, João Paulo C De Lima, Hamid Farzaneh, and Jeronimo Castrillon. The landscape of compute-near-memory and compute-in-memory: A research and commercial overview. arXiv preprint arXiv:2401.14428, 2024.Google ScholarGoogle Scholar
  26. Asif Ali Khan, Hamid Farzaneh, Karl F. A. Friebel, Clément Fournier, Lorenzo Chelini, and Jeronimo Castrillon. Cinm (cinnamon): Acompilation infrastructure for heterogeneous compute in-memory and compute near-memory paradigms. arXiv preprint arXiv:2301.07486, 2023.Google ScholarGoogle Scholar
  27. S Karen Khatamifard, Zamshed Chowdhury, Nakul Pande, Meisam Razaviyayn, Chris Kim, and Ulya R Karpuzcu. Genvom: Read mapping near non-volatile memory. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(6):3482--3496, 2021.Google ScholarGoogle Scholar
  28. Ann Franchesca Laguna et al. Ferroelectric fet based in-memory computing for few-shot learning. In 2019 on Great Lakes Symposium on VLSI, pages 373--378, 2019.Google ScholarGoogle Scholar
  29. Ann Franchesca Laguna, Hasindu Gamaarachchi, Xunzhao Yin, Michael Niemier, Sri Parameswaran, and X Sharon Hu. Seed-and-vote based in-memory accelerator for dna read mapping. In 39th International Conference on Computer-Aided Design, pages 1--9, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chris Lattner et al. Mlir: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 2--14, 2021.Google ScholarGoogle Scholar
  31. Mengyuan Li et al. Associative memory based experience replay for deep reinforcement learning. In 41st IEEE/ACM International Conference on Computer-Aided Design, pages 1--9, 2022.Google ScholarGoogle Scholar
  32. Mengyuan Li et al. imars: an in-memory-computing architecture for recommendation systems. In 59th ACM/IEEE Design Automation Conference, pages 463--468, 2022.Google ScholarGoogle Scholar
  33. Mengyuan Li, Shiyi Liu, Mohammad Mehdi Sharifi, and X. Sharon Hu. Camasim: A comprehensive simulation framework for content-addressable memory based accelerators, 2024.Google ScholarGoogle Scholar
  34. Liu Liu et al. A reconfigurable fefet content addressable memory for multi-state hamming distance. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023.Google ScholarGoogle Scholar
  35. Song Liu, Yi Wang, and Fei Wang. A fast read alignment method based on seed-and-vote for next generation sequencing. BMC bioinformatics, 17:193--203, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  36. Liu Liu et al. Eva-cam: a circuit/architecture-level evaluation tool for general content addressable memories. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1173--1176. IEEE, 2022.Google ScholarGoogle Scholar
  37. Siri Narla et al. Modeling and design for magnetoelectric ternary content addressable memory (tcam). IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, 8(1):44--52, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  38. National Center for Biotechnology Information. Genome Reference Consortium Human Data. https://www.ncbi.nlm.nih.gov/grc/human/data, 2023. Accessed on: 28/11/2023.Google ScholarGoogle Scholar
  39. Kai Ni et al. Ferroelectric ternary content-addressable memory for one-shot learning. Nature Electronics, 2(11):521--529, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  40. Kostas Pagiamtzis and Ali Sheikholeslami. Content-addressable memory (cam) circuits and architectures: A tutorial and survey. IEEE journal of solid-state circuits, 41(3):712--727, 2006.Google ScholarGoogle Scholar
  41. Giacomo Pedretti et al. X-time: An in-memory engine for accelerating machine learning on tabular data with cams. arXiv preprint arXiv:2304.01285, 2023.Google ScholarGoogle Scholar
  42. Minh Pham, Yicheng Tu, and Xiaoyi Lv. Accelerating bwa-mem read mapping on gpus. In 37th International Conference on Supercomputing, pages 155--166, 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Songyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, and Ying Wang. Cim-mlc: A multi-level compilation stack for computing-in-memory accelerators. arXiv preprint arXiv:2401.12428, 2024.Google ScholarGoogle Scholar
  44. Mariam Rakka et al. Dt2cam: A decision tree to content addressable memory framework. IEEE Transactions on Emerging Topics in Computing, 2023.Google ScholarGoogle Scholar
  45. Indranil Roy and Srinivas Aluru. Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM Transactions on computational biology and bioinformatics, 13(1):99--111, 2015.Google ScholarGoogle Scholar
  46. Adam Siemieniuk et al. Occ: An automated end-to-end machine learning optimizing compiler for computing-in-memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021.Google ScholarGoogle Scholar
  47. Mathias Soeken et al. An mig-based compiler for programmable logic-in-memory architectures. In 53rd Annual Design Automation Conference, pages 1--6, 2016.Google ScholarGoogle Scholar
  48. Jacob R Stevens, Ashish Ranjan, Dipankar Das, Bharat Kaul, and Anand Raghunathan. Manna: An accelerator for memory-augmented neural networks. In 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 794--806, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Fang Wang et al. Speeding up querying and mining operations on data with modern hardware. 2022.Google ScholarGoogle Scholar
  50. Xunzhao Yin, et al. Deep random forest with ferroelectric analog content addressable memory. arXiv preprint arXiv:2110.02495, 2021.Google ScholarGoogle Scholar
  51. Xunzhao Yin et al. Fecam: A universal compact digital and analog content addressable memory using ferroelectric. IEEE Transactions on Electron Devices, 67(7):2785--2792, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  52. Tao Yu, Yichi Zhang, Zhiru Zhang, and Christopher M De Sa. Understanding hyperdimensional computing for parallel single-pass learning. Advances in Neural Info. Processing Sys., 35:1157--1169, 2022.Google ScholarGoogle Scholar
  53. Xiaodong Yu et al. Robotomata: A framework for approximate pattern matching of big data on an automata processor. In 2017 IEEE International Conference on Big Data (Big Data), pages 283--292, 2017.Google ScholarGoogle Scholar
  54. Zhuowen Zou, Hanning Chen, Prathyush Poduval, Yeseong Kim, Mahdi Imani, Elaheh Sadredini, Rosario Cammarota, and Mohsen Imani. Biohd: an efficient genome sequence search platform using hyperdimensional memorization. In 49th Annual International Symposium on Computer Architecture, ISCA '22, page 656--669, New York, NY, USA, 2022. Association for Computing Machinery.Google ScholarGoogle Scholar
  55. Charles A Zukowski and Shao-Yi Wang. Use of selective precharge for low-power content-addressable memories. In 1997 IEEE International Symposium on Circuits and Systems (ISCAS), volume 3, pages 1788--1791. IEEE, 1997.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
    April 2024
    1106 pages
    ISBN:9798400703867
    DOI:10.1145/3620666

    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 April 2024

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate535of2,713submissions,20%
  • Article Metrics

    • Downloads (Last 12 months)128
    • Downloads (Last 6 weeks)128

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader