ABSTRACT
Hardware specialization is seen as a promising venue for improving computing efficiency, with reconfigurable devices as excellent deployment platforms for application-specific architectures. One approach to hardware specialization is via the popular RISC-V, where Instruction Set Architecture (ISA) extensions for domains such as Edge Artifical Intelligence (AI) are already appearing. However, to use the custom instructions while maintaining a high (e.g., C/C++) abstraction level, the assembler and compiler must be modified. Alternatively, inline assembly can be manually introduced by a software developer with expert knowledge of the hardware modifications in the RISC-V core.
In this paper, we consider a RISC-V core with a vectorization and streaming engine to support the Unlimited Vector Extension (UVE), and propose an approach to automatically transform annotated C loops into UVE compatible code, via automatic insertion of inline assembly. We rely on a source-to-source transformation tool, Clava, to perform sophisticated code analysis and transformations via scripts. We use pragmas to identify code sections amenable for vectorization and/or streaming, and use Clava to automatically insert inline UVE instructions, avoiding extensive modifications of existing compiler projects.
We produce UVE binaries which are functionally correct, when compared to handwritten versions with inline assembly, and achieve equal and sometimes improved number of executed instructions, for a set of six benchmarks from the Polybench suite. These initial results are evidence towards that this kind of translation is feasible, and we consider that it is possible in future work to target more complex transformations or other ISA extensions, accelerating the adoption of hardware/software co-design flows for generic application cases.
- Imad Al Assir, Mohamad El Iskandarani, Hadi Rayan Al Sandid, and Mazen A. R. Saghir. 2021. Arrow: A RISC-V Vector Accelerator for Machine Learning Inference. https://doi.org/10.48550/ARXIV.2107.07169Google ScholarCross Ref
- Hansang Bae, Dheya Mustafa, Jae-Woo Lee, Aurangzeb, Hao Lin, Chirag Dave, Rudolf Eigenmann, and Samuel P. Midkiff. 2013. The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation. Intl. Journal of Parallel Programming 41, 6 (01 Dec 2013), 753–767. https://doi.org/10.1007/s10766-012-0211-zGoogle ScholarDigital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (aug 2011), 1–7. https://doi.org/10.1145/2024716.2024718Google ScholarDigital Library
- João Bispo and João M.P. Cardoso. 2020. Clava: C/C++ source-to-source compilation using LARA. SoftwareX 12 (2020). https://doi.org/10.1016/j.softx.2020.100565Google ScholarCross Ref
- Joao Mario Domingos, Nuno Neves, Nuno Roma, and Pedro Tomás. 2021. Unlimited Vector Extension with Data Streaming Support. In ACM/IEEE 48th Annual Intl. Symp. on Computer Architecture (ISCA). 209–222. https://doi.org/10.1109/ISCA52012.2021.00025Google ScholarDigital Library
- Carlo Galuzzi and Koen Bertels. 2011. The Instruction-Set Extension Problem: A Survey. ACM Trans. Reconfigurable Technol. Syst. 4, 2, Article 18 (may 2011), 28 pages. https://doi.org/10.1145/1968502.1968509Google ScholarDigital Library
- Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A Stream Compiler for Communication-Exposed Architectures. In Proc. of the 10th Intl. Conference on Architectural Support for Programming Languages and Operating Systems. 291–303. https://doi.org/10.1145/605397.605428Google ScholarDigital Library
- Paul Grigoras, Xinyu Niu, Jose G. F. Coutinho, Wayne Luk, Jacob Bower, and Oliver Pell. 2013. Aspect driven compilation for dataflow designs. In IEEE 24th Intl. Conference on Application-Specific Systems, Architectures and Processors. 18–25. https://doi.org/10.1109/ASAP.2013.6567545Google ScholarDigital Library
- Marie-Christine Jakobs, Felix Pauck, Marco Platzner, Heike Wehrheim, and Tobias Wiersema. 2021. Software/Hardware Co-Verification for Custom Instruction Set Processors. IEEE Access 9 (2021). https://doi.org/10.1109/ACCESS.2021.3131213Google ScholarCross Ref
- Matthew Johns and Tom J. Kazmierski. 2020. A Minimal RISC-V Vector Processor for Embedded Systems. In Forum for Specification and Design Languages (FDL). 1–4. https://doi.org/10.1109/FDL50818.2020.9232940Google ScholarCross Ref
- David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A Language and Compiler for Application Accelerators. In Proc. of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, NY, USA, 296–311. https://doi.org/10.1145/3192366.3192379Google ScholarDigital Library
- Bastian Koppelmann, Peer Adelt, Wolfgang Mueller, and Christoph Scheytt. 2019. RISC-V Extensions for Bit Manipulation Instructions. In 29th Intl. Symp. on Power and Timing Modeling, Optimization and Simulation (PATMOS). 41–48. https://doi.org/10.1109/PATMOS.2019.8862170Google ScholarCross Ref
- C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. In Intl. Symp. on Code Generation and Optimization. 75–86. https://doi.org/10.1109/CGO.2004.1281665Google ScholarCross Ref
- LLVM Project. 2022. Clang: a C language family frontend for LLVM. https://clang.llvm.org/Google Scholar
- Sparsh Mittal. 2020. A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications 32, 4 (01 Feb 2020), 1109–1139.Google Scholar
- Nuno Neves, Joao Mario Domingos, Nuno Roma, Pedro Tomas, and Gabriel Falcao. 2022. Compiling for Vector Extensions with Stream-based Specialization. IEEE Micro (2022), 49–58. https://doi.org/10.1109/MM.2022.3173405Google ScholarDigital Library
- Nuno Paulino, João Canas Ferreira, and João M. P. Cardoso. 2020. Optimizing OpenCL Code for Performance on FPGA: k-Means Case Study With Integer Data Sets. IEEE Access 8 (2020). https://doi.org/10.1109/ACCESS.2020.3017552Google ScholarCross Ref
- Francesco Peverelli, Marco Rabozzi, Emanuele Del Sozzo, and Marco D. Santambrogio. 2018. OXiGen: A tool for automatic acceleration of c functions into dataflow FPGA-based kernels. In Proc. of the IEEE 32nd Intl. Parallel and Distributed Processing Symp. Workshops, IPDPSW 2018. 91–98. https://doi.org/10.1109/IPDPSW.2018.00023Google ScholarCross Ref
- Pedro Pinto, Tiago Carvalho, João Bispo, and João M. P. Cardoso. 2017. LARA as a Language-Independent Aspect-Oriented Programming Approach. In Proc. of the Symp. on Applied Computing. New York, NY, USA, 1623–1630. https://doi.org/10.1145/3019612.3019749Google ScholarDigital Library
- Pouchet Louis-Noël. 15. PolyBench/C - the Polyhedral Benchmark suite. http://web.cse.ohio-state.edu/$$pouchet.2/software/polybench/Google Scholar
- Dan Quinlan and Chunhua Liao. 2011. The ROSE source-to-source compiler infrastructure. In Cetus users and compiler infrastructure workshop, in conjunction with PACT, Vol. 2011. Citeseer, 1.Google Scholar
- RISC-V Software. 2022. RISC-V Vector Extension 1.0. https://github.com/riscv/riscv-v-spec/releases/tag/v1.0.Google Scholar
- RISC-V Software. 2022. Spike RISC-V ISA Simulator. https://github.com/riscv-software-src/riscv-isa-sim.Google Scholar
- Fabian Schuiki, Florian Zaruba, Torsten Hoefler, and Luca Benini. 2021. Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores. IEEE Trans. Comput. 70, 2 (2021), 212–227. https://doi.org/10.1109/TC.2020.2987314Google ScholarDigital Library
- Hafsah Shahzad, Ahmed Sanaullah, and Martin Herbordt. 2021. Survey and Future Trends for FPGA Cloud Architectures. In IEEE High Performance Extreme Computing Conference. 1–10. https://doi.org/10.1109/HPEC49654.2021.9622807Google ScholarCross Ref
- Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathanael Premillieu, Alastair Reid, Alejandro Rico, and Paul Walker. 2017. The ARM Scalable Vector Extension. IEEE Micro 37, 2 (2017), 26–39. https://doi.org/10.1109/MM.2017.35Google ScholarDigital Library
- S. Summers, A. Rose, and P. Sanders. 2017. Using MaxCompiler for the high level synthesis of trigger algorithms. Journal of Instrumentation 12, 02 (feb 2017), C02015. https://doi.org/10.1088/1748-0221/12/02/C02015Google ScholarCross Ref
- Etienne Tehrani, Tarik Graba, Abdelmalek Si Merabet, and Jean-Luc Danger. 2020. RISC-V Extension for Lightweight Cryptography. In 23rd Euromicro Conference on Digital System Design. 222–228. https://doi.org/10.1109/DSD51259.2020.00045Google ScholarCross Ref
- Jessica Vandebon, Jose G. F. Coutinho, Wayne Luk, Eriko Nurvitadhi, and Tim Todman. 2020. Artisan: a Meta-Programming Approach For Codifying Optimisation Strategies. In IEEE 28th Annual Intl. Symp. on Field-Programmable Custom Computing Machines (FCCM). 177–185. https://doi.org/10.1109/FCCM48280.2020.00032Google ScholarCross Ref
- Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, and Kunle Olukotun. 2021. SARA: Scaling a Reconfigurable Dataflow Accelerator. In ACM/IEEE 48th Annual Intl. Symp. on Computer Architecture (ISCA). 1041–1054. https://doi.org/10.1109/ISCA52012.2021.00085Google ScholarDigital Library
- Yuzhi Zhou, Xi Jin, and Tian Xiang. 2020. RISC-V Graphics Rendering Instruction Set Extensions for Embedded AI Chips Implementation. In Proc. of the 2020 2nd Intl. Conference on Big Data Engineering and Technology. 85–88. https://doi.org/10.1145/3378904.3378926Google ScholarDigital Library
Index Terms
- Using Source-to-Source to Target RISC-V Custom Extensions: UVE Case-Study
Recommendations
Combining source-to-source transformations and processor instruction set extensions for the automated design-space exploration of embedded systems
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsIndustry's demand for flexible embedded solutions providing high performance and short time-to-market has led to the development of configurable and extensible processors. These pre-verified application-specific processors build on proven baseline cores ...
Combining source-to-source transformations and processor instruction set extensions for the automated design-space exploration of embedded systems
Proceedings of the 2007 LCTES conferenceIndustry's demand for flexible embedded solutions providing high performance and short time-to-market has led to the development of configurable and extensible processors. These pre-verified application-specific processors build on proven baseline cores ...
Hardware/software co-design of a fuzzy RISC processor
DATE '98: Proceedings of the conference on Design, automation and test in EuropeIn this paper, we show how hardware/software co-evaluation can be applied to instruction set definition. As a case study, we show the definition and evaluation of instruction set extensions for fuzzy processing. These instructions are based on the use ...
Comments