Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Montagna, Fabio; Tagliavini, Giuseppe; Rossi, Davide; Garofalo, Angelo; Benini, Luca

doi:10.1007/978-3-030-81682-7_11

Fabio Montagna¹¹,
Giuseppe Tagliavini¹¹,
Davide Rossi¹¹,
Angelo Garofalo¹¹ &
…
Luca Benini^11,12

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12800))

Included in the following conference series:

International Conference on Architecture of Computing Systems

690 Accesses
3 Citations

Abstract

High-level programming models aim at exploiting hardware parallelism and reducing software development costs. However, their adoption on ultra-low-power multi-core microcontroller (MCU) platforms requires minimizing the overheads of work-sharing constructs on fine-grained parallel regions. This work tackles this challenge by proposing OMP-SPMD, a streamlined approach for parallel computing enabling the OpenMP syntax for the Single-Program Multiple-Data (SPMD) paradigm. To assess the performance improvement, we compare our solution with two alternatives: a baseline implementation of the OpenMP runtime based on the fork-join paradigm (OMP-base) and a version leveraging hardware-specific optimizations (OPM-opt). We benchmarked these libraries on a Parallel Ultra-Low Power (PULP) MCU, highlighting that hardware-specific optimizations improve OMP-base performance up to 69%. At the same time, OMP-SPMD leads to an extra improvement up to 178%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Impact of Parallel Programming Interfaces on Energy

MAPS: A Software Development Environment for Embedded Multicore Applications

MAPS: A Software Development Environment for Embedded Multi-core Applications

References

Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisciplinary Rev. Comput. Stat. 2(4), 433–459 (2010)
Article Google Scholar
Brigham, E.O.: The Fast Fourier Transform and its Applications. Prentice-Hall Inc., Hoboken (1988)
Google Scholar
Chapman, B., Huang, L., Biscondi, E., Stotzer, E., Shrivastava, A., Gatherer, A.: Implementing OpenMP on a high performance embedded multicore MPSoC. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2009)
Google Scholar
Chen, K.C., Chen, C.H.: Enabling SIMT execution model on homogeneous multi-core system. ACM Trans. Archit. Code Optim. (TACO) 15(1), 1–26 (2018)
Article Google Scholar
Diaz, J., Munoz-Caro, C., Nino, A.: A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans. Parallel Distrib. Syst. 23(8), 1369–1386 (2012)
Article Google Scholar
Gaster, B., Howes, L., Kaeli, D.R., Mistry, P., Schaa, D.: Heterogeneous computing with OpenCL. Newnes (2012)
Google Scholar
Gautschi, M., et al.: Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE Trans. Very Large Scale Integration (VLSI) Syst. 25(10), 2700–2713 (2017)
Google Scholar
Glaser, F., Tagliavini, G., Rossi, D., Haugou, G., Huang, Q., Benini, L.: Energy-efficient hardware-accelerated synchronization for shared-L1-memory multiprocessor clusters. IEEE Trans. Parallel Distrib. Syst. 32(3), 633–648 (2021)
Article Google Scholar
GNU Foundation: libgomp runtime. https://gcc.gnu.org/onlinedocs/libgomp/
LLVM Project: LLVM OpenMP runtime. https://openmp.llvm.org/Reference.pdf
Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)
Article Google Scholar
Mitra, G., Stotzer, E., Jayaraj, A., Rendell, A.P.: Implementation and optimization of the OpenMP accelerator model for the TI keystone II architecture. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 202–214. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11454-5_15
Chapter Google Scholar
Montagna, F., Benatti, S., Rossi, D.: Flexible, scalable and energy efficient bio-signals processing on the pulp platform: a case study on seizure detection. J. Low Power Electron. Appl. 7(2), 16 (2017)
Article Google Scholar
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for? Queue 6(2), 40–53 (2008)
Article Google Scholar
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006)
Article Google Scholar
Pereira, M.M., Sousa, R.C.F., Araujo, G.: Compiling and optimizing OpenMP 4.X programs to OpenCL and SPIR. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 48–61. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_4
Chapter Google Scholar
Pullini, A., Rossi, D., Loi, I., Tagliavini, G., Benini, L.: Mr. Wolf: an energy-precision scalable parallel ultra low power SoC for IoT edge processing. IEEE J. Solid-State Circuits 54(7), 1970–1981 (2019)
Google Scholar
PULP Project: RI5CY Manual. https://www.pulp-platform.org/docs/ri5cy_user_manual.pdf
PULP Project: Setup of Xilinx FPGA boards. https://github.com/pulp-platform/pulp/tree/master/fpga/pulpissimo-zcu104
Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)
Google Scholar
Rossi, D., et al.: PULP: a parallel ultra low power platform for next generation IoT applications. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–39. IEEE (2015)
Google Scholar
Sony Corporation: Sony Spresense multicore microcontroller. https://developer.sony.com/develop/spresense/
Stratton, J.A., et al.: Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 111–119 (2010)
Google Scholar
Tagliavini, G., Cesarini, D., Marongiu, A.: Unleashing fine-grained parallelism on embedded many-core accelerators with lightweight OpenMP tasking. IEEE Trans. Parallel Distrib. Syst. 29(9), 2150–2163 (2018)
Article Google Scholar
Taylor, B., Marco, V.S., Wang, Z.: Adaptive optimization for OpenCL programs on embedded heterogeneous systems. ACM SIGPLAN Notices 52(5), 11–20 (2017)
Article Google Scholar

Download references

Acknowledgments

This work has been partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement numbers 732631 (OPRECOMP), 863337 (WiPLASH), and 857191 (IOTWINS).

Author information

Authors and Affiliations

University of Bologna, Bologna, Italy
Fabio Montagna, Giuseppe Tagliavini, Davide Rossi, Angelo Garofalo & Luca Benini
ETH Zürich, Zürich, Switzerland
Luca Benini

Authors

Fabio Montagna
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Tagliavini
View author publications
You can also search for this author in PubMed Google Scholar
Davide Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Angelo Garofalo
View author publications
You can also search for this author in PubMed Google Scholar
Luca Benini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giuseppe Tagliavini .

Editor information

Editors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Christian Hochberger
Karlsruhe Institute of Technology, Karlsruhe, Germany
Lars Bauer
Otto-von-Guericke University Magdeburg, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Montagna, F., Tagliavini, G., Rossi, D., Garofalo, A., Benini, L. (2021). Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs. In: Hochberger, C., Bauer, L., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2021. Lecture Notes in Computer Science(), vol 12800. Springer, Cham. https://doi.org/10.1007/978-3-030-81682-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-81682-7_11
Published: 15 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81681-0
Online ISBN: 978-3-030-81682-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Abstract

Access this chapter

Similar content being viewed by others

The Impact of Parallel Programming Interfaces on Energy

MAPS: A Software Development Environment for Embedded Multicore Applications

MAPS: A Software Development Environment for Embedded Multi-core Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Abstract

Access this chapter

Similar content being viewed by others

The Impact of Parallel Programming Interfaces on Energy

MAPS: A Software Development Environment for Embedded Multicore Applications

MAPS: A Software Development Environment for Embedded Multi-core Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation