skip to main content
10.1145/2968455.2968506acmotherconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

LOCUS: low-power customizable many-core architecture for wearables

Published:01 October 2016Publication History

ABSTRACT

The requirements' demands of applications, such as real-time response, are pushing the wearable devices to leverage more power-efficient processors inside the SoC (System-on-chip). However, existing wearable devices are not well suited for such challenging applications due to poor performance, while the conventional powerful many-core architectures are not appropriate either due to the stringent power budget in this domain. We propose LOCUS - a low-power, customizable, many-core processor for next-generation wearable devices. LOCUS combines customizable processor cores with a customizable network on a message-passing architecture to deliver very competitive performance/watt - an average 3.1x compared to quad-core ARM processors used in the state-of-the-art wearable devices. A combination of full-system simulation with representative applications from wearable domain and RTL synthesis of the architecture show that 16-core LOCUS achieves an average 1.52x performance/watt improvement over a conventional 16-core shared-memory many-core architecture.

References

  1. Amber Arm-Compatible Core. http://goo.gl/jshd3q.Google ScholarGoogle Scholar
  2. AR Glasses SDK. http://goo.gl/o9Y5YM.Google ScholarGoogle Scholar
  3. ECG Processing - R-Peaks Detection. http://goo.gl/oybn8c.Google ScholarGoogle Scholar
  4. Gartner Inc. http://goo.gl/tvinzf.Google ScholarGoogle Scholar
  5. Google Glass. https://goo.gl/2VDMyO.Google ScholarGoogle Scholar
  6. Google Glass SDK. https://goo.gl/jWeUh5.Google ScholarGoogle Scholar
  7. Google's Fused Location API. https://goo.gl/fackd8.Google ScholarGoogle Scholar
  8. HERE Maps. http://goo.gl/lVPqux.Google ScholarGoogle Scholar
  9. Ineda Dhanush WPU. http://goo.gl/SFml7h.Google ScholarGoogle Scholar
  10. Intel Xeon Phi. http://goo.gl/8jxtzr.Google ScholarGoogle Scholar
  11. LG G Watch. http://goo.gl/5BZ5zD.Google ScholarGoogle Scholar
  12. Lg Watch Urbane w150. http://goo.gl/qg76vg.Google ScholarGoogle Scholar
  13. Moto 360. http://goo.gl/N1jquY.Google ScholarGoogle Scholar
  14. MPICH. https://www.mpich.org/.Google ScholarGoogle Scholar
  15. Odroid-XU3. http://goo.gl/vhPocF.Google ScholarGoogle Scholar
  16. Offline Navigation. http://goo.gl/Bmeljs.Google ScholarGoogle Scholar
  17. ORA by Optinvent. http://optinvent.com/.Google ScholarGoogle Scholar
  18. Qualcomm Snapdragon 400. https://goo.gl/aja771.Google ScholarGoogle Scholar
  19. Samsung Gear S. http://goo.gl/aE6ApL.Google ScholarGoogle Scholar
  20. Samsung Gear SDK. http://goo.gl/cT4qXJ.Google ScholarGoogle Scholar
  21. SmartWatch 2 APIs. https://goo.gl/IBGTmg.Google ScholarGoogle Scholar
  22. Snapdragon 400 Chip Cost. http://goo.gl/YAIqzJ.Google ScholarGoogle Scholar
  23. Sony SmartWatch 3. http://goo.gl/qrV8ux.Google ScholarGoogle Scholar
  24. N. Agarwal et al. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS'09.Google ScholarGoogle Scholar
  25. S. Bell et al. Tile64-processor: A 64-core soc with mesh interconnect. In ISSCC'08.Google ScholarGoogle Scholar
  26. N. Binkert et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C.-H. O. Chen et al. SMART: a single-cycle reconfigurable NoC for SoC applications. In DATE'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Chen et al. A just-in-time customizable processor. In ICCAD'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In MICRO'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Clark et al. An architecture framework for transparent instruction set customization in embedded processors. In ISCA'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. Conti et al. PULP: A ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. Journal of Signal Processing Systems, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Corradini. Dynamic time warping for off-line recognition of a small gesture vocabulary. In Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Z. Cvetanovic and C. Nofsinger. Parallel astar search on message-passing architectures. In System Sciences, 1990., Proceedings of the Twenty-Third Annual Hawaii International Conference on, volume 1, pages 82--90. IEEE, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  34. A. Y. Dogan et al. Multi-core architecture design for ultra-low-power wearable health monitoring systems. In DATE'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Duller et al. Parallel processing-the picoChip way. Communicating Processing Architectures, 2003.Google ScholarGoogle Scholar
  36. A. Efrat et al. Curve matching, time warping, and light fields: New algorithms for computing similarity between curves. J. Math. Imaging Vis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Gschwind et al. Synergistic processing in Cell's multicore architecture. MICRO'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. Gwennap. Adapteva: More flops, less watts. Microprocessor Report, 6(13):11--02, 2011.Google ScholarGoogle Scholar
  39. J. Howard et al. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In ISSCC'10.Google ScholarGoogle Scholar
  40. L. Huang et al. Accelerating NoC-based MPI primitives via communication architecture customization. In ASAP'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. Krishna et al. Breaking the on-chip latency barrier using SMART. In HPCA'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Li et al. The power-performance tradeoffs of the Intel Xeon Phi on HPC applications. In IPDPSW'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. L. McMurchie and C. Ebeling. PathFinder: a negotiation-based performance-driven router for FPGAs. In FPGA'95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Müller. Dynamic time warping. Information retrieval for music and motion, 2007.Google ScholarGoogle Scholar
  46. M. Ohara et al. MPI microtask for programming the Cell broadband engine processor. IBM Systems Journal, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Psota and A. Agarwal. rmpi: Message passing on multicore processors with on-chip interconnect. In HiPEAC'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on, 1978.Google ScholarGoogle Scholar
  49. K. Sankaran et al. Using mobile phone barometer for low-power transportation context detection. SenSys'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. C. Sun et al. DSENT-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In NoCS'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. Tappert et al. The state of the art in online handwriting recognition. Pattern Analysis and Machine Intelligence, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. B. Taylor et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. MICRO'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. S. V. Tota et al. MEDEA: a hybrid shared-memory/message-passing multiprocessor noc-based architecture. In DATE'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. P. Yu and T. Mitra. Characterizing embedded applications for instruction-set extensible processors. In DAC'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. P. Yu and T. Mitra. Scalable custom instructions identification for instruction-set extensible processors. In CASES'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. J. Zebchuk et al. A tagless coherence directory. In MICRO'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. LOCUS: low-power customizable many-core architecture for wearables

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems
      October 2016
      187 pages
      ISBN:9781450344821
      DOI:10.1145/2968455

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate52of230submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader