research-article

LOCUS: low-power customizable many-core architecture for wearables

Authors:
Cheng Tan

National University of Singapore

National University of Singapore
View Profile

,
Aditi Kulkarni

National University of Singapore

National University of Singapore
View Profile

,
Vanchinathan Venkataramani

National University of Singapore

National University of Singapore
View Profile

,
Manupa Karunaratne

National University of Singapore

National University of Singapore
View Profile

,
Tulika Mitra

National University of Singapore

National University of Singapore
View Profile

,
Li-Shiuan Peh

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded SystemsOctober 2016Article No.: 11Pages 1–10https://doi.org/10.1145/2968455.2968506

Published:01 October 2016Publication History

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Pages 1–10

ABSTRACT

The requirements' demands of applications, such as real-time response, are pushing the wearable devices to leverage more power-efficient processors inside the SoC (System-on-chip). However, existing wearable devices are not well suited for such challenging applications due to poor performance, while the conventional powerful many-core architectures are not appropriate either due to the stringent power budget in this domain. We propose LOCUS - a low-power, customizable, many-core processor for next-generation wearable devices. LOCUS combines customizable processor cores with a customizable network on a message-passing architecture to deliver very competitive performance/watt - an average 3.1x compared to quad-core ARM processors used in the state-of-the-art wearable devices. A combination of full-system simulation with representative applications from wearable domain and RTL synthesis of the architecture show that 16-core LOCUS achieves an average 1.52x performance/watt improvement over a conventional 16-core shared-memory many-core architecture.

References

Amber Arm-Compatible Core. http://goo.gl/jshd3q.Google Scholar
AR Glasses SDK. http://goo.gl/o9Y5YM.Google Scholar
ECG Processing - R-Peaks Detection. http://goo.gl/oybn8c.Google Scholar
Gartner Inc. http://goo.gl/tvinzf.Google Scholar
Google Glass. https://goo.gl/2VDMyO.Google Scholar
Google Glass SDK. https://goo.gl/jWeUh5.Google Scholar
Google's Fused Location API. https://goo.gl/fackd8.Google Scholar
HERE Maps. http://goo.gl/lVPqux.Google Scholar
Ineda Dhanush WPU. http://goo.gl/SFml7h.Google Scholar
Intel Xeon Phi. http://goo.gl/8jxtzr.Google Scholar
LG G Watch. http://goo.gl/5BZ5zD.Google Scholar
Lg Watch Urbane w150. http://goo.gl/qg76vg.Google Scholar
Moto 360. http://goo.gl/N1jquY.Google Scholar
MPICH. https://www.mpich.org/.Google Scholar
Odroid-XU3. http://goo.gl/vhPocF.Google Scholar
Offline Navigation. http://goo.gl/Bmeljs.Google Scholar
ORA by Optinvent. http://optinvent.com/.Google Scholar
Qualcomm Snapdragon 400. https://goo.gl/aja771.Google Scholar
Samsung Gear S. http://goo.gl/aE6ApL.Google Scholar
Samsung Gear SDK. http://goo.gl/cT4qXJ.Google Scholar
SmartWatch 2 APIs. https://goo.gl/IBGTmg.Google Scholar
Snapdragon 400 Chip Cost. http://goo.gl/YAIqzJ.Google Scholar
Sony SmartWatch 3. http://goo.gl/qrV8ux.Google Scholar
N. Agarwal et al. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS'09.Google Scholar
S. Bell et al. Tile64-processor: A 64-core soc with mesh interconnect. In ISSCC'08.Google Scholar
N. Binkert et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011. Google ScholarDigital Library
C.-H. O. Chen et al. SMART: a single-cycle reconfigurable NoC for SoC applications. In DATE'13. Google ScholarDigital Library
L. Chen et al. A just-in-time customizable processor. In ICCAD'13. Google ScholarDigital Library
N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In MICRO'04. Google ScholarDigital Library
N. Clark et al. An architecture framework for transparent instruction set customization in embedded processors. In ISCA'05. Google ScholarDigital Library
F. Conti et al. PULP: A ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. Journal of Signal Processing Systems, 2015. Google ScholarDigital Library
A. Corradini. Dynamic time warping for off-line recognition of a small gesture vocabulary. In Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001. Google ScholarDigital Library
Z. Cvetanovic and C. Nofsinger. Parallel astar search on message-passing architectures. In System Sciences, 1990., Proceedings of the Twenty-Third Annual Hawaii International Conference on, volume 1, pages 82--90. IEEE, 1990.Google ScholarCross Ref
A. Y. Dogan et al. Multi-core architecture design for ultra-low-power wearable health monitoring systems. In DATE'12. Google ScholarDigital Library
A. Duller et al. Parallel processing-the picoChip way. Communicating Processing Architectures, 2003.Google Scholar
A. Efrat et al. Curve matching, time warping, and light fields: New algorithms for computing similarity between curves. J. Math. Imaging Vis. Google ScholarDigital Library
M. Gschwind et al. Synergistic processing in Cell's multicore architecture. MICRO'06. Google ScholarDigital Library
L. Gwennap. Adapteva: More flops, less watts. Microprocessor Report, 6(13):11--02, 2011.Google Scholar
J. Howard et al. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In ISSCC'10.Google Scholar
L. Huang et al. Accelerating NoC-based MPI primitives via communication architecture customization. In ASAP'12. Google ScholarDigital Library
T. Krishna et al. Breaking the on-chip latency barrier using SMART. In HPCA'13. Google ScholarDigital Library
B. Li et al. The power-performance tradeoffs of the Intel Xeon Phi on HPC applications. In IPDPSW'14. Google ScholarDigital Library
S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO'09. Google ScholarDigital Library
L. McMurchie and C. Ebeling. PathFinder: a negotiation-based performance-driven router for FPGAs. In FPGA'95. Google ScholarDigital Library
M. Müller. Dynamic time warping. Information retrieval for music and motion, 2007.Google Scholar
M. Ohara et al. MPI microtask for programming the Cell broadband engine processor. IBM Systems Journal, 2006. Google ScholarDigital Library
J. Psota and A. Agarwal. rmpi: Message passing on multicore processors with on-chip interconnect. In HiPEAC'08. Google ScholarDigital Library
H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on, 1978.Google Scholar
K. Sankaran et al. Using mobile phone barometer for low-power transportation context detection. SenSys'14. Google ScholarDigital Library
C. Sun et al. DSENT-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In NoCS'12. Google ScholarDigital Library
C. Tappert et al. The state of the art in online handwriting recognition. Pattern Analysis and Machine Intelligence, 1990. Google ScholarDigital Library
M. B. Taylor et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. MICRO'02. Google ScholarDigital Library
S. V. Tota et al. MEDEA: a hybrid shared-memory/message-passing multiprocessor noc-based architecture. In DATE'10. Google ScholarDigital Library
P. Yu and T. Mitra. Characterizing embedded applications for instruction-set extensible processors. In DAC'04. Google ScholarDigital Library
P. Yu and T. Mitra. Scalable custom instructions identification for instruction-set extensible processors. In CASES'04. Google ScholarDigital Library
J. Zebchuk et al. A tagless coherence directory. In MICRO'09. Google ScholarDigital Library

LOCUS: low-power customizable many-core architecture for wearables
1. Computer systems organization

Recommendations

LOCUS: Low-Power Customizable Many-Core Architecture for Wearables
Special Issue on Autonomous Battery-Free Sensing and Communication, Special Issue on ESWEEK 2016 and Regular Papers

Application requirements, such as real-time response, are pushing wearable devices to leverage more powerful processors inside the SoC (system on chip). However, existing wearable devices are not well suited for such challenging applications due to poor ...
Read More
Quantitative Trait Locus Analysis Using a Partitioned Linear Model on a GPU Cluster
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Quantitative Trait Locus (QTL) analysis is a statistical technique that allows understanding of the relationship between plant genotypes and the resultant continuous phenotypes in non-constant environments. This requires generation and processing of ...
Read More
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems
October 2016
187 pages
ISBN:9781450344821
DOI:10.1145/2968455

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 119
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LOCUS: low-power customizable many-core architecture for wearables

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

ABSTRACT

References

Cited By

Recommendations

LOCUS: Low-Power Customizable Many-Core Architecture for Wearables

Quantitative Trait Locus Analysis Using a Partitioned Linear Model on a GPU Cluster

Evaluation of Rodinia Codes on Intel Xeon Phi

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

LOCUS: low-power customizable many-core architecture for wearables

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

ABSTRACT

References

Cited By

Recommendations

LOCUS: Low-Power Customizable Many-Core Architecture for Wearables

Quantitative Trait Locus Analysis Using a Partitioned Linear Model on a GPU Cluster

Evaluation of Rodinia Codes on Intel Xeon Phi

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media