ABSTRACT
The requirements' demands of applications, such as real-time response, are pushing the wearable devices to leverage more power-efficient processors inside the SoC (System-on-chip). However, existing wearable devices are not well suited for such challenging applications due to poor performance, while the conventional powerful many-core architectures are not appropriate either due to the stringent power budget in this domain. We propose LOCUS - a low-power, customizable, many-core processor for next-generation wearable devices. LOCUS combines customizable processor cores with a customizable network on a message-passing architecture to deliver very competitive performance/watt - an average 3.1x compared to quad-core ARM processors used in the state-of-the-art wearable devices. A combination of full-system simulation with representative applications from wearable domain and RTL synthesis of the architecture show that 16-core LOCUS achieves an average 1.52x performance/watt improvement over a conventional 16-core shared-memory many-core architecture.
- Amber Arm-Compatible Core. http://goo.gl/jshd3q.Google Scholar
- AR Glasses SDK. http://goo.gl/o9Y5YM.Google Scholar
- ECG Processing - R-Peaks Detection. http://goo.gl/oybn8c.Google Scholar
- Gartner Inc. http://goo.gl/tvinzf.Google Scholar
- Google Glass. https://goo.gl/2VDMyO.Google Scholar
- Google Glass SDK. https://goo.gl/jWeUh5.Google Scholar
- Google's Fused Location API. https://goo.gl/fackd8.Google Scholar
- HERE Maps. http://goo.gl/lVPqux.Google Scholar
- Ineda Dhanush WPU. http://goo.gl/SFml7h.Google Scholar
- Intel Xeon Phi. http://goo.gl/8jxtzr.Google Scholar
- LG G Watch. http://goo.gl/5BZ5zD.Google Scholar
- Lg Watch Urbane w150. http://goo.gl/qg76vg.Google Scholar
- Moto 360. http://goo.gl/N1jquY.Google Scholar
- MPICH. https://www.mpich.org/.Google Scholar
- Odroid-XU3. http://goo.gl/vhPocF.Google Scholar
- Offline Navigation. http://goo.gl/Bmeljs.Google Scholar
- ORA by Optinvent. http://optinvent.com/.Google Scholar
- Qualcomm Snapdragon 400. https://goo.gl/aja771.Google Scholar
- Samsung Gear S. http://goo.gl/aE6ApL.Google Scholar
- Samsung Gear SDK. http://goo.gl/cT4qXJ.Google Scholar
- SmartWatch 2 APIs. https://goo.gl/IBGTmg.Google Scholar
- Snapdragon 400 Chip Cost. http://goo.gl/YAIqzJ.Google Scholar
- Sony SmartWatch 3. http://goo.gl/qrV8ux.Google Scholar
- N. Agarwal et al. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS'09.Google Scholar
- S. Bell et al. Tile64-processor: A 64-core soc with mesh interconnect. In ISSCC'08.Google Scholar
- N. Binkert et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011. Google ScholarDigital Library
- C.-H. O. Chen et al. SMART: a single-cycle reconfigurable NoC for SoC applications. In DATE'13. Google ScholarDigital Library
- L. Chen et al. A just-in-time customizable processor. In ICCAD'13. Google ScholarDigital Library
- N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In MICRO'04. Google ScholarDigital Library
- N. Clark et al. An architecture framework for transparent instruction set customization in embedded processors. In ISCA'05. Google ScholarDigital Library
- F. Conti et al. PULP: A ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. Journal of Signal Processing Systems, 2015. Google ScholarDigital Library
- A. Corradini. Dynamic time warping for off-line recognition of a small gesture vocabulary. In Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001. Google ScholarDigital Library
- Z. Cvetanovic and C. Nofsinger. Parallel astar search on message-passing architectures. In System Sciences, 1990., Proceedings of the Twenty-Third Annual Hawaii International Conference on, volume 1, pages 82--90. IEEE, 1990.Google ScholarCross Ref
- A. Y. Dogan et al. Multi-core architecture design for ultra-low-power wearable health monitoring systems. In DATE'12. Google ScholarDigital Library
- A. Duller et al. Parallel processing-the picoChip way. Communicating Processing Architectures, 2003.Google Scholar
- A. Efrat et al. Curve matching, time warping, and light fields: New algorithms for computing similarity between curves. J. Math. Imaging Vis. Google ScholarDigital Library
- M. Gschwind et al. Synergistic processing in Cell's multicore architecture. MICRO'06. Google ScholarDigital Library
- L. Gwennap. Adapteva: More flops, less watts. Microprocessor Report, 6(13):11--02, 2011.Google Scholar
- J. Howard et al. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In ISSCC'10.Google Scholar
- L. Huang et al. Accelerating NoC-based MPI primitives via communication architecture customization. In ASAP'12. Google ScholarDigital Library
- T. Krishna et al. Breaking the on-chip latency barrier using SMART. In HPCA'13. Google ScholarDigital Library
- B. Li et al. The power-performance tradeoffs of the Intel Xeon Phi on HPC applications. In IPDPSW'14. Google ScholarDigital Library
- S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO'09. Google ScholarDigital Library
- L. McMurchie and C. Ebeling. PathFinder: a negotiation-based performance-driven router for FPGAs. In FPGA'95. Google ScholarDigital Library
- M. Müller. Dynamic time warping. Information retrieval for music and motion, 2007.Google Scholar
- M. Ohara et al. MPI microtask for programming the Cell broadband engine processor. IBM Systems Journal, 2006. Google ScholarDigital Library
- J. Psota and A. Agarwal. rmpi: Message passing on multicore processors with on-chip interconnect. In HiPEAC'08. Google ScholarDigital Library
- H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on, 1978.Google Scholar
- K. Sankaran et al. Using mobile phone barometer for low-power transportation context detection. SenSys'14. Google ScholarDigital Library
- C. Sun et al. DSENT-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In NoCS'12. Google ScholarDigital Library
- C. Tappert et al. The state of the art in online handwriting recognition. Pattern Analysis and Machine Intelligence, 1990. Google ScholarDigital Library
- M. B. Taylor et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. MICRO'02. Google ScholarDigital Library
- S. V. Tota et al. MEDEA: a hybrid shared-memory/message-passing multiprocessor noc-based architecture. In DATE'10. Google ScholarDigital Library
- P. Yu and T. Mitra. Characterizing embedded applications for instruction-set extensible processors. In DAC'04. Google ScholarDigital Library
- P. Yu and T. Mitra. Scalable custom instructions identification for instruction-set extensible processors. In CASES'04. Google ScholarDigital Library
- J. Zebchuk et al. A tagless coherence directory. In MICRO'09. Google ScholarDigital Library
- LOCUS: low-power customizable many-core architecture for wearables
Recommendations
LOCUS: Low-Power Customizable Many-Core Architecture for Wearables
Special Issue on Autonomous Battery-Free Sensing and Communication, Special Issue on ESWEEK 2016 and Regular PapersApplication requirements, such as real-time response, are pushing wearable devices to leverage more powerful processors inside the SoC (system on chip). However, existing wearable devices are not well suited for such challenging applications due to poor ...
Quantitative Trait Locus Analysis Using a Partitioned Linear Model on a GPU Cluster
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumQuantitative Trait Locus (QTL) analysis is a statistical technique that allows understanding of the relationship between plant genotypes and the resultant continuous phenotypes in non-constant environments. This requires generation and processing of ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and SimulationHigh performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Comments