research-article

SCALENet: A SCalable Low power AccELerator for Real-time Embedded Deep Neural Networks

Authors:
Colin Shea

University of Maryland, Baltimore County, Baltimore , MD, USA

University of Maryland, Baltimore County, Baltimore , MD, USA
View Profile

,
Adam Page

University of Maryland, Baltimore County, Baltimore, MD, USA

University of Maryland, Baltimore County, Baltimore, MD, USA
View Profile

,
Tinoosh Mohsenin

University of Maryland, Baltimore County, Baltimore, MD, USA

University of Maryland, Baltimore County, Baltimore, MD, USA
View Profile

GLSVLSI '18: Proceedings of the 2018 on Great Lakes Symposium on VLSIMay 2018Pages 129–134https://doi.org/10.1145/3194554.3194601

Published:30 May 2018Publication History

GLSVLSI '18: Proceedings of the 2018 on Great Lakes Symposium on VLSI

Pages 129–134

ABSTRACT

As deep learning networks mature and improve classification performance, a significant challenge is their deployment in embedded settings. Modern network typologies, such as convolutional neural networks, can be very deep and impose considerable complexity that is often not feasible in resource bound, real-time systems. Processing of these networks requires high levels of parallelization, maximizing data throughput, and support for different network types, while minimizing power and resource consumption. In response to these requirements, in this paper, we present a low power FPGA based neural network accelerator named SCALENet: a SCalable Low power AccELerator for real-time deep neural Networks. Key features include optimization for power with coarse and fine grain scheduler, implementation flexibility with hardware only or hardware/software co-design, and acceleration for both fully connected and convolutional layers. The experimental results evaluate SCALENet against two different neural network applications: image processing, and biomedical seizure detection. The image processing networks, implemented on SCALENet, trained on the CIFAR-10 and ImageNet datasets with eight different networks, are implemented on an Arty A7 and Zedboard#8482; FPGA platforms. The highest improvement came with the Inception network on an ImageNet dataset with a throughput of 22x and decrease in energy consumption of 13x compared to the ARM processor implementation. We then implement SCALENet for time series EEG seizure detection using both a Direct Convolution and FFT Convolution method to show its design versatility with a 99.7% reduction in execution time and a 97.9% improvement in energy consumption compared to the ARM. Finally, we demonstrate the ability to achieve parity with or exceed the energy efficiency of NVIDIA GPUs when evaluated against Jetson TK1 with embedded GPU System on Chip (SoC) and with a 4x power savings in a power envelope of 2.07 Watts.

References

T. Abtahi, A. Kulkarni, and T. Mohsenin. 2017. Accelerating convolutional neural network with FFT on tiny cores. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.Google Scholar
T. Abtahi, C. Shea, A. Kulkarni, and T. Mohsenin. 2018. Accelerating Convolu- tional Neural Network with FFT on Embedded Hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2018).Google ScholarCross Ref
U. Rajendra Acharya et al . 2017. Computers in Biology and Medicine (2017).Google Scholar
Y. Chen et al. 2016. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In ISSCC.Google ScholarDigital Library
Vinayak Gokhale et al. 2014. A 240 G-ops/s mobile coprocessor for deep neural networks. In CVPRW. Google ScholarDigital Library
A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. 2000 (June 13). PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101, 23 (2000 (June 13)), e215--e220. Circulation Electronic Pages: http://circ.ahajournals.org/content/101/23/e215.full PMID:1085218.Google ScholarCross Ref
Forrest N. Iandola et al . 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR (2016).Google Scholar
Adam Page, Ali Jafari, Colin Shea, and Tinoosh Mohsenin. 2017. SPARCNet: A Hardware Accelerator for Efficient Deployment of Sparse Convolutional Networks. J. Emerg. Technol. Comput. Syst. 13, 3, Article 31 (May 2017), 32 pages. Google ScholarDigital Library
A. Page, C. Shea, and T. Mohsenin. 2016. Wearable Seizure Detection using Convolutional Neural Networks with Transfer Learning. In ISCAS.Google Scholar
Mohammad Samragh et al. 2017. Customizing Neural Networks for Efficient FPGA Implementation. In FCCM. IEEE.Google Scholar
Jaehyeong Sim et al. 2016. 14.6 A 1.42 TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems. In ISSCC. IEEE.Google Scholar
Karen Simonyan et al. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR (2014).Google Scholar
L. Song et al . 2016. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In DAC. Google ScholarDigital Library
Abdulhamit Subasi et al.. Classification of EEG signals using neural network and logistic regression. Computer Methods and Programs in Biomedicine ({n. d.}).Google Scholar
Xilinx. 2011. Power Methodology Guide. (2011).Google Scholar
Chen Zhang et al. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In FPGA (FPGA '15). ACM. Google ScholarDigital Library

Index Terms

SCALENet: A SCalable Low power AccELerator for Real-time Embedded Deep Neural Networks

Recommendations

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained great attentions due to its higher energy efficiency than GPUs. However, it has been a challenge for FPGA-based solutions to achieve a higher throughput than GPU ...
Read More
A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks
Special Issue on Frontiers of Hardware and Algorithms for On-chip Learning, Special Issue on Silicon Photonics and Regular Papers

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have received attention due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. ...
Read More
CoNNa–Hardware accelerator for compressed convolutional neural networks
Abstract
In this paper, we propose a novel Convolutional Neural Network hardware accelerator called CoNNA, capable of accelerating pruned, quantized CNNs. In contrast to most existing solutions, CoNNA offers a complete solution to the compressed CNN ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GLSVLSI '18: Proceedings of the 2018 on Great Lakes Symposium on VLSI
May 2018
533 pages
ISBN:9781450357241
DOI:10.1145/3194554
General Chair:
Deming Chen
University of Illinois, USA
,
Program Chairs:
Houman Homayoun
George Mason University, USA
,
Baris Taskin
Drexel University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolutional neural network
embedded systems
fpga
low power
machine learning
programmable hardware accelerator
scalable
Qualifiers
- research-article
Conference

Acceptance Rates
GLSVLSI '18 Paper Acceptance Rate48of197submissions,24%Overall Acceptance Rate312of1,156submissions,27%
More
Upcoming Conference
GLSVLSI '24

Sponsor:

sigda

Great Lakes Symposium on VLSI 2024

June 12 - 14, 2024

Clearwater , FL , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 375
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SCALENet: A SCalable Low power AccELerator for Real-time Embedded Deep Neural Networks

GLSVLSI '18: Proceedings of the 2018 on Great Lakes Symposium on VLSI

ABSTRACT

References

Cited By

Index Terms

Recommendations

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

CoNNa–Hardware accelerator for compressed convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SCALENet: A SCalable Low power AccELerator for Real-time Embedded Deep Neural Networks

GLSVLSI '18: Proceedings of the 2018 on Great Lakes Symposium on VLSI

ABSTRACT

References

Cited By

Index Terms

Recommendations

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

CoNNa–Hardware accelerator for compressed convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media