skip to main content
10.1145/3194554.3194601acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

SCALENet: A SCalable Low power AccELerator for Real-time Embedded Deep Neural Networks

Authors Info & Claims
Published:30 May 2018Publication History

ABSTRACT

As deep learning networks mature and improve classification performance, a significant challenge is their deployment in embedded settings. Modern network typologies, such as convolutional neural networks, can be very deep and impose considerable complexity that is often not feasible in resource bound, real-time systems. Processing of these networks requires high levels of parallelization, maximizing data throughput, and support for different network types, while minimizing power and resource consumption. In response to these requirements, in this paper, we present a low power FPGA based neural network accelerator named SCALENet: a SCalable Low power AccELerator for real-time deep neural Networks. Key features include optimization for power with coarse and fine grain scheduler, implementation flexibility with hardware only or hardware/software co-design, and acceleration for both fully connected and convolutional layers. The experimental results evaluate SCALENet against two different neural network applications: image processing, and biomedical seizure detection. The image processing networks, implemented on SCALENet, trained on the CIFAR-10 and ImageNet datasets with eight different networks, are implemented on an Arty A7 and Zedboard#8482; FPGA platforms. The highest improvement came with the Inception network on an ImageNet dataset with a throughput of 22x and decrease in energy consumption of 13x compared to the ARM processor implementation. We then implement SCALENet for time series EEG seizure detection using both a Direct Convolution and FFT Convolution method to show its design versatility with a 99.7% reduction in execution time and a 97.9% improvement in energy consumption compared to the ARM. Finally, we demonstrate the ability to achieve parity with or exceed the energy efficiency of NVIDIA GPUs when evaluated against Jetson TK1 with embedded GPU System on Chip (SoC) and with a 4x power savings in a power envelope of 2.07 Watts.

References

  1. T. Abtahi, A. Kulkarni, and T. Mohsenin. 2017. Accelerating convolutional neural network with FFT on tiny cores. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.Google ScholarGoogle Scholar
  2. T. Abtahi, C. Shea, A. Kulkarni, and T. Mohsenin. 2018. Accelerating Convolu- tional Neural Network with FFT on Embedded Hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2018).Google ScholarGoogle ScholarCross RefCross Ref
  3. U. Rajendra Acharya et al . 2017. Computers in Biology and Medicine (2017).Google ScholarGoogle Scholar
  4. Y. Chen et al. 2016. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In ISSCC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vinayak Gokhale et al. 2014. A 240 G-ops/s mobile coprocessor for deep neural networks. In CVPRW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. 2000 (June 13). PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101, 23 (2000 (June 13)), e215--e220. Circulation Electronic Pages: http://circ.ahajournals.org/content/101/23/e215.full PMID:1085218.Google ScholarGoogle ScholarCross RefCross Ref
  7. Forrest N. Iandola et al . 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR (2016).Google ScholarGoogle Scholar
  8. Adam Page, Ali Jafari, Colin Shea, and Tinoosh Mohsenin. 2017. SPARCNet: A Hardware Accelerator for Efficient Deployment of Sparse Convolutional Networks. J. Emerg. Technol. Comput. Syst. 13, 3, Article 31 (May 2017), 32 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Page, C. Shea, and T. Mohsenin. 2016. Wearable Seizure Detection using Convolutional Neural Networks with Transfer Learning. In ISCAS.Google ScholarGoogle Scholar
  10. Mohammad Samragh et al. 2017. Customizing Neural Networks for Efficient FPGA Implementation. In FCCM. IEEE.Google ScholarGoogle Scholar
  11. Jaehyeong Sim et al. 2016. 14.6 A 1.42 TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems. In ISSCC. IEEE.Google ScholarGoogle Scholar
  12. Karen Simonyan et al. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR (2014).Google ScholarGoogle Scholar
  13. L. Song et al . 2016. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In DAC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Abdulhamit Subasi et al.. Classification of EEG signals using neural network and logistic regression. Computer Methods and Programs in Biomedicine ({n. d.}).Google ScholarGoogle Scholar
  15. Xilinx. 2011. Power Methodology Guide. (2011).Google ScholarGoogle Scholar
  16. Chen Zhang et al. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In FPGA (FPGA '15). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SCALENet: A SCalable Low power AccELerator for Real-time Embedded Deep Neural Networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            GLSVLSI '18: Proceedings of the 2018 on Great Lakes Symposium on VLSI
            May 2018
            533 pages
            ISBN:9781450357241
            DOI:10.1145/3194554

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 May 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            GLSVLSI '18 Paper Acceptance Rate48of197submissions,24%Overall Acceptance Rate312of1,156submissions,27%

            Upcoming Conference

            GLSVLSI '24
            Great Lakes Symposium on VLSI 2024
            June 12 - 14, 2024
            Clearwater , FL , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader