research-article

Unrolling Ternary Neural Networks

Authors:
Stephen Tridgell

The University of Sydney, NSW, Australia

The University of Sydney, NSW, Australia

0000-0002-5884-2417
View Profile

,
Martin Kumm

Fulda University of Applied Sciences, Fulda, Germany

Fulda University of Applied Sciences, Fulda, Germany
View Profile

,
Martin Hardieck

University of Kassel, Kassel, Germany

University of Kassel, Kassel, Germany
View Profile

,
David Boland

The University of Sydney, NSW, Australia

The University of Sydney, NSW, Australia
View Profile

,
Duncan Moss

The University of Sydney, NSW, Australia

The University of Sydney, NSW, Australia
View Profile

,
Peter Zipf

University of Kassel, Kassel, Germany

University of Kassel, Kassel, Germany
View Profile

,
Philip H. W. Leong

The University of Sydney, NSW, Australia

The University of Sydney, NSW, Australia
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 12 Issue 4Article No.: 22pp 1–23https://doi.org/10.1145/3359983

Published:18 October 2019Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

The computational complexity of neural networks for large-scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This article demonstrates, for the case where the neural network architecture and ternary weight values are known a priori, that extremely high throughput implementations of neural network inference can be made by customising the datapath and routing to remove unnecessary computations and data movement. This approach is ideally suited to FPGA implementations as a specialized implementation of a trained network improves efficiency while still retaining generality with the reconfigurability of an FPGA. A VGG-style network with ternary weights and fixed point activations is implemented for the CIFAR10 dataset on Amazon’s AWS F1 instance. This article demonstrates how to remove 90% of the operations in convolutional layers by exploiting sparsity and compile-time optimizations. The implementation in hardware achieves 90.9 ± 0.1% accuracy and 122k frames per second, with a latency of only 29µs, which is the fastest CNN inference implementation reported so far on an FPGA.

References

Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2017. YodaNN: An architecture for ultralow power binary-weight CNN acceleration. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 37, 1 (2017), 48--60.Google ScholarCross Ref
Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: Constructing hardware in a scala embedded language. In Proceedings of the Design Automation Conference (DAC’12). IEEE, 1212--1221.Google ScholarDigital Library
Chaim Baskin, Natan Liss, Evgenii Zheltonozhskii, Alex M Bronstein, and Avi Mendelson. 2018. Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’18). IEEE, 162--169.Google ScholarCross Ref
Yoonho Boo and Wonyong Sung. 2017. Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations. In Proceedings of the 2017 IEEE International Workshop on Signal Processing Systems (SiPS’17). IEEE, 1--6.Google ScholarCross Ref
P. Cappello and K. Steiglitz. 1984. Some complexity issues in digital signal processing. IEEE Trans. Acoust. Speech Sign. Process. 32, 5 (Oct. 1984), 1037--1041.Google ScholarCross Ref
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. 52, 1 (2017), 127--138.Google ScholarCross Ref
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems. 4107--4115.Google Scholar
Julian Faraone, Nicholas Fraser, Giulio Gambardella, Michaela Blott, and Philip H. W. Leong. 2017. Compressing low precision deep neural networks using sparsity-induced regularization in ternary networks. In Proceedings of the International Conference on Neural Information Processing. Springer, 393--404.Google Scholar
Linux Foundation. 2015. Data Plane Development Kit (DPDK). Retrieved from http://www.dpdk.org.Google Scholar
Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms. ACM, 25--30.Google Scholar
Ben Graham. 2014. Fractional max-pooling (2014). arXiv preprint arXiv:1412.6071 (2014).Google Scholar
Martin Hardieck, Martin Kumm, Konrad Möller, and Peter Zipf. 2018. Constant matrix multiplication with ternary adders. In Proceedings of the IEEE International Conference on Electronics, Circuits and Systems (ICECS’18).Google ScholarCross Ref
Shen-Fu Hsiao, Ming-Chih Chen, and Chia-Shin Tu. 2006. Memory-free low-cost designs of advanced encryption standard using common subexpression elimination for subfunctions in transformations. IEEE Trans. Circ. Syst. I: Regul. Pap. 53, 3 (2006), 615--626.Google ScholarCross Ref
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 1--12.Google ScholarDigital Library
Jin Hee Kim, Brett Grady, Ruolong Lian, John Brothers, and Jason H. Anderson. 2017. FPGA-based CNN inference accelerator synthesized from multi-threaded C software. In Proceedings of the 30th IEEE International System-on-Chip Conference (SOCC'17). IEEE, 268--273.Google Scholar
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. Citeseer.Google Scholar
Martin Kumm, Martin Hardieck, and Peter Zipf. 2017. Optimization of constant matrix multiplication with low power and high throughput. IEEE Trans. Comput. 66, 12 (2017), 2072--2080.Google ScholarCross Ref
Martin Kumm, Peter Zipf, Mathias Faust, and Chip-Hong Chang. 2012. Pipelined adder graph optimization for high speed multiple constant multiplication. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’12). IEEE, 49--52.Google ScholarCross Ref
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (5 2015), 436--444. DOI:https://doi.org/10.1038/nature14539Google Scholar
Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016).Google Scholar
Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, and Fengbo Ren. 2017. A 7.663-TOPS 8.2-W energy-efficient FPGA accelerator for binary convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 290--291.Google ScholarDigital Library
Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, and Shaojun Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072--1086.Google ScholarDigital Library
Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, and Pradeep Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 (2017).Google Scholar
Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, and Luca Benini. 2018. NEURA ghe: Exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on Zynq SoCs. ACM Trans. Reconfig. Technol. Syst. 11, 3 (2018), 18.Google ScholarDigital Library
Duncan J. M. Moss, Eriko Nurvitadhi, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and Philip H. W. Leong. 2017. High performance binary neural networks on the Xeon+ FPGA™ platform. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 1--4.Google Scholar
Adrien Prost-Boucle, Alban Bourge, Frédéric Pétrot, Hande Alemdar, Nicholas Caldwell, and Vincent Leroy. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 1--7.Google ScholarCross Ref
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et al. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26--35.Google ScholarDigital Library
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 525--542.Google ScholarCross Ref
Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 65--74.Google ScholarDigital Library
Ganesh Venkatesh, Eriko Nurvitadhi, and Debbie Marr. 2017. Accelerating deep convolutional networks using low-precision and sparsity. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 2861--2865.Google ScholarCross Ref
Yanshu Wang, Shuang Liang, Song Yao, Yi Shan, Song Han, Junjie Peng, and Hong Xia Luo. 2017. Reconfigurable processor for deep learning in autonomous vehicles. ITU Journal: ICT Discoveries 1 (2017).Google Scholar
Ning Wu, Xiaoqiang Zhang, Yunfei Ye, and Lidong Lan. 2013. Improving common subexpression elimination algorithm with a new gate-level delay computing method. In Proceedings of the World Congress on Engineering and Computer Science, Vol. 2.Google Scholar
Chi Zhang and Viktor Prasanna. 2017. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 35--44.Google ScholarDigital Library
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).Google Scholar

Index Terms

Unrolling Ternary Neural Networks

Recommendations

High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression
Special Issue on Deep learning on FPGAs

Although performing inference with artificial neural networks (ANN) was until quite recently considered as essentially compute intensive, the emergence of deep neural networks coupled with the evolution of the integration technology transformed ...
Read More
TAB: Unified and Optimized Ternary, Binary, and Mixed-precision Neural Network Inference on the Edge
Ternary Neural Networks (TNNs) and mixed-precision Ternary Binary Networks (TBNs) have demonstrated higher accuracy compared to Binary Neural Networks (BNNs) while providing fast, low-power, and memory-efficient inference. Related works have improved the ...
Read More
An FPGA implementation for neural networks with the FDFM processor core approach

This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 12, Issue 4
December 2019
163 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3361265
Editor:
Deming Chen
University of Illinois, Urbana-Champaign Urbana
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 October 2019
- Revised: 1 August 2019
- Accepted: 1 August 2019
- Received: 1 December 2018
Published in trets Volume 12, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Low-precision machine learning
sparse matrix operations
ternary neural networks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 360
  Total Downloads
- Downloads (Last 12 months)59
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Unrolling Ternary Neural Networks

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression

TAB: Unified and Optimized Ternary, Binary, and Mixed-precision Neural Network Inference on the Edge

An FPGA implementation for neural networks with the FDFM processor core approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Unrolling Ternary Neural Networks

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression

TAB: Unified and Optimized Ternary, Binary, and Mixed-precision Neural Network Inference on the Edge

An FPGA implementation for neural networks with the FDFM processor core approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media