skip to main content
research-article

High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression

Published:12 December 2018Publication History
Skip Abstract Section

Abstract

Although performing inference with artificial neural networks (ANN) was until quite recently considered as essentially compute intensive, the emergence of deep neural networks coupled with the evolution of the integration technology transformed inference into a memory bound problem. This ascertainment being established, many works have lately focused on minimizing memory accesses, either by enforcing and exploiting sparsity on weights or by using few bits for representing activations and weights, to be able to use ANNs inference in embedded devices. In this work, we detail an architecture dedicated to inference using ternary {−1, 0, 1} weights and activations. This architecture is configurable at design time to provide throughput vs. power trade-offs to choose from. It is also generic in the sense that it uses information drawn for the target technologies (memory geometries and cost, number of available cuts, etc.) to adapt at best to the FPGA resources. This allows to achieve up to 5.2k frames per second per Watt for classification on a VC709 board using approximately half of the resources of the FPGA.

References

  1. Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of the 30th International Joint Conference on Neural Networks. 2547--2554. Retrieved from https://github.com/slide-lig/tnn-train.Google ScholarGoogle ScholarCross RefCross Ref
  2. Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2018. YodaNN: An architecture for ultra-low power binary-weight CNN acceleration. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 1 (2018), 48--60.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ken Batcher. 1987. Quoted in “Humour the computer”, Andrew Davidson, 1995, MIT Press, p.-40.Google ScholarGoogle Scholar
  4. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. MIT Press, 3123--3131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. 14.1A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In Proceedings of the IEEE International Solid-State Circuits Conference. IEEE, 238--239.Google ScholarGoogle Scholar
  6. Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms. 25--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lu Hou, Quanming Yao, and James T. Kwok. 2016. Loss-aware binarization of deep networks. arXiv:1611.01600.Google ScholarGoogle Scholar
  9. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061.Google ScholarGoogle Scholar
  10. Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weights -1, 0, and +1. In Proceedings of the IEEE Workshop on Signal Processing Systems.Google ScholarGoogle ScholarCross RefCross Ref
  11. Matthew Jacobsen, Dustin Richmond, Matthew Hogains, and Ryan Kastner. 2015. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM Trans. Reconfig. Technol. Syst. 8, 4 (Sept. 2015), 22:23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition. IEEE, 1462--1467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Donald E. Knuth. 1997. Seminumerical algorithms, vol. 2. In The Art of Computer Programming. Addison-Wesley, Reading.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. Toronto University.Google ScholarGoogle Scholar
  15. Martin Kumm and Peter Zipf. 2014. Pipelined compressor tree optimization using integer linear programming. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  16. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  17. Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv:1605.04711.Google ScholarGoogle Scholar
  18. Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, and Fengbo Ren. 2017. A 7.663TOPS 8.2W energy-efficient FPGA accelerator for binary convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 290--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhiqiang Liu, Yong Dou, Jingfei Jiang, Jinwei Xu, Shijie Li, Yongmei Zhou, and Yingnan Xu. 2017. Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10, 3 (July 2017), 17:1--17:23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Duncan J. M. Moss, Eriko Nurvitadhi, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and Philip H. W. Leong. 2017. High performance binary neural networks on the Xeon+FPGA platform. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.Google ScholarGoogle Scholar
  21. Hiroki Nakahara, Tomoya Fujii, and Shimpei Sato. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.Google ScholarGoogle Scholar
  22. Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.Google ScholarGoogle Scholar
  23. Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, and Guy Boudoukh. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jinhwan Park and Wonyong Sung. 2016. FPGA-based implementation of deep neural networks using on-chip memory only. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1011--1015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ardavan Pedram, Stephen Richardson, Mark Horowitz, Sameh Galal, and Shahar Kvatinsky. 2017. Dark memory and accelerator-rich system optimization in the dark silicon era. IEEE Design Test 34, 2 (2017), 39--50.Google ScholarGoogle ScholarCross RefCross Ref
  26. Adrien Prost-Boucle, Alban Bourge, Frédéric Pétrot, Hande Alemdar, Nicholas Caldwell, and Vincent Leroy. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 525--542.Google ScholarGoogle ScholarCross RefCross Ref
  28. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  29. Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2011. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. In Proceedings of the International Joint Conference on Neural Networks.Google ScholarGoogle Scholar
  30. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261.Google ScholarGoogle Scholar
  31. Olivier Temam. 2010. The rebirth of neural networks. Keynote speech. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Heng Wai Leong, Magnus Jahre, and Kees A. Vissers. 2016. FINN: A framework for fast, scalable binarized neural network inference. arXiv:1612.07119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Vissers. 2017. A framework for reduced precision neural networks on FPGA. In Proceedings of the 17th International Forum on MPSoC. Retrieved from http://www.mpsoc-forum.org/previous/2017/files/proceedings/Kees_Vissers.pdf.Google ScholarGoogle Scholar
  35. Ephrem Wu, Xiaoqian Zhang, David Berman, and Inkeun Cho. 2017. A high-throughput reconfigurable processing array for neural networks. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained ternary quantization. arXiv:1612.01064v3.Google ScholarGoogle Scholar
  38. Peter Škoda, Tomislav Lipić, Àgoston Srp, Branka Medved Rogina, Karolj Skala, and Ferenc Vajda. 2011. Implementation framework for artificial neural networks on FPGA. In Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’11). 274--278.Google ScholarGoogle Scholar

Index Terms

  1. High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 3
            Special Issue on Deep learning on FPGAs
            September 2018
            187 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/3299999
            • Editor:
            • Steve Wilton
            Issue’s Table of Contents

            Copyright © 2018 ACM

            © 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 December 2018
            • Accepted: 1 August 2018
            • Revised: 1 July 2018
            • Received: 1 November 2017
            Published in trets Volume 11, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader