ABSTRACT
Multiplication is arguably the most cost-dominant operation in modern deep neural networks (DNNs), limiting their achievable efficiency and thus more extensive deployment in resource-constrained applications. To tackle this limitation, pioneering works have developed handcrafted multiplication-free DNNs, which require expert knowledge and time-consuming manual iteration, calling for fast development tools. To this end, we propose a Neural Architecture Search and Acceleration framework dubbed NASA, which enables automated multiplication-reduced DNN development and integrates a dedicated multiplication-reduced accelerator for boosting DNNs' achievable efficiency. Specifically, NASA adopts neural architecture search (NAS) spaces that augment the state-of-the-art one with hardware inspired multiplication-free operators, such as shift and adder, armed with a novel progressive pretrain strategy (PGP) together with customized training recipes to automatically search for optimal multiplication-reduced DNNs; On top of that, NASA further develops a dedicated accelerator, which advocates a chunk-based template and auto-mapper dedicated for NASA-NAS resulting DNNs to better leverage their algorithmic properties for boosting hardware efficiency. Experimental results and ablation studies consistently validate the advantages of NASA's algorithm-hardware co-design framework in terms of achievable accuracy and efficiency tradeoffs. Codes are available at https://github.com/shihuihong214/NASA.
- Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016), 1--12.Google ScholarCross Ref
- R. Banner, Itay Hubara, E. Hoffer, and Daniel Soudry. 2018. Scalable Methods for 8-bit Training of Neural Networks. In NeurIPS.Google Scholar
- Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. ArXiv abs/1812.00332 (2019).Google Scholar
- Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2018. Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks. arXiv preprint arXiv:1807.07928 (2018).Google Scholar
- Yu-Hsin Chen, T. Krishna, J. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52 (2017), 127--138.Google ScholarCross Ref
- Mostafa Elhoushi, Farhan Shafiq, Y. Tian, Joey Li, and Zihao Chen. 2019. DeepShift: Towards Multiplication-Less Neural Networks. ArXiv abs/1905.13298 (2019).Google Scholar
- Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770--778.Google Scholar
- Yu hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (2016), 367--379.Google ScholarDigital Library
- Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. ArXiv abs/1611.01144 (2017).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google Scholar
- Hanxiao Liu, K. Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. ArXiv abs/1806.09055 (2019).Google Scholar
- Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and J. Dean. 2018. Efficient Neural Architecture Search via Parameter Sharing. In ICML.Google Scholar
- Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized Evolution for Image Classifier Architecture Search. In AAAI.Google Scholar
- Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 6517--6525.Google ScholarCross Ref
- Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. ArXiv abs/1804.02767 (2018).Google Scholar
- Yongming Shen, Michael Ferdman, and Peter Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), 535--547.Google ScholarDigital Library
- Christian Szegedy, W. Liu, Y. Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, D. Erhan, V. Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 1--9.Google ScholarCross Ref
- Alvin Wan, Xiaoliang Dai, Peizhao Zhang, Zijian He, Yuandong Tian, Saining Xie, Bichen Wu, Matthew Yu, Tao Xu, Kan Chen, Péter Vajda, and Joseph Gonzalez. 2020. FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 12962--12971.Google ScholarCross Ref
- Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, and Song Han. 2020. HAT: Hardware-Aware Transformers for Efficient Natural Language Processing. ArXiv abs/2005.14187 (2020).Google Scholar
- Yunhe Wang, Mingqiang Huang, Kai Han, Hanting Chen, Wei Zhang, Chunjing Xu, and Dacheng Tao. 2020. AdderNet: Do We Really Need Multiplications in Deep Learning? 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 1465--1474.Google Scholar
- Yunhe Wang, Mingqiang Huang, Kai Han, Hanting Chen, Wei Zhang, Chunjing Xu, and Dacheng Tao. 2021. AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence. ArXiv abs/2101.10015 (2021).Google Scholar
- Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Péter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 10726--10734.Google ScholarCross Ref
- Bichen Wu, Alvin Wan, Xiangyu Yue, Peter H. Jin, Sicheng Zhao, Noah Golmant, Amir Gholaminejad, Joseph E. Gonzalez, and Kurt Keutzer. 2018. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 9127--9135.Google ScholarCross Ref
- Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, and Yingyan Lin. 2020. AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA, USA) (FPGA '20). Association for Computing Machinery, New York, NY, USA, 40--50. Google ScholarDigital Library
- Ping Xue and Bede Liu. 1984. Adaptive equalizer using finite-bit power-of-two quantizer. In IEEE Trans. Acoust. Speech Signal Process.Google Scholar
- Haoran You, Xiaohan Chen, Yongan Zhang, Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, and Yingyan Lin. 2020. ShiftAddNet: A Hardware-Inspired Deep Network. Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS) (2020).Google Scholar
- Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, and Quoc V. Le. 2020. BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models. In ECCV.Google Scholar
- Xiaofan Zhang, Junsong Wang, Chao Zhu, Y. Lin, Jinjun Xiong, W. Hwu, and D. Chen. 2018. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2018), 1--8.Google Scholar
- Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, and Yingyan Lin. 2020. SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) (2020), 954--967.Google Scholar
- Yang Zhao, Chaojian Li, Yue Wang, Pengfei Xu, Yongan Zhang, and Yingyan Lin. 2020. DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020), 1593--1597.Google Scholar
- Barret Zoph and Quoc V. Le. 2017. Neural Architecture Search with Reinforcement Learning. ArXiv abs/1611.01578 (2017).Google Scholar
- Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 8697--8710.Google Scholar
Index Terms
- NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks
Comments