ABSTRACT
In this paper, an energy-efficient and high-speed comparator-based processing-in-memory accelerator (CMP-PIM) is proposed to efficiently execute a novel hardware-oriented comparator-based deep neural network called CMPNET. Inspired by local binary pattern feature extraction method combined with depthwise separable convolution, we first modify the existing Convolutional Neural Network (CNN) algorithm by replacing the computationally-intensive multiplications in convolution layers with more efficient and less complex comparison and addition. Then, we propose a CMP-PIM that employs parallel computational memory sub-array as a fundamental processing unit based on SOT-MRAM. We compare CMP-PIM accelerator performance on different data-sets with recent CNN accelerator designs. With the close inference accuracy on SVHN data-set, CMP-PIM can get ∼ 94× and 3× better energy efficiency compared to CNN and Local Binary CNN (LBCNN), respectively. Besides, it achieves 4.3× speed-up compared to CNN-baseline with identical network configuration.
- L. Cavigelli et al., "Accelerating real-time embedded scene labeling with convolutional networks," in DAC, 2015 52nd ACM/IEEE. IEEE, 2015, pp. 1--6. Google ScholarDigital Library
- R. Andri et al., "Yodann: An architecture for ultra-low power binary-weight cnn acceleration," IEEE TCAD, 2017.Google Scholar
- S. Zhou et al., "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," arXiv preprint arXiv:1606.06160, 2016.Google Scholar
- M. Rastegari et al., "Xnor-net: Imagenet classification using binary convolutional neural networks," in European Conference on Computer Vision. Springer, 2016, pp. 525--542.Google Scholar
- T. P. Weldon et al., "Efficient gabor filter design for texture segmentation," Pattern recognition, vol. 29, no. 12, pp. 2005--2015, 1996.Google ScholarDigital Library
- T. Ahonen et al., "Face description with local binary patterns: Application to face recognition," IEEE TPAMI, vol. 28, pp. 2037--2041, 2006. Google ScholarDigital Library
- Juefei-Xu et al., "Local binary convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 19--28.Google Scholar
- S. S. Sarwar et al., "Gabor filter assisted energy efficient fast learning convolutional neural networks," arXiv preprint arXiv:1705.04748, 2017.Google Scholar
- T. Tang et al., "Binary convolutional neural network on rram," in 22nd ASP-DAC. IEEE, 2017, pp. 782--787.Google Scholar
- S. Li et al., "Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories," in 2016 53nd DAC. IEEE, 2016. Google ScholarDigital Library
- P. Chi et al., "Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory," in ISCA. IEEE Press, 2016. Google ScholarDigital Library
- Y. Kim et al., "Write-optimized reliable design of stt mram," in Proceedings of the 2012 ACM/IEEE ISLPED. ACM, 2012. Google ScholarDigital Library
- G. Prenat et al., "Beyond stt-mram, spin orbit torque ram sot-mram for high speed and high reliability applications," in Spintronics-based Computing. Springer, 2015.Google Scholar
- L. Sifre and S. Mallat, "Rigid-motion scattering for image classification," Ph.D. dissertation, Citeseer, 2014.Google Scholar
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.Google Scholar
- F. Chollet, "Xception: Deep learning with depthwise separable convolutions," arXiv preprint arXiv:1610.02357, 2016.Google Scholar
- K. He et al., "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770--778.Google Scholar
- C.-F. Pai et al., "Spin transfer torque devices utilizing the giant spin hall effect of tungsten," Applied Physics Letters, 2012.Google ScholarCross Ref
- S. Aga et al., "Compute caches," in High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 2017, pp. 481--492.Google Scholar
- S. Jeloka et al., "A 28 nm configurable memory (tcam/bcam/sram) using push-rule 6t bit cell enabling logic-in-memory," IEEE Journal of Solid-State Circuits, vol. 51, no. 4, pp. 1009--1021, 2016.Google ScholarCross Ref
- R. Collobert et al., "Torch7: A matlab-like environment for machine learning," in BigLearn, NIPS Workshop, no. EPFL-CONF-192376, 2011.Google Scholar
- K. Simonyan et al., "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.Google Scholar
- C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1--9.Google Scholar
- (2011) Ncsu eda freepdk45. {Online}. Available: http://www.eda.ncsu.edu/wiki/FreePDK45:ContentsGoogle Scholar
- Z. He et al., "High performance and energy-efficient in-memory computing architecture based on sot-mram," in NANOARCH. IEEE, 2017, pp. 97--102.Google Scholar
- X. Dong et al., "Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory," in Emerging Memory Technologies. Springer, 2014, pp. 15--50.Google ScholarCross Ref
Recommendations
TOP-PIM: throughput-oriented programmable processing in memory
HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computingAs computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical to continued performance scaling. Moving computation closer to memory presents an opportunity to ...
CMP-PIM: An Energy-Efficient Comparator-based Processing-In-Memory Neural Network Accelerator
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)In this paper, an energy-efficient and high-speed comparator-based processing-in-memory accelerator (CMP-PIM) is proposed to efficiently execute a novel hardware-oriented comparator-based deep neural network called CMPNET. Inspired by local binary pattern ...
PQ-PIM: A pruning–quantization joint optimization framework for ReRAM-based processing-in-memory DNN accelerator
AbstractPruning and quantization are two efficient techniques to achieve performance improvement and energy saving for ReRAM-based DNN accelerators. However, most existing ReRAM-based DNN accelerators using pruning and quantization are based ...
Comments