Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Cao, Yongqiang; Chen, Yang; Khosla, Deepak

doi:10.1007/s11263-014-0788-3

Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Published: 23 November 2014

Volume 113, pages 54–66, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yongqiang Cao¹,
Yang Chen¹ &
Deepak Khosla¹

16k Accesses
472 Citations
7 Altmetric
Explore all metrics

Abstract

Deep-learning neural networks such as convolutional neural network (CNN) have shown great potential as a solution for difficult vision problems, such as object recognition. Spiking neural networks (SNN)-based architectures have shown great potential as a solution for realizing ultra-low power consumption using spike-based neuromorphic hardware. This work describes a novel approach for converting a deep CNN into a SNN that enables mapping CNN to spike-based hardware architectures. Our approach first tailors the CNN architecture to fit the requirements of SNN, then trains the tailored CNN in the same way as one would with CNN, and finally applies the learned network weights to an SNN architecture derived from the tailored CNN. We evaluate the resulting SNN on publicly available Defense Advanced Research Projects Agency (DARPA) Neovision2 Tower and CIFAR-10 datasets and show similar object recognition accuracy as the original CNN. Our SNN implementation is amenable to direct mapping to spike-based neuromorphic hardware, such as the ones being developed under the DARPA SyNAPSE program. Our hardware mapping analysis suggests that SNN implementation on such spike-based hardware is two orders of magnitude more energy-efficient than the original CNN implementation on off-the-shelf FPGA-based hardware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Asifullah Khan, Anabia Sohail, … Aqsa Saeed Qureshi

A comprehensive review of Binary Neural Network

Article 30 March 2023

Chunyu Yuan & Sos S. Agaian

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Xia Zhao, Limin Wang, … Milan Parmar

References

Arthur, J.V., Merolla, P.A., Akopyan, F., Alvarez, R., Cassidy, A., Chandra, S., & Modha, D.S. (2012). Building block of a programmable neuromorphic substrate: A digital neurosynaptic core. In The 2012 International Joint Conference on Neural Networks (IJCNN), (pp. 1–8).
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.
Article MATH MathSciNet Google Scholar
Cao, Y., Grossberg, S., & Markowitz, J. (2011). How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex? Neural Networks, 24(10), 1050–1061.
Article MATH Google Scholar
Cao, Y., & Grossberg, S. (2012). Stereopsis and 3D surface perception by spiking neurons in laminar cortical circuits: A method for converting neural rate models into spiking models. Neural Networks, 26, 75–98.
Article Google Scholar
Cassidy, A.S., Merolla, P., Arthur, J.V., Esser, S.K., Jackson, B., Alvarez-Icaza, R., & Modha, D.S. (2013). Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores. In: The 2013 International Joint Conference on Neural Networks (IJCNN), (pp. 1–10).
Ciresan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3642–3649).
Collobert, R., Kavukcuoglu, K., & Farabet, C. (2011). Torch7: A matlab-like environment for machine learning. In: BigLearn, NIPS Workshop.
Cruz-Albrecht, J. M., Yung, M. W., & Srinivasa, N. (2012). Energy-efficient neuron, synapse and STDP integrated circuits. IEEE Transactions on Biomedical Circuits and Systems, 6(3), 246–256.
Article Google Scholar
Defense Advanced Research Projects Agency (DARPA) (2011), “Neovision2 Evaluation Results”, DISTAR Case Number 18620, December 22, 2011.
Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., & Culurciello, E. (2010), “Hardware Accelerated Convolutional Neural Networks for Synthetic Vision Systems”, In: IEEE International Symposium on Circuits and Systems (ISCAS’10), Paris, 2010.
Farabet, C. (2013), Towards real-time image understanding with convolutional neural networks, Ph.D. Thesis, Universit’e Paris-Est, Dec. 19, 2013 (http://pub.clement.farabet.net/thesis.pdf, accessed July 10, 2014).
Folowosele, F., Vogelstein, R. J., & Etienne-Cummings, R. (2011). Towards a cortical prosthesis: implementing a spike-based HMAX model of visual object recognition in silico. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 1, 516–525.
Article Google Scholar
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.
Article MATH Google Scholar
Fukushima, K. (1988). Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks, 1(2), 119–130.
Article Google Scholar
Fazl, A., Grossberg, S., & Mingolla, E. (2009). View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds. Cognitive Psychology, 58(1), 1–48.
Article Google Scholar
Grossberg, S., Markowitz, J., & Cao, Y. (2011). On the road to invariant recognition: explaining tradeoff and morph properties of cells in inferotemporal cortex using multiple-scale task-sensitive attentive learning. Neural Networks, 24(10), 1036–1049.
Article Google Scholar
Grossberg, S., & Huang, T.-R. (2009). ARTSCENE: A neural system for natural scene classification. Journal of Vision, 9(4), 6:1–19, doi:10.1167/9.4.6.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504–507.
Article MATH MathSciNet Google Scholar
Hinton, G. E., Osindero, S., & Teh, Y. (2006a). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.
Article MATH MathSciNet Google Scholar
Hinton, G. E., Osindero, S., Welling, M., & Teh, Y. (2006b). Unsupervised discovery of non-linear structure using contrastive backpropagation. Cognitive Science, 30(4), 725–731.
Article Google Scholar
Ho, N. (2013). Convolutional neural network and CIFAR-10, part-3, http://nghiaho.com/?p=1997, July 7, 2013.
Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3), 574–591.
Article Google Scholar
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology, 160(1), 106–154.
Article Google Scholar
Itti, L. (2013). Neovision2 annotated video datasets. http://ilab.usc.edu/neo2/dataset/, accessed July 10, 2014.
Jarrett, K., Kavukcuoglu, K., Ranzato, M. A., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In 12th International Conference on Computer Vision (ICCV) (pp. 2146–2153).
Khosla, D., Chen, Y., Kim, K., Cheng, S.Y., Honda, A.L., & Zhang, L. (2013a). A neuromorphic system for multi-object detection and classification. Proc. SPIE. 8745, Signal Processing, Sensor Fusion, and Target Recognition XXII :87450X.
Khosla, D., Chen, Y., Huber, D., Van Buer, D., & Kim, K. (2013b). Real-time, low-power neuromorphic hardware for autonomous object recognition. Proc. SPIE 8713, Airborne Intelligence, Surveillance, Reconnaissance (ISR) Systems and Applications X: 871313.
Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images, MSc thesis, Univ. of Toronto, Dept. of Computer Science, 2009. (Also see http://www.cs.toronto.edu/~kriz/cifar.html for CIFAR-10 image data sets).
Krizhevskey, A. (2012). Cuda-ConvNet, http://code.google.com/p/cuda-convnet/, accessed July 10, 2014.
Krizhevsky, A., Sutskever, I, & Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (pp. 1106–1114).
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Lin, M., Chen, Q., & Yan, S. (2014). Network in Network, in International Conference on Learning Representation (ICLR2014), Banff, Canada, April 14–16, 2014.
Merolla, P., Arthur, J., Akopyan, F., Imam, N., Manohar, R., & Modha, D.S. (2011). A digital neurosynaptic core using embedded crossbar memory with 45pj per spike in 45nm. In: Custom Integrated Circuits Conference (CICC), 2011 IEEE (pp. 1–4).
Masquelier, T., & Thorpe, S. J. (2007). Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Computational Biology, 3, 0247–0257.
Article Google Scholar
Perez-Carrasco, J.A., Serrano, C., Acha, B., Serrano-Gotarredona, T., & Linares-Barranco, B. (2010). Spike-based convolutional network for real-time processing. In: 2010 International Conference on Pattern Recognition (pp. 3085–3088).
Ranzato, M. A., Huang, F. J., Boureau, Y. L., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07) (pp. 1–8).
Razavian, A.S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN Features off-the-shelf: an Astounding Baseline for Recognition, http://arxiv.org/abs/1403.6382, DeepVision CVPR 2014 Workshop, Columbus, Ohio, June 28, 2014.
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025.
Article Google Scholar
Rumelhart, D. E., Hintont, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
Article Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. & LeCun, Y. (2014). “OverFeat: Integrated recognition, localization and detection using convolutional networks”,In International Conference on Learning Representations (ICLR 2014), April 2014.
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 411–426.
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the Defense Advanced Research Projects Agency Cognitive Technology Threat Warning System (CT2WS) and SyNAPSE programs (contracts W31P4Q-08-C-0264 and HR0011-09-C-0001). The views expressed in this document are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government. We would like to thank Dr. Clement Farabet of New York University for providing the initial CNN structure on which the CNN model outlined in Fig. 1 is based; and the anonymous reviewers for their invaluable comments and recommendations that led to this revised manuscript.

Author information

Authors and Affiliations

HRL Laboratories, LLC, 3011 Malibu Canyon Road, Malibu, CA, 90265-4797, USA
Yongqiang Cao, Yang Chen & Deepak Khosla

Authors

Yongqiang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Khosla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Chen.

Additional information

Communicated by Marc’Aurelio Ranzato, Geoffrey E. Hinton, and Yann LeCun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, Y., Chen, Y. & Khosla, D. Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition. Int J Comput Vis 113, 54–66 (2015). https://doi.org/10.1007/s11263-014-0788-3

Download citation

Received: 09 February 2014
Accepted: 10 November 2014
Published: 23 November 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11263-014-0788-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Abstract

Access this article

Similar content being viewed by others

A survey of the recent architectures of deep convolutional neural networks

A comprehensive review of Binary Neural Network

A review of convolutional neural networks in computer vision

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition

Abstract

Access this article

Similar content being viewed by others

A survey of the recent architectures of deep convolutional neural networks

A comprehensive review of Binary Neural Network

A review of convolutional neural networks in computer vision

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation