Unsupervised Video Hashing via Deep Neural Network

Ma, Chao; Gu, Yun; Gong, Chen; Yang, Jie; Feng, Deying

doi:10.1007/s11063-018-9812-x

Unsupervised Video Hashing via Deep Neural Network

Published: 17 March 2018

Volume 47, pages 877–890, (2018)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Chao Ma¹,
Yun Gu¹,
Chen Gong²,
Jie Yang¹ &
…
Deying Feng³

617 Accesses
9 Citations
Explore all metrics

Abstract

Hashing is a common solution for content-based multimedia retrieval by encoding high-dimensional feature vectors into short binary codes. Previous works mainly focus on image hashing problem. However, these methods can not be directly used for video hashing, as videos contain not only spatial structure within each frame, but also temporal correlation between successive frames. Several researchers proposed to handle this by encoding the extracted key frames, but these frame-based methods are time-consuming in real applications. Other researchers proposed to characterize the video by averaging the spatial features of frames and then the existing hashing methods can be adopted. Unfortunately, the sort of “video” features does not take the correlation between frames into consideration and may lead to the loss of the temporal information. Therefore, in this paper, we propose a novel unsupervised video hashing framework via deep neural network, which performs video hashing by incorporating the temporal structure as well as the conventional spatial structure. Specially, the spatial features of videos are obtained by utilizing convolutional neural network, and the temporal features are established via long-short term memory. After that, the time series pooling strategy is employed to obtain the single feature vector for each video. The obtained spatio-temporal feature can be applied to many existing unsupervised hashing methods. Experimental results on two real datasets indicate that by employing the spatio-temporal features, our hashing method significantly improves the performance of existing methods which only deploy the spatial features, and meanwhile obtains higher mean average precision compared with the state-of-the-art video hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Parul Saini, Krishan Kumar, … Alok Negi

A comprehensive survey on model compression and acceleration

Article 08 February 2020

Tejalal Choudhary, Vipul Mishra, … Jagannathan Sarangapani

Learning a Deep Convolutional Network for Image Super-Resolution

References

Cao L, Li Z, Mu Y, Chang SF (2012) Submodular video hashing: a unified framework towards video pooling and indexing. In: Proceedings of the 20th ACM international conference on Multimedia. ACM, pp 299–308
Carreira-Perpinán MA, Raziperchikolaei R (2015) Hashing with binary autoencoders. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 557–566
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, 2015. CVPR 2015, pp 2625–2634
Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: VLDB, vol 99, pp 518–529
Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 817–824
Guo Z, Gao L, Song J, Xu X, Shao J, Shen HT (2016) Attention-based LSTM with semantic consistency for videos captioning. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 357–361
Hao Y, Mu T, Goulermas JY, Jiang J, Hong R, Wang M (2017) Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Trans Image Process 26(11):5531–5544
Article MathSciNet Google Scholar
Heo JP, Lee Y, He J, Chang SF, Yoon SE (2012) Spherical hashing. In: IEEE conference on computer vision and pattern recognition, 2012. CVPR 2012. IEEE, pp 2957–2964
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. ArXiv preprint arXiv:1408.5093
Korman S, Avidan S (2011) Coherency sensitive hashing. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 1607–1614
Korman S, Avidan S (2016) Coherency sensitive hashing. IEEE Trans Pattern Anal Mach Intell 38(6):1099–1112
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. ArXiv preprint arXiv:1511.03855
Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2074–2081
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Ma C, Gu Y, Liu W, Yang J, He X (2016) Unsupervised video hashing by exploiting spatio-temporal feature. In: International conference on neural information processing. Springer, pp 511–518
Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 353–360
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175
Article MATH Google Scholar
Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Advances in neural information processing systems, pp 1509–1517
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Article MathSciNet Google Scholar
Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978
Article Google Scholar
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Shen F, Shen C, Shi Q, Van Den Hengel A, Tang Z (2013) Inductive hashing on manifolds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1562–1569
Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv preprint arXiv:1409.1556
Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 423–432
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. ArXiv preprint arXiv:1212.0402
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406
Article Google Scholar
Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 13:1
Google Scholar
Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24(4):510–514
Article Google Scholar
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. Curran Associates, Inc., New York, pp 1753–1760
Google Scholar
Wu G, Liu L, Guo Y, Ding G, Han J, Shen J, Shao L (2017) Unsupervised deep video hashing with balanced rotation. In: IJCAI
Wu X, Hauptmann AG, Ngo CW (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 218–227
Ye G, Liu D, Wang J, Chang SF (2013) Large-scale video hashing via structure learning. In: Proceedings of the IEEE international conference on computer vision, pp 2272–2279
Yu FX, Kumar S, Gong Y, Chang SF (2014) Circulant binary embedding. In: Computer Science, pp 946–954
Zaremba W, Sutskever I (2014) Learning to execute. ArXiv preprint arXiv:1410.4615
Zhang H, Wang M, Hong R, Chua TS (2016) Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 781–790
Zhang P, Zhang W, Li WJ, Guo M (2014) Supervised hashing with latent factor models. In: International ACM SIGIR conference on research and development in information retrieval, pp 173–182
Zhang Y, Zhao D, Sun J, Zou G, Li W (2016) Adaptive convolutional neural network and its application in face recognition. Neural Process Lett 43(2):389–399
Article Google Scholar

Download references

Acknowledgements

This research is partly supported by NSFC, China (No: 61572315, 6151101179, 61603171, 61602246), 973 Plan, China (No. 2015CB856004), Committee of Science and Technology, Shanghai, China (No. 17JC1403000), NSF of Jiangsu Province (No: BK20171430), and the “Six Talent Peak” Project of Jiangsu Province of China (No. DZXX-027).

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
Chao Ma, Yun Gu & Jie Yang
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Chen Gong
Liaocheng University, Liaocheng, China
Deying Feng

Authors

Chao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yun Gu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Gong
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Deying Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, C., Gu, Y., Gong, C. et al. Unsupervised Video Hashing via Deep Neural Network. Neural Process Lett 47, 877–890 (2018). https://doi.org/10.1007/s11063-018-9812-x

Download citation

Published: 17 March 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11063-018-9812-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Unsupervised Video Hashing via Deep Neural Network

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

A comprehensive survey on model compression and acceleration

Learning a Deep Convolutional Network for Image Super-Resolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised Video Hashing via Deep Neural Network

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

A comprehensive survey on model compression and acceleration

Learning a Deep Convolutional Network for Image Super-Resolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation