ConvNets-based action recognition from skeleton motion maps

Chen, Yanfang; Wang, Liwei; Li, Chuankun; Hou, Yonghong; Li, Wanqing

doi:10.1007/s11042-019-08261-1

ConvNets-based action recognition from skeleton motion maps

Published: 07 November 2019

Volume 79, pages 1707–1725, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yanfang Chen¹,
Liwei Wang¹,
Chuankun Li ORCID: orcid.org/0000-0001-9427-8780¹,
Yonghong Hou¹ &
…
Wanqing Li²

640 Accesses
20 Citations
Explore all metrics

Abstract

With the advance of deep learning, deep learning based action recognition is an important research topic in computer vision. The skeleton sequence is often encoded into an image to better use Convolutional Neural Networks (ConvNets) such as Joint Trajectory Maps (JTM). However, this encoding method cannot effectively capture long temporal information. In order to solve this problem, This paper presents an effective method to encode spatial-temporal information into color texture images from skeleton sequences, referred to as Temporal Pyramid Skeleton Motion Maps (TPSMMs), and Convolutional Neural Networks (ConvNets) are applied to capture the discriminative features from TPSMMs for human action recognition. The TPSMMs not only capture short temporal information, but also embed the long dynamic information over the period of an action. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture and SYSU-3D datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Article 29 May 2020

Jin-Gong Jia, Yuan-Feng Zhou, … Cai-Ming Zhang

A Novel Multi-feature Skeleton Representation for 3D Action Recognition

References

Chaudhry R, Ofli F, Kurillo G, Bajcsy R, Vidal R (2013) Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 471–478
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 168–172
Chollet F (2015) Keras. https://github.com/fchollet/keras
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp 1110–1118
Du Y, Fu Y, Wang L (2016) Skeleton based action recognition with convolutional neural network. In: Proc. Asian Conference on Pattern Recognition(IAPR), pp 579–583
Fothergill S, Mentis HM, Nowozin S, Kohli P (2012) Instructing people for training gestural interactive systems. In: ACM Conference on Computer-Human Interaction (ACM HCI), pp 1737–1746
Gowayyed MA, Torki M, Hussein ME, El-Saban M (2013) Histogram of oriented displacements (HOD) Describing trajectories of human joints for action recognition. In: IJCAI, pp 1351–1357
Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circ Syst Video Technol 28(3):807–811
Article Google Scholar
Hu J-F, Zheng W-S, Lai J, Zhang J (2015) Jointly learning heterogeneous features for RGB-D activity recognition. In CVPR, pages 5344–5352
Hu J-F, Zheng W-S, Ma L, Wang G, Lai J (2016) Real-time rgb-d activity prediction by soft regression. In: ECCV, pp 280–296
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI, pp 2466–2472
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: International Joint Conference on Artificial Intelligence, pp 639–44
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proc. Annual Conference on Neural Information Processing Systems (NIPS), pp 1106–1114
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1012–1020
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: CVPRW, pp 9–14
Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Processing Letters, pp 624–628
Article Google Scholar
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Proceedings of European Conference on Computer Vision, pp 816–833
Chapter Google Scholar
Liu J, Akhtar N, Mian A (2017) Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition. arXiv:1711.05941
Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2017) Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021
Article Google Scholar
Liu J, Wang G, Hu P, Duan LY, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3671–3680
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn, pp 346–362
Lu X, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: CVPRW, pp 20–27
Lu C, Jia J, Tang C-K (2014) Range-sample depth feature for action recognition. In: CVPR, pp 772–779
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2012) Sequence of the most informative joints (smij) A new representation for human skeletal action recognition. In: Computer Vision and Pattern Recognition Workshops, pp 24–38
Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp 716–723
Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, Long short-term memory, fully connected deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 4580–4584
Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+ D: A large scale dataset for 3D human activity analysis. In CVPR, pages 1010–1019
Tang Y, Yi T, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5323–5332
Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: ICCV, pp 4041–4049
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp 588–595
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp 1290–1297
Wang P, Li W, Ogunbona P, Gao Z, Zhang H (2014) Mining mid-level features for action recognition based on effective skeleton representation. In: DICTA, pp 1–8
Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona PO (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACM MM, pp 1119–1122
Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Hum-Mach Syst 46(4):498–509
Article Google Scholar
Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp 102–106
Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. Proceedings of the 24th ACM international conference on Multimedia. ACM, pp 102–106
Xie C, Li C, Zhang B, Chen C, Han J, Zou C, Liu J (2018) Memory attention networks for skeleton-based action recognition. arXiv:1804.08254
Xu Y, Qi Z, Zhang D (2011) Combine crossing matching scores with conventional matching scores for bimodal biometrics and face and palmprint recognition experiments. Neurocomputing 74(18):3946–3952
Article Google Scholar
Xu Y, Zhu X, Li Z, Liu G, Lu Y, Liu H (2013) Using the original and symmetrical facetraining samples to perform representation based two-step face recognition. Pattern Recogn 46(4):1151–1158
Article Google Scholar
Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: CVPR, pp 804–811
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM MM, pp 1057–1060
Yang S, Yuan C, Hu W, Ding X (2014) A hierarchical model based on latent dirichlet allocation for action recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 2613–2618
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: IEEE International Conference on Computer Vision, pp 2136–2145
Zhang S, Liu X, Xiao J (2017) On geometric features for skeleton-based action recognition using multilayer lstm networks. In: Applications of Computer Vision, pp 148–157
Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended lc-ksvd for action recognition. In: DICTA. IEEE
Zhu W, Lan C, Xing J, Zeng W, Li Y, Li S, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM, networks In Proc. AAAI Conference on Artificial Intelligence, pp 3697–3704
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant No.61571325 and in part by the Key Projects in the Tianjin Science and Technology Pillar Program under Grant No.16ZXHLGX00190.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Tianjin University, Tianjin, People’s Republic of China
Yanfang Chen, Liwei Wang, Chuankun Li & Yonghong Hou
Advanced Multimedia Research Lab, University of Wollongong, Wollongong, Australia
Wanqing Li

Authors

Yanfang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chuankun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Hou
View author publications
You can also search for this author in PubMed Google Scholar
Wanqing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuankun Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Wang, L., Li, C. et al. ConvNets-based action recognition from skeleton motion maps. Multimed Tools Appl 79, 1707–1725 (2020). https://doi.org/10.1007/s11042-019-08261-1

Download citation

Received: 16 December 2018
Revised: 14 July 2019
Accepted: 16 September 2019
Published: 07 November 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11042-019-08261-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ConvNets-based action recognition from skeleton motion maps

Abstract

Access this article

Similar content being viewed by others

A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

A Novel Multi-feature Skeleton Representation for 3D Action Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ConvNets-based action recognition from skeleton motion maps

Abstract

Access this article

Similar content being viewed by others

A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

A Novel Multi-feature Skeleton Representation for 3D Action Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation