Skip to main content
Log in

ConvNets-based action recognition from skeleton motion maps

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the advance of deep learning, deep learning based action recognition is an important research topic in computer vision. The skeleton sequence is often encoded into an image to better use Convolutional Neural Networks (ConvNets) such as Joint Trajectory Maps (JTM). However, this encoding method cannot effectively capture long temporal information. In order to solve this problem, This paper presents an effective method to encode spatial-temporal information into color texture images from skeleton sequences, referred to as Temporal Pyramid Skeleton Motion Maps (TPSMMs), and Convolutional Neural Networks (ConvNets) are applied to capture the discriminative features from TPSMMs for human action recognition. The TPSMMs not only capture short temporal information, but also embed the long dynamic information over the period of an action. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture and SYSU-3D datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Chaudhry R, Ofli F, Kurillo G, Bajcsy R, Vidal R (2013) Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 471–478

  2. Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 168–172

  3. Chollet F (2015) Keras. https://github.com/fchollet/keras

  4. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp 1110–1118

  5. Du Y, Fu Y, Wang L (2016) Skeleton based action recognition with convolutional neural network. In: Proc. Asian Conference on Pattern Recognition(IAPR), pp 579–583

  6. Fothergill S, Mentis HM, Nowozin S, Kohli P (2012) Instructing people for training gestural interactive systems. In: ACM Conference on Computer-Human Interaction (ACM HCI), pp 1737–1746

  7. Gowayyed MA, Torki M, Hussein ME, El-Saban M (2013) Histogram of oriented displacements (HOD) Describing trajectories of human joints for action recognition. In: IJCAI, pp 1351–1357

  8. Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circ Syst Video Technol 28(3):807–811

    Article  Google Scholar 

  9. Hu J-F, Zheng W-S, Lai J, Zhang J (2015) Jointly learning heterogeneous features for RGB-D activity recognition. In CVPR, pages 5344–5352

  10. Hu J-F, Zheng W-S, Ma L, Wang G, Lai J (2016) Real-time rgb-d activity prediction by soft regression. In: ECCV, pp 280–296

  11. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708

  12. Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI, pp 2466–2472

  13. Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: International Joint Conference on Artificial Intelligence, pp 639–44

  14. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proc. Annual Conference on Neural Information Processing Systems (NIPS), pp 1106–1114

  16. Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1012–1020

  17. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: CVPRW, pp 9–14

  18. Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Processing Letters, pp 624–628

    Article  Google Scholar 

  19. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Proceedings of European Conference on Computer Vision, pp 816–833

    Chapter  Google Scholar 

  20. Liu J, Akhtar N, Mian A (2017) Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition. arXiv:1711.05941

  21. Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2017) Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans Pattern Anal Mach Intell 40(12):3007–3021

    Article  Google Scholar 

  22. Liu J, Wang G, Hu P, Duan LY, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3671–3680

  23. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn, pp 346–362

  24. Lu X, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: CVPRW, pp 20–27

  25. Lu C, Jia J, Tang C-K (2014) Range-sample depth feature for action recognition. In: CVPR, pp 772–779

  26. Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2012) Sequence of the most informative joints (smij) A new representation for human skeletal action recognition. In: Computer Vision and Pattern Recognition Workshops, pp 24–38

  27. Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp 716–723

  28. Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, Long short-term memory, fully connected deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 4580–4584

  29. Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+ D: A large scale dataset for 3D human activity analysis. In CVPR, pages 1010–1019

  30. Tang Y, Yi T, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5323–5332

  31. Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: ICCV, pp 4041–4049

  32. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp 588–595

  33. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp 1290–1297

  34. Wang P, Li W, Ogunbona P, Gao Z, Zhang H (2014) Mining mid-level features for action recognition based on effective skeleton representation. In: DICTA, pp 1–8

  35. Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona PO (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACM MM, pp 1119–1122

  36. Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Hum-Mach Syst 46(4):498–509

    Article  Google Scholar 

  37. Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp 102–106

  38. Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. Proceedings of the 24th ACM international conference on Multimedia. ACM, pp 102–106

  39. Xie C, Li C, Zhang B, Chen C, Han J, Zou C, Liu J (2018) Memory attention networks for skeleton-based action recognition. arXiv:1804.08254

  40. Xu Y, Qi Z, Zhang D (2011) Combine crossing matching scores with conventional matching scores for bimodal biometrics and face and palmprint recognition experiments. Neurocomputing 74(18):3946–3952

    Article  Google Scholar 

  41. Xu Y, Zhu X, Li Z, Liu G, Lu Y, Liu H (2013) Using the original and symmetrical facetraining samples to perform representation based two-step face recognition. Pattern Recogn 46(4):1151–1158

    Article  Google Scholar 

  42. Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: CVPR, pp 804–811

  43. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM MM, pp 1057–1060

  44. Yang S, Yuan C, Hu W, Ding X (2014) A hierarchical model based on latent dirichlet allocation for action recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 2613–2618

  45. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: IEEE International Conference on Computer Vision, pp 2136–2145

  46. Zhang S, Liu X, Xiao J (2017) On geometric features for skeleton-based action recognition using multilayer lstm networks. In: Applications of Computer Vision, pp 148–157

  47. Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended lc-ksvd for action recognition. In: DICTA. IEEE

  48. Zhu W, Lan C, Xing J, Zeng W, Li Y, Li S, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM, networks In Proc. AAAI Conference on Artificial Intelligence, pp 3697–3704

  49. Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant No.61571325 and in part by the Key Projects in the Tianjin Science and Technology Pillar Program under Grant No.16ZXHLGX00190.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuankun Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Wang, L., Li, C. et al. ConvNets-based action recognition from skeleton motion maps. Multimed Tools Appl 79, 1707–1725 (2020). https://doi.org/10.1007/s11042-019-08261-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08261-1

Keywords

Navigation