Abstract
Optical flow is widely used in human action recognition. However, the influence of complex background on optical flow often leads to low recognition efficiency. To deal with this issue, an optical flow-based physical feature-driven action recognition framework is proposed in this paper. We first calculate the original dense optical flow field. Then, for reducing computational burden, joint action relevance that can eliminate the pseudo-optical flow in complex background is developed. After that, a more state flow field is obtained by local spatial–temporal thermal diffusion processing. On this basis, we design a feature descriptor that takes the divergence, curl and gradient features of flow field into consideration. Finally, we adopt Fisher vector to encode descriptors for classification. Experimental on HMDB51, KTH and UCF101 datasets proves that actions in complex background can be recognized accurately by the proposed framework, which outperforms the already developed methods.
Similar content being viewed by others
References
Post, FH, Vrolijk, B, Hauser, H, Laramee, RS, Doleisch, H (2003) The state of the art in flow visualisation: Feature extraction and tracking. In: Computer Graphics Forum, vol. 22, Wiley Online Library, pp 775–792
Yiying T, Santiago L, Hirani Anil N, Mathieu D (2003) Discrete multiscale vector field decomposition. ACM Trans Graphics (TOG) 22(3):445–452
Wang, H, Kläser, A (2011) Cordelia Schmid, and Cheng-Lin Liu. Action recognition by dense trajectories. In: CVPR 2011, pp 3169–3176. IEEE
Wang, H, Schmid, C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3551–3558
Huang, Q, Sun, S, Wang, F (2017) A compact pairwise trajectory representation for action recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1767–1771. IEEE
Son NX, Abdel-Illah M, Phuong NT (2019) Hierarchical gaussian descriptor based on local pooling for action recognition. Mach Vis Appl 30(2):321–343
Lucena M, PérezdelaBlanca N, JoséManuel F (2012) Human action recognition based on aggregated local motion estimates. Mach Vis Appl 23(1):135–150
Aichun Z, Qianyu W, Ran C, Tian W, Wenlong H, Gang H, Hichem S (2020) Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional lstm-cnn. Neurocomputing 414:90–100
Zhenxing Z, Gaoyun A, Dapeng W, Qiuqi R (2019) Spatial-temporal pyramid based convolutional neural network for action recognition. Neurocomputing 358:446–455
Paul IE, Mohan CK (2016) Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recogn 59:199–212
Zhigang T, Wei X, Qianqing Q, Ronald P, Veltkamp Remco C, Baoxin L, Junsong Y (2018) Multi-stream cnn: Learning representations based on human-related regions for action recognition. Pattern Recogn 79:32–43
Wang, J, Xu, Z, Liu, Y (2013) Texture-based segmentation for extracting image shape features. In: 2013 19th International Conference on Automation and Computing, pp 1–6. IEEE
Sui, H, Song, Z, Gao, D, Hua, L (2017) Automatic image registration based on shape features and multi-scale image segmentation. In: 2017 2nd International Conference on Multimedia and Image Processing (ICMIP), pp 118–122. IEEE
Akbari, H, Kalkhoran, HM, Fatemizadeh, E (2015) A robust fcm algorithm for image segmentation based on spatial information and total variation. In: 2015 9th Iranian Conference on Machine Vision and Image Processing (MVIP), pp 180–184. IEEE
Li, Y, Zhang, J, Gao, P, Jiang, L, Chen M (2018) Grab cut image segmentation based on image region. In: 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), pp 311–315. IEEE
Shaoqing R, Kaiming H, Ross G, Xiangyu Z, Jian S (2016) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481
Girshick, R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448
Hsu, S-C, Wang, Y-W, Huang, C-L (2018) Human object identification for human-robot interaction by using fast r-cnn. In: 2018 Second IEEE International Conference on Robotic Computing (IRC), pp 201–204. IEEE
Wang, K, Dong, Y, Bai, H, Zhao, Y, Hu, K (2016) Use fast r-cnn and cascade structure for face detection. In: 2016 Visual Communications and Image Processing (VCIP), pp 1–4. IEEE
Ding P, Zhang J, Zhou H, Zou X, Wang M (2020) Pyramid context learning for object detection. J Supercomput 76:9374–9387
Chen, J-L, Lin, Z-Y, Wan, Y-C, Chen, L-G (2016) Accelerated local feature extraction in a reuse scheme for efficient action recognition. In: 2016 IEEE International Conference on Image Processing (ICIP), pp 296–299. IEEE
Liu, L, Hu, F, Zhao. J (2016) Action recognition based on features fusion and 3d convolutional neural networks. In: 2016 9th International Symposium on Computational Intelligence and Design (ISCID), volume 1, pages 178–181. IEEE
Huynh-The, T, Hua, C-H, Tu, NA, Kim, J-W, Kim, S-H, Kim, D-S (2020) 3d action recognition exploiting hierarchical deep feature fusion model. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp 1–3. IEEE
Fatih Z (2019) Efficient deep feature selection for remote sensing image recognition with fused deep learning architectures. J Supercomput 4:1–19
Xiaojiang P, LiMin W, Cai Z, Qiao Yu, Peng Q (2013) Hybrid super vector with improved dense trajectories for action recognition. ICCV Workshops 13:109–125
Maria CJ, Joan C (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recogn 81:443–455
Yang Y, Ao L, Xiaofeng Z (2020) Human action recognition based on action relevance weighted encoding. Signal Process: Image Commun 80:115640
Xiao, X, Hu, H, Wang, W (2017) Trajectories-based motion neighborhood feature for human action recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 4147–4151. IEEE
Zhan, Y, Ma, L, Yang, C (2017) Pseudo trajectories eliminating and pyramid clustering: Optimizing dense trajectories for action recognition. In: 2017 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp 62–67. IEEE
Ni, B, Moulin, P, Yang, X, Yan, S (2015) Motion part regularization: Improving action recognition via trajectory selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3698–3706
Hughes Roger L (2003) The flow of human crowds. Annual Rev Fluid Mech 35(1):169–182
Francisco M-G, Miguel L, Ignacio G-F, Fernando F (2017) Modeling, evaluation, and scale on artificial pedestrians: a literature review. ACM Comput Surv (CSUR) 50(5):1–35
Xiao-Han C, Jian-Huang L (2019) Detecting abnormal crowd behaviors based on the div-curl characteristics of flow fields. Pattern Recogn 88:342–355
Qin B, Li B, Wang QJ (2010) Topological simplification of planar vector fields based on physical feature. Periodical of Ocean University of China 2:18
Lijie X, Teng-Yok L, Han-Wei S (2010) An information-theoretic framework for flow visualization. IEEE Trans Visual Comput Graphics 16(6):1216–1224
Jorge S, Florent P, Thomas M, Jakob V (2013) Image classification with the fisher vector: Theory and practice. Int J Comput Vis 105(3):222–245
Schuldt, C, Laptev, I, Caputo, B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3, pp 32–36. IEEE
Soomro, K, Zamir, AR, Shah, M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402
Kuehne, H, Jhuang, H, Garrote, E, Poggio, T, Serre, T (2011) Hmdb: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp 2556–2563. IEEE
Farnebäck, G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis, vol 2749, pp 363–370. Springer
Min J, Na P, Jun K (2020) Spatial-temporal saliency action mask attention network for action recognition. J Vis Commun Image Represent 71:102846
Zhongke L, Haifeng H, Junxuan Z, Chang Y (2019) Residual attention unit for action recognition. Comput Vis Image Understanding 189:102821
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xia, L., Ma, W. Human action recognition using high-order feature of optical flows. J Supercomput 77, 14230–14251 (2021). https://doi.org/10.1007/s11227-021-03827-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03827-z