Skip to main content
Log in

Evaluation of semi-supervised learning method on action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Action recognition is one of the most difficult problems in computer vision and multimedia areas, since both spatial information and spatiotemporal semantic meaning should be taken into consideration. Moreover, the noisy and weakly annotated information make this task even harder. Nowadays, instead of the traditional features and classifiers, a lot of new attempts have made the task of action recognition promising. Noticing that there is no work on comparison of different combination of pooling and semi-supervised learning method under the same experiment setting, it would be interesting to apply different combination of pooling and semi-supervised learning method on both the synthetic and realistic action recognition datasets to see which combination or method performs better. In summary, we can obtain the following conclusions based on our experiments. Firstly, Second Order Pooling (Carreira et al. 2012) is worse than the traditional Bag of Words (Schmid and Mohr 1997; Dance et al. 2004) regarding to the overall performance in some dataset, but is a good way to speed up the coding stage of video classification with little sacrifice of performance. Secondly, Semi-supervised Hierarchical Regression Algorithm (MLHR) and Manifold Regularized Least Square Regression (MRLS) (Belkin et al. J Mach Learn Res 12:2399–2434, 2006) is better than some of the supervised learning methods (χ 2-SVM, SVM-2K (Farquhar et al. 2006)) in the real world action recognition problems which shares little available annotated information. Thirdly, for KTH, UCF50 and HMDB dataset, late fusion doesn’t necessarily improve the performance. In comparison, MLHR, SVM-2K and Multi-kernel Learning is a more natural way to deal with multi-feature problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labled and unlabeled examples. J Mach Learn Res 12:2399–2434

    MathSciNet  Google Scholar 

  2. Carreira J, Caseiro R, Batista J, Sminchisescu C (2012) Semantic segmentation with second-order pooling. In: ECCV

  3. Chen M, Hauptmann A (2009) Mosift, recognizing human actions in surveillance videos

  4. Dance C, Willamowski J, Fan L, Bray C, Csurka G (2004) Visual categorization with bags of keypoints. In: ECCV SLCV workshop

  5. Farquhar JDR, Meng H, Szedmak S, Hardoon DR, Shawe-taylor J (2006) Two view learning: svm-2k, theory and practice. In: Advances in neural information processing systems. MIT Press

  6. Han Y, Xu Z, Ma Z, Huang Z (2013) Image classification with manifold learning for out-of-sample data. Signal Process 93(8):2169–2177

    Article  Google Scholar 

  7. Han Y, Yang Y, Ma Z, Shen H, Sebe N, Zhou X (2014) Image attibute adaptation. IEEE Trans Multimed (IEEE T-MM). doi:10.1109/TMM.2014.2306092

  8. Han Y, Zhang J, Xu Z, Yu S (2013) Discriminative multi-task feature selection. In: AAAI

  9. Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3):321–377

    Article  MATH  MathSciNet  Google Scholar 

  10. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: ICCV

  11. Lan Z, Bao L, Yu S, Liu W, Hauptmann A (2012) Double fusion for multimedia event detection. In: ACM MM

  12. Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state-of-the-art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1–19

    Article  Google Scholar 

  13. Ma Z, Nie F, Yang Y, Uijlings J, Sebe N, Hauptmann AG (2012) Discriminating joint feature analysis for multimedia data understanding. IEEE Trans Multimed (TMM) 14(6):1662–1672

    Article  Google Scholar 

  14. Ma Z, Yang Y, Cai Y, Sebe N, Hauptmann A (2012) Transfer knowledge adaptation for ad hoc multimedia event detection with few examplars. In: ACM MM

  15. Reddy K, Shah M (2012) Recognizing 50 human action categories of web videos. In: MVAP

  16. Schmid C, Mohr R (1997) Local grayvalue invariants for image retrieval. In: TPAMI

  17. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: ICPR

  18. Snoek C, Worring M, Smeulders A (2005) Early versus late fusion in semantic video analysis. In: ACM MM

  19. Sonnenburg S, Rtsch G, Schfer C, Schlkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MATH  MathSciNet  Google Scholar 

  20. Vinokourov A, Shawe-taylor J, Cristianini N (2002) Inferring a semantic representation of text via cross-language correlation analysis

  21. Wang H, Kläser A, Schmid C, Liu C (2011) Action recognition by dense trajectories. In: CVPR

  22. Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann A (2013) Feature weighting via optimal thresholding for video analysis. In: ICCV

  23. Yan R (2006) Probabilistic latent query analysis for combining multiple retrieval sources. In: Proceedings of the 29th international ACM SIGIR conference. ACM Press, pp 324–331

  24. Yan Y, Xu Z, Liu G, Ma Z, Sebe N (2013) Glocal structural feature selection with sparsity for multimedia data understanding. In: ACM MM

  25. Yang Y, Ma Z, Hauptmann A, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimedia 15(3):321–377

    Google Scholar 

  26. Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Machine Intell 34(4):723–742

    Article  Google Scholar 

  27. Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann A (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimedia 15(3):572–581

    Article  Google Scholar 

  28. Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: ACM MM

  29. Yang Y, Zhuang Y, Wu F, Pan Y (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446

    Article  Google Scholar 

  30. Zhan Y, Sun J, Niu D, Mao Q, Fan J (2014) A semi-supervised incremental learning method based on adaptive probabilistic hypergraph for video semantic detection. Multimed Tools Appl

  31. Zhou D, Bousquet O, Lal TN, Weston J, Schlkopf B (2004) Learning with local and global consistency. In: Advances in neural information processing systems, vol 16. MIT Press, pp 321–328

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haoquan Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, H., Yan, Y., Xu, S. et al. Evaluation of semi-supervised learning method on action recognition. Multimed Tools Appl 74, 523–542 (2015). https://doi.org/10.1007/s11042-014-1936-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1936-z

Keywords

Navigation