Skip to main content

Quasi-Online Detection of Take and Release Actions from Egocentric Videos

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2023 (ICIAP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14234))

Included in the following conference series:

  • 572 Accesses

Abstract

In this paper, we considered the problem of detecting object take and release actions from untrimmed egocentric videos in an industrial domain. Rather than requiring that actions are recognized as they are observed, in an online fashion, we propose a quasi-online formulation in which take and release actions can be recognized shortly after they are observed, but keeping a low latency. We contribute a problem formulation, an evaluation protocol, and a baseline approach that relies on state-of-the-art components. Experiments on ENIGMA, a newly collected dataset of egocentric untrimmed videos of human-object interactions in an industrial scenario, and on THUMOS’14 show that the proposed approach achieves promising performance on quasi-online take/release action recognition and outperforms methods for online detection of action start on THUMOS’14 by \(+8.64\%\) when an average latency of 2.19s is allowed. Code and supplementary material are available at https://github.com/fpv-iplab/Quasi-Online-Detection-Take-Release.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html.

  2. 2.

    For detailed statistics regarding the dataset, please refer to the supplementary material.

  3. 3.

    See the supplementary material for a study on the influence of the different parameters on THUMOS.

References

  1. Besari, A.R.A., Saputra, A.A., Chin, W.H., Kubota, N., et al.: Feature-based egocentric grasp pose classification for expanding human-object interactions. In: 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), pp. 1–6. IEEE (2021)

    Google Scholar 

  2. Damen, D., et al.: Scaling egocentric vision: the dataset. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 753–771. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_44

    Chapter  Google Scholar 

  3. De Geest, R., Gavves, E., Ghodrati, A., Li, Z., Snoek, C., Tuytelaars, T.: Online action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 269–284. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_17

    Chapter  Google Scholar 

  4. Farinella, G.M., et al.: VEDI: vision exploitation for data interpretation. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11752, pp. 753–763. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30645-8_68

    Chapter  Google Scholar 

  5. Gao, J., Yang, Z., Nevatia, R.: RED: reinforced encoder-decoder networks for action anticipation. arXiv preprint arXiv:1707.04818 (2017)

  6. Gao, M., Xu, M., Davis, L.S., Socher, R., Xiong, C.: StartNet: online detection of action start in untrimmed videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5542–5551 (2019)

    Google Scholar 

  7. Gao, M., Zhou, Y., Xu, R., Socher, R., Xiong, C.: WOAD: weakly supervised online action detection in untrimmed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1915–1923 (2021)

    Google Scholar 

  8. Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8359–8367 (2018)

    Google Scholar 

  9. Idrees, H., et al.: The THUMOS challenge on action recognition for videos “in the wild’’. Comput. Vis. Image Underst. 155, 1–23 (2017)

    Article  Google Scholar 

  10. Karita, S., et al.: A comparative study on transformer vs RNN in speech applications. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 449–456. IEEE (2019)

    Google Scholar 

  11. Moltisanti, D., Wray, M., Mayol-Cuevas, W., Damen, D.: Trespassing the boundaries: labeling temporal bounds for object interactions in egocentric video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2886–2894 (2017)

    Google Scholar 

  12. Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.-C.: Learning human-object interactions by graph parsing neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 407–423. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_25

    Chapter  Google Scholar 

  13. Ragusa, F., Furnari, A., Farinella, G.M.: MECCANO: a multimodal egocentric dataset for humans behavior understanding in the industrial-like domain. arXiv preprint arXiv:2209.08691 (2022)

  14. Shou, Z., et al.: Online detection of action start in untrimmed, streaming videos. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 551–568. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_33

    Chapter  Google Scholar 

  15. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  16. Wang, T., Yang, T., Danelljan, M., Khan, F.S., Zhang, X., Sun, J.: Learning human-object interaction detection using interaction points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4116–4125 (2020)

    Google Scholar 

  17. Wang, X., et al.: OadTR: online action detection with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7565–7575 (2021)

    Google Scholar 

  18. Xu, M., et al.: Long short-term transformer for online action detection. Adv. Neural. Inf. Process. Syst. 34, 1086–1099 (2021)

    Google Scholar 

  19. Zhao, Y., Krähenbühl, P.: Real-time online video detection with temporal smoothing transformers. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXXIV, vol. 13694, pp. 485–502. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19830-4_28

Download references

Acknowledgements

This research has been supported by Next Vision s.r.l., by the project MISE - PON I &C 2014-2020 - Progetto ENIGMA - Prog n. F/190050/02/X44 - CUP: B61B19000520008, and by Research Program Pia.ce.ri. 2020/2022 Linea 2 - University of Catania.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosario Scavo .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 48 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Scavo, R., Ragusa, F., Farinella, G.M., Furnari, A. (2023). Quasi-Online Detection of Take and Release Actions from Egocentric Videos. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14234. Springer, Cham. https://doi.org/10.1007/978-3-031-43153-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43153-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43152-4

  • Online ISBN: 978-3-031-43153-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics