short-paper

Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos

Authors:
Konstantinos Gkountakos

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece
View Profile

,
Despoina Touska

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece
View Profile

,
Konstantinos Ioannidis

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece
View Profile

,
Theodora Tsikrika

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece
View Profile

,
Stefanos Vrochidis

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece
View Profile

,
Ioannis Kompatsiaris

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece

Information Technologies Institute - Centre for Research and Technology Hellas, Thermi, Thessaloniki, Greece
View Profile

ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalAugust 2021Pages 451–455https://doi.org/10.1145/3460426.3463591

Published:01 September 2021Publication History

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 451–455

ABSTRACT

This work presents a spatio-temporal activity detection and recognition framework for untrimmed surveillance videos consisting of a three-step pipeline: object detection, tracking, and activity recognition. The framework relies on the YOLO v4 architecture for object detection, Euclidean distance for tracking, while the activity recognizer uses a 3D Convolutional Deep learning architecture employing spatio-temporal boundaries and addressing it as multi-label classification. The evaluation experiments on the VIRAT dataset achieve accurate detections of the temporal boundaries and recognitions of activities in untrimmed videos, with better performance for the multi-label compared to the multi-class activity recognition.

References

George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, and Georges Quénot. 2020. TRECVID 2020: comprehensive campaign for evaluating video retrieval tasks across multiple application domains. In Proceedings of TRECVID 2020. NIST, USA, NIST, 100 Bureau Drive Gaithersburg, MD 20899.Google Scholar
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).Google Scholar
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the ieee conference on computer vision and pattern recognition. IEEE, 961--970.Google ScholarCross Ref
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6299--6308.Google ScholarCross Ref
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2625--2634.Google ScholarCross Ref
Jiyang Gao, Zhenheng Yang, Kan Chen, Chen Sun, and Ram Nevatia. 2017. Turn tap: Temporal unit regression network for temporal action proposals. In Proceedings of the IEEE international conference on computer vision. IEEE, 3628--3636.Google ScholarCross Ref
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can spatiotemporal 3d CNNs retrace the history of 2d cnns and imagenet?. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, 6546--6555.Google ScholarCross Ref
Yu-Gang Jiang, Jingen Liu, A Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Sukthankar. 2014. THUMOS challenge: Action recognition with a large number of classes.Google Scholar
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, 1725--1732.Google ScholarDigital Library
Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, and Ming Yang. 2018. Bsn: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 3--19.Google ScholarDigital Library
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.Google ScholarCross Ref
Wenhe Liu, Guoliang Kang, Po-Yao Huang, Xiaojun Chang, Yijun Qian, Junwei Liang, Liangke Gui, Jing Wen, and Peng Chen. 2020. Argus: Efficient activity detection system for extended video analysis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. IEEE, 126--133.Google ScholarCross Ref
Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, JK Aggarwal, Hyungtae Lee, Larry Davis, et al. 2011. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. IEEE, 3153--3160.Google ScholarDigital Library
Aayush Jung Rana, Praveen Tirupattur, Mamshad Nayeem Rizve, Kevin Duarte, Ugur Demir, Yogesh Singh Rawat, and Mubarak Shah. 2019. An Online System for Real-Time Activity Detection in Untrimmed Surveillance Videos. In TRECVID. NIST.Google Scholar
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 6 (2016), 1137--1149.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional net- works for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 1. NIPS, 568--576.Google ScholarDigital Library
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, 10781--10790.Google ScholarCross Ref
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. IEEE, 4489--4497.Google ScholarDigital Library
Huifen Xia and Yongzhao Zhan. 2020. A Survey on Temporal Action Localization. IEEE Access 8 (2020), 70477--70487.Google ScholarCross Ref

Index Terms

Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Activity detection and recognition of daily living events
MIIRH '13: Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare

Activity recognition is one of the most active topics within computer vision. Despite its popularity, its application in real life scenarios is limited because many methods are not entirely automated and consume high computational resources for ...
Read More
Activity detection using Sequential Statistical Boundary Detection (SSBD)

We propose a novel activity detection scheme tailored for home environment scenes.We introduce three new action datasets for action detection evaluation.Fast spatio-temporal action localization with the use of statistical tools. The spiralling increase ...
Read More
Towards unobtrusive detection and realistic attribute analysis of daily activity sequences using a finger-worn device

Detection and analysis of activities of daily living (ADLs) are important in activity tracking, security monitoring, and life support in elderly healthcare. Recently, many research projects have employed wearable devices to detect and analyze ADLs. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
August 2021
715 pages
ISBN:9781450384636
DOI:10.1145/3460426
General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D-convolutional neural networks
activity detection
activity recognition
spatiotemporal boundaries detection
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 171
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Activity detection and recognition of daily living events

Activity detection using Sequential Statistical Boundary Detection (SSBD)

Towards unobtrusive detection and realistic attribute analysis of daily activity sequences using a finger-worn device