A comprehensive solution for detecting events in complex surveillance videos

Zhu, Yandong; Zhou, Kaihui; Wang, Menglai; Zhao, Yanyun; Zhao, Zhicheng

doi:10.1007/s11042-018-6163-6

A comprehensive solution for detecting events in complex surveillance videos

Published: 08 June 2018

Volume 78, pages 817–838, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yandong Zhu ORCID: orcid.org/0000-0002-7234-1694¹,
Kaihui Zhou¹,
Menglai Wang¹,
Yanyun Zhao² &
…
Zhicheng Zhao²

358 Accesses
14 Citations
Explore all metrics

Abstract

Event detection have long been a fundamental problem in computer vision society. Various datasets for recognizing human events and activities have been proposed to help developing better models and methods, such as UCF101, HMDB51, etc. These datasets all share the same properties that either predefined scripts are provided or the images are almost actor-oriented with little background noise. These properties, however, are completely different from that of surveillance event detection, making the effective solutions on these datasets totally not suitable. Event detection in complex surveillance video is a much more difficult task with several challenges: heavy occlusions between pedestrians, low image resolution and uncontrolled scene condition. TRECVID-SED evaluation, aiming at detecting events in highly crowded airport, is well-known for its great difficulties. To deal with event detection in realistic scene, such as TRECVID-SED, we introduce a comprehensive solution framework based on pedestrian detection, deep key-pose detection and trajectory analysis. Explicitly, instead of detecting whole body of one person, we detect the head-shoulder of pedestrian, addressing the issue of heavy occlusion of pedestrians in complex scene. We also propose a trajectory-based event detection method so as to better focus on the key actors of events. For those events with discriminative poses, we model the event detection as key pose detection by taking advantages of Faster R-CNN. The presented framework achieves the best result in TRECVID-SED 2016 evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

SOMPT22: A Surveillance Oriented Multi-pedestrian Tracking Dataset

Event detection in surveillance videos: a review

Article Open access 25 June 2022

Key-Track: A Lightweight Scalable LSTM-based Pedestrian Tracker for Surveillance Systems

Notes

References

Amor BB, Jingyong S, Srivastava A (2016) Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans Pattern Anal Mach Intell 38(1):1–13
Article Google Scholar
S Bell, CL Zitnick, K Bala, R Girshick (2015) Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks. arXiv 1–24
Cai Z, et al. (2016) A unified multi-scale deep convolutional neural network for fast object detection. European Conference on Computer Vision. Springer International Publishing
Chang BW, R Nevatia (2008) Robust object tracking by hierarchical association of detection responses." European Conference on Computer Vision. Springer Berlin Heidelberg
X Chang et al. (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Patt Anal Mach Intel
X Chang et al. (2016) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybernet
Chen Q et al. (2015) Part-based deep network for pedestrian detection in surveillance videos." Visual Communications and Image Processing (VCIP), 2015. IEEE
Dalal N, B Triggs (2005) Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE
Felzenszwalb PF et al (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Gidaris, Spyros, and Nikos Komodakis (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. Proc IEEE Int Conf Comput Vis
Girshick R (2015) Fast r-cnn. Proc IEEE Int Conf Comput Vis
Girshick R et al. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf Comput Vis Patt Recog
Horn BKP, Schunck BG (1981) Determining optical flow. Artif Intell 17(1–3):185–203
Article Google Scholar
http://crcv.ucf.edu/data/UCF101.php
https://www.nist.gov/itl/iad/mig/trecvid-multimedia-event-detection-evaluation-track
Karen, A Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Quart 2:83–97 Kuhn's original publication
Article MathSciNet Google Scholar
D Le, S Phan, Y Miyao, S Satoh et al (2016) @ TRECVID
Lenz P, A Geiger, R Urtasun (2015) Followme: Efficient online min-cost flow tracking with bounded memory and computation. Proc IEEE Int Conf Comput Vis
Li Y, K He, J Sun (2016) "R-fcn: Object detection via region-based fully convolutional networks. Adv Neural Info Proc Syst
J. Liang, P. Huang, L. Jiang, Z. Lan, J. Chen, A. Hauptmann et al. @ TRECVID (2016) Multimedia event Detection, Ad-hoc Video Search, Surveillance event Detection
Liu L et al (2016) Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans Cybernet 46(1):158–170
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article MathSciNet Google Scholar
Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Adv Neural Inf Proces Syst 2:841–848
Google Scholar
Peng X, Wang L, Wang X et al (2016) Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150:109–125
Article Google Scholar
Prince, SJD (2012) Computer vision: models, learning, and inference". Cambridge University Press
Redmon J et al. (2016) You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Ren S et al. (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neur Info Proc Syst
Russakovsky O, Deng J et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, A Zisserman (2014) Two-stream convolutional networks for action recognition in videos. Adv Neur Info Proc Syst
Solera F, S Calderara, R Cucchiara (2015) Learning to divide and conquer for online multi-target tracking. Proc IEEE Int Conf Comput Vis
Wang H et al. (2011) Action recognition by dense trajectories." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE
Wang H et al (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Article MathSciNet Google Scholar
Wang, et al (2016) Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans Patt Anal Mach Intel
Wu J, Zhang Y, Lin W (2016) Good practices for learning to recognize actions using FV and VLAD. IEEE Trans Cybernet 46(12):2978–2990
Article Google Scholar
P. Yang, J. Xiong, D. Xie, S. Pu, HRI Team @ TRECVID (2016) Surveillance event detection
S Yu, L Jiang, CMU Informedia @ TRECVID (2015). Proc TRECVID 2015 Work
Zach C, T Pock, H Bischof (2007) A duality based approach for realtime TV-L 1 optical flow. Pattern Recog 214–223
Zha Z-J et al (2013) Detecting group activities with multi-camera context. IEEE Trans Circ Syst Video Technol 23(5):856–869
Article Google Scholar
Zhang L, Y Li, R Nevatia (2008) Global data association for multi-object tracking using network flows. Comput Vis Patt Recog, 2008. CVPR 2008. IEEE Conference on. IEEE
Zhang S et al (2015) Multi-target tracking by learning local-to-global trajectory models. Pattern Recogn 48(2):580–590
Article Google Scholar
Zhang X et al (2016) Deep fusion of multiple semantic cues for complex event recognition. IEEE Trans Image Process 25(3):1033–1046
Article MathSciNet Google Scholar
Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering Beijing University of Posts and Telecommunications, Beijing, China
Yandong Zhu, Kaihui Zhou & Menglai Wang
Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China
Yanyun Zhao & Zhicheng Zhao

Authors

Yandong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Kaihui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Menglai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yandong Zhu.

Additional information

This work is supported by Key Laboratory of Forensic Marks, Ministry of Public Security ,Beijing,China and Chinese National Natural Science Foundation (61532018, 61372169, 61471049).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Y., Zhou, K., Wang, M. et al. A comprehensive solution for detecting events in complex surveillance videos. Multimed Tools Appl 78, 817–838 (2019). https://doi.org/10.1007/s11042-018-6163-6

Download citation

Received: 28 April 2017
Revised: 14 March 2018
Accepted: 18 May 2018
Published: 08 June 2018
Issue Date: January 2019
DOI: https://doi.org/10.1007/s11042-018-6163-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive solution for detecting events in complex surveillance videos

Abstract

Access this article

Similar content being viewed by others

SOMPT22: A Surveillance Oriented Multi-pedestrian Tracking Dataset

Event detection in surveillance videos: a review

Key-Track: A Lightweight Scalable LSTM-based Pedestrian Tracker for Surveillance Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comprehensive solution for detecting events in complex surveillance videos

Abstract

Access this article

Similar content being viewed by others

SOMPT22: A Surveillance Oriented Multi-pedestrian Tracking Dataset

Event detection in surveillance videos: a review

Key-Track: A Lightweight Scalable LSTM-based Pedestrian Tracker for Surveillance Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation