Video anomaly detection using CycleGan based on skeleton features

doi:10.1016/j.jvcir.2022.103508

Journal of Visual Communication and Image Representation

Volume 85, May 2022, 103508

https://doi.org/10.1016/j.jvcir.2022.103508 Get rights and content

Abstract

Anomaly detection is a challenging task in the field of intelligent video surveillance. It aims to identify anomalous events by monitoring the video captured by visual sensors. The main difficulty of this task is that the definition of anomalies is ambiguous. In recent years, most anomaly detection methods use a two-stage learning strategy, i.e., feature extraction and model building. In this paper, with the idea of refactoring, we propose an end-to-end anomaly detection framework using cyclic consistent adversarial networks (CycleGAN). Dynamic skeleton features are used as network constraints to alleviate the inaccuracy of feature extraction algorithms of a single generative adversarial network. In the training phase, only normal video frames and the corresponding skeleton features are used to train the generator and discriminator. In the testing phase, anomalous behaviors with high reconstruction errors can be filtered out by manually set thresholds. To the best of our knowledge, this is the first time CycleGAN has been used for video anomaly detection. Experimental results on challenging datasets show that our method can accurately detect anomalous behaviors in videos collected by video surveillance systems and is comparable to the current state-of-the-art methods.

Introduction

Video anomaly detection [1], [2], [3], [4] has received more and more research attention around the world for its great significance in real-world applications such as intelligent visual monitoring, traffic detection, and autopilot. However, it is still faced with some challenging problems.

The ambiguous definition of human-perceivable abnormal behavior prevents the optimization of anomaly detection. The same behavior can be recognized as normal or abnormal behavior in different scenarios. For example, ‘running’ is a normal behavior in the sports field, while it should be detected as abnormal activity in a high way. At the same time, there is an imbalance between positive and negative samples in the training dataset, because abnormal activities happen with a much lower probability than normal activities in reality..

Many previous works [5], [6], [7] used pixel-based features or motion features extracted from video frames to train the network. Nevertheless, pixel-based features or motion features extracted from the original frames cover much hidden redundant information, which will increase the difficulty of model training. In addition, pixel-based features are sensitive to noise, leading to the coverage of important and useful information. Human skeleton features, which have rich semantic information and a strong sense of structure, are employed to alleviate the problem. However, the algorithm for skeleton features extraction is not accurate enough to some extent. Taking only skeleton features as input to train the model [5], [8], that is to say, totally depends on the skeleton features, brings about inaccuracy in detection results.

To solve the problems mentioned above, we propose an anomaly detection method. The idea of reconstruction is employed. Only positive samples are fed into the network. CycleGAN [9] is adopted to set up a mapping from the original frame to extracted skeleton features. Using the cycle-consistency loss, two generators and discriminators are trained to minimize the error in reconstructing the video frames and the error in reconstructing the pose estimation map. A manually set threshold is used to filter out the abnormal behavior in the testing phase.

The contribution of our work can be listed as follows.

1. We use only positive samples to train the model, and screen out the abnormal behavior by a preset reconstruction error value, alleviating the problem of ambiguous definition of abnormality and imbalance between positive and negative samples.

2. CycleGAN is applied to anomaly detection in a reconstructive manner, preserving the details of video frames and dynamic skeleton features. To the best of our knowledge, this is the first time CycleGAN is being used in video anomaly detection.

3. Skeleton feature, which has a strong ability to describe the movements of human beings, is taken as the constrain for reconstruction, avoiding the disadvantages of total dependency of skeleton features.

The paper is organized as follows. In Section 2 we review the previous work on anomaly detection. The detail of the proposed method is described in Section 3. Section 4 gives the experimental results and analysis. Section 5 is the conclusion.

Section snippets

Video anomaly detection

Traditional anomaly detection methods are usually based on hand-crafted features, which mainly include two steps, one for feature extraction and another for model establishment. Low-level trajectory features [10], [11] are typically used in early works. Simply as it is extracted, it fails when faced with complex scenarios. To overcome the disadvantages of low-level trajectory features, low-level spatial–temporal features, such as Histograms of Oriented Gradients (HOG) and Histograms of Oriented

Method

In this section, we describe our anomaly detection method in detail. In the training phase, dynamic skeleton features are extracted, and then dynamic skeleton features containing only normal behavior and video frames are fed into the model as the input for training. By reducing the cyclic consistency loss and training the generator and discriminator simultaneously. In the test phase, when the input is an abnormal frame, the reconstructed abnormal video frame is very different from the

Quantitative analysis

We compare the proposed abnormal behavior detection method based on dynamic skeleton features with existing state-of-the-art methods [5], [6], [16], [36], [37], [38], [39], [40], [41], [42], [43] on the CUHK Avenue dataset, and the AUC results are listed in the Table 1. As can be seen from the table, the proposed method gets the AUC value of 87.8%, which surpasses all the mainstream methods, for its compact features and accurate representation ability of the human body movements.

Qualitative analysis

In Fig. 4, the

Conclusion

Dynamic skeleton feature extraction has compact features and rich semantics, a strong representation of human motion and movement, and low computational cost. We propose an anomaly detection algorithm based on dynamic skeleton features using the CycleGAN structure. To avoid the inaccuracy caused by the skeleton feature extraction algorithm, we use dynamic skeleton features as network constraints instead of relying on them completely. The extracted dynamic skeleton features and frames containing

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Funding

This work was supported by the National Natural Science Foundation of Beijing [grant numbers L192036]; the National Natural Science Foundation of China [grant number 61701029]; and the Industry-University-Research Innovation Foundation of the Science and Technology Development Center of the Ministry of Education [grant number 2018A02012].

References (56)

TungFrederick et al.
Goal-based trajectory analysis for unusual behaviour detection in intelligent surveillance
Image Vis. Comput.
(2011)
XuDan et al.
Detecting anomalous events in videos by learning deep representations of appearance and motion
Comput. Vis. Image Underst.
(2017)
SunQianru et al.
Online growing neural gas for anomaly detection in changing surveillance scenes
Pattern Recognit.
(2017)
ZhangYing et al.
Video anomaly detection based on locality sensitive hashing filters
Pattern Recognit.
(2016)
Vijay Mahadevan, Weixin Li, Viral Bhalodia, Nuno Vasconcelos, Anomaly detection in crowded scenes, in: IEEE Conference...
LiWeixin et al.
Anomaly detection and localization in crowded scenes
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)
Kai-Wen Cheng, Yie-Tarng Chen, Wen-Hsien Fang, Video anomaly detection and localization using hierarchical feature...
Waqas Sultani, Chen Chen, Mubarak Shah, Real-world anomaly detection in surveillance videos, in: IEEE Conference on...
Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury, Larry S. Davis, Learning temporal regularity in...
Wen Liu, Weixin Luo, Dongze Lian, Shenghua Gao, Future frame prediction for anomaly detection–a new baseline, in: IEEE...

XuDan et al.

Learning deep representations of appearance and motion for anomalous event detection

(2015)

Royston Rodrigues, Neha Bhargava, Rajbabu Velmurugan, Subhasis Chaudhuri, Multi-timescale Trajectory Prediction for...

Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A Efros, Unpaired image-to-image translation using cycle-consistent...

WuShandong et al.

Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes

ZhangDong et al.

Semi-supervised adapted hmms for unusual event detection

KimJaechul et al.

Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates

R. Rodrigues, N. Bhargava, R. Velmurugan, S. Chaudhuri, Multi-timescale Trajectory Prediction for Abnormal Human...

D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, Avd Hengel, Memorizing Normality to Detect Anomaly:...

SabokrouM. et al.

Adversarially learned one-class classifier for novelty detection

(2018)

SabokrouM. et al.

Deep end-to-end one-class classifier

IEEE Trans. Neural Netw. Learn. Syst.

(2020)

M. Sabokrou, M. Pourreza, M. Fayyaz, R. Entezari, M. Fathy, J. Gall, E. Adeli, AVID: Adversarial Visual Irregularity...

S. Akcay, A. Atapour-Abarghouei, T. P. Breckon, GANomaly: Semi-Supervised Anomaly Detection Via Adversarial Training,...

T. Schlegl, Philipp Seebck, S. M. Waldstein, U. Schmidt-Erfurth, G. Langs, Unsupervised Anomaly Detection with...

AhmadiM. et al.

Generative adversarial irregularity detection in mammography images

M. Pourreza, B. Mohammadi, M. Khaki, S. Bouindour, M. Sabokrou, G2D: Generate to Detect Anomaly, in: 2021 IEEE Winter...

M. Sabokrou, M. Khalooei, E. Adeli, Self-Supervised Representation Learning via Neighborhood-Relational Encoding, in:...

Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, Deep high-resolution representation learning for human pose estimation, in:...

InsafutdinovEldar et al.

Deepercut: A deeper, stronger, and faster multi-person pose estimation model

Cited by (6)

Prime: Privacy-preserving video anomaly detection via Motion Exemplar guidance
2023, Knowledge-Based Systems
Video anomaly detection (VAD) involves identifying events or behaviours in video sequences that deviate from expected patterns. Most VAD models to date focus on seeking continuous improvement by directly learning identifiable visual cues from information-rich appearance data, regardless of the critical issue of privacy and data security in public places. This paper explores the possibility of addressing privacy-preserving VAD by privacy-independent data, such as human body skeleton and optical flow. However, due to the imbalanced nature of normality and anomaly, direct learning of the consistency of heterogeneous data may result in normality bias. To address the issues, we propose a novel motion exemplar-guided approach (a.k.a. Prime) that explicitly incorporates the support set of human skeleton poses into the VAD framework for breaking through the usefulness-versus-privacy dilemma. The support set containing diverse motion exemplars from the large-scale human skeleton-based action database enables our model to disentangle the coarsely defined anomalies. To learn the abnormal consistency between poses and optical flow, we introduce a Non-Minimum Suppression (NMS) strategy that adaptively highlights the correlation of anomalous pairs. The proposed architecture allows us to train our model with both fully and weakly-supervised paradigms in an end-to-end manner. We conducted performance evaluations of our method on three well-established datasets for VAD tasks: UCSD Ped2, Avenue, and ShanghaiTech. These evaluations were carried out in both privacy and non-privacy settings to assess the effectiveness of our approach. The results demonstrate that our approach surpasses the performance of most state-of-the-art (SOTA) methods, both in fully-supervised and weakly-supervised paradigms.
Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection
2023, Journal of Visual Communication and Image Representation
Video anomaly detection (VAD) refers to identifying abnormal events in the surveillance video. Typically, reconstruction based video anomaly detection techniques employ convolutional autoencoders with a limited number of layers, which extracts insufficient features leading to improper network training. To address this challenge, an end-to-end unsupervised feature enhancement network, namely Bi-Residual Convolutional AutoEncoder (Bi-ResCAE) has been proposed that can learn normal events with low reconstruction error and detect anomalies with high reconstruction error. The proposed Bi-ResCAE network incorporates long–short residual connections to enhance feature reusability and training stabilization. In addition, we propose to formulate a novel VAD model that can extract appearance and motion features by fusing both the Bi-ResCAE network and optical flow network in the objective function to recognize the anomalous object in the video. Extensive experiments on three benchmark datasets validate the effectiveness of the model. The proposed model achieves an AUC (Area Under the ROC Curve) of 84.7% on Ped1, 97.7% on Ped2, and 86.71% on the Avenue dataset. The results show that the Bi-READ performs better than state-of-the-art techniques.
Skeletal Video Anomaly Detection Using Deep Learning: Survey, Challenges, and Future Directions
2024, IEEE Transactions on Emerging Topics in Computational Intelligence
Fusion of Transformer Model and Skeleton Detection Model for Abnormal Human Activity Detection with Transfer Learning
2023, Proceedings - 2023 IEEE World Conference on Applied Intelligence and Computing, AIC 2023
An implementation of intelligent YOLOv3-based anomaly detection model from crowded video scenarios with optimized ensemble pattern extraction
2023, Imaging Science Journal
Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions
2022, arXiv

^☆: This paper has been recommended for acceptance by Zicheng Liu.

View full text

Full length articleVideo anomaly detection using CycleGan based on skeleton features☆