Full length article
Video anomaly detection using CycleGan based on skeleton features

https://doi.org/10.1016/j.jvcir.2022.103508Get rights and content

Abstract

Anomaly detection is a challenging task in the field of intelligent video surveillance. It aims to identify anomalous events by monitoring the video captured by visual sensors. The main difficulty of this task is that the definition of anomalies is ambiguous. In recent years, most anomaly detection methods use a two-stage learning strategy, i.e., feature extraction and model building. In this paper, with the idea of refactoring, we propose an end-to-end anomaly detection framework using cyclic consistent adversarial networks (CycleGAN). Dynamic skeleton features are used as network constraints to alleviate the inaccuracy of feature extraction algorithms of a single generative adversarial network. In the training phase, only normal video frames and the corresponding skeleton features are used to train the generator and discriminator. In the testing phase, anomalous behaviors with high reconstruction errors can be filtered out by manually set thresholds. To the best of our knowledge, this is the first time CycleGAN has been used for video anomaly detection. Experimental results on challenging datasets show that our method can accurately detect anomalous behaviors in videos collected by video surveillance systems and is comparable to the current state-of-the-art methods.

Introduction

Video anomaly detection [1], [2], [3], [4] has received more and more research attention around the world for its great significance in real-world applications such as intelligent visual monitoring, traffic detection, and autopilot. However, it is still faced with some challenging problems.

The ambiguous definition of human-perceivable abnormal behavior prevents the optimization of anomaly detection. The same behavior can be recognized as normal or abnormal behavior in different scenarios. For example, ‘running’ is a normal behavior in the sports field, while it should be detected as abnormal activity in a high way. At the same time, there is an imbalance between positive and negative samples in the training dataset, because abnormal activities happen with a much lower probability than normal activities in reality..

Many previous works [5], [6], [7] used pixel-based features or motion features extracted from video frames to train the network. Nevertheless, pixel-based features or motion features extracted from the original frames cover much hidden redundant information, which will increase the difficulty of model training. In addition, pixel-based features are sensitive to noise, leading to the coverage of important and useful information. Human skeleton features, which have rich semantic information and a strong sense of structure, are employed to alleviate the problem. However, the algorithm for skeleton features extraction is not accurate enough to some extent. Taking only skeleton features as input to train the model [5], [8], that is to say, totally depends on the skeleton features, brings about inaccuracy in detection results.

To solve the problems mentioned above, we propose an anomaly detection method. The idea of reconstruction is employed. Only positive samples are fed into the network. CycleGAN [9] is adopted to set up a mapping from the original frame to extracted skeleton features. Using the cycle-consistency loss, two generators and discriminators are trained to minimize the error in reconstructing the video frames and the error in reconstructing the pose estimation map. A manually set threshold is used to filter out the abnormal behavior in the testing phase.

The contribution of our work can be listed as follows.

1. We use only positive samples to train the model, and screen out the abnormal behavior by a preset reconstruction error value, alleviating the problem of ambiguous definition of abnormality and imbalance between positive and negative samples.

2. CycleGAN is applied to anomaly detection in a reconstructive manner, preserving the details of video frames and dynamic skeleton features. To the best of our knowledge, this is the first time CycleGAN is being used in video anomaly detection.

3. Skeleton feature, which has a strong ability to describe the movements of human beings, is taken as the constrain for reconstruction, avoiding the disadvantages of total dependency of skeleton features.

The paper is organized as follows. In Section 2 we review the previous work on anomaly detection. The detail of the proposed method is described in Section 3. Section 4 gives the experimental results and analysis. Section 5 is the conclusion.

Section snippets

Video anomaly detection

Traditional anomaly detection methods are usually based on hand-crafted features, which mainly include two steps, one for feature extraction and another for model establishment. Low-level trajectory features [10], [11] are typically used in early works. Simply as it is extracted, it fails when faced with complex scenarios. To overcome the disadvantages of low-level trajectory features, low-level spatial–temporal features, such as Histograms of Oriented Gradients (HOG) and Histograms of Oriented

Method

In this section, we describe our anomaly detection method in detail. In the training phase, dynamic skeleton features are extracted, and then dynamic skeleton features containing only normal behavior and video frames are fed into the model as the input for training. By reducing the cyclic consistency loss and training the generator and discriminator simultaneously. In the test phase, when the input is an abnormal frame, the reconstructed abnormal video frame is very different from the

Quantitative analysis

We compare the proposed abnormal behavior detection method based on dynamic skeleton features with existing state-of-the-art methods [5], [6], [16], [36], [37], [38], [39], [40], [41], [42], [43] on the CUHK Avenue dataset, and the AUC results are listed in the Table 1. As can be seen from the table, the proposed method gets the AUC value of 87.8%, which surpasses all the mainstream methods, for its compact features and accurate representation ability of the human body movements.

Qualitative analysis

In Fig. 4, the

Conclusion

Dynamic skeleton feature extraction has compact features and rich semantics, a strong representation of human motion and movement, and low computational cost. We propose an anomaly detection algorithm based on dynamic skeleton features using the CycleGAN structure. To avoid the inaccuracy caused by the skeleton feature extraction algorithm, we use dynamic skeleton features as network constraints instead of relying on them completely. The extracted dynamic skeleton features and frames containing

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Funding

This work was supported by the National Natural Science Foundation of Beijing [grant numbers L192036]; the National Natural Science Foundation of China [grant number 61701029]; and the Industry-University-Research Innovation Foundation of the Science and Technology Development Center of the Ministry of Education [grant number 2018A02012].

References (56)

  • XuDan et al.

    Learning deep representations of appearance and motion for anomalous event detection

    (2015)
  • Royston Rodrigues, Neha Bhargava, Rajbabu Velmurugan, Subhasis Chaudhuri, Multi-timescale Trajectory Prediction for...
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A Efros, Unpaired image-to-image translation using cycle-consistent...
  • WuShandong et al.

    Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes

  • ZhangDong et al.

    Semi-supervised adapted hmms for unusual event detection

  • KimJaechul et al.

    Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates

  • R. Rodrigues, N. Bhargava, R. Velmurugan, S. Chaudhuri, Multi-timescale Trajectory Prediction for Abnormal Human...
  • D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, Avd Hengel, Memorizing Normality to Detect Anomaly:...
  • SabokrouM. et al.

    Adversarially learned one-class classifier for novelty detection

    (2018)
  • SabokrouM. et al.

    Deep end-to-end one-class classifier

    IEEE Trans. Neural Netw. Learn. Syst.

    (2020)
  • M. Sabokrou, M. Pourreza, M. Fayyaz, R. Entezari, M. Fathy, J. Gall, E. Adeli, AVID: Adversarial Visual Irregularity...
  • S. Akcay, A. Atapour-Abarghouei, T. P. Breckon, GANomaly: Semi-Supervised Anomaly Detection Via Adversarial Training,...
  • T. Schlegl, Philipp Seebck, S. M. Waldstein, U. Schmidt-Erfurth, G. Langs, Unsupervised Anomaly Detection with...
  • AhmadiM. et al.

    Generative adversarial irregularity detection in mammography images

  • M. Pourreza, B. Mohammadi, M. Khaki, S. Bouindour, M. Sabokrou, G2D: Generate to Detect Anomaly, in: 2021 IEEE Winter...
  • M. Sabokrou, M. Khalooei, E. Adeli, Self-Supervised Representation Learning via Neighborhood-Relational Encoding, in:...
  • Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, Deep high-resolution representation learning for human pose estimation, in:...
  • InsafutdinovEldar et al.

    Deepercut: A deeper, stronger, and faster multi-person pose estimation model

  • Cited by (6)

    • Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection

      2023, Journal of Visual Communication and Image Representation
    • Skeletal Video Anomaly Detection Using Deep Learning: Survey, Challenges, and Future Directions

      2024, IEEE Transactions on Emerging Topics in Computational Intelligence
    • Fusion of Transformer Model and Skeleton Detection Model for Abnormal Human Activity Detection with Transfer Learning

      2023, Proceedings - 2023 IEEE World Conference on Applied Intelligence and Computing, AIC 2023

    This paper has been recommended for acceptance by Zicheng Liu.

    View full text