Skip to main content
Log in

An overview of violence detection techniques: current challenges and future directions

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The Big Video Data generated in today’s smart cities has raised concerns from its purposeful usage perspective, where surveillance cameras, among many others are the most prominent resources to contribute to the huge volumes of data, making its automated analysis a difficult task in terms of computation and preciseness. Violence detection (VD), broadly plunging under action and activity recognition domain, is used to analyze Big Video data for anomalous actions incurred due to humans. The VD literature is traditionally based on manually engineered features, though advancements to deep learning based standalone models are developed for real-time VD analysis. This paper focuses on overview of deep sequence learning approaches along with localization strategies of the detected violence. This overview also dives into the initial image processing and machine learning-based VD literature and their possible advantages such as efficiency against the current complex models. Furthermore,the datasets are discussed, to provide an analysis of the current models, explaining their pros and cons with future directions in VD domain derived from an in-depth analysis of the previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abdali A-MR, Al-Tuma RF (2019) Robust real-time violence detection in video using CNN and LSTM. In: 2019 2nd scientific conference of computer sciences (SCCS). IEEE, Piscataway, pp 104–108

  • Accattoli S, Sernani P, Falcionelli N, Mekuria DN, Dragoni AF (2020) Violence detection in videos by combining 3D convolutional neural networks and support vector machines. Appl Artif Intell 34(4):329–344

    Article  Google Scholar 

  • Agarwal R, Machado MC, Castro PS, Bellemare MG (2021) Contrastive behavioral similarity embeddings for generalization in reinforcement learning. arXiv preprint. arXiv:2101.05265

  • Al-Nawashi M, Al-Hazaimeh OM, Saraee M (2017) A novel framework for intelligent surveillance system based on abnormal human activity detection in academic environments. Neural Comput Appl 28(1):565–572

    Article  Google Scholar 

  • Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil- López S, Molina D, Benjamins R et al (2020) Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion 58:82–115

    Article  Google Scholar 

  • Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? arXiv preprint. arXiv:2102.05095

  • Bilinski P, Bremond F (2016) Human violence recognition and detection in surveillance videos. In: 2016 13th IEEE International conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 30–36

  • Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  • Chang C-W, Chang C-Y, Lin Y-Y (2022) A hybrid CNN and LSTM-based deep learning model for abnormal behavior detection. Multimedia Tools Appl 81(2):1–19

  • Chen D, Wactlar H, Chen M, Gao C, Bharucha A, Hauptmann A (2008) Recognition of aggressive human behavior using binary local motion descriptors. In: 2008 30th annual international conference of the IEEE engineering in medicine and biology society. IEEE, Piscataway, pp 5238–5241

  • Chen L-H, Hsu H-W, Wang L-Y, Su C-W (2011) Violence detection in movies. In: 2011 Eighth international conference computer graphics, imaging and visualization. IEEE, Piscataway, pp 119–124

  • Cheng M, Cai K, Li M (2021) Rwf-2000: an open large scale video database for violence detection. In: 2020 25th International conference on pattern recognition (ICPR). IEEE, Piscataway, pp 4183–4190

  • Cui Q, Gong Z, Ni W, Hou Y, Chen X, Tao X, Zhang P (2019) Stochastic online learning for mobile edge computing: learning from changes. IEEE Commun Mag 57(3):63–69

    Article  Google Scholar 

  • Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon H (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn 108:107561

    Article  Google Scholar 

  • Datta A, Shah M, Lobo NDV 2002) Person-on-person violence detection in video data. In: Object recognition supported by user interaction for service robots, vol 1. IEEE, Piscataway, pp 433–438

  • De Souza FD, Chavez GC, do Valle EA Jr, Araújo AA (2010) Violence detection in video using spatio-temporal features. In: 2010 23rd SIBGRAPI conference on graphics, patterns and images. IEEE, Piscataway, pp 224–230

  • Dehingia N, Dey AK, McDougal L, McAuley J, Singh A, Raj A (2022) Help seeking behavior by women experiencing intimate partner violence in India: a machine learning approach to identifying risk factors. PLoS ONE 17(2):e0262538

    Article  Google Scholar 

  • Deniz O, Serrano I, Bueno G, Kim T-K (2014) Fast violence detection in video. In: 2014 international conference on computer vision theory and applications (VISAPP), vol 2. IEEE, Piscataway, pp 478–485

  • Dhiman C, Vishwakarma DK (2017) High dimensional abnormal human activity recognition using histogram oriented gradients and zernike moments. In: 2017 IEEE International conference on computational intelligence and computing research (ICCIC). IEEE, Piscataway, pp 1–4

  • Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3d convolutional neural networks. In: International symposium on visual computing. Springer, Cham, pp 551–558

  • Dogru O, Velswamy K, Huang B (2021) Actor–critic reinforcement learning and application in developing computer-vision-based interface tracking. Engineering 7(9):1248–1261

  • Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint. arXiv:2010.11929

  • Fenil E, Manogaran Gunasekaran, Vivekananda GN, Thanjaivadivel T, Jeeva S, Ahilan A et al (2019) Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput Netw 151:191–200

    Article  Google Scholar 

  • Freire-Obregón D, Barra P, Castrillón-Santana M, De Marsico M (2022) Inflated 3D ConvNet context analysis for violence detection. Mach Vis Appl 33(1):1–13

    Article  Google Scholar 

  • Fu EY, Leong HV, Ngai G, Chan SCF (2017) Automatic fight detection in surveillance videos. In: 14th International conference on advances in mobile computing and multimedia (MoMM 2016)—proceedings. Association for Computing Machinery, New York, pp 225–234

  • Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vis Comput 48:37–41

    Article  Google Scholar 

  • Gao M, Zheng F, Yu JJ, Shan C, Ding G, Han J (2022) Deep learning for video object segmentation: a review. Artif Intell Rev. https://doi.org/10.1007/s10462-022-10176-7

  • Gracia IS, Suarez OD, Garcia GB, Kim TK (2015) Fast fight detection. PLoS ONE 10(4):e0120448

  • Hafiz AM, Parah SA, Bhat RA (2021) Reinforcement learning applied to machine vision: state of the art. Int J Multimedia Inf Retrieval. https://doi.org/10.1007/s13735-021-00209-2

  • Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, Piscataway, pp 1–6

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint. arXiv:1704.04861

  • Hussain T, Muhammad K, Ullah A, Cao Z, Baik SW, de Albuquerque VHC (2019) Cloud-assisted multiview video summarization using cnn and bidirectional LSTM. IEEE Trans Ind Inf 16(1):77–86

    Article  Google Scholar 

  • Hussain T, Muhammad K, Ullah A, Del Ser J, Gandomi AH, Sajjad M, Baik SW, de Albuquerque VHC (2020) Multi-view summarization and activity recognition meet edge computing in IoT environments. IEEE Internet Things J 8:9634–9644

  • Hussain A, Hussain T, Ullah W, Baik SW (2022) Vision transformer and deep sequence learning for human activity recognition in surveillance videos. Comput Intell Neurosci. https://doi.org/10.1155/2022/3454167

  • Jin Y, Jiao L, Qian Z, Zhang S, Lu S (2021) Learning for learning: predictive online control of federated learning with edge provisioning. In: IEEE INFOCOM 2021—IEEE conference on computer communications. IEEE, Piscataway, pp 1–10

  • Karpathy A, Toderici C, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732

  • Khan SU, Haq IU, Rho S, Baik SW, Lee MY (2019) Cover the violence: a novel deep-learning-based approach towards violence-detection in movies. Appl Sci 9(22):4963

    Article  Google Scholar 

  • Li X, Huo Y, Jin Q, Xu J (2016) Detecting violence in video using subclasses. In: Proceedings of the 24th ACM international conference on multimedia, pp 586–590

  • Li J, Jiang X, Sun T, Xu K (2019) Efficient violence detection using 3d convolutional neural networks. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 1–8

  • Li X, Wang Y, Zhou Z, Qiao Y (2020) SmallBigNet: integrating core and contextual views for video classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1092–1101

  • Lohithashva BH, Manjunath Aradhya VN, Guru DS (2020) Violent video event detection based on integrated LBP and GLCM texture features. Rev Intell Artif 34(2):179–187

    Article  Google Scholar 

  • Lygouras E, Santavas N, Taitzoglou A, Tarchanidis K, Mitropoulos A, Gasteratos A (2019) Unsupervised human detection with an embedded vision system on a fully autonomous UAV for search and rescue operations. Sensors 19(16):3542

    Article  Google Scholar 

  • Mabrouk AB, Zagrouba E (2017) Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recogn Lett 92:62–67

    Article  Google Scholar 

  • Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society conference on computer vision and pattern recognition. IEEE, Piscataway, pp 1975–1981

  • Meng Z, Yuan J, Li Z (2017) Trajectory-pooled deep convolutional networks for violence detection in videos. In: International conference on computer vision systems. Springer, Cham, pp 437–447

  • Mishra AA, Srinivasa G (2018) Automated detection of fighting styles using localized action features. In: 2018 2nd International conference on inventive systems and control (ICISC). IEEE, Piscataway, pp 1385–1389

  • Mu G, Cao H, Jin Q (2016) Violent scene detection using convolutional neural networks and deep audio features. In: Chinese conference on pattern recognition. Springer, Singapore, pp 451–463

  • Mumtaz A, Bux SA, Habib Z (2022) Fast learning through deep multi-net CNN model for violence recognition in video surveillance. Comput J 65(3):457–472

    Article  Google Scholar 

  • Naik AJ, Gopalakrishna MT (2022) Automated violence detection in video crowd using spider monkey-grasshopper optimization oriented optimal feature selection and deep neural network. J Control Autom Electr Syst. https://doi.org/10.1007/s40313-021-00868-w

  • Nguyen NT, Phung DQ, Venkatesh S, Bui H (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden markov model. In: 2005 IEEE Computer Society conference on computer vision and pattern recognition (CVPR’05), vol 2. IEEE, Piscataway, pp 955–960

  • Nievas EB, Suarez OD, García GB, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: International conference on computer analysis of images and patterns. Springer, Berlin, pp 332–339

  • Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54(3):2259–2322

    Article  Google Scholar 

  • Pavlidis NG, Tasoulis OK, Plagianakos Vassilis P, Nikiforidis G, Vrahatis MN (2005) Spiking neural network training using evolutionary algorithms. In: Proceedings of 2005 IEEE international joint conference on neural networks, vol 4. IEEE, Piscataway, pp 2190–2194

  • Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE international conference on computer vision, pp 5533–5541

  • Ramanujam E, Perumal T, Padmavathi S (2021) Human activity recognition with smartphone and wearable sensors using deep learning techniques: a review. IEEE Sensors J. https://doi.org/10.1109/JSEN.2021.3069927

  • Rojat T, Puget R, Filliat D, Del Ser J, Gelin R, Díaz-Rodríguez N (2021) Explainable artificial intelligence (XAI) on timeseries data: a survey. arXiv preprint. arXiv:2104.00950

  • Roka S, Diwakar M, Karanwal S (2022) A review in anomalies detection using deep learning. In: Proceedings of third international conference on sustainable computing. Springer, Singapore, pp 329–338

  • Roman DGC, Chávez GC (2020) Violence detection and localization in surveillance video. In: 2020 33rd SIBGRAPI Conference on graphics, patterns and images (SIBGRAPI). IEEE, Piscataway, pp 248–255

  • Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE International conference on computer vision (ICCV)

  • Sasithradevi A, Mansoor Roomi S Mohamed (2020) Video classification and retrieval through spatio-temporal radon features. Pattern Recogn 99:107099

    Article  Google Scholar 

  • Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G (2018) Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans Image Process 27(10):4787–4797

    Article  MathSciNet  MATH  Google Scholar 

  • Shen L, Hong R, Hao Y (2020) Advance on large scale near-duplicate video retrieval. Front Comput Sci 14(5):1–24

    Article  Google Scholar 

  • Sonar A, Pacelli V, Majumdar A (2021) Invariant policy optimization: towards stronger generalization in reinforcement learning. In: Proceedings of the 3rd conference on Learning for dynamics and control. PMLR, pp 21–33

  • Song D, Kim C, Park S-K (2018) A multi-temporal framework for high-level activity analysis: violent event detection in visual surveillance. Inf Sci 447:83–103

    Article  MathSciNet  Google Scholar 

  • Spolaôr N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intell 90:103557

    Article  Google Scholar 

  • Sudhakaran S, Lanz O (2017) Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 1–6

  • Suleiman A, Chen Y-H, Emer J, Sze V (2017) Towards closing the energy gap between hog and cnn features for embedded vision. In: 2017 IEEE International symposium on circuits and systems (ISCAS). IEEE, Piscataway, pp 1–4

  • Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488

  • Sumon SA, Goni R, Hashem NB, Rahman RM (2020) Violence detection by pretrained modules with different deep learning approaches. Vietnam J Comput Sci 7(01):19–40

    Article  Google Scholar 

  • Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  • Tran D, Wang H, Torresani L, Feiszli M (2019) Video classification with channel-separated convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5552–5561

  • Traoré A, Akhloufi MA (2020) Violence detection in videos using deep recurrent and convolutional neural networks. In: 2020 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, Piscataway, pp 154–159

  • Ullah A, Muhammad K, Del Ser J, Baik SW, de Albuquerque VHC (2018) Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Trans Ind Electron 66(12):9692–9702

    Article  Google Scholar 

  • Ullah FUM, Ullah A, Muhammad K, Haq IU, Baik SW (2019) Violence detection using spatiotemporal features with 3d convolutional neural network. Sensors 19(11):2472

    Article  Google Scholar 

  • Ullah A, Muhammad K, Haydarov K, Haq IU, Lee M, Baik SW (2020a) One-shot learning for surveillance anomaly recognition using siamese 3D CNN. In: 2020 International joint conference on neural networks (IJCNN). IEEE, Piscataway, pp 1–8

  • Ullah W, Ullah A, Haq IU, Muhammad K, Sajjad M, Baik SW (2020b) CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl 80(11):16979–16995

    Article  Google Scholar 

  • Ullah A, Muhammad K, Hussain T, Baik SW (2021a) Conflux LSTMs network: a novel approach for multi-view action recognition. Neurocomputing 435:321–329

    Article  Google Scholar 

  • Ullah FUM, Muhammad K, Haq IU, Khan N, Heidari AAA, Baik SW, Albuquerque V (2021b) AI assisted edge vision for violence detection in IoT based industrial surveillance networks. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2021.3116377

  • Ullah FUM, Obaidat MS, Muhammad K, Ullah A, Baik SW, Cuzzolin F, Rodrigues JJP, de Albuquerque VHC (2021c) An intelligent system for complex violence pattern analysis and detection. Int J Intell Syst. https://doi.org/10.1002/int.22537

  • Ullah W, Ullah A, Haq IU, Muhammad K, Sajjad M, Baik SW (2021d) CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl 80(11):16979–16995

    Article  Google Scholar 

  • Wang L, Li W, Li W, Van Gool L (2018) Appearance-and-relation networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1430–1439

  • Wang W, Zhou T, Porikli F, Crandall D, Van Gool L (2021) A survey on deep learning technique for video segmentation. arXiv preprint. arXiv:2107.01153

  • Waseem U, Amin U, Tanveer H, Khan ZA, Baik SW (2021e) An efficient anomaly recognition framework using an attention residual lstm in surveillance videos. Sensors 21(8):2811

    Article  Google Scholar 

  • Wu J, Zhong S, Liu Y (2020) Dynamic graph convolutional network for multi-video summarization. Pattern Recogn 107:107382

    Article  Google Scholar 

  • Xia Q, Zhang P, Wang JJ, Tian M, Fei C (2018) Real time violence detection based on deep spatio-temporal features. In: Chinese conference on biometric recognition. Springer, Cham, pp 157–165

  • Xu L, Gong C, Yang J, Wu W, Yao L (2014) Violent video detection based on mosift feature and sparse coding. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, Piscataway, pp 3538–3542

  • Yue-Hei NJ, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702

  • Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society conference on computer vision and pattern recognition workshops. IEEE, Piscataway, pp 28–35

  • Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint. arXiv:1409.2329

  • Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2016) A new method for violence detection in surveillance scenes. Multimedia Tools Appl 75(12):7327–7349

    Article  Google Scholar 

  • Zhang Z, Lu X, Cao G, Yang Y, Jiao L, Liu F (2021) ViT-YOLO: transformer-based YOLO for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2799–2808

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nadia Mumtaz, Naveed Ejaz, Shahab S. Band or Neeraj Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no known conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mumtaz, N., Ejaz, N., Habib, S. et al. An overview of violence detection techniques: current challenges and future directions. Artif Intell Rev 56, 4641–4666 (2023). https://doi.org/10.1007/s10462-022-10285-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-022-10285-3

Keywords

Navigation