Abstract
The Big Video Data generated in today’s smart cities has raised concerns from its purposeful usage perspective, where surveillance cameras, among many others are the most prominent resources to contribute to the huge volumes of data, making its automated analysis a difficult task in terms of computation and preciseness. Violence detection (VD), broadly plunging under action and activity recognition domain, is used to analyze Big Video data for anomalous actions incurred due to humans. The VD literature is traditionally based on manually engineered features, though advancements to deep learning based standalone models are developed for real-time VD analysis. This paper focuses on overview of deep sequence learning approaches along with localization strategies of the detected violence. This overview also dives into the initial image processing and machine learning-based VD literature and their possible advantages such as efficiency against the current complex models. Furthermore,the datasets are discussed, to provide an analysis of the current models, explaining their pros and cons with future directions in VD domain derived from an in-depth analysis of the previous methods.
Similar content being viewed by others
References
Abdali A-MR, Al-Tuma RF (2019) Robust real-time violence detection in video using CNN and LSTM. In: 2019 2nd scientific conference of computer sciences (SCCS). IEEE, Piscataway, pp 104–108
Accattoli S, Sernani P, Falcionelli N, Mekuria DN, Dragoni AF (2020) Violence detection in videos by combining 3D convolutional neural networks and support vector machines. Appl Artif Intell 34(4):329–344
Agarwal R, Machado MC, Castro PS, Bellemare MG (2021) Contrastive behavioral similarity embeddings for generalization in reinforcement learning. arXiv preprint. arXiv:2101.05265
Al-Nawashi M, Al-Hazaimeh OM, Saraee M (2017) A novel framework for intelligent surveillance system based on abnormal human activity detection in academic environments. Neural Comput Appl 28(1):565–572
Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil- López S, Molina D, Benjamins R et al (2020) Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion 58:82–115
Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? arXiv preprint. arXiv:2102.05095
Bilinski P, Bremond F (2016) Human violence recognition and detection in surveillance videos. In: 2016 13th IEEE International conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 30–36
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Chang C-W, Chang C-Y, Lin Y-Y (2022) A hybrid CNN and LSTM-based deep learning model for abnormal behavior detection. Multimedia Tools Appl 81(2):1–19
Chen D, Wactlar H, Chen M, Gao C, Bharucha A, Hauptmann A (2008) Recognition of aggressive human behavior using binary local motion descriptors. In: 2008 30th annual international conference of the IEEE engineering in medicine and biology society. IEEE, Piscataway, pp 5238–5241
Chen L-H, Hsu H-W, Wang L-Y, Su C-W (2011) Violence detection in movies. In: 2011 Eighth international conference computer graphics, imaging and visualization. IEEE, Piscataway, pp 119–124
Cheng M, Cai K, Li M (2021) Rwf-2000: an open large scale video database for violence detection. In: 2020 25th International conference on pattern recognition (ICPR). IEEE, Piscataway, pp 4183–4190
Cui Q, Gong Z, Ni W, Hou Y, Chen X, Tao X, Zhang P (2019) Stochastic online learning for mobile edge computing: learning from changes. IEEE Commun Mag 57(3):63–69
Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon H (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn 108:107561
Datta A, Shah M, Lobo NDV 2002) Person-on-person violence detection in video data. In: Object recognition supported by user interaction for service robots, vol 1. IEEE, Piscataway, pp 433–438
De Souza FD, Chavez GC, do Valle EA Jr, Araújo AA (2010) Violence detection in video using spatio-temporal features. In: 2010 23rd SIBGRAPI conference on graphics, patterns and images. IEEE, Piscataway, pp 224–230
Dehingia N, Dey AK, McDougal L, McAuley J, Singh A, Raj A (2022) Help seeking behavior by women experiencing intimate partner violence in India: a machine learning approach to identifying risk factors. PLoS ONE 17(2):e0262538
Deniz O, Serrano I, Bueno G, Kim T-K (2014) Fast violence detection in video. In: 2014 international conference on computer vision theory and applications (VISAPP), vol 2. IEEE, Piscataway, pp 478–485
Dhiman C, Vishwakarma DK (2017) High dimensional abnormal human activity recognition using histogram oriented gradients and zernike moments. In: 2017 IEEE International conference on computational intelligence and computing research (ICCIC). IEEE, Piscataway, pp 1–4
Ding C, Fan S, Zhu M, Feng W, Jia B (2014) Violence detection in video by using 3d convolutional neural networks. In: International symposium on visual computing. Springer, Cham, pp 551–558
Dogru O, Velswamy K, Huang B (2021) Actor–critic reinforcement learning and application in developing computer-vision-based interface tracking. Engineering 7(9):1248–1261
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint. arXiv:2010.11929
Fenil E, Manogaran Gunasekaran, Vivekananda GN, Thanjaivadivel T, Jeeva S, Ahilan A et al (2019) Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput Netw 151:191–200
Freire-Obregón D, Barra P, Castrillón-Santana M, De Marsico M (2022) Inflated 3D ConvNet context analysis for violence detection. Mach Vis Appl 33(1):1–13
Fu EY, Leong HV, Ngai G, Chan SCF (2017) Automatic fight detection in surveillance videos. In: 14th International conference on advances in mobile computing and multimedia (MoMM 2016)—proceedings. Association for Computing Machinery, New York, pp 225–234
Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using oriented violent flows. Image Vis Comput 48:37–41
Gao M, Zheng F, Yu JJ, Shan C, Ding G, Han J (2022) Deep learning for video object segmentation: a review. Artif Intell Rev. https://doi.org/10.1007/s10462-022-10176-7
Gracia IS, Suarez OD, Garcia GB, Kim TK (2015) Fast fight detection. PLoS ONE 10(4):e0120448
Hafiz AM, Parah SA, Bhat RA (2021) Reinforcement learning applied to machine vision: state of the art. Int J Multimedia Inf Retrieval. https://doi.org/10.1007/s13735-021-00209-2
Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, Piscataway, pp 1–6
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint. arXiv:1704.04861
Hussain T, Muhammad K, Ullah A, Cao Z, Baik SW, de Albuquerque VHC (2019) Cloud-assisted multiview video summarization using cnn and bidirectional LSTM. IEEE Trans Ind Inf 16(1):77–86
Hussain T, Muhammad K, Ullah A, Del Ser J, Gandomi AH, Sajjad M, Baik SW, de Albuquerque VHC (2020) Multi-view summarization and activity recognition meet edge computing in IoT environments. IEEE Internet Things J 8:9634–9644
Hussain A, Hussain T, Ullah W, Baik SW (2022) Vision transformer and deep sequence learning for human activity recognition in surveillance videos. Comput Intell Neurosci. https://doi.org/10.1155/2022/3454167
Jin Y, Jiao L, Qian Z, Zhang S, Lu S (2021) Learning for learning: predictive online control of federated learning with edge provisioning. In: IEEE INFOCOM 2021—IEEE conference on computer communications. IEEE, Piscataway, pp 1–10
Karpathy A, Toderici C, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732
Khan SU, Haq IU, Rho S, Baik SW, Lee MY (2019) Cover the violence: a novel deep-learning-based approach towards violence-detection in movies. Appl Sci 9(22):4963
Li X, Huo Y, Jin Q, Xu J (2016) Detecting violence in video using subclasses. In: Proceedings of the 24th ACM international conference on multimedia, pp 586–590
Li J, Jiang X, Sun T, Xu K (2019) Efficient violence detection using 3d convolutional neural networks. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 1–8
Li X, Wang Y, Zhou Z, Qiao Y (2020) SmallBigNet: integrating core and contextual views for video classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1092–1101
Lohithashva BH, Manjunath Aradhya VN, Guru DS (2020) Violent video event detection based on integrated LBP and GLCM texture features. Rev Intell Artif 34(2):179–187
Lygouras E, Santavas N, Taitzoglou A, Tarchanidis K, Mitropoulos A, Gasteratos A (2019) Unsupervised human detection with an embedded vision system on a fully autonomous UAV for search and rescue operations. Sensors 19(16):3542
Mabrouk AB, Zagrouba E (2017) Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recogn Lett 92:62–67
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer Society conference on computer vision and pattern recognition. IEEE, Piscataway, pp 1975–1981
Meng Z, Yuan J, Li Z (2017) Trajectory-pooled deep convolutional networks for violence detection in videos. In: International conference on computer vision systems. Springer, Cham, pp 437–447
Mishra AA, Srinivasa G (2018) Automated detection of fighting styles using localized action features. In: 2018 2nd International conference on inventive systems and control (ICISC). IEEE, Piscataway, pp 1385–1389
Mu G, Cao H, Jin Q (2016) Violent scene detection using convolutional neural networks and deep audio features. In: Chinese conference on pattern recognition. Springer, Singapore, pp 451–463
Mumtaz A, Bux SA, Habib Z (2022) Fast learning through deep multi-net CNN model for violence recognition in video surveillance. Comput J 65(3):457–472
Naik AJ, Gopalakrishna MT (2022) Automated violence detection in video crowd using spider monkey-grasshopper optimization oriented optimal feature selection and deep neural network. J Control Autom Electr Syst. https://doi.org/10.1007/s40313-021-00868-w
Nguyen NT, Phung DQ, Venkatesh S, Bui H (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden markov model. In: 2005 IEEE Computer Society conference on computer vision and pattern recognition (CVPR’05), vol 2. IEEE, Piscataway, pp 955–960
Nievas EB, Suarez OD, García GB, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: International conference on computer analysis of images and patterns. Springer, Berlin, pp 332–339
Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54(3):2259–2322
Pavlidis NG, Tasoulis OK, Plagianakos Vassilis P, Nikiforidis G, Vrahatis MN (2005) Spiking neural network training using evolutionary algorithms. In: Proceedings of 2005 IEEE international joint conference on neural networks, vol 4. IEEE, Piscataway, pp 2190–2194
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE international conference on computer vision, pp 5533–5541
Ramanujam E, Perumal T, Padmavathi S (2021) Human activity recognition with smartphone and wearable sensors using deep learning techniques: a review. IEEE Sensors J. https://doi.org/10.1109/JSEN.2021.3069927
Rojat T, Puget R, Filliat D, Del Ser J, Gelin R, Díaz-Rodríguez N (2021) Explainable artificial intelligence (XAI) on timeseries data: a survey. arXiv preprint. arXiv:2104.00950
Roka S, Diwakar M, Karanwal S (2022) A review in anomalies detection using deep learning. In: Proceedings of third international conference on sustainable computing. Springer, Singapore, pp 329–338
Roman DGC, Chávez GC (2020) Violence detection and localization in surveillance video. In: 2020 33rd SIBGRAPI Conference on graphics, patterns and images (SIBGRAPI). IEEE, Piscataway, pp 248–255
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE International conference on computer vision (ICCV)
Sasithradevi A, Mansoor Roomi S Mohamed (2020) Video classification and retrieval through spatio-temporal radon features. Pattern Recogn 99:107099
Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G (2018) Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans Image Process 27(10):4787–4797
Shen L, Hong R, Hao Y (2020) Advance on large scale near-duplicate video retrieval. Front Comput Sci 14(5):1–24
Sonar A, Pacelli V, Majumdar A (2021) Invariant policy optimization: towards stronger generalization in reinforcement learning. In: Proceedings of the 3rd conference on Learning for dynamics and control. PMLR, pp 21–33
Song D, Kim C, Park S-K (2018) A multi-temporal framework for high-level activity analysis: violent event detection in visual surveillance. Inf Sci 447:83–103
Spolaôr N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intell 90:103557
Sudhakaran S, Lanz O (2017) Learning to detect violent videos using convolutional long short-term memory. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 1–6
Suleiman A, Chen Y-H, Emer J, Sze V (2017) Towards closing the energy gap between hog and cnn features for embedded vision. In: 2017 IEEE International symposium on circuits and systems (ISCAS). IEEE, Piscataway, pp 1–4
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488
Sumon SA, Goni R, Hashem NB, Rahman RM (2020) Violence detection by pretrained modules with different deep learning approaches. Vietnam J Comput Sci 7(01):19–40
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Tran D, Wang H, Torresani L, Feiszli M (2019) Video classification with channel-separated convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5552–5561
Traoré A, Akhloufi MA (2020) Violence detection in videos using deep recurrent and convolutional neural networks. In: 2020 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, Piscataway, pp 154–159
Ullah A, Muhammad K, Del Ser J, Baik SW, de Albuquerque VHC (2018) Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Trans Ind Electron 66(12):9692–9702
Ullah FUM, Ullah A, Muhammad K, Haq IU, Baik SW (2019) Violence detection using spatiotemporal features with 3d convolutional neural network. Sensors 19(11):2472
Ullah A, Muhammad K, Haydarov K, Haq IU, Lee M, Baik SW (2020a) One-shot learning for surveillance anomaly recognition using siamese 3D CNN. In: 2020 International joint conference on neural networks (IJCNN). IEEE, Piscataway, pp 1–8
Ullah W, Ullah A, Haq IU, Muhammad K, Sajjad M, Baik SW (2020b) CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl 80(11):16979–16995
Ullah A, Muhammad K, Hussain T, Baik SW (2021a) Conflux LSTMs network: a novel approach for multi-view action recognition. Neurocomputing 435:321–329
Ullah FUM, Muhammad K, Haq IU, Khan N, Heidari AAA, Baik SW, Albuquerque V (2021b) AI assisted edge vision for violence detection in IoT based industrial surveillance networks. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2021.3116377
Ullah FUM, Obaidat MS, Muhammad K, Ullah A, Baik SW, Cuzzolin F, Rodrigues JJP, de Albuquerque VHC (2021c) An intelligent system for complex violence pattern analysis and detection. Int J Intell Syst. https://doi.org/10.1002/int.22537
Ullah W, Ullah A, Haq IU, Muhammad K, Sajjad M, Baik SW (2021d) CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl 80(11):16979–16995
Wang L, Li W, Li W, Van Gool L (2018) Appearance-and-relation networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1430–1439
Wang W, Zhou T, Porikli F, Crandall D, Van Gool L (2021) A survey on deep learning technique for video segmentation. arXiv preprint. arXiv:2107.01153
Waseem U, Amin U, Tanveer H, Khan ZA, Baik SW (2021e) An efficient anomaly recognition framework using an attention residual lstm in surveillance videos. Sensors 21(8):2811
Wu J, Zhong S, Liu Y (2020) Dynamic graph convolutional network for multi-video summarization. Pattern Recogn 107:107382
Xia Q, Zhang P, Wang JJ, Tian M, Fei C (2018) Real time violence detection based on deep spatio-temporal features. In: Chinese conference on biometric recognition. Springer, Cham, pp 157–165
Xu L, Gong C, Yang J, Wu W, Yao L (2014) Violent video detection based on mosift feature and sparse coding. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, Piscataway, pp 3538–3542
Yue-Hei NJ, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702
Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society conference on computer vision and pattern recognition workshops. IEEE, Piscataway, pp 28–35
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint. arXiv:1409.2329
Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2016) A new method for violence detection in surveillance scenes. Multimedia Tools Appl 75(12):7327–7349
Zhang Z, Lu X, Cao G, Yang Y, Jiao L, Liu F (2021) ViT-YOLO: transformer-based YOLO for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2799–2808
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mumtaz, N., Ejaz, N., Habib, S. et al. An overview of violence detection techniques: current challenges and future directions. Artif Intell Rev 56, 4641–4666 (2023). https://doi.org/10.1007/s10462-022-10285-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10285-3