Automatic handgun detection alarm in videos using deep learning
Introduction
The crime rates caused by guns are very concerning in many places in the world, especially in countries where the possession of guns is legal or was legal for a period of time. The last statistics reported by the United Nations Office on Drugs and Crime (UNODC) reveals that the number of crimes involving guns per 100,000 habitants is very high in many countries, e.g., 21.5 in Mexico, 4.7 in United States and 1.6 in Belgium [19]. In addition, several psychological studies demonstrated that the simple fact of having access to a gun increases drastically the probability of committing a violent behavior [25].
One way to reducing this kind of violence is prevention via early detection so that the security agents or policemen can act. In particular, one innovative solution to this problem is to equip surveillance or control cameras with an accurate automatic handgun detection alert system. Related studies address the detection of guns but only on X-ray or millimetric wave images and only using traditional machine learning methods [6], [7], [26], [27], [29].
In the last five years, deep learning in general and Convolutional Neural Networks (CNNs) in particular have achieved superior results to all the classical machine learning methods in image classification, detection and segmentation in several applications [8], [13], [18], [22], [23], [30]. Classical methods require manual intervention, whereas deep CNNs models automatically discover increasingly higher level features from data [11], [17]. We aim at developing a good gun detector in videos using CNNs.
A proper training of deep CNNs, which contain millions of parameters, requires very large datasets, in the order of millions of samples, as well as High Performance Computing (HPC) resources, e.g., multi-processor systems accelerated with GPUs. Transfer learning through fine-tuning is becoming a widely accepted alternative to overcome these constraints. It consists of re-utilizing the knowledge learnt from one problem to another related one [20]. Applying transfer learning to deep CNNs depends on the similarities between the original and new problem and also on the size of the new training set.
In general, fine-tuning the entire network, i.e., updating all the weights, is only used when the new dataset is large enough, else the model could suffer overfitting especially among the first layers of the network. Since these layers extract low-level features, e.g., edges and color, they do not change significantly and can be re-utilized for several visual recognition tasks. The last layers of the CNN are gradually adjusted to the particularities of the problem to extract high level features. In this work we used a VGG-16 based classification model pre-trained on the ImageNet dataset (around 1.28 million images over 1000 generic object classes) [24] and fine-tuned on our own dataset of 3000 images of guns taken in a variety of contexts.
Using CNNs to automatically detect pistols in videos faces several challenges:
- •
Pistols can be handled with one or two hands in different ways and thus a large part of the pistol can be occluded.
- •
The process of designing a new dataset is manual and time consuming.
- •
The labeled dataset cannot be re-utilized by different detection approaches since they require different preprocessing, different labeling operations and cannot learn from the same labeled databases.
- •
Automatic pistol detection alarm requires the activation of the alarm in real time and only when the system is confident about the existence of a pistol in the scene.
- •
Automatic detection alarm systems require an accurate location of the pistol in the monitored scene.
As far as we know, this work presents the first automatic gun detection alarm system that uses deep CNNs based detection models. We focus on the most used type of handguns in crimes [31], pistol, which includes, revolver, automatic and semi-automatic pistols, six-gun shooters, horse pistol and derringers. To guide the design of the new dataset and to find the best detector we consider the following steps:
- •
we reformulate the problem of automatic pistol detection alarm in videos into the problem of minimizing the number of false positives where pistol represents the true class and
- •
we evaluate and compare the VGG-16 based classifier using two different detection approaches, the sliding window and region proposals approaches.
Due to the particularities of each approach, we applied different optimizations in each case. We evaluated increasing the number of classes in the sliding window approach and designing a richer training dataset for the region proposals approach.
As this work focuses on near real time solutions, we selected the most accurate and fastest detector and assess its performance on seven videos of different characteristics. Then, we evaluated its suitability as automatic pistol detection alarm system using a new metric, the Alarm Activation Time per Interval (AATpI), that measures the activation time for each scene with guns.
The main contributions of this work are:
- •
Designing a new labeled database that makes the learning model achieve high detection qualities. Our experience in building the new dataset and detector can be useful to guide developing the solution of other different problems.
- •
Finding the most appropriate CNN-based detector that achieves real-time pistol detection in videos.
- •
Introducing a new metric, AATpI, to assess the suitability of the proposed detector as automatic detection alarm system.
From the experiments we found that the most promising results are obtained by Faster R-CNN based model trained on our new database. The best performing model shows a high potential even in low quality youtube videos and provides satisfactory results as automatic alarm system. Among 30 scenes, it successfully activates the alarm, after five successive true positives, within an interval of time smaller than 0.2 s, in 27 scenes.
This paper is organized as follows. Section 2 gives a brief analysis of the most related papers. Section 3 provides an overview of the CNN model used in this work. Section 4 describes the procedure we have used to find the best detector that reaches good precisions and low false positives rate. Section 5 analyzes the performance of the built detector using seven videos and introduces a new metric to assess the performance of the detector as automatic detection system. Finally the conclusions are summarized in Section 6.
Section snippets
Related works
The problem of handgun detection in videos using deep learning is related in part to two broad research areas. The first addresses gun detection using classical methods and the second focuses on improving the performance of object detection using deep CNNs.
Deep learning model
VGG was the first runner-up in ILSVRC 2014 [24]. It was used to show that the depth of the network is critical to the performance. The largest VGG architecture, VGG-16, involves 144 million parameters from 16 layers with learnable weights, thirteen convolutional layers and three fully connected layers in addition to five max-pooling layers and one linear Softmax output layer, see Fig. 1 for illustration. This model also uses dropout regularization in the fully-connected layer and applies ReLU
Database construction: towards an equilibrium between false positives and false negatives
Automatic pistol detection in videos not only requires minimizing the number of false positives but also reaching a near real time detection. We analyze the performance of the classifier in combination with two detection methods, the sliding window (Section 4.1) and the region proposals (Section 4.2.1).
Due to the differences between these two approaches, different optimization model based on databases with different characteristics, size and classes, are applied in each case. In the sliding
Analysis of the detection in videos
In this section we explore the strengths and weaknesses of our model on seven low quality youtube videos. In particular, we first assess the quality of the detection and localization (Section 5.1) then analyze the suitability of our model as pistol detection alarm system using a new metric (Section 5.2).
Conclusions and future work
This work presented a novel automatic pistol detection system in videos appropriate for both, surveillance and control purposes. We reformulate this detection problem into the problem of minimizing false positives and solve it by building the key training data-set guided by the results of a VGG-16 based classifier, then assessing the best classification model under two approaches, the sliding window approach and region proposal approach. The most promising results have been obtained with Faster
Acknowledgments
This work was partially supported by the Spanish Ministry of Science and Technology under the project TIN2014-57251-P. Siham Tabik was supported by the Ramón y Cajal Programme (RYC-2015-18136).
Roberto Olmos received the M.Sc. degree in Computer Science in 2015 from the Meritorious Autonomous University of Puebla, Puebla, Mexico. He is currently a Ph.D. student in the Department of computer Science and Artificial Intelligence, University of Granada. His research interests include object detection and supervised deep learning.
References (31)
- et al.
A comparison of 3d interest point descriptors with application to airport baggage object detection in complex ct imagery
Pattern Recogn.
(2013) - et al.
Deep learning for visual understanding: a review
Neurocomputing
(2016) - et al.
Computational face reader based on facial attribute estimation
Neurocomputing
(2017) - et al.
A computer vision based framework for visual gun detection using harris interest point detector
Proc. Comput. Sci.
(2015) - et al.
Exploiting the complementary strengths of multi-layer cnn features for image retrieval
Neurocomputing
(2017) - R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, A. Belikov, A....
- F. Chollet, Keras: Theano-based deep learning library, Code: https://github.com/fchollet. Documentation:...
- et al.
Histograms of oriented gradients for human detection
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
(2005) - et al.
Object detection with discriminatively trained part-based models
IEEE Trans. Pattern Anal. Mach. Intell.
(2010) - et al.
Pictorial structures for object recognition
Int. J. Comput. Vis.
(2005)
Automatic image analysis process for the detection of concealed weapons
Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
Plant identification using deep neural networks via optimization of transfer learning parameters
Neurocomputing
Fast r-cnn
Proceedings of the IEEE International Conference on Computer Vision
Rich feature hierarchies for accurate object detection and semantic segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Bag of words based surveillance system using support vector machines
Int. J. Secur. Appl.
Cited by (147)
Robust weapon detection in dark environments using Yolov7-DarkVision[Formula presented]
2024, Digital Signal Processing: A Review JournalConditioned Cooperative training for semi-supervised weapon detection
2023, Neural NetworksMultiview abnormal video synopsis in real-time
2023, Engineering Applications of Artificial IntelligenceImproving handgun detection through a combination of visual features and body pose-based data
2023, Pattern RecognitionCitation Excerpt :The proposed architecture is designed to be used in video surveillance scenarios, implemented in CCTV surveillance systems. Unfortunately, existing public handgun datasets contain handgun profile images with a homogeneous background (e.g., gun selling websites), which are quite different from the images obtained by CCTV cameras [11,31]. In this work, data was collected from different sources looking for a wide variety of contexts and image features, including publicly available datasets, YouTube video sequences and even synthetic data extracted from video games.
Detecting wildlife trafficking in images from online platforms: A test case using deep learning with pangolin images
2023, Biological ConservationA comprehensive study towards high-level approaches for weapon detection using classical machine learning and deep learning methods
2023, Expert Systems with ApplicationsCitation Excerpt :A few images from this dataset are shown in Fig. 3. Olmos et al. (2018) developed two databases for knives and handguns. The knife dataset contains 12,869 images.
Roberto Olmos received the M.Sc. degree in Computer Science in 2015 from the Meritorious Autonomous University of Puebla, Puebla, Mexico. He is currently a Ph.D. student in the Department of computer Science and Artificial Intelligence, University of Granada. His research interests include object detection and supervised deep learning.
Siham Tabik received the B.Sc. degree in physics from University Mohammed V, Rabat, Morocco, in 1998 and the Ph.D. degree in Computer Science from the University of Almería, Almería, Spain. She is currently Ramón y Cajal researcher at the University of Granada. Her research interests include the design of scalable algorithms, deep learning CNNs models and object detection.
Francisco Herrera received his M.Sc. in Mathematics in 1988 and Ph.D. in Mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada.
He has been the supervisor of 40 Ph.D. students. He has published more than 300 journal papers that have received more than 51000 citations (Scholar Google, H-index 114). He is coauthor of the books “Genetic Fuzzy Systems” (World Scientific, 2001) and “Data Preprocessing in Data Mining” (Springer, 2015), “The 2-tuple Linguistic Model. Computing with Words in Decision Making” (Springer, 2015), “Multilabel Classification. Problem analysis, metrics and techniques” (Springer, 2016), “Multiple Instance Learning. Foundations and Algorithms” (Springer, 2016).
He currently acts as Editor in Chief of the international journals Ïnformation Fusion“ (Elsevier) and ”Progress in Artificial Intelligence (Springer). He acts as editorial member of a dozen of journals.
He received the following honors and awards: ECCAI Fellow 2009, IFSA Fellow 2013, 2010 Spanish National Award on Computer Science ARITMEL to the “Spanish Engineer on Computer Science”, International Cajastur “Mamdani” Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 and 2012 Paper Award (bestowed in 2011 and 2015 respectively), 2011 Lotfi A. Zadeh Prize Best paper Award of the International Fuzzy Systems Association, 2013 AEPIA Award to a scientific career in Artificial Intelligence, and 2014 XV Andalucía Research Prize Maimónides and 2017 Andalucía Medal (by the regional government of Andalucía). He has been selected as a Thomson Reuters Highly Cited Researcher http://highlycited.com/ (in the fields of Computer Science and Engineering, respectively, 2014 to present.)
His current research interests include among others, soft computing (including fuzzy modeling and evolutionary algorithms), information fusion, decision making, biometric, data preprocessing, data science and big data.