Elsevier

Neurocomputing

Volume 275, 31 January 2018, Pages 66-72
Neurocomputing

Automatic handgun detection alarm in videos using deep learning

https://doi.org/10.1016/j.neucom.2017.05.012Get rights and content

Abstract

Current surveillance and control systems still require human supervision and intervention. This work presents a novel automatic handgun detection system in videos appropriate for both, surveillance and control purposes. We reformulate this detection problem into the problem of minimizing false positives and solve it by i) building the key training data-set guided by the results of a deep Convolutional Neural Networks (CNN) classifier and ii) assessing the best classification model under two approaches, the sliding window approach and region proposal approach. The most promising results are obtained by Faster R-CNN based model trained on our new database. The best detector shows a high potential even in low quality youtube videos and provides satisfactory results as automatic alarm system. Among 30 scenes, it successfully activates the alarm after five successive true positives in a time interval smaller than 0.2 s, in 27 scenes. We also define a new metric, Alarm Activation Time per Interval (AATpI), to assess the performance of a detection model as an automatic detection system in videos.

Introduction

The crime rates caused by guns are very concerning in many places in the world, especially in countries where the possession of guns is legal or was legal for a period of time. The last statistics reported by the United Nations Office on Drugs and Crime (UNODC) reveals that the number of crimes involving guns per 100,000 habitants is very high in many countries, e.g., 21.5 in Mexico, 4.7 in United States and 1.6 in Belgium [19]. In addition, several psychological studies demonstrated that the simple fact of having access to a gun increases drastically the probability of committing a violent behavior [25].

One way to reducing this kind of violence is prevention via early detection so that the security agents or policemen can act. In particular, one innovative solution to this problem is to equip surveillance or control cameras with an accurate automatic handgun detection alert system. Related studies address the detection of guns but only on X-ray or millimetric wave images and only using traditional machine learning methods [6], [7], [26], [27], [29].

In the last five years, deep learning in general and Convolutional Neural Networks (CNNs) in particular have achieved superior results to all the classical machine learning methods in image classification, detection and segmentation in several applications [8], [13], [18], [22], [23], [30]. Classical methods require manual intervention, whereas deep CNNs models automatically discover increasingly higher level features from data [11], [17]. We aim at developing a good gun detector in videos using CNNs.

A proper training of deep CNNs, which contain millions of parameters, requires very large datasets, in the order of millions of samples, as well as High Performance Computing (HPC) resources, e.g., multi-processor systems accelerated with GPUs. Transfer learning through fine-tuning is becoming a widely accepted alternative to overcome these constraints. It consists of re-utilizing the knowledge learnt from one problem to another related one [20]. Applying transfer learning to deep CNNs depends on the similarities between the original and new problem and also on the size of the new training set.

In general, fine-tuning the entire network, i.e., updating all the weights, is only used when the new dataset is large enough, else the model could suffer overfitting especially among the first layers of the network. Since these layers extract low-level features, e.g., edges and color, they do not change significantly and can be re-utilized for several visual recognition tasks. The last layers of the CNN are gradually adjusted to the particularities of the problem to extract high level features. In this work we used a VGG-16 based classification model pre-trained on the ImageNet dataset (around 1.28 million images over 1000 generic object classes) [24] and fine-tuned on our own dataset of 3000 images of guns taken in a variety of contexts.

Using CNNs to automatically detect pistols in videos faces several challenges:

  • Pistols can be handled with one or two hands in different ways and thus a large part of the pistol can be occluded.

  • The process of designing a new dataset is manual and time consuming.

  • The labeled dataset cannot be re-utilized by different detection approaches since they require different preprocessing, different labeling operations and cannot learn from the same labeled databases.

  • Automatic pistol detection alarm requires the activation of the alarm in real time and only when the system is confident about the existence of a pistol in the scene.

  • Automatic detection alarm systems require an accurate location of the pistol in the monitored scene.

As far as we know, this work presents the first automatic gun detection alarm system that uses deep CNNs based detection models. We focus on the most used type of handguns in crimes [31], pistol, which includes, revolver, automatic and semi-automatic pistols, six-gun shooters, horse pistol and derringers. To guide the design of the new dataset and to find the best detector we consider the following steps:

  • we reformulate the problem of automatic pistol detection alarm in videos into the problem of minimizing the number of false positives where pistol represents the true class and

  • we evaluate and compare the VGG-16 based classifier using two different detection approaches, the sliding window and region proposals approaches.

Due to the particularities of each approach, we applied different optimizations in each case. We evaluated increasing the number of classes in the sliding window approach and designing a richer training dataset for the region proposals approach.

As this work focuses on near real time solutions, we selected the most accurate and fastest detector and assess its performance on seven videos of different characteristics. Then, we evaluated its suitability as automatic pistol detection alarm system using a new metric, the Alarm Activation Time per Interval (AATpI), that measures the activation time for each scene with guns.

The main contributions of this work are:

  • Designing a new labeled database that makes the learning model achieve high detection qualities. Our experience in building the new dataset and detector can be useful to guide developing the solution of other different problems.

  • Finding the most appropriate CNN-based detector that achieves real-time pistol detection in videos.

  • Introducing a new metric, AATpI, to assess the suitability of the proposed detector as automatic detection alarm system.

From the experiments we found that the most promising results are obtained by Faster R-CNN based model trained on our new database. The best performing model shows a high potential even in low quality youtube videos and provides satisfactory results as automatic alarm system. Among 30 scenes, it successfully activates the alarm, after five successive true positives, within an interval of time smaller than 0.2 s, in 27 scenes.

This paper is organized as follows. Section 2 gives a brief analysis of the most related papers. Section 3 provides an overview of the CNN model used in this work. Section 4 describes the procedure we have used to find the best detector that reaches good precisions and low false positives rate. Section 5 analyzes the performance of the built detector using seven videos and introduces a new metric to assess the performance of the detector as automatic detection system. Finally the conclusions are summarized in Section 6.

Section snippets

Related works

The problem of handgun detection in videos using deep learning is related in part to two broad research areas. The first addresses gun detection using classical methods and the second focuses on improving the performance of object detection using deep CNNs.

Deep learning model

VGG was the first runner-up in ILSVRC 2014 [24]. It was used to show that the depth of the network is critical to the performance. The largest VGG architecture, VGG-16, involves 144 million parameters from 16 layers with learnable weights, thirteen convolutional layers and three fully connected layers in addition to five max-pooling layers and one linear Softmax output layer, see Fig. 1 for illustration. This model also uses dropout regularization in the fully-connected layer and applies ReLU

Database construction: towards an equilibrium between false positives and false negatives

Automatic pistol detection in videos not only requires minimizing the number of false positives but also reaching a near real time detection. We analyze the performance of the classifier in combination with two detection methods, the sliding window (Section 4.1) and the region proposals (Section 4.2.1).

Due to the differences between these two approaches, different optimization model based on databases with different characteristics, size and classes, are applied in each case. In the sliding

Analysis of the detection in videos

In this section we explore the strengths and weaknesses of our model on seven low quality youtube videos. In particular, we first assess the quality of the detection and localization (Section 5.1) then analyze the suitability of our model as pistol detection alarm system using a new metric (Section 5.2).

Conclusions and future work

This work presented a novel automatic pistol detection system in videos appropriate for both, surveillance and control purposes. We reformulate this detection problem into the problem of minimizing false positives and solve it by building the key training data-set guided by the results of a VGG-16 based classifier, then assessing the best classification model under two approaches, the sliding window approach and region proposal approach. The most promising results have been obtained with Faster

Acknowledgments

This work was partially supported by the Spanish Ministry of Science and Technology under the project TIN2014-57251-P. Siham Tabik was supported by the Ramón y Cajal Programme (RYC-2015-18136).

Roberto Olmos received the M.Sc. degree in Computer Science in 2015 from the Meritorious Autonomous University of Puebla, Puebla, Mexico. He is currently a Ph.D. student in the Department of computer Science and Artificial Intelligence, University of Granada. His research interests include object detection and supervised deep learning.

References (31)

  • R. Gesick et al.

    Automatic image analysis process for the detection of concealed weapons

    Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies

    (2009)
  • M.M. Ghazi et al.

    Plant identification using deep neural networks via optimization of transfer learning parameters

    Neurocomputing

    (2017)
  • R. Girshick

    Fast r-cnn

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • R. Girshick et al.

    Rich feature hierarchies for accurate object detection and semantic segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • N.B. Halima et al.

    Bag of words based surveillance system using support vector machines

    Int. J. Secur. Appl.

    (2016)
  • Cited by (147)

    • Multiview abnormal video synopsis in real-time

      2023, Engineering Applications of Artificial Intelligence
    • Improving handgun detection through a combination of visual features and body pose-based data

      2023, Pattern Recognition
      Citation Excerpt :

      The proposed architecture is designed to be used in video surveillance scenarios, implemented in CCTV surveillance systems. Unfortunately, existing public handgun datasets contain handgun profile images with a homogeneous background (e.g., gun selling websites), which are quite different from the images obtained by CCTV cameras [11,31]. In this work, data was collected from different sources looking for a wide variety of contexts and image features, including publicly available datasets, YouTube video sequences and even synthetic data extracted from video games.

    • A comprehensive study towards high-level approaches for weapon detection using classical machine learning and deep learning methods

      2023, Expert Systems with Applications
      Citation Excerpt :

      A few images from this dataset are shown in Fig. 3. Olmos et al. (2018) developed two databases for knives and handguns. The knife dataset contains 12,869 images.

    View all citing articles on Scopus

    Roberto Olmos received the M.Sc. degree in Computer Science in 2015 from the Meritorious Autonomous University of Puebla, Puebla, Mexico. He is currently a Ph.D. student in the Department of computer Science and Artificial Intelligence, University of Granada. His research interests include object detection and supervised deep learning.

    Siham Tabik received the B.Sc. degree in physics from University Mohammed V, Rabat, Morocco, in 1998 and the Ph.D. degree in Computer Science from the University of Almería, Almería, Spain. She is currently Ramón y Cajal researcher at the University of Granada. Her research interests include the design of scalable algorithms, deep learning CNNs models and object detection.

    Francisco Herrera received his M.Sc. in Mathematics in 1988 and Ph.D. in Mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada.

    He has been the supervisor of 40 Ph.D. students. He has published more than 300 journal papers that have received more than 51000 citations (Scholar Google, H-index 114). He is coauthor of the books “Genetic Fuzzy Systems” (World Scientific, 2001) and “Data Preprocessing in Data Mining” (Springer, 2015), “The 2-tuple Linguistic Model. Computing with Words in Decision Making” (Springer, 2015), “Multilabel Classification. Problem analysis, metrics and techniques” (Springer, 2016), “Multiple Instance Learning. Foundations and Algorithms” (Springer, 2016).

    He currently acts as Editor in Chief of the international journals Ïnformation Fusion“ (Elsevier) and ”Progress in Artificial Intelligence (Springer). He acts as editorial member of a dozen of journals.

    He received the following honors and awards: ECCAI Fellow 2009, IFSA Fellow 2013, 2010 Spanish National Award on Computer Science ARITMEL to the “Spanish Engineer on Computer Science”, International Cajastur “Mamdani” Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 and 2012 Paper Award (bestowed in 2011 and 2015 respectively), 2011 Lotfi A. Zadeh Prize Best paper Award of the International Fuzzy Systems Association, 2013 AEPIA Award to a scientific career in Artificial Intelligence, and 2014 XV Andalucía Research Prize Maimónides and 2017 Andalucía Medal (by the regional government of Andalucía). He has been selected as a Thomson Reuters Highly Cited Researcher http://highlycited.com/ (in the fields of Computer Science and Engineering, respectively, 2014 to present.)

    His current research interests include among others, soft computing (including fuzzy modeling and evolutionary algorithms), information fusion, decision making, biometric, data preprocessing, data science and big data.

    View full text