Elsevier

Information Sciences

Volume 369, 10 November 2016, Pages 548-563
Information Sciences

A novel spatio-temporal saliency approach for robust dim moving target detection from airborne infrared image sequences

https://doi.org/10.1016/j.ins.2016.07.042Get rights and content

Highlights

  • A closed-form local adaptive contrast operation is proposed.

  • The motion consistency characteristic is explored for the first time.

  • Spatio-temporal saliency is specifically modeled for dim moving target detection.

  • The proposed approach can outperform existing approaches remarkably.

Abstract

Dim moving target detection from infrared image sequences, which lags behind the visual perception ability of humans, has attracted considerable interest from researchers due to its crucial role in airborne surveillance systems. This paper proposes a novel spatio-temporal saliency model to cope with the infrared dim moving target detection problem. Based on a closed-form solution derived from regularized feature reconstruction, a local adaptive contrast operation is proposed, whereby the spatial saliency map and the temporal saliency map can be calculated on the spatial domain and the temporal domain. In order to depict the motion consistency characteristic of the moving target, this paper also proposes a transmission operation to generate the trajectory prediction map. The fused result of the spatial saliency map, the temporal saliency map, and the trajectory prediction map is called the “spatio-temporal saliency map” in this paper, from which the target of interest can be easily segmented. A diverse test dataset comprised of three infrared image sequences under different backgrounds was collected to evaluate the proposed model; and extensive experiments confirmed that the proposed spatio-temporal saliency model can achieve much better detection performance than the state-of-the-art approaches.

Introduction

Automatic infrared small target detection plays an important role in infrared search and track systems [5], [12], [13], [19], [45]. For military applications, it is necessary to warn of the incoming target at a very long distance [18]. As the infrared sensor is far away from the target of interest, the imaging resolution is diminished and the size of the target in the infrared image is very small. Furthermore, the infrared radiant energy of the target decays greatly after long distance propagation, which results in a very low signal-to-noise ratio (SNR) for the target [20]. Also, the target is often buried in the background clutter because of the complex surrounding environment. In this situation, the target in the image appears to be “dim,” [21] which indicates that the size of the target is very small and the SNR is very low. When both the imaging resolution and the SNR of the target are less than desirable, the target may be perceptually invisible in one single image. Generally, successful local feature descriptors [1], [8], [9] in the natural image processing domain are good at encoding texture and shape features, but they are not competent for infrared small target detection as infrared images lack the available structure features and the projected infrared small target does not contain the available shapes. In addition, though some super-resolution methods have also been proposed to deal with the very low-resolution problem [17], they mainly aim at obtaining visually appealing results and are unsuitable for subsequent detection task. Even under these extreme circumstances, the target still appears to be salient in the image sequences. Hence, the temporal context is the key to improving the dim target detection performance. However, extraction of the temporal information from infrared sequences is not easy as common feature descriptors [27], [25] and image matching approaches [28], [29] are not often used directly in motion compensation of infrared sequences. Compared to moving target detection from natural images [38], [43], the crucial difficulties of dim moving target detection from infrared images are embodied in the noisy background and the dim target. Although a number of studies [3], [7], [23], [39] addressed dim moving target detection recently, how to robustly extract a dim moving target remains a challenging problem.

Numerous infrared small moving target detection approaches were proposed in the past few decades, which can be classified according to their input into two categories based on whether or not the approach utilized the temporal information: 1) methods using one single image and 2) methods using multiple adjacent images. In the first category, only one single image is available for implementing infrared small target detection. Generally, an infrared small target presents itself as a Gaussian light blob in the image, which is the crucial distinction between the infrared small target and the background clutter. The approaches based on this discrimination rule include the mathematical morphological-based methods [2], [30], [40], the LS-SVM-based approach [44], the facet-based methods [32], [41], [48], the image layering-based method [26], the local contrast based method [5], the low rank recovery-based method [12], and the visual attention-based approaches [15], [32], [33]. Generally, these methods can efficiently extract the target when the received image is substantially ideal. However, these methods do not work well when the SNR of the target is very low and the complexity of the background clutter is very high. In this situation, it is necessary to adopt temporal cues. In [3], [10], target detection and tracking were jointly considered; and [18] integrated a temporal detector, which noticeably improved the target detection rate with an acceptable false alarm rate. Deng and Zhu [7] proposed a local increment coding approach to suppress complex background clutter. Chen et al. [4] utilized the bi-dimensional empirical model to address the dim moving target detection problem. Sun et al. [39] utilized the patches from the previous or after images to reconstruct the current image, and the residual image between the current image and the reconstructed image also was utilized to highlight the dim moving target. Based on the background and target dictionary, the likelihood that each patch belonged to the target could be solved by the sparse representation on the spatio-temporal domain [22], [39]. Inspired by the visual attention process of humans, Li et al. [23] proposed a hierarchical approach to extract moving targets in which motion cues were utilized to generate candidate regions. In summary, although progress has been achieved in dim moving target detection, more exploitation of motion analysis is needed because adequate utilization of the temporal information is the key to further improving the moving target detection performance.

It is well known that the existing computational models related to dim moving target detection still lag far behind the visual perceptual ability of humans. The natural next step is to further pursue the imitation of the biological visual perception process to design infrared dim moving target detection algorithms. In the visual cortex, visual information is processed and organized along two different streams: 1) the ventral stream for appearance perception and 2) the dorsal stream for motion perception [11]. In the literature, the two-stream hypothesis has been widely employed in many computer vision tasks such as action recognition [37], scene recognition [36], and spatio-temporal saliency modeling [24]. Similar to these computer vision tasks [24], [36], [37], object detection can be solved by spatial saliency modeling, which aims to imitate the perception function of the primary areas of the ventral stream [16]. Moving object detection can be improved by spatio-temporal saliency modeling, which aims to imitate the joint perception function of the ventral stream and the dorsal stream [24], [35]. Several existing studies [15], [32], [33] made use of the capabilities of spatial saliency models to cope with the infrared small target detection problem. In order to comprehensively implement the motion analysis using image sequences, it is logical to design computational models in a spatio-temporal manner [14], [49]. Accordingly, infrared small moving target detection can be similarly solved through spatio-temporal saliency modeling. However, the existing spatio-temporal saliency models [24], [35] were established for high-resolution natural images and cannot be directly utilized in the infrared small moving target detection task as infrared images generally lack rich structure features. To the best of our knowledge, there are currently no spatio-temporal saliency models specifically designed for infrared dim moving target detection.

Generally, one infrared moving small target reflects three characteristics: spatial singularity, temporal singularity, and motion consistency. The singularity characteristic can be interpreted as the isolation property, and the motion consistency characteristic means that one moving target would consistently appear in the field of view for some time. More specifically, the spatial singularity characteristic reflects the isolation property in the spatial domain (i.e., in the current image). The temporal singularity characteristic depicts the isolation property in the temporal domain (i.e., in two adjacent images). The motion consistency characteristic reveals the history information of the infrared moving target, which is reflected in multiple history images.

In order to narrow the gap between the unstable detection performance from computational methods and the perfect interpretation performance of humans, this paper proposes a novel spatio-temporal saliency model. More specifically, based on a closed-form solution derived from regularized feature reconstruction, we developed a local adaptive contrast operation whereby the spatial saliency map and the temporal saliency map can be calculated on the spatial domain and the temporal domain, respectively. In order to depict the motion consistency characteristic of the moving target, this paper also proposes a transmission operation to generate the trajectory prediction map. The motion continuity constraint allows manipulation of the proposed transmission in the local domain, which makes the calculation process efficient. In addition, the proposed transmission operation works in a recursive way, which is well suited to mine the history information. The spatial saliency map, the temporal saliency map, and the trajectory prediction map, which reflect the singularity characteristic in the spatial domain, the singularity characteristic in the temporal domain, and the motion consistency characteristic, respectively, then are fused and become the “spatio-temporal saliency map,” from which the target can be easily segmented. Hence, the proposed spatio-temporal saliency model can specifically depict the spatial singularity characteristic, the temporal singularity characteristic, and the motion consistency characteristic of an infrared dim moving target.

A representative test set comprised of three infrared sequences taken under different backgrounds (i.e., sky-sea, ground, and sky) were used to confirm the validity of the proposed approach. When compared with the state-of-the-art approaches, the proposed approach showed very promising results. As a whole, the contributions of this paper can be summarized as follows:

  • Based on regularized feature reconstruction, the proposed approach utilizes a closed-form local adaptive contrast operation which performed better background clutter suppression compared to the existing local contrast operations.

  • The proposed approach offers a simple but efficient transmission operation to catch the motion consistency characteristic of the moving target; and based on the existing literature, this is the first time that the motion consistency characteristic has been mined and utilized in infrared dim moving target detection.

  • The proposed novel spatio-temporal saliency model potentially has other applications, such as spatio-temporal interest point detection [46] and action recognition [34].

The remainder of this paper is organized as follows. Section 2 introduces the proposed local adaptive contrast operation. Section 3 introduces the proposed infrared dim moving target detection approach through spatio-temporal saliency modeling. Section 4 presents the experimental results, which include a comprehensive analysis of the proposed infrared dim moving target detection approach and a comparison of the proposed approach to the existing state-of-the-art approaches. Finally, Section 5 presents the conclusions of this paper.

Section snippets

The proposed local adaptive contrast operation

Before the complete spatio-temporal saliency method is introduced in Section 3, this section first discusses the saliency measure since it plays a key role in saliency modeling. More specifically, the traditional local contrast measure (LCM), which has been proposed and utilized in infrared small target enhancement in the past, is discussed. Due to the drawbacks of LCM, we then propose the feature distinction-based LCM and the feature reconstruction-based LCM.

The proposed infrared dim moving target detection approach

In this section, the overall architecture of the proposed spatio-temporal saliency approach is introduced. Then, the spatial saliency map generation process, the temporal saliency map generation process, the trajectory prediction map generation process, and the fusion process for generating the final spatio-temporal saliency map are presented. Concluding this section, the infrared dim moving target detection approach, which is based on the proposed spatio-temporal saliency approach, is

Experimental results

In this section, a diverse evaluation dataset and the corresponding evaluation metrics are first introduced. Then, the crucial procedures and parameters of our proposed dim moving target detection approach are verified experimentally. Finally, the comprehensive comparison with existing state-of-the-art approaches is presented.

Conclusion

There is general consensus that the existing computational models for dim moving target detection continue to lag far behind the visual perceptual ability of humans. To address this problem, a novel spatio-temporal saliency approach was introduced in this paper. More specifically, based on the closed-form solution derived from regularized feature reconstruction, we proposed a local adaptive contrast operation, whereby the spatial saliency map and the temporal saliency map can be calculated on

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China under grants 41322010, 41571434, 41371339, and 61273279, and by LIESMARS Special Research Funding. In addition, we are also grateful to the reviewers for their suggestions.

References (49)

  • WangF. et al.

    Robust infrared target tracking based on particle filter with embedded saliency detection

    Information Sciences

    (2015)
  • YangC. et al.

    Multiscale facet model for infrared small target detection

    Infrared Phys. Technol.

    (2014)
  • T. Ahonen et al.

    Face description with local binary patterns: application to face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • U. Braga-Neto et al.

    Automatic target detection and tracking in forward-looking infrared image sequences using morphological connected operators

    J. Electron. Imaging

    (2004)
  • ChenC. et al.

    A local contrast method for small infrared target detection

    IEEE Trans. Geosci. Remote Sens.

    (2014)
  • ChengM. et al.

    Global contrast based salient region detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • L. Deng et al.

    Moving point target detection based on clutter suppression using spatiotemporal local increment coding

    Electron. Lett.

    (2015)
  • C. Ding et al.

    Multi-directional multi-level dual-cross patterns for robust face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • C. Ding et al.

    A comprehensive survey on pose-invariant face recognition

    Comput. Vis. Pattern Recognit.

    (2015)
  • D. Essen et al.

    Hierarchical organization and functional streams in the visual cortex

    Trends Neurosci.

    (1983)
  • GaoC. et al.

    Infrared patch-image model for small target detection in a single image

    IEEE Trans. Image Process.

    (2013)
  • Y. Gu et al.

    A kernel-based nonparametric regression method for clutter removal in infrared small-target detection applications

    IEEE Geosci. Remote Sens. Lett.

    (2014)
  • J. Han et al.

    A robust infrared small target detection algorithm based on human visual system

    IEEE Geosci. Remote Sens. Lett.

    (2014)
  • L. Itti et al.

    A model of saliency-based visual attention for rapid scene analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
  • Cited by (68)

    • Infrared small target detection via incorporating spatial structural prior into intrinsic tensor sparsity regularization

      2021, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Moreover, some hybrid local-prior-based schemes, such as proposed by Li et al. [16], Qin et al. [17], and Han et al. [18], discriminate small target heuristically stage by stage. And methods in [19,20] remove the interframe distractors by tracking the motion cues of a target. Recently, convolutional neural networks and deep learning approaches have shown powerful for infrared small target detection [21–23], even in some complex situations with heavy clutters.

    View all citing articles on Scopus
    View full text