Next Article in Journal
Multi-Sensor Classification Framework of Urban Vegetation for Improving Ecological Services Management
Previous Article in Journal
Ground Subsidence, Driving Factors, and Risk Assessment of the Photovoltaic Power Generation and Greenhouse Planting (PPG&GP) Projects in Coal-Mining Areas of Xintai City Observed from a Multi-Temporal InSAR Perspective
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Infrared Small Target Detection Based on Tensor Tree Decomposition and Self-Adaptive Local Prior

1
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
Department of Key Laboratory of Computational Optical Imagine Technology, Chinese Academy of Sciences, Beijing 100094, China
3
School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
4
School of Artificial Intelligence, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2024, 16(6), 1108; https://doi.org/10.3390/rs16061108
Submission received: 17 February 2024 / Revised: 17 March 2024 / Accepted: 18 March 2024 / Published: 21 March 2024
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

Abstract

:
Infrared small target detection plays a crucial role in both military and civilian systems. However, current detection methods face significant challenges in complex scenes, such as inaccurate background estimation, inability to distinguish targets from similar non-target points, and poor robustness across various scenes. To address these issues, this study presents a novel spatial–temporal tensor model for infrared small target detection. In our method, we introduce the tensor tree rank to capture global structure in a more balanced strategy, which helps achieve more accurate background estimation. Meanwhile, we design a novel self-adaptive local prior weight by evaluating the level of clutter and noise content in the image. It mitigates the imbalance between target enhancement and background suppression. Then, the spatial–temporal total variation (STTV) is used as a joint regularization term to help better remove noise and obtain better detection performance. Finally, the proposed model is efficiently solved by the alternating direction multiplier method (ADMM). Extensive experiments demonstrate that our method achieves superior detection performance when compared with other state-of-the-art methods in terms of target enhancement, background suppression, and robustness across various complex scenes. Furthermore, we conduct an ablation study to validate the effectiveness of each module in the proposed model.

1. Introduction

In comparison to active radar imaging, infrared imaging offers the benefits of enhanced portability and improved concealment. Meanwhile, compared with visible light systems, it boasts a range of advantages, such as exceptional anti-interference features and the ability to operate all throughout the day [1]. Owing to the superior benefits of infrared imaging, infrared dim and small target detection plays a significant role in military and civil applications, such as aerospace technology [2], security surveillance [3], and forest fire prevention [4]. However, due to the lengthy detection distance, infrared targets usually occupy only a few pixels and lack shape information and textural features. In addition, infrared images in complex scenes often contain a variety of interferences (e.g., heavy clutter and prominent suspicious targets), resulting in a weak signal-to-clutter ratio (SCR) [5]. Therefore, infrared small target detection remains a challenging issue and has attracted widespread research interests.

1.1. Related Works

In general, infrared small target detection algorithms primarily include single-frame detection and sequential-frame detection [6]. For a long time, many single-frame detection approaches have been developed to address the challenges in infrared small target detection. Single-frame detection methods can be divided into four categories: (1) background consistency-based methods; (2) human visual system (HVS)-based methods; (3) deep learning (DL)-based methods; and (4) low-rank and sparse decomposition (LRSD)-based methods.
  • Background consistency-based methods achieve target enhancement and background suppression based on the assumption of background consistency. Typical methods include the Top-hat filter [7], Max–Mean and Max–Median filters [8], and the high-pass filter [9]. Hadhoud and Thomas [10] extended the LMS algorithm [11] and proposed a two-dimensional adaptive least mean square filter (TDLMS). Cao and Sun [12] utilized the maximum inter-class variance method to improve morphological filtering. Although these methods are capable of achieving fast detection speeds, they are unsuitable for application in complex scenes.
  • Contrast is the most crucial factor encoded in our visual system; HVS-based methods generally utilize visual saliency features to distinguish the target from the background. Chen et al. [13] proposed a local contrast method (LCM) to describe the difference between the target and its neighborhood. Inspired by LCM, many methods based on local contrast improvement have been proposed. Starting from the perspective of image patch difference, Wei et al. [14] presented a multiscale patch-based contrast measure (MPCM). Shi et al. [15] proposed a high-boost-based multiscale local contrast measure (HBMLCM). Han et al. [16] designed a multiscale tri-layer local contrast measure (TLLCM) to compute comprehensive contrast. Han et al. [17] improved the detection accuracy by utilizing the Laplacian filter and proposed a coarse-to-fine structure (MCFS) for infrared small moving target detection. However, when the image contains background edges and pixel-sized noises with high brightness (PNHB), such algorithms usually display high false alarms.
  • With the development of artificial neural networks, DL-based methods have received extensive attention for their application in infrared target detection. Fan et al. [18] improved the convolutional neural network to extract infrared image features, aiming to improve detection accuracy and efficiency. Zhao et al. [19] designed an architecture of generative adversarial network (GAN), which models the detection problem issue as an image-to-image translation problem. In [20], a novel Dim2Clear network (Dim2Clear) was proposed to solve the problem of noise interference. Recently, Ying et al. [21] developed a label evolution framework with single point supervision. Although DL-based methods can achieve good detection performance under training scenes, their generalization to practical applications remains a challenge.
  • In recent years, LRSD-based methods have achieved great success and can now effectively separate the low rank background and the sparse target of infrared image. Gao et al. [22] first proposed an infrared patch-image model (IPI) by constructing local patches. Consequently, infrared small target detection is transformed into an optimization problem. However, as the nuclear norm minimization (NNM) uses the same threshold to shrink singular values, an over-shrinkage problem may occur in complex backgrounds full of interference [23]. Furthermore, besides the target, edges and corners in the background are also considered as sparse components under the l 1 -norm [24]. To handle the above problems, Dai et al. [25] constructed a non-negative infrared patch-image model (NIPPS) by adding a non-negative constraint to the sparse target. Wang et al. [26] introduced the total variation regularization that better removes the noise and proposed a total variation regularization and principal component pursuit model (TV-PCP). Zhang et al. [27] designed a nonconvex rank approximation minimization (NRAM) by utilizing the l 2,1 -norm to constrain the remaining edges. Assuming that the background comes from multiple subspaces, the stable multi-subspace learning model (SMSL) [28] and the self-regularized weighted sparse model (SRWS) [29] were proposed to improve detection performance. In order to better extract the image structure information and meet the practical demand for fast detection speed, Dai and Wu [30] adopted the tensor structure and proposed a reweighted infrared patch-tensor model (RIPT). Zhang and Peng [31] combined the partial sum of the tensor nuclear norm (PSTNN) and the local prior to effectively improve detection efficiency. In [32], the tensor fibered nuclear norm based on the Log operator (LogTFNN) was used to nonconvex approximate the tensor rank, which helps suppress background and noise. Zhang et al. [33] constructed a non-local block tensor and an adaptive compromising factor based on the image local entropy. Then, a self-adaptive and non-local patch-tensor model (ANLPT) was proposed for infrared small target detection.
Although the above LRSD-based single-frame detection methods have achieved good results, they ignore temporal information. Traditional sequential-frame detection methods, such as 3D matched filtering [34], dynamic programming algorithms [35], the spatiotemporal saliency approach [36], and trajectory consistency [37], face challenges in effectively separating the background from the target. In addition, these methods usually require prior knowledge, which is difficult to obtain in practical applications. In order to exploit the spatial–temporal information that is neglected in LRSD-based single-frame detection approaches, Sun et al. [38] stacked images from successive adjacent frames. Inspired by this, Zhang et al. [39] proposed a novel spatial–temporal tensor model with edge-corner awareness to further improve detection ability. Considering that the Laplace operator can approximate the tensor rank more accurately, Hu et al. [40] proposed a multi-frame spatial–temporal patch-tensor model (MFSTPT). Wang et al. [41] integrated the nonoverlapping patch spatial–temporal tensor model (NPSTT) and the tensor capped nuclear norm (TCNN) for detection results with low false alarms. Further, Liu et al. [42] designed a nonconvex tensor Tucker decomposition method, in which factor prior was used to obtain accurate background estimation and reduce computational complexity.

1.2. Motivation

Compared with background consistency-based approaches and HVS-based approaches, low-rank and sparse tensor decomposition (LRSTD)-based methods can better enhance small target features and suppress background clutter interference. Among these approaches, single-frame detection methods only consider single-frame image to construct the optimization model and struggle to achieve accurate results in various challenging environments with dynamic change or heavy clutter. Considering the significance of combining contextual information in the spatial–temporal domain, this article primarily concentrates on sequential-frame infrared target detection. While currently available methods have achieved relatively good detection performance, there are still some issues that need to be addressed.
First, due to the complex multilinear structure of the tensor, the exact approximation of the background tensor rank is always a major difficulty. To improve the accuracy of background estimation, these LRSTD-based methods [30,31,43] focus on designing more accurate tensor rank constraints, such as the sum of nuclear norm (SNN), tensor nuclear norm (TNN), and tensor train nuclear norm. Nevertheless, it has been proven that SNN fails to accurately approximate the tensor rank [44]. According to the definition of TNN, it lacks flexibility and the ability to measure low-rankness from multiple modes [45]. Although tensor train rank has a well-balanced matricization scheme, it suffers from higher storage requirements [46]. In summary, the approximation of the background tensor rank still needs to be improved. Therefore, we apply tensor tree rank to separate target and background. Compared with the above strategies, tensor tree decomposition is a more balanced method that splits the modes of a tensor in a hierarchical way.
In addition to accurate background tensor estimation, the suppression of strong edges and corner points is key to achieving good detection performance. The local structure prior is often used to suppress interference. The RIPT only focuses on the edge structure information of the background, which may lead to false alarms. Likewise, the fixed prior weights used in PSTNN and MFSTPT cannot effectively suppress clutter in diverse scenes with different levels of interference. It is important to balance the enhancement of the target and the suppression of the interference from edges and corners in different scenes. To solve this problem, we propose a self-adaptive local prior method to adaptively suppress background clutter. Moreover, we use spatial–temporal total variation (STTV) to explore local smooth information. This strategy helps us to better remove the background noise. By combining tensor tree decomposition, self-adaptive local prior, and STTV, our method can accurately detect small targets. In the following sections, we refer to the proposed method as the TTALP-TV method. We present the results of qualitative and quantitative experiments to demonstrate that TTALP-TV surpasses other state-of-the-art methods in terms of target enhancement and background suppression in various complex scenes. Figure 1 presents the flowchart of our method. The main contributions of this article can be summarized as follows:
(1)
In order to approximate the tensor rank function more flexibly and accurately, we introduce tensor tree decomposition to exploit spatial and temporal correlation through a hierarchical structure.
(2)
The self-adaptive local prior is proposed as a target weight, which can not only better extract target information but also more effectively remove background clutter. Simultaneously, we impose STTV regularization constraint on the background to preserve image details and reduce noise interference.
(3)
We integrate the tensor tree rank, self-adaptive local prior, and STTV for infrared small target detection. An efficient optimization scheme using the alternating direction multiplier method (ADMM) is introduced to solve the proposed model.
The remaining sections of this article are organized as follows. Section 2 summarizes the notations and preliminaries of tensor tree decomposition. Section 3 introduces the TTALP-TV model and describes its optimization procedure in detail. In Section 4, we demonstrate the effectiveness of the proposed algorithm through extensive experiments and analyses. Finally, Section 5 concludes this article and discusses the future work.

2. Notations and Preliminaries

This section introduces the essential notations and preliminaries used in this research. In this paper, we use lowercase letters (e.g., x ), boldface lowercase letters (e.g., x ), boldface capital letters (e.g., X ), and Euler script (e.g., X ) to represent scalars, vectors, matrices, and tensors, respectively. The D th-order tensor X R I 1 × × I D , and D = 1 , , D . The node-q tensor-matrix multiplication can be denoted as Y = X × q U , where X R I 1 × × I D , U R J × I q , Y R I 1 × × I q 1 × J × I q + 1 × × I D , and q is the node in the tensor tree format.
The specific explanations of the symbols used are given in Table 1.

Tensor Tree Network

Definition 1 (Tensor tree structure) [47].
For D th-order data, we define a binary tree T with root D as its dimension tree. Each node C q T ,   q = 1 , , Q possesses the following attributes:
  • The node with only one entry is a leaf, i.e., C p = d . The set of all leaf nodes can be represented as follows:
F ( T ) = C p T   |   C p   i s   a   l e a f   n o d e   o f   T ,   p = 1 , , P
where P is the number of leaves.
2.
The node consisting of two disjoint successors is an interior node. The set of all interior nodes is denoted by:
E ( T ) = T F ( T )
And Q P represents the number of interior nodes.
3.
The tree distance h ( C q ) is the distance between the node C q T and the root, with a maximum depth of H . At depth h , P h and Q h denote the number of leaves and total nodes, respectively.
Definition 2 (Matricization) [47].
Given a node of dimension indices C q D and its complement C ¯ q = D C q , the matricization of a tensor X R I 1 × × I D is defined as:
X q R I C q × I C ¯ q
where I C q = c C q I c and I C ¯ q = c ¯ C ¯ q I c ¯ .
Definition 3 (Tensor tree rank) [47].
Let T be a dimension tree of a D th-order tensor, the tensor tree rank is the set of ranks of the matricization for each node, in the form of:
r a n k t r e e = k q | k q = r a n k X q ,         C q T
Definition 4 (Tensor tree decomposition) [47].
Given X R I 1 × × I D , for every node C q T , X q can be written as:
X q = U q V q T ,       U q R I C q × k C q
where k C q is the standard matrix rank of X q . For each C q E ( T ) with two disjoint successors C q 1 and C q 2 , the column vectors u q : , l of U q can be expressed as:
u q : , l = l 1 = 1 k q 1 l 2 = 1 k q 2 G q ( l , l 1 , l 2 ) u q 1 : , l 1 u q 2 : , l 2
where G q l , l 1 , l 2 is the coefficient of the linear combination. Figure 2 graphically illustrates the tensor tree decomposition of a 4th-order tensor, providing an intuitive understanding of its structure.

3. Methodology

3.1. Spatial–Temporal Infrared Tensor Model

According to the characteristic analysis in [22], the original infrared images can be linearly modeled as:
f D = f B + f T + f N
where f B , f T , f D , and f N denote the background image, target image, infrared image, and noise image, respectively. Equation (7) only considers spatial data and ignores the target’s motion in the temporal dimension, increasing the risk of missed detections or false alarms in some complex infrared scenes. Moreover, compared with the matrix-based methods, in the tensor domain, we can explore the intrinsic relationships of the data from multiple perspectives and improve computational efficiency.
To ensure the comprehensive utilization of spatial and temporal information, we adopt the approach in [38] to construct spatial–temporal image tensor. As shown in Figure 1, the input image tensor D R n 1 × n 2 × L is constructed by stacking consecutive L frames in chronological order from the infrared sequence. Therefore, Equation (7) is written as follows:
D = B + T + N
where B , T , D , and  N are the spatial–temporal tensor forms of f B , f T , f D , and f N , respectively. Figure 3 shows that the singular value distribution curves of the image tensor along mode 1, mode 2, and mode 3 rapidly decrease to zero. This indicates that background tensor B is a low-rank tensor. Given that the infrared small targets usually occupy only a few pixels in the entire image, it can be assumed that T is a sparse tensor. At the same time, it is commonly assumed that the noise in infrared images is additive Gaussian noise that satisfies N F < δ . Therefore, the mathematical formula is as follows:
min B , T r a n k ( B ) + λ 1 T 0 s . t . D B T F < δ
where λ 1 is a positive regularization parameter balancing the trade-off between the target spatial–temporal tensor and the background spatial–temporal tensor. As the optimization of l 0 -norm is NP-hard. In practice, it is usually substituted with the l 1 -norm:
min B , T r a n k ( B ) + λ 1 T 1 s . t . D B T F < δ

3.2. Self-Adaptive Local Prior Information

In infrared images, the strong edges and corner points in the background exhibit sparsity similar to that of the target. This makes it difficult to completely distinguish them from the target when relying solely on global sparse features. Thus, it is essential to extract local prior and incorporate it into the optimization function to reduce background residuals. For this reason, the structure tensor [48] was used to depict the local geometry structure of infrared images. For an original infrared image D , the classic linear structure tensor can be calculated as follows:
J ρ = K ρ D ρ   D ρ = J 11 J 12 J 21 J 22 = K ρ I x 2 K ρ I x I y K ρ I x I y K ρ I y 2
λ 1 = 1 2 J 11 + J 22 + J 22 J 11 2 + 4 J 12 2 λ 2 = 1 2 J 11 + J 22 J 22 J 11 2 + 4 J 12 2
where K ρ denotes the Gaussian kernel function with variance ρ , denotes the convolution operation, denotes the gradient, and denotes the Kronecker product. The difference between λ 1 and λ 2 reflects the image area to which the pixel belongs. When the pixel belongs to the flat region, λ 1 λ 2 0 ; when the pixel belongs to the corner region, λ 1 λ 2 0 ; and when the pixel belongs to the edge region, λ 1 λ 2 0 . The local prior information extracted in RIPT [30] is calculated as follows:
E x , y = λ 1 λ 2
where x , y represents the pixel position. However, as shown in row 3 of Figure 4, RIPT only captures the edge structure information of the background, which may result in background residuals and the loss of targets. Brown et al. [49] proposed the following corner-strength function to highlight target information:
C x , y = d e t S T x , y t r S T x , y = λ 1 λ 2 λ 1 + λ 2
where S T · means the structure tensor, and d e t · and t r · represent the determinant and trace of the matrix, respectively. The PSTNN model [31] utilizes the maximum eigenvalue as the background weight function, rather than Equation (13), and combines it with Equation (14) to calculate the prior weight:
W p x , y = C · E
Row 4 of Figure 4 shows that the PSTNN suppresses the residual edge effect to some extent, but there is still room for improvement. In MFSTPT model [40], a weighted geometric average strategy was developed to integrate edge weighting from Equation (13) and the corner-point weighting from Equation (14), which can be expressed as follows:
W c x , y = C p · E q n
However, as demonstrated in row 5 of Figure 4, strong edges are still not constrained effectively, despite the improved acquisition of target information. Based on the above analyses, we believe that the previous methods do not fully exploit the pixel information contained in λ 1 and λ 2 . Another problem is that the weight-stretching parameter artificially set in RIPT and MFSTPT cannot effectively balance the enhancement of the target and the suppression of the background. The underlying reason is the lack of consideration of the clutter information content in infrared images across different scenes. Therefore, a self-adaptive local prior method is proposed to address the above issues. Inspired by the Frangi filter [50], we utilize the ratio and the difference of eigenvalues to highlight target information and suppress background interference:
R x , y = λ 1 λ 2 β = m e a n ( λ 1 λ 2 3 )
where R x , y represents the statistical measure of edges and corner points, and λ 1 > λ 2 . In edge and corner-point regions, a larger difference between the eigenvalues results in a higher R -value. In contrast, in flat regions, the similarity between the two eigenvalues leads to a lower R -value. In addition, the β -value reflects the level of background interference contained in the original image. In scenes with strong clutter, the β -value is larger. Instead, as the background clutter decreases, the β -value will also be smaller. This can be used to adaptively suppress edges and corners. Thus, the final self-adaptive prior weight is described as:
W s x , y = e x p R 2 2 β · 1 e x p λ 1 2 + λ 2 2 2 c 2
where c denotes the half of the maximum of λ 1 2 + λ 2 2 . The last row in Figure 4 shows that the proposed self-adaptive weight effectively suppresses the residual effect of strong edges and bright corner points, while also highlighting the target information. It can be seen that the adaptive factor β enhances suppression in scenes with strong clutter, resulting in a slight target shrinkage, but significantly reduces background residuals compared to other methods. Then, we construct the spatial–temporal tensor W s  and normalize it as follows:
W s = W s w m i n w m a x w m i n
where w m a x and w m i n denote the maximum and minimum values of W s , respectively. In order to accelerate the convergence speed and improve the computational efficiency, we use the reweighted scheme [51] to add a sparse weight:
W s w k + 1 = c T k + ε
where c denotes a non-negative constant, ε represents a small positive number preventing the denominator from being 0, and k is the number of iterations. Considering that self-adaptive prior weight in Equation (19) can suppress edges and corner points, we obtain W r e c by taking the reciprocal of the corresponding elements in W s . Combined with the sparse weight in Equation (20), we build the final local prior tensor as follows:
W = W r e c W s w
where represents the Hadamard product.

3.3. Spatial–Temporal Total Variation Regularization

In real-world infrared scenes, heavy noise can be a significant interference, causing false alarms in target detection. Fortunately, the TV model effectively reduces image noise while simultaneously preserving the spatial piecewise smoothness. Introduced by Rudin [52], TV regularization can distinguish between areas with significant variations, such as edges and textures, and smooth areas with large amounts of noise. For the matrix X R I 1 × I 2 , the TV norm can be mathematically expressed as follows:
X T V = i = 1 I 1 1 j = 1 I 2 1 X i + 1 , j X i , j + | X i , j + 1 X i , j | + i = 1 I 1 1 X i + 1 , I 2 X i , I 2 + j = 1 I 2 1 X I 1 , j + 1 X I 1 , j
It can be seen from Equation (22) that the matrix-based TV framework only depicts the spatial continuity of the infrared targets and ignores the temporal continuity between successive frames. For the exploration of temporal coherence and spatio-temporal smoothing of small targets, the STTV can be obtained:
X S T T V = D h X 1 + D v X 1 + D z X 1
D h X = X i + 1 , j , k X i , j , k
D v X = X i , j + 1 , k X i , j , k
D z X = X i , j , k + 1 X i , j , k
where D h , D v , and D z represent the horizontal, vertical, and temporal difference operators, respectively. This spatiotemporal form of TV can be seen as an effective regular item, and it exhibits a degree of resilience against noise while preserving the image details. Furthermore, it not only emphasizes the spatial smoothness of the local region in the image but also considers that the target remains temporally consistent among successive frames.

3.4. The Proposed TTALP-TV Model

In tensor robust principal component analysis (TRPCA) problems, the rank function is a nonconvex objective to solve. Therefore, the approximation of low-rank background tensor B in Formula (10) is a crucial issue. A recent study [53] shows that employing the tensor tree-based TRPCA method can measure low-rankness of each mode and reduce memory requirements. In this article, we leverage the advantages of tensor tree rank and present the following optimization model:
min B , T w T r a n k t r e e B + λ 1 T 1 s . t . D B T F < δ
where r a n k t r e e B = k 1 , , k Q T is the tensor tree rank and the weighting vector w T meets q = 1 Q w q = 1 . The direct minimization of tensor tree ranks in Formula (27) is NP-hard. As such, we can use their matrix nuclear norms as convex surrogates:
min B , T q = 1 Q w q B q + λ 1 T 1 s . t . D B T F < δ
Furthermore, we incorporate the local prior tensor and STTV regularization to obtain the prior information and suppress background noise, respectively. The proposed TTALP-TV model is as follows:
min B , T q = 1 Q w q B q + λ 1 W T 1 + λ 2 B S T T V s . t . D B T F < δ
where λ 2 is a positive regularization parameter.

3.5. Optimization Procedure

The objective function (29) can be solved effectively using the ADMM [54] method. By introducing four auxiliary variables, X , Z 1 , Z 2 , and Z 3 , we obtain the following model:
min B , T q = 1 Q w q X q + λ 1 W T 1 + λ 2 Z 1 1 + Z 2 1 + Z 3 1 s . t . D B T F < δ , X = B ,   Z 1 = D h B ,   Z 2 = D v B ,   Z 3 = D z B
Based on the inexact augmented Lagrangian multiplier (IALM) [55], Equation (30) is written as:
L A B , T , X , Z = q = 1 Q w q X q + λ 1 W T 1 + λ 2 Z 1 1 + Z 2 1 + Z 3 1 + y 1 , D B T + y 2 , X B + y 3 ,   Z 1 D h B + y 4 ,   Z 2 D v B + y 5 ,   Z 3 D z B + μ 2 ( D B T F 2 + X B F 2 +   Z 1 D h B F 2 +   Z 2 D v B F 2 +   Z 3 D z B F 2 )
where y 1 , y 2 , y 3 , y 4 ,   a n d   y 5 are the Lagrangian multipliers, and μ represents a positive penalty parameter. Using the ADMM framework, we can divide the Equation (31) into the following subproblems:
(a)
Updating X with other variables being fixed:
X = arg min X q = 1 Q w q X q + y 2 , X B + μ 2 X B F 2 = arg min X q = 1 Q w q X q + μ 2 X B y 2 μ F 2
Let τ = w q μ and S = B y 2 μ , Equation (32) can be rewritten as:
X = arg min X q = 1 Q τ X q + 1 2 X q S q F 2
For each node C q T , the solution of X q can be obtained by the singular value thresholding (SVT) [56]:
X q = S V T τ S q = U s t h τ V H
where s t h τ s = s g n s m a x ( s τ , 0 ) and H denotes the complex conjugate.
According to the tensor tree structure of X , U q can be used to represent the updated node value instead of directly updating X q . Moreover, after updating the two successor nodes C q 1 , C q 2 C q , we can update the transfer tensor G q to represent U q for each interior node C q T , where G q is obtained by applying SVT to the new tensor C = S × q 1 U q 1 × q 2 U q 2 . In summary, we can utilize tensor tree decomposition to update X , and the solution details are summarized in Algorithm 1.
Algorithm 1: The updating of X from leaves to roots.
Input:  S , τ
  for  p = 1 , ,   P
     U p , k p = S V T τ S p
  end for
C H 1 = S × 1 U 1 × 2 U 2 × P U P
for  h = H 1 , ,   0  do
  for  q h = 1 , ,   Q h  do
    if (the q h node is an interior one)
       U ^ q h , k ^ q h = S V T τ q h C q h
       G q h = r e s h a p e U ^ q h , k ^ q h ,   k ^ q h , 1 , k ^ q h , 2  
    end if
  end for
   C h 1 = C h × 1 U ^ 1 × 2 U ^ 2 × ( Q h P h ) U ^ ( Q h P h )
end for
X can be constructed from G q and U q
Output:  X
(b)
Updating B with other variables being fixed:
B = arg min B μ 2 ( D B T + y 1 μ F 2 + X B + y 2 μ F 2 +   Z 1 D h B + y 3 μ F 2 +   Z 2 D v B + y 4 μ F 2 +   Z 3 D z B + y 5 μ F 2 )
The closed form solution of Equation (35) is expressed as follows:
B = F 1 F L + θ 1 + θ 2 + θ 3 2 + i h , v , z F D i H F D i
where L = D T + X + y 1 μ + y 2 μ , θ 1 = D h T Z 1 + y 3 μ , θ 2 = D v T Z 2 + y 4 μ , and θ 3 = D z T Z 3 + y 5 μ . F and F 1 represent the fast nFFT operator and the inverse nFFT operator, respectively.
(c)
Updating T with other variables being fixed:
T = arg min T λ 1 W T 1 + μ 2 D B T + y 1 μ F 2
Using the element-wise shrinkage approach [57], T is updated by:
T = T H λ 1 μ D B + y 1 μ
where T H · denotes the element-wise shrinkage operator.
(d)
Updating Z 1 , Z 2 , a n d   Z 3 with other variables being fixed:
  Z 1 = arg min Z 1 λ 2 Z 1 1 + μ 2   Z 1 D h B + y 3 μ F 2   Z 2 = arg min Z 2 λ 2 Z 2 1 + μ 2   Z 2 D v B + y 4 μ F 2   Z 3 = arg min Z 3 λ 2 Z 3 1 + μ 2   Z 3 D z B + y 5 μ F 2
The Equation (39) can be solved by the element-wise shrinkage operator:
  Z 1 = T H λ 2 μ D h B y 3 μ   Z 2 = T H λ 2 μ D v B y 4 μ   Z 3 = T H λ 2 μ D z B y 5 μ
(e)
Updating Lagrangian multipliers y 1 , y 2 , y 3 , y 4 , a n d   y 5 with other variables being fixed:
y 1 = y 1 + μ D B T y 2 = y 2 + μ X B y 3 = y 3 + μ Z 1 D h B y 4 = y 4 + μ Z 2 D v B y 5 = y 5 + μ Z 3 D z B
(f)
Updating penalty parameter μ by μ = m i n ρ μ , μ m a x .
The complete process of the ADMM optimization method is given in Algorithm 2.
Algorithm 2: TTALP-TV algorithm
Input: The spatial–temporal tensor D R n 1 × n 2 × L ,   parameters   λ 1 ,   λ 2 ,   μ
Initialize:  B 0 = D ,   T 0 = Z i 0 = 0 ,   i = 1 ,   2 ,   3 ,   y i 0 = 0 ,   i = 1 ,   2 ,   3 ,   4 ,   5 ,   μ 0 = 5 e 3 ,   μ m a x = 1 e 6 ,   ρ = 1.2 ,   ζ = 1 e 6 ,   maximum   iteration   step   K = 100 .
While not converged do
1:  Update   X k + 1 by Algorithm 1
2:  Update   B k + 1 via Equation (36)
3:  Update   T k + 1 via Equation (38)
4:  Update   W k + 1 via Equation (21)
5:  Update     Z 1 k + 1 ,     Z 2 k + 1 ,     Z 3 k + 1 via Equation (40)
6: Update Lagrangian multipliers y i k + 1 ,   i = 1 ,   2 ,   3 ,   4 ,   5 via Equation (41)
7:  Update   penalty   parameter   μ via
                  μ = m i n ρ μ , μ m a x
8: Check the convergence condition
                D B k + 1 T k + 1 F 2 D F 2 ζ
9:  Update   k = k + 1
End while
Output: Background component B  and target component  T .

3.6. Steps of Detection Method

Figure 1 elaborates the whole process of the proposed TTALP-TV model, which is described as follows:
  • Self-adaptive local prior extraction. Given an infrared image, the self-adaptive prior weight W s is calculated by Equation (18).
  • Spatial–temporal tensor construction. The spatial–temporal infrared tensor D R n 1 × n 2 × L and local prior tensor W R n 1 × n 2 × L are constructed by stacking consecutive L frames in chronological order from the original image sequence and the prior weight map, respectively.
  • Background and target separation. The spatial–temporal infrared tensor D is decomposed into background tensor B and target tensor T through Algorithm 2.
Image reconstruction. Contrary to the construction process, the target image f T is reconstructed from T .

4. Experimental Results

In this section, we first discuss the datasets used in infrared target detection experiments. Then, we introduce evaluation metrics and analyze the effects of several important parameters on the TTALP-TV model. Finally, we evaluate the detection ability and robustness of the proposed algorithm and compare it with eight state-of-the-art methods in six complex scenes.

4.1. Experiment Data

The dataset used in the experiments consists of six infrared image sequences, including complex scenes such as sky, sea, clouds, mountains, and buildings. The infrared sequences 1, 3, 4, and 6 are from [58,59]. In order to carry out an objective assessment of TTALP-TV from diverse scenes, we simulated infrared sequences 2 and 5 using the strategy in [22]. As shown in Figure 5, the images are uniformly scaled to the same size to improve target visibility. Meanwhile, each small target is marked by a red rectangle and magnified in the bottom right corner of the image. It can be seen that, in most scenes, the targets occupy a few pixels and lack shape information and texture features. Due to heavy clutter interference in complex scenes, it is difficult to distinguish the target from the background. The specific descriptions of sequences are presented in Table 2. Additionally, the entire experiment framework was implemented using MATLAB R2020a in Windows 10 based on AMD Ryzen 7 5800H 3.20 GHz CPU with 16GB memory.

4.2. Evaluation Metrics and Baselines

We evaluate the detection performance of the TTALP-TV method using three evaluation metrics: 3D receiver operating characteristic (3D ROC) [60], signal-to-clutter ratio gain (SCRG), and background suppression factor (BSF). The 3D ROC curve consists of three parameters, including false alarm rate F a , detection probability P d , and threshold τ . The P d evaluates the target detection capability, while the F a assesses the background suppression capability, as defined below:
P d = T D A T
where T D and A T denote the number of detected targets and the number of actual targets, respectively.
F a = F D N P
where F D and N P denote the number of false detections and the number of image pixels, respectively. Due to the intersections between ROC curves, we calculate the AUC values of three 2D ROC curves, A U C F a , P d , A U C τ , P d , and A U C τ , F a , for a more accurate performance assessment. The values of A U C F a , P d and A U C τ , P d range from zero to one, where values closer to one indicate better target detection capability. Conversely, the value of A U C τ , F a ranges from one to zero, where the value closer to zero represents a better ability to suppress background clutter. Therefore, the above three AUC values are combined to comprehensively evaluate the overall accuracy (OA) and the signal-to-noise probability ratio (SNPR), which are defined as follows:
A U C O A = A U C F a , P d + A U C τ , P d A U C τ , F a
A U C S N P R = A U C τ , P d A U C τ , F a
where A U C O A 0 , 2 and A U C S N P R 0 , + . Meanwhile, higher A U C O A and A U C S N P R denote a stronger ability to detect targets and suppress background clutter, respectively.
In addition, the SCRG and BSF can also be used to measure an algorithm’s ability to enhance the target and suppress the background, respectively. Both SCRG and BSF are calculated in the neighborhood of the target. As shown in Figure 6, if the target size is a × b , then ( a + 2 d ) × ( b + 2 d ) denotes the size of the target neighborhood. In the experiments of this paper, we follow [32] to set d = 65 .
The SCRG represents the SCR of the detection result and the original image, which is expressed as:
S C R G = S C R o u t S C R i n
where SCR reflects the degree of discrimination between the target and the background clutter in the image, which can be calculated as:
S C R = μ ¯ 0 μ ¯ 1 σ 1
where μ ¯ 0 denotes the target’s average gray value, μ ¯ 1 denotes that of the target neighborhood, and σ 1 denotes the gray standard deviation of the target neighborhood.
The BSF can evaluate the background suppression ability, which is defined as follows:
B S F = σ i n σ o u t
where σ o u t and σ i n represent the standard deviations of the target neighborhood in the detection result and the original image, respectively.

4.3. Parameter Analyses

The settings of different parameters in the model have a great impact on the detection performance. Therefore, this section aims to explore the appropriate parameters for the TTALP-TV method in sequences 1–6. According to [61], we set λ 2 = 0.01 . Then, we detail the effects of L and H on the detection capability of our proposed method.

4.3.1. Adjacent Frames Number L

In the construction of the spatial–temporal tensor, the adjacent frame number L determines the utilization of temporal domain information. In order to investigate the influence of different L values on the detection performance of the TTALP-TV model, we adjust L from 2 to 6 with a step of 1. Figure 7 shows the analysis results of various L values using the 3D ROC. Increasing the L values can incorporate more temporal information, which ensures the low-rankness of the spatial–temporal tensor. At the same time, over-large adjacent frame numbers will lead to redundant and repetitive information, resulting in high false alarms. Figure 7 shows that L = 3 is the most suitable for the proposed model.

4.3.2. Tuning Parameter H

The compromising parameter λ 1 controls the balance between the sparse target and the low-rank background in the framework. Following [62], we set λ 1 = H m a x n 1 , n 2 L , where H is a crucial tuning parameter. We change H from 4 to 12 with a step of 2. The 3D ROC analysis results of H are shown in Figure 8. It can be seen from Figure 8 that when H values increase, the false alarms decrease, which indicates that H assists in the suppression of background residuals. Meanwhile, if H is too large (e.g., H = 12 ), some necessary information may be lost, resulting in the degradation of detection performance. Based on the 3D ROC analysis results shown in Figure 8, we set H = 10 .

4.4. Ablation Study

In order to validate the effectiveness of the self-adaptive local prior and STTV regularization in the proposed TTALP-TV method, we conducted an ablation study, as shown in Figure 9. The TTALP-TV framework consists of three parts: the tensor tree-based spatiotemporal tensor model, the self-adaptive local prior tensor, and STTV regularization. As illustrated in Figure 9, we compare the 3D ROC analysis results of four versions of the TTALP-TV method in sequences 1–6: (1) the tensor tree-based spatiotemporal tensor model (TTSTT), (2) incorporating self-adaptive local prior tensor into the tensor tree-based spatiotemporal tensor model (TTALP), (3) imposing STTV regularization constraint on the background component in the tensor tree-based spatiotemporal tensor model (TTSTT-TV), and (4) integrating the self-adaptive local prior tensor and STTV regularization into the tensor tree-based spatiotemporal tensor model (TTALP-TV). Figure 9 shows that leveraging the self-adaptive local prior does in fact improve target detection performance to a certain extent. Moreover, the STTV regularization constraint on the background helps better remove background clutter and noise while preserving image details. The results of the ablation experiments intuitively demonstrate the significance of any single module and provide guidance for further attempts to improve the optimization model.

4.5. Noise Robustness Validation of the Proposed TTALP-TV Method

Due to the influence of the real-world environment on the sensor, infrared images usually contain noise. Therefore, it is essential to evaluate the robustness of the TTALP-TV model to noise. To evaluate the noise robustness of TTALP-TV under different noise intensities, Gaussian white noise of σ = 5 and σ = 15 was added to six scenes, respectively. The second and fourth rows of Figure 9 show the visual detection results of σ = 5 and σ = 15 , respectively. Figure 10 shows that TTALP-TV can accurately detect targets and suppress noise of different intensities, demonstrating its robustness to noisy scenes.

4.6. Comparison with State-of-the-Art Methods

In order to assess the advantages of the TTALP-TV method, we compare it with eight representative baseline methods. These methods can be categorized into background consistency-based methods (Top-hat [7]), HVS-based methods (TLLCM [16]), LRSD-based single-frame detection methods (IPI [22], PSTNN [31], NTFRA [32], and ANLPT [33]) and LRSD-based sequential-frame detection methods (ASTTV-NTLA [63] and NFTDGSTV [42]). Table 3 lists the detailed parameter settings of these methods.

4.6.1. Visual Comparison

Figure 11 and Figure 12 show the detection results of eight compared methods and our method in six infrared sequences. From Figure 11 and Figure 12, we can see that Top-hat has a lot of clutter and noise residuals in its detection results. The main reason for this is that the structure size of the Top-hat is fixed, meaning it cannot adapt to the dynamics of complex scenes. In contrast, TLLCM suppresses clutter to a certain extent but still has background residuals in complex scenes. Compared with the background consistency and HVS methods, the matrix-based LRSD method IPI contains fewer background residuals, but its background is gray. As can be seen from Figure 11 and Figure 12, the PSTNN and ANLPT methods can achieve relatively better target detection performance (e.g., sequences 1, 4, and 6), but they are basically unable to completely suppress background.
At the same time, we can see that NTFRA can better preserve targets and suppress background interference but fails in complex scenes with highlighted line edges (e.g., sequences 3–4). These single-frame detection methods effectively utilize spatial information to separate the target from the background. However, using only inter-frame information results in low robustness to various complex scenes with dynamic changes and heavy clutter. Therefore, many researchers have combined spatial–temporal information to improve detection ability and remove background interference. It can be seen from Figure 11 and Figure 12 that ASTTV-NTLA and NFTDGSTV present exceptional target detection and background suppression abilities in scenes with little clutter interference (e.g., sequence 2). However, when faced with complex scenes with high-brightness clutter and heavy noise (e.g., sequences 3, 4, and 6), their detection performance will degrade significantly. In contrast, the proposed TTALP-TV method is not only able to accurately extract the target and preserve a relatively complete shape, but it can also mostly suppress strong edges and bright corner-point noise in complex scenes.

4.6.2. Quantitative Analysis

In addition to the qualitative analysis in Figure 11 and Figure 12, we adopt 3D ROC, A U C O A , A U C S N P R , SCRG, and BSF, a total of five evaluation metrics, to compare nine methods quantitatively. Figure 13 shows the 3D ROC curves of all comparison methods in complex and noisy scenes (e.g., sequences 1–6). In order to clearly depict the differences among the nine methods, the logarithmic scale is used for the false alarm rate axis. As shown in Figure 13, the proposed TTALP-TV method is closer to the top-right corner, indicating that it has superior detection performance. The single-frame detection method ANLPT also achieves good detection performance in sequence 5. Meanwhile, other sequential-frame detection methods, ASTTV-NTLA and NFTDGSTV, exhibit performance similar to our method in sequence 2 and sequence 6, but were not good enough in the rest of the sequences. To further assess which method has the best performance, we use A U C O A and A U C S N P R to evaluate target detection ability and background suppression ability, respectively. In each sequence, the highest value is highlighted in red, and the second highest value is marked in green. Table 4 and Table 5 show that our method achieves the highest A U C O A and A U C S N P R values.
In Table 6 and Table 7, the SCRG and BSF of nine methods in six sequences are displayed, with the highest and second highest values of SCRG and BSF marked in red and green, respectively. The results show that the ANLPT model achieves the highest SCRG values in sequence 5. On the other hand, the SCRG and BSF values of our model surpass other methods for more complex scenes (e.g., sequences 1–4 and 6). In summary, the above quantitative analyses demonstrate the effectiveness of our algorithm in both target enhancement and background suppression, particularly in complex scenes.

4.6.3. Running Time

In addition to the above evaluation metrics, computational efficiency is also a crucial factor in infrared target detection algorithms. Table 8 presents the average running time of all comparison methods on six sequences (per frame). It should be noted that the image size of sequences 1–4 is 256 × 256 , and the image size of sequences 5–6 is 256 × 205 and 296 × 237 , respectively. In general, the larger image size results in the longer running time. Based on Table 8, we can find that Top-hat has the shortest time cost. This is because Top-hat adopts a simple model architecture. It is worth noting that tensor-based algorithms are significantly quicker than the matrix-based IPI algorithm. Among the tensor-based methods, the running time of sequential-frame detection methods (e.g., ASTTV-NTLA, NFTDGSTV) is longer than that of single-frame detection methods (e.g., PSTNN, NTFRA, ANLPT). This is mainly because sequential-frame detection methods require more time to process the temporal domain information. From Table 8, it can be seen that the proposed method has a longer running time than ASTTV-NTLA and NFTDGSTV. This is because computing the self-adaptive prior in TTALP-TV increases costs in terms of time. Based on the qualitative and quantitative results shown in Figure 11, Figure 12 and Figure 13 and Table 4, Table 5, Table 6 and Table 7, it can be concluded that our method has better detection performance than the compared methods. Therefore, the extra running time of our method is acceptable.

5. Conclusions

In this article, the TTALP-TV model is proposed for infrared small target detection in complex scenes. Based on the theorem that the tensor tree decomposition can exploit the data structure in a more balanced strategy, we introduce tensor tree rank to obtain more accurate background estimation. It reduces storage costs and retains spatial and temporal correlation through a hierarchical method. In addition, a novel local prior weight is proposed for adaptively assigning weights to targets, which helps to better distinguish targets from similar objects. Meanwhile, STTV is used as a joint regularization term to remove noise while preserving image details. Therefore, the separation of target and background is converted into an optimization problem. Finally, we provide an efficient ADMM-based framework for solving the proposed TTALP-TV model. Extensive experiments demonstrate that the proposed algorithm not only can accurately detect the target but also effectively suppresses background clutter and noise in various complex scenes. However, the real-time performance of our method still needs to be improved due to the prior weight calculation in the model. In the future, our work will focus on establishing more efficient mechanisms to further simplify the calculation and improve detection efficiency.

Author Contributions

Conceptualization, G.Z.; methodology, G.Z. and Z.D.; software, G.Z.; investigation, G.Z., Z.D., B.Z., W.Z. and J.L.; validation, G.Z.; writing—original draft preparation, G.Z.; writing—review and editing, G.Z., Q.L. and Z.T.; project administration, Q.L.; funding acquisition, Z.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Program Project of Science and Technology Innovation of the Chinese Academy of Sciences (no. KGFZD-135-20-03-02) and this research was funded by the Innovation Foundation of Key Laboratory of Computational Optical Imaging Technology, CAS (no. CXJJ-23S016).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the time limitation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rawat, S.S.; Verma, S.K.; Kumar, Y. Review on recent development in infrared small target detection algorithms. Procedia Comput. Sci. 2020, 167, 2496–2505. [Google Scholar] [CrossRef]
  2. Xiao, S.; Peng, Z.; Li, F. Infrared Cirrus Detection Using Non-Convex Rank Surrogates for Spatial-Temporal Tensor. Remote Sens. 2023, 15, 2334. [Google Scholar] [CrossRef]
  3. Gao, J.; Wang, L.; Yu, J.; Pan, Z. Structure Tensor-Based Infrared Small Target Detection Method for a Double Linear Array Detector. Remote Sens. 2022, 14, 4785. [Google Scholar] [CrossRef]
  4. Pang, D.; Shan, T.; Ma, P.; Li, W.; Liu, S.; Tao, R. A Novel Spatiotemporal Saliency Method for Low-Altitude Slow Small Infrared Target Detection. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7000705. [Google Scholar] [CrossRef]
  5. Du, J.; Lu, H.; Zhang, L.; Hu, M.; Chen, S.; Deng, Y.; Shen, X.; Zhang, Y. A Spatial-Temporal Feature-Based Detection Framework for Infrared Dim Small Target. IEEE Trans. Geosci. Remote Sens. 2022, 60, 3000412. [Google Scholar] [CrossRef]
  6. Eysa, R.; Hamdulla, A. Issues on Infrared Dim Small Target Detection and Tracking. In Proceedings of the 2019 International Conference on Smart Grid and Electrical Automation (ICSGEA), Xiangtan, China, 10–11 August 2019; pp. 452–456. [Google Scholar]
  7. Rivest, J.-F.; Fortin, R. Detection of dim targets in digital infrared imagery by morphological image processing. Opt. Eng. 1996, 35, 1886–1893. [Google Scholar] [CrossRef]
  8. Deshpande, S.D.; Er, M.H.; Venkateswarlu, R.; Chan, P. Max-mean and max-median filters for detection of small targets. In Proceedings of the Optics & Photonics, Denver, CO, USA, 4 October 1999. [Google Scholar]
  9. Yang, L.; Yang, J.; Yang, K. Adaptive detection for infrared small target under sea-sky complex background. Electron. Lett. 2004, 40, 1. [Google Scholar] [CrossRef]
  10. Hadhoud, M.M.; Thomas, D.W. The two-dimensional adaptive LMS (TDLMS) algorithm. IEEE Trans. Circuits Syst. 1988, 35, 485–494. [Google Scholar] [CrossRef]
  11. Widrow, B.; Glover, J.R.; McCool, J.M.; Kaunitz, J.; Williams, C.S.; Hearn, R.H.; Zeidler, J.R.; Dong, J.E.; Goodlin, R.C. Adaptive noise cancelling: Principles and applications. Proc. IEEE 1975, 63, 1692–1716. [Google Scholar] [CrossRef]
  12. Cao, M.; Sun, D. Infrared weak target detection based on improved morphological filtering. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 1808–1813. [Google Scholar]
  13. Chen, C.L.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A Local Contrast Method for Small Infrared Target Detection. IEEE Trans. Geosci. Remote Sens. 2014, 52, 574–581. [Google Scholar] [CrossRef]
  14. Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit. 2016, 58, 216–226. [Google Scholar] [CrossRef]
  15. Shi, Y.; Wei, Y.; Yao, H.; Pan, D.; Xiao, G. High-Boost-Based Multiscale Local Contrast Measure for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2018, 15, 33–37. [Google Scholar] [CrossRef]
  16. Han, J.; Moradi, S.; Faramarzi, I.; Liu, C.; Zhang, H.; Zhao, Q. A Local Contrast Method for Infrared Small-Target Detection Utilizing a Tri-Layer Window. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1822–1826. [Google Scholar] [CrossRef]
  17. Ma, Y.; Liu, Y.; Pan, Z.; Hu, Y. Method of Infrared Small Moving Target Detection Based on Coarse-to-Fine Structure in Complex Scenes. Remote Sens. 2023, 15, 1508. [Google Scholar] [CrossRef]
  18. Fan, Z.; Bi, D.; Xiong, L.; Ma, S.; He, L.; Ding, W. Dim infrared image enhancement based on convolutional neural network. Neurocomputing 2018, 272, 396–404. [Google Scholar] [CrossRef]
  19. Zhao, B.; Wang, C.; Fu, Q.; Han, Z. A Novel Pattern for Infrared Small Target Detection with Generative Adversarial Network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4481–4492. [Google Scholar] [CrossRef]
  20. Zhang, M.; Zhang, R.; Zhang, J.; Guo, J.; Li, Y.; Gao, X. Dim2Clear Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5001714. [Google Scholar] [CrossRef]
  21. Ying, X.; Liu, L.; Wang, Y.; Li, R.; Chen, N.; Lin, Z.; Sheng, W.; Zhou, S. Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15528–15538. [Google Scholar]
  22. Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared patch-image model for small target detection in a single image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef]
  23. Xie, Y.; Gu, S.; Liu, Y.; Zuo, W.; Zhang, W.; Zhang, L. Weighted Schatten p-norm minimization for image denoising and background subtraction. IEEE Trans. Image Process. 2016, 25, 4842–4857. [Google Scholar] [CrossRef]
  24. Dai, Y.; Wu, Y.; Song, Y. Infrared small target and background separation via column-wise weighted robust principal component analysis. Infrared Phys. Technol. 2016, 77, 421–430. [Google Scholar] [CrossRef]
  25. Dai, Y.; Wu, Y.; Song, Y.; Guo, J. Non-negative infrared patch-image model: Robust target-background separation via partial sum minimization of singular values. Infrared Phys. Technol. 2017, 81, 182–194. [Google Scholar] [CrossRef]
  26. Wang, X.; Peng, Z.; Kong, D.; Zhang, P.; He, Y. Infrared dim target detection based on total variation regularization and principal component pursuit. Image Vis. Comput. 2017, 63, 1–9. [Google Scholar] [CrossRef]
  27. Zhang, L.; Peng, L.; Zhang, T.; Cao, S.; Peng, Z. Infrared small target detection via non-convex rank approximation minimization joint l2,1 norm. Remote Sens. 2018, 10, 1821. [Google Scholar] [CrossRef]
  28. Wang, X.; Peng, Z.; Kong, D.; He, Y. Infrared dim and small target detection based on stable multisubspace learning in heterogeneous scene. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5481–5493. [Google Scholar] [CrossRef]
  29. Zhang, T.; Peng, Z.; Wu, H.; He, Y.; Li, C.; Yang, C. Infrared small target detection via self-regularized weighted sparse model. Neurocomputing 2021, 420, 124–148. [Google Scholar] [CrossRef]
  30. Dai, Y.; Wu, Y. Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3752–3767. [Google Scholar] [CrossRef]
  31. Zhang, L.; Peng, Z. Infrared small target detection based on partial sum of the tensor nuclear norm. Remote Sens. 2019, 11, 382. [Google Scholar] [CrossRef]
  32. Kong, X.; Yang, C.; Cao, S.; Li, C.; Peng, Z. Infrared small target detection via nonconvex tensor fibered rank approximation. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5000321. [Google Scholar] [CrossRef]
  33. Zhang, Z.; Ding, C.; Gao, Z.; Xie, C. ANLPT: Self-Adaptive and Non-Local Patch-Tensor Model for Infrared Small Target Detection. Remote Sens. 2023, 15, 1021. [Google Scholar] [CrossRef]
  34. Reed, I.S.; Gagliardi, R.M.; Stotts, L.B. Optical moving target detection with 3-D matched filtering. IEEE Trans. Aerosp. Electron. Syst. 1988, 24, 327–336. [Google Scholar] [CrossRef]
  35. Tonissen, S.M.; Evans, R.J. Peformance of dynamic programming techniques for track-before-detect. IEEE Trans. Aerosp. Electron. Syst. 1996, 32, 1440–1451. [Google Scholar] [CrossRef]
  36. Li, Y.; Zhang, Y.; Yu, J.-G.; Tan, Y.; Tian, J.; Ma, J. A novel spatio-temporal saliency approach for robust dim moving target detection from airborne infrared image sequences. Inf. Sci. 2016, 369, 548–563. [Google Scholar] [CrossRef]
  37. Zhao, F.; Wang, T.; Shao, S.; Zhang, E.; Lin, G. Infrared moving small-target detection via spatiotemporal consistency of trajectory points. IEEE Geosci. Remote Sens. Lett. 2019, 17, 122–126. [Google Scholar] [CrossRef]
  38. Sun, Y.; Yang, J.; Long, Y.; An, W. Infrared small target detection via spatial-temporal total variation regularization and weighted tensor nuclear norm. IEEE Access 2019, 7, 56667–56682. [Google Scholar] [CrossRef]
  39. Zhang, P.; Zhang, L.; Wang, X.; Shen, F.; Pu, T.; Fei, C. Edge and Corner Awareness-Based Spatial–Temporal Tensor Model for Infrared Small-Target Detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10708–10724. [Google Scholar] [CrossRef]
  40. Hu, Y.; Ma, Y.; Pan, Z.; Liu, Y. Infrared Dim and Small Target Detection from Complex Scenes via Multi-Frame Spatial—Temporal Patch-Tensor Model. Remote Sens. 2022, 14, 2234. [Google Scholar] [CrossRef]
  41. Wang, G.; Tao, B.; Kong, X.; Peng, Z. Infrared Small Target Detection Using Nonoverlapping Patch Spatial–Temporal Tensor Factorization with Capped Nuclear Norm Regularization. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5001417. [Google Scholar] [CrossRef]
  42. Liu, T.; Yang, J.; Li, B.; Wang, Y.; An, W. Infrared Small Target Detection via Nonconvex Tensor Tucker Decomposition with Factor Prior. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5617317. [Google Scholar] [CrossRef]
  43. Wu, F.; Yu, H.; Liu, A.; Luo, J.; Peng, Z. Infrared Small Target Detection Using Spatiotemporal 4-D Tensor Train and Ring Unfolding. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5002922. [Google Scholar] [CrossRef]
  44. Romera-Paredes, B.; Pontil, M. A New Convex Relaxation for Tensor Completion. In Proceedings of the Advances in Neural Information Processing Systems, Harrahs and Harveys, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2967–2975. [Google Scholar]
  45. Yang, J.-H.; Zhao, X.-L.; Ji, T.-Y.; Ma, T.-H.; Huang, T.-Z. Low-rank tensor train for tensor robust principal component analysis. Appl. Math. Comput. 2020, 367, 124783. [Google Scholar] [CrossRef]
  46. Zhao, Q.; Zhou, G.; Xie, S.; Zhang, L.; Cichocki, A. Tensor ring decomposition. arXiv 2016, arXiv:1606.05535. [Google Scholar]
  47. Hackbusch, W.; Kühn, S. A new scheme for the tensor representation. J. Fourier Anal. Appl. 2009, 15, 706–722. [Google Scholar] [CrossRef]
  48. Bigun, J.; Granlund, G.H.; Wiklund, J. Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 775–790. [Google Scholar] [CrossRef]
  49. Brown, M.; Szeliski, R.; Winder, S. Multi-image matching using multi-scale oriented patches. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 511, pp. 510–517. [Google Scholar]
  50. Frangi, A.F.; Niessen, W.J.; Vincken, K.L.; Viergever, M.A. Multiscale vessel enhancement filtering. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI’98, Cambridge, MA, USA, 11–13 October 1998; Wells, W.M., Colchester, A., Delp, S., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 130–137. [Google Scholar]
  51. Candes, E.J.; Eldar, Y.C.; Strohmer, T.; Voroninski, V. Phase retrieval via matrix completion. SIAM Rev. 2015, 57, 225–251. [Google Scholar] [CrossRef]
  52. Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
  53. Liu, Y.; Long, Z.; Zhu, C. Image Completion Using Low Tensor Tree Rank and Total Variation Minimization. IEEE Trans. Multimed. 2019, 21, 338–350. [Google Scholar] [CrossRef]
  54. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
  55. Lin, Z.; Chen, M.; Ma, Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
  56. Cai, J.-F.; Candès, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
  57. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  58. Hui, B.; Song, Z.; Fan, H.; Zhong, P.; Hu, W.; Zhang, X.; Lin, J.; Su, H.; Jin, W.; Zhang, Y.; et al. A Dataset for Infrared Image Dim-Small Aircraft Target Detection and Tracking under Ground/Air Background; China Scientific Data: Beijing, China, 2019. [Google Scholar] [CrossRef]
  59. Sun, X.; Guo, L.; Zhang, W.; Wang, Z.; Yu, Q. Small Aerial Target Detection for Airborne Infrared Detection Systems Using LightGBM and Trajectory Constraints. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9959–9973. [Google Scholar] [CrossRef]
  60. Chang, C.-I. An effective evaluation tool for hyperspectral target detection: 3D receiver operating characteristic curve analysis. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5131–5153. [Google Scholar] [CrossRef]
  61. Sun, L.; Zhan, T.; Wu, Z.; Jeon, B. A Novel 3D Anisotropic Total Variation Regularized Low Rank Method for Hyperspectral Image Mixed Denoising. ISPRS Int. J. Geo-Inf. 2018, 7, 412. [Google Scholar] [CrossRef]
  62. Lu, C.; Feng, J.; Chen, Y.; Liu, W.; Lin, Z.; Yan, S. Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 925–938. [Google Scholar] [CrossRef]
  63. Liu, T.; Yang, J.; Li, B.; Xiao, C.; Sun, Y.; Wang, Y.; An, W. Nonconvex Tensor Low-Rank Approximation for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5614718. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the proposed TTALP-TV model for infrared small target detection.
Figure 1. Flowchart of the proposed TTALP-TV model for infrared small target detection.
Remotesensing 16 01108 g001
Figure 2. The diagram of tensor tree decomposition.
Figure 2. The diagram of tensor tree decomposition.
Remotesensing 16 01108 g002
Figure 3. Singular value distribution curves of infrared spatial–temporal tensor along each mode.
Figure 3. Singular value distribution curves of infrared spatial–temporal tensor along each mode.
Remotesensing 16 01108 g003
Figure 4. Comparison of different local structure priors. Row 1 shows original infrared images. Rows 2 to 6 depict different local prior maps, obtained by Equation (13), RIPT, PSTNN, MFSTPT, and the proposed method, respectively. Columns (ad) display the prior weights extracted using different calculation methods for four infrared image sequences.
Figure 4. Comparison of different local structure priors. Row 1 shows original infrared images. Rows 2 to 6 depict different local prior maps, obtained by Equation (13), RIPT, PSTNN, MFSTPT, and the proposed method, respectively. Columns (ad) display the prior weights extracted using different calculation methods for four infrared image sequences.
Remotesensing 16 01108 g004
Figure 5. Representative frames corresponding to the six infrared sequences used in the experiments.
Figure 5. Representative frames corresponding to the six infrared sequences used in the experiments.
Remotesensing 16 01108 g005
Figure 6. Diagram of the target neighborhood.
Figure 6. Diagram of the target neighborhood.
Remotesensing 16 01108 g006
Figure 7. Three-dimensional ROC curves corresponding to different parameters of L in the six sequences.
Figure 7. Three-dimensional ROC curves corresponding to different parameters of L in the six sequences.
Remotesensing 16 01108 g007
Figure 8. Three-dimensional ROC curves corresponding to different parameters of H in the six sequences.
Figure 8. Three-dimensional ROC curves corresponding to different parameters of H in the six sequences.
Remotesensing 16 01108 g008
Figure 9. Ablation results of the six sequences in 3D ROC curves.
Figure 9. Ablation results of the six sequences in 3D ROC curves.
Remotesensing 16 01108 g009
Figure 10. Detection results of TTALP-TV under different noise intensities.
Figure 10. Detection results of TTALP-TV under different noise intensities.
Remotesensing 16 01108 g010
Figure 11. Detection results of nine methods in sequences 1–3. The red rectangles denote target areas, and the blue ellipses denote noise and background residuals.
Figure 11. Detection results of nine methods in sequences 1–3. The red rectangles denote target areas, and the blue ellipses denote noise and background residuals.
Remotesensing 16 01108 g011
Figure 12. Detection results of nine methods in sequences 4–6. The red rectangles denote target areas, and the blue ellipses denote noise and background residuals.
Figure 12. Detection results of nine methods in sequences 4–6. The red rectangles denote target areas, and the blue ellipses denote noise and background residuals.
Remotesensing 16 01108 g012
Figure 13. Three-dimensional ROC curves of nine methods in sequences 1–6.
Figure 13. Three-dimensional ROC curves of nine methods in sequences 1–6.
Remotesensing 16 01108 g013
Table 1. Mathematical symbols.
Table 1. Mathematical symbols.
NotationExplanation
X R I 1 × I 2 × I 3 3th-order tensor
X ( i , j , k ) / x i , j , k Its ( i , j , k )-th element
X , Y The inner product of two tensors, i 1 , i 2 , i 3 x i 1 , i 2 , i 3 y i 1 , i 2 , i 3
X 0 The l 0 -norm, the number of non-zero elements in X
X 1 The l 1 -norm, the sum of non-zero elements in X
X F The Frobenius norm, the square root of the sum of the squares of all elements in X
X * The matrix nuclear norm, the sum of all singular values in X
Table 2. Characteristics of the dataset.
Table 2. Characteristics of the dataset.
SequenceFramesImage SizeTarget DescriptionsBackground Descriptions
1120 256 × 256 Slow-moving and smallGround background with fierce clouds and noise
2120 256 × 256 Slow-moving and weak airplaneGround background with sea and islands
3120 256 × 256 Fast-moving, small and dimGround background with bright buildings
4120 256 × 256 Fast-moving, small and regular shapeGround background with reflective mountains
5120 256 × 205 Fast-moving, irregularly shaped aircraftGround background with reflective clouds
6120 296 × 237 Fast-moving, small and dimGround background with multilayer clouds
Table 3. Detailed parameters of nine methods.
Table 3. Detailed parameters of nine methods.
MethodParameters
Top-hatShape: disk, structure size: 5 × 5 .
TLLCMDifferent filtering window: 3 × 3 , 5 × 5 , 7 × 7 .
IPIPatch size: 50 × 50 , step: 10, λ = 1 m i n m , n , ε = 10 7 .
PSTNNPatch size: 40 × 40 , step: 40, λ = 0.7 m i n n 1 , n 2 n 3 , ε = 10 7 .
ASTTV-NTLA L = 3 , H = 6 , λ t v = 0.005 , λ s = H m a x M , N L , λ 3 = 100 .
NTFRAPatch size: 40 × 40 , step: 40, λ = 1 m i n n 1 , n 2 n 3 , β = 0.05 , μ = 200 .
ANLPTPatch size: 50 × 50 , step: 50, region: 10, channel: 3, μ = 10 3 .
NFTDGSTV L = 3 , H = 4 , λ 1 = 0.01 , λ 2 = H m a x M , N L , λ s = 0.001 .
Proposed L = 3 , H = 10 , λ 1 = H m a x n 1 , n 2 L , λ 2 = 0.01 .
Table 4. A U C O A and A U C S N P R of nine methods in sequences 1–3.
Table 4. A U C O A and A U C S N P R of nine methods in sequences 1–3.
MethodSequence 1Sequence 2Sequence 3
A U C O A A U C S N P R A U C O A A U C S N P R A U C O A A U C S N P R
Top-hat1.927913.90531.972236.15001.68189.5483
TLLCM1.8827115.68741.9228158.08231.303839.1759
IPI1.82755.80481.944518.07091.71173.4998
PSTNN1.9945187.90841.9943180.18010.880439.4834
ASTTV-NTLA1.9943182.12711.9947197.31221.88178.4660
NTFRA1.9944186.07831.9947195.73410.49360.6221
ANLPT1.9946193.51681.9943181.17811.9942178.3936
NFTDGSTV1.898310.84091.976242.27231.906512.3141
Proposed1.9948198.02731.9948198.01271.9944183.0405
Table 5. A U C O A and A U C S N P R of nine methods in sequences 4–6.
Table 5. A U C O A and A U C S N P R of nine methods in sequences 4–6.
MethodSequence 4Sequence 5Sequence 6
A U C O A A U C S N P R A U C O A A U C S N P R A U C O A A U C S N P R
Top-hat1.945618.42961.855631.04801.892314.3259
TLLCM1.840498.97581.461674.50131.447477.1831
IPI1.89349.39791.77794.87441.51302.0540
PSTNN1.9936161.91001.164981.64251.7827162.4838
ASTTV-NTLA1.87608.07981.949022.94041.904710.5164
NTFRA1.871047.73431.038437.72610.870645.0366
ANLPT1.9944184.82681.9947196.16991.9947196.9744
NFTDGSTV1.943517.76051.74775.95531.79464.8739
Proposed1.9948198.02721.9947198.03391.9948198.0076
Table 6. SCRG and BSF of nine methods in sequences 1–3.
Table 6. SCRG and BSF of nine methods in sequences 1–3.
MethodSequence 1Sequence 2Sequence 3
SCRGBSFSCRGBSFSCRGBSF
Top-hat17.871.461.481.170.860.89
TLLCM99.674.402.851.975.174.08
IPI131.1212.072.492.127.574.79
PSTNN114.297.982.591.981.153.24
ASTTV-NTLA219.2415.332.632.529.014.63
NTFRA81.116.202.342.270.161.56
ANLPT178.2811.192.591.959.374.23
NFTDGSTV174.7814.052.782.7813.476.65
Proposed235.5117.833.563.4619.518.86
Table 7. SCRG and BSF of nine methods in sequences 4–6.
Table 7. SCRG and BSF of nine methods in sequences 4–6.
MethodSequence 4Sequence 5Sequence 6
SCRGBSFSCRGBSFSCRGBSF
Top-hat7.793.6614.004.299.632.17
TLLCM20.946.3332.7510.4627.266.69
IPI19.8311.1126.5611.4925.017.94
PSTNN25.8512.6329.7417.1062.0610.56
ASTTV-NTLA18.1810.4536.5015.7771.8414.40
NTFRA20.5010.6423.8917.5013.069.49
ANLPT22.9710.8241.9516.0177.4814.15
NFTDGSTV23.1812.1829.5312.5655.4512.09
Proposed27.3913.8840.7917.8396.1118.30
Table 8. Running time(s) of the nine methods.
Table 8. Running time(s) of the nine methods.
MethodSequence 1Sequence 2Sequence 3Sequence 4Sequence 5Sequence 6
Top-hat0.09580.09890.10080.10130.10980.1011
TLLCM1.12091.09791.11931.12350.86031.2962
IPI5.10734.79035.11925.80943.95345.9438
PSTNN0.36050.24020.31050.31680.28450.2813
ASTTV-NTLA2.08892.10022.05522.10451.44872.2719
NTFRA1.44931.39041.41451.49651.28291.6751
ANLPT1.56821.54371.49391.59611.29541.5260
NFTDGSTV1.92582.02231.95341.82971.80252.4437
Proposed2.31152.28452.37402.31821.74062.5226
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, G.; Ding, Z.; Lv, Q.; Zhu, B.; Zhang, W.; Li, J.; Tan, Z. Infrared Small Target Detection Based on Tensor Tree Decomposition and Self-Adaptive Local Prior. Remote Sens. 2024, 16, 1108. https://doi.org/10.3390/rs16061108

AMA Style

Zhang G, Ding Z, Lv Q, Zhu B, Zhang W, Li J, Tan Z. Infrared Small Target Detection Based on Tensor Tree Decomposition and Self-Adaptive Local Prior. Remote Sensing. 2024; 16(6):1108. https://doi.org/10.3390/rs16061108

Chicago/Turabian Style

Zhang, Guiyu, Zhenyu Ding, Qunbo Lv, Baoyu Zhu, Wenjian Zhang, Jiaao Li, and Zheng Tan. 2024. "Infrared Small Target Detection Based on Tensor Tree Decomposition and Self-Adaptive Local Prior" Remote Sensing 16, no. 6: 1108. https://doi.org/10.3390/rs16061108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop