Kernel uncorrelated neighborhood discriminative embedding for feature extraction

Xuelian Yu; Xuegang Wang

doi:10.1117/1.2821866

1 December 2007 Kernel uncorrelated neighborhood discriminative embedding for feature extraction

Xuelian Yu, Xuegang Wang

Author Affiliations +

Optical Engineering, Vol. 46, Issue 12, 120502 (December 2007). https://doi.org/10.1117/1.2821866

Abstract

Feature extraction is a crucial step for pattern recognition. Recently, some manifold learning algorithms have drawn much attention. Although their properties of locality preserving are fairly significant, most manifold-based algorithms have limits to solve classification problems. First, they do not have good discriminant ability. Second, they fail to remove the redundancy among the extracted features. We present a new feature extraction method, called kernel uncorrelated neighborhood discriminative embedding (KUNDE), which integrates two abilities of manifold learning and pattern classification. The purpose of KUNDE is to preserve the within-class neighboring geometry while maximizing the between-class scatter. Optimizing an objective function in a kernel feature space, nonlinear features are extracted. Moreover, by putting a simple uncorrelated constraint on the computation of the basis vectors, the extracted features via KUNDE are statistically uncorrelated and thus contain minimum redundancy. Experimental results on radar target recognition indicate the promising performance of the proposed method.

1. Introduction

In the past few years, some nonlinear manifold learning algorithms have been proposed to discover the nonlinear structure of the manifold by investigating the local geometry of samples, such as isomap,¹ locally linear embedding (LLE),^{2, 3} and Laplacian eigenmap.⁴ These methods are defined only on training data, and the issue of how to map new test data remains difficult. Therefore, they cannot be applied directly to recognition problems.

Recently, some manifold-based linear algorithms, like neighborhood preserving embedding (NPE),⁵ resolve the difficulty by finding a mapping on the whole data space, not just on training data. However, they have a common inherent limitation: they deemphasize discriminant information that is very important for the recognition task. In addition, the basis vectors of these methods are statistically correlated, and then the extracted features contain much redundancy. As a result, the overlapped information can distort the distribution of the features and even degrade the recognition performance.

In this work, a new feature extraction algorithm, called kernel uncorrelated neighborhood discriminative embedding (KUNDE) is proposed to address the problems mentioned earlier. On the one hand, the method explicitly considers both the within-class neighboring information and the between-class scatter information and emphasizes the discriminant information. On the other hand, the method obtains statistically uncorrelated features with minimum redundancy by putting a simple uncorrelated constraint on the computation of the basis vectors. Mapping the input data to some high-dimensional feature space using the kernel technique, nonlinear features are extracted.

2. Kernel Uncorrelated Neighborhood Discriminative Embedding

Given a data set $X = [x_{1}, \dots, x_{N}]$ in $R^{D}$ , suppose that each data point $x_{i}$ belongs to one of the $C$ classes $ω_{1}, \dots, ω_{C}$ , and each class contains $n_{c}$ $(c = 1, \dots, C)$ samples. The data is then mapped into an implicit high-dimensional feature space $F$ by a nonlinear mapping function $ϕ : x ∊ R^{D} \to ϕ (x) ∊ F$ . The problem is to find a transformation matrix $V$ that maps these points to be new points $Y = [y_{1}, \dots, y_{N}] = V^{T} ϕ (X)$ in $R^{d} (d ⪡ D)$ , where $y_{i} = V^{T} ϕ (x_{i})$ .

The algorithm procedure of KUNDE is stated here:

1. Construct the kernel matrix $K = ϕ (X)^{T} ϕ (X)$ whose elements are $K_{i j} = k (x_{i}, x_{j})$ , where $k$ is a kernel function that satisfies $k (x_{i}, x_{j}) = [ϕ (x_{i}) \cdot ϕ (x_{j})]$ .
2. Compute the weight matrix $W$ by minimizing the reconstruction error $E (W) = \sum_{i} ‖ ϕ (x_{i}) - \sum_{j} W_{i j} ϕ (x_{j}) ‖^{2}$ , where $\sum_{j} W_{i j} = 1$ , and $W_{i j} \neq 0$ if $ϕ (x_{j})$ is one of the $n$ identical-label nearest neighbors of $ϕ (x_{i})$ ; otherwise, $W_{i j} = 0$ . An efficient way to minimize the error can refer to Ref. 3.
3. Compute matrices $M$ , $L$ , and $G$ as follows: $M = (I - W) (I - W)^{T}$ , $L = I - E$ , and $G = I - (1 ∕ N) {ee}^{T}$ , where $I$ is an identity matrix, $E_{i j} = 1 ∕ n_{c}$ if $x_{i}$ and $x_{j}$ belong to the $c$ th class; otherwise, $E_{i j} = 0$ , and $e = (1, \dots, 1)^{T}$ .
4. Solve the generalized eigenvalue problem $K (M + L) {Ka}_{i} = λ_{i} {KGKa}_{i}$ , with $λ_{1} < \dots < λ_{d}$ , and constitute the matrix $A = [a_{1}, \dots, a_{d}]$ .
5. For any data point $x$ in $R^{D}$ , the embedded feature in $R^{d}$ is given by $y = V^{T} ϕ (x) = A^{T} [k (x_{1}, x), \dots, k (x_{N}, x)]^{T}$ .

3. Theoretical Justification

In this section, we provide the theoretical analysis of the KUNDE algorithm. As mentioned earlier, the objective of KUNDE is to preserve the within-class neighboring geometry while maximizing the between-class scatter in the low-dimensional space.

First, we characterize the within-class geometry of each data point in the feature space $F$ by linear coefficients that reconstruct the data point from its $n$ identical-label nearest neighbors. To preserve the within-class neighboring relations, the basic idea is that the same weights that reconstruct the point $ϕ (x_{i})$ in $F$ should also reconstruct its embedded counterpart $y_{i}$ in $R^{d}$ . Therefore, we should minimize the following cost function:

Eq. 1

J_{1} (V) = \sum_{i} {‖ y_{i} - \sum_{j} W_{i j} y_{j} ‖}^{2} = {‖ Y (I - W) ‖}^{2}, = t r a c e [Y (I - W) {(I - W)}^{T} Y^{T}], = t r a c e [V^{T} ϕ (X) M ϕ {(X)}^{T} V],

where

M = (I - W) (I - W)^{T}

.

Second, since the purpose of KUNDE is to solve classification problems, we should make the embedded vectors from different classes far from each other. Here, we propose to maximize the between-class scatter:

Eq. 2

J_{2} (V) = t r a c e (V^{T} S_{b}^{ϕ} V),

where

S_{b}^{ϕ}

is the between-class scatter matrix in

F

. According to Ref. 6, the total scatter matrix

S_{t}^{ϕ}

, within-class scatter matrix

S_{w}^{ϕ}

, and between-class scatter matrix

S_{b}^{ϕ}

can be represented as follows:

3.

S_{t}^{ϕ} = (1 ∕ N) \sum_{i = 1}^{N} (ϕ (x_{i}) - m) {(ϕ (x_{i}) - m)}^{T},

= (1 ∕ N) ϕ (X) (I - {ee}^{T} ∕ N) ϕ {(X)}^{T} = ϕ (X) G ϕ {(X)}^{T},

S_{w}^{ϕ} = \sum_{c = 1}^{C} \sum_{x ∊ ω_{c}} [ϕ (x) - m_{c}] {[ϕ (x) - m_{c}]}^{T},

= ϕ (X) (I - E) ϕ {(X)}^{T} = ϕ (X) L ϕ {(X)}^{T},

S_{b}^{ϕ} = S_{t}^{ϕ} - S_{w}^{ϕ} = ϕ (X) (G - L) ϕ {(X)}^{T} = ϕ (X) B ϕ {(X)}^{T},

where

G = I - (1 ∕ N) {ee}^{T}

,

L = I - E

, and

B = G - L

. Thus, Eq. 2 can be rewritten as:

Eq. 4

J_{2} (V) = t r a c e [V^{T} ϕ (X) B ϕ {(X)}^{T} V] .

Combining Eqs. 1, 4, we should minimizing the following objective function:

Eq. 5

J (V) = \frac{t r a c e [V^{T} ϕ (X) M ϕ {(X)}^{T} V]}{t r a c e [V^{T} ϕ (X) B ϕ {(X)}^{T} V]} .

Now, we turn to the statistically uncorrelated constraint. Assuming that any two different components $y_{i}$ and $y_{j}$ $(j \neq i)$ of the extracted feature $y = V^{T} x$ are uncorrelated, this means that:

Eq. 6

E {[y_{i} - E (y_{i})] [y_{j} - E (y_{j})]} = v_{i}^{T} S_{t}^{ϕ} v_{j} = 0,

where

v_{i}

and

v_{j}

are two different columns of the matrix

V

. Besides,

v_{i}

should be normalized. Without loss of generalization, let

v_{i}

satisfy:

Eq. 7

v_{i}^{T} S_{t}^{ϕ} v_{i} = 1 .

Then, from Eqs. 6, 7, we get:

Eq. 8

V^{T} S_{t}^{ϕ} V = I .

As a result, KUNDE can be formulated as the following constrained minimization problem:

Eq. 9

\min_{V^{T} S_{t}^{ϕ} V = I} \frac{t r a c e [V^{T} ϕ (X) M ϕ {(X)}^{T} V]}{t r a c e [V^{T} ϕ (X) B ϕ {(X)}^{T} V]} = \min_{V^{T} ϕ (X) G ϕ {(X)}^{T} V = I} \frac{t r a c e [V^{T} ϕ (X) M ϕ {(X)}^{T} V]}{t r a c e [I - V^{T} ϕ (X) L ϕ {(X)}^{T} V]} .

Further, Eq. 9 is equivalent to:

Eq. 10

\min_{V^{T} ϕ (X) G ϕ {(X)}^{T} V = I} t r a c e [V^{T} ϕ (X) (M + L) ϕ {(X)}^{T} V] .

Since each column of

V

should lie in the span of all training samples in

F

, there exist coefficients

α_{j} (j = 1, \dots, N)

such that

v = \sum_{j = 1}^{N} α_{j} ϕ (x_{j}) = ϕ (X) a

, where

a = [α_{1}, \dots, α_{N}]^{T}

. Therefore, Eq. 10 becomes:

Eq. 11

\min_{A^{T} KGKA = I} t r a c e [A^{T} K (M + L) KA],

where

K

is the kernel matrix, which is defined in Sec. 2.

Last, the constrained minimization problem is reduced to a generalized eigenvalue problem, as follows:

Eq. 12

K (M + L) Ka = λ KGKa .

The matrix

A

is determined by the

d

eigenvectors corresponding to the first

d

smallest eigenvalues of Eq. 12. Once

A

is obtained, for any data point

x

in the input space, the nonlinear feature is given as

y = V^{T} ϕ (x) = A^{T} [k (x_{1}, x), \dots, k (x_{N}, x)]^{T}

.

4. Experimental Results

In this section, experiments are performed on radar target recognition with measured and simulated range profiles, respectively. The performance of the KUNDE algorithm is compared with those of kernel NPE (KNPE),⁵ KPCA,⁷ and KFDA.⁸ The Gaussian kernel $k (x_{i}, x_{j}) = \exp (- ‖ x_{i} - x_{j} ‖^{2} ∕ σ^{2})$ is adopted, and the parameter $σ$ is simply set to 1. Since we focus only on feature extraction, as for classification, the nearest-neighbor classifier using a Euclidean metric is employed for the sake of simplicity.

4.1.

Experiments on Measured Data

The measured data are from three flying airplanes, including An-26, Yark-42, and Cessna Citation S/II. Each airplane has 260 range profiles. In the experiments, each profile is preprocessed by energy normalization. Then, for each airplane, one third of all profiles are used for training and the rest for testing. Figure 1 shows the plots of recognition rates obtained by each method versus the reduced dimensionality. Note that the dimensionality of KFDA is at most $c - 1$ , where $c$ is the number of classes.

Fig. 1

Recognition rate versus reduced dimensionality on measured range profiles.

From Fig. 1, it can be seen that KUNDE achieves the best recognition results at each feature dimensionality and KFDA performs second best, while KNPE and KPCA are relatively poor since neither of them considers discrimination information. This indicates that KUNDE has more discriminative power than the other three methods by incorporating the within-class neighboring information and the between-class scatter information and that the statistically uncorrelated features are very helpful for improving the recognition performance.

4.2.

Experiments on Simulated Data

The simulated profiles are from six airplanes: Mirage, IDF, F16, J8II, SU27, and E2C, and each airplane has 60 profiles. Similarly, each profile is normalized to unit energy, and 20 profiles per target are used for training and the rest for testing. The recognition results are shown in Fig. 2. The results suggest that KUNDE outperforms again all the other methods. In addition, it can be seen that as the feature dimensionality increases, KUNDE has a higher recognition rate and KPCA and KNPE can also give satisfactory results.

Fig. 2

Recognition rate versus reduced dimensionality on simulated range profiles.

5. Conclusions

A novel algorithm called kernel uncorrelated neighborhood discriminative embedding (KUNDE) is presented for pattern recognition. KUNDE has two prominent characteristics. First, it integrates the within-class neighboring information and the between-class scatter information, which enables its powerful discriminative ability. Second, it extracts statistically uncorrelated features with minimum redundancy by introducing an uncorrelated constraint, which is helpful to improve recognition performance. Experimental results on radar target recognition show that KUNDE achieves better recognition performance than all the other involved methods.

Acknowledgments

The authors would like to thank the anonymous reviewer and editors for their helpful comments and suggestions. This work is partially supported by the Key Project of the Chinese Ministry of Education (Grant No. 105150) and the foundation of the ATR Key Lab (Grant No. 51483010305DZ0207).

References

1.

J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, 290 (5500), 2319 –2323 (2000). https://doi.org/10.1126/science.290.5500.2319 0036-8075 Google Scholar

2.

S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, 290 (5500), 2323 –2326 (2000). https://doi.org/10.1126/science.290.5500.2323 0036-8075 Google Scholar

3.

L. K. Saul and S. T. Roweis, “Think globally, fit locally: unsupervised learning of low dimensional manifolds,” J. Mach. Learn. Res., 4 119 –155 (2003). 1532-4435 Google Scholar

4.

M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Comput., 5 (6), 1373 –1396 (2003). 0899-7667 Google Scholar

5.

X. F. He, D. Cai, S. C. Yan, and H. J. Zhang, “Neighborhood preserving embedding,” (2005). Google Scholar

6.

X. F. He, S. C. Yan, Y. X. Hu, P. Niyogi, and H. J. Zhang, “Face recognition using Laplacian faces,” IEEE Trans. Pattern Anal. Mach. Intell., 27 (3), 328 –340 (2005). https://doi.org/10.1109/TPAMI.2005.55 0162-8828 Google Scholar

7.

B. Scholkopf, A. Smola, and K. R. Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Comput., 10 1299 –1319 (1998). https://doi.org/10.1162/089976698300017467 0899-7667 Google Scholar

8.

Q. S. Liu, R. Huang, H. Q. Lu, and S. D. Ma, “Face recognition using kernel-based fisher discriminant analysis,” (2002). Google Scholar

Citation Download Citation

Xuelian Yu and Xuegang Wang "Kernel uncorrelated neighborhood discriminative embedding for feature extraction," Optical Engineering 46(12), 120502 (1 December 2007). https://doi.org/10.1117/1.2821866

Published: 1 December 2007

Access the abstract

JOURNAL ARTICLE
3 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Feature extraction

Detection and tracking algorithms

Radar

Target recognition

Associative arrays

Pattern recognition

Image classification

1.

Introduction

2.