Subspace manifold learning with sample weights

https://doi.org/10.1016/j.imavis.2006.10.007Get rights and content

Abstract

Subspace manifold learning represents a popular class of techniques in statistical image analysis and object recognition. Recent research in the field has focused on nonlinear representations; locally linear embedding (LLE) is one such technique that has recently gained popularity. We present and apply a generalization of LLE that introduces sample weights. We demonstrate the application of the technique to face recognition, where a model exists to describe each face’s probability of occurrence. These probabilities are used as weights in the learning of the low-dimensional face manifold. Results of face recognition using this approach are compared against standard nonweighted LLE and PCA. A significant improvement in recognition rates is realized using weighted LLE on a data set where face occurrences follow the modeled distribution.

Introduction

The problems of dimensionality reduction and subspace learning are active research topics in machine learning and statistical image analysis [1]. In this context, the goal has often been related to mitigating the effects of the curse of dimensionality [2], compression (e.g., [3]) or the uncovering of latent variables (e.g., blind source separation [4], factor analysis [5]). Specialized dimensionality reduction techniques have also been developed for visualization of high-dimensional data (e.g., [6]).

Subspace learning techniques have also been successfully applied in machine vision, especially in the context of face recognition where they have gained considerable popularity [7], [8]. The strong interest in face recognition has been motivated by applications ranging from authentication and security to expression recognition and user interface design. While humans are highly adept at recognizing faces, this task remains a significant challenge for machines. Typically, a face recognition system is trained offline with a set of labeled images prior to being presented with a novel image to recognize. The challenge is to maximize the amount of relevant detail learned from the training data with minimum sensitivity to transformations such as pose and illumination.

Face images are especially suitable for subspace learning: faces are mostly symmetrical, contain many textured and smooth surfaces and have a fairly constant appearance, resulting in strong correlation. Faces normally appear upright, and in many applications a frontal view is available. For these reasons, most face recognition systems have used image-based representations, where faces are represented with characteristic images, and dimensionality reduction techniques are applied for increased storage and comparison efficiency. These techniques effectively learn a subspace of the space spanned by the original input, on which the face images in the training data (approximately) lie. By extension, novel face images are expected to lie close to this manifold.

Some face recognition systems are holistic and use a single model to represent the entire face (e.g., [9]) while others use geometrical configurations of local features (recognition by parts, e.g., [10]). Our empirical evaluation takes the former approach, but subspace learning also plays a role in parts-based systems, where it may be applied to each part.

Face recognition has been an extremely fertile research area, and has resulted in many specialized techniques. A complete review is beyond the scope of this paper, but Zhao et al. [8] survey many of the popular techniques in the field. Here, we limit our discussion to the main subspace-based approaches.

Kirby and Sirovich [3] first proposed a low-dimensional representation for face images, based on principal component analysis (PCA, [11]). PCA is a linear transformation into a lower-dimensional coordinate system that preserves maximum variance in the data, thus minimizing mean-square error, computed as a subset of the Karhunen–Loeve rotation [12]. Turk and Pentland later applied this technique to face recognition and detection [13], and introduced the notion of face space – the space spanned by the transformation, and the faceness of novel images was measured by their distance to face space. The database consisted of low-dimensional PCA projections of the training data, and face recognition was performed by applying a nearest-neighbor classifier in the reduced subspace.

Turk and Pentland’s technique has been very influential but PCA’s linear model is suboptimal for image data. Murase and Nayar [14] present an extension of Turk and Pentland’s technique that represents continuous appearance variations of objects using spline interpolation. The test image is first projected onto a low-dimensional space, in order to identify the object. Once the object is identified, the image is projected onto a new subspace, defined specifically for that object. The resulting subspaces are nonlinear and appear as manifolds in high-dimensional space.

Other linear dimensionality reduction techniques that have been successfully applied to face recognition include independent component analysis (ICA, [15]) and linear discriminant analysis (LDA, [16]). ICA seeks components that are statistically independent (rather than de-correlated). It is argued to provide a more localized decomposition than PCA. LDA is a linear transformation related to the Fisher linear discriminant (FLD, [17]), that seeks to maximize class separability in the projection, based on known class labels [12].

As a least-squares technique, PCA suffers from sensitivity to outliers. Several approaches have been proposed to increase the robustness of PCA. De la Torre and Black propose using M-estimators [18]. Skočaj et al. [19] use a generalized version of PCA that introduces image and pixel weights, and the effect of outliers in the data is reduced by reducing their weights. The weights control the learning of the subspace (the training phase) and are also used in the recognition classifier. Weighted subspace learning is also a central element of our technique, although the motivation for introducing the weights is different.

The space spanned by images of an object under different variations is highly nonlinear, and the application of linear dimensionality reduction techniques to image data results in suboptimal object recognition performance [20], [21], [22], [23]. Several of the recently proposed subspace learning techniques model the subspace manifold as a connected patchwork of locally linear surfaces. These models are especially well suited to manifolds where the intrinsic dimensionality varies in different areas of the manifold or where the locally intrinsically low-dimensional patches have a globally varying orientation. Also, in cases where the manifold is discontinuous, these techniques make it possible to model clusters separately. One popular technique from this category is locally linear embedding (LLE, [21]), which is particularly appealing due to its simplicity and the existence of a closed-form solution (discounting the computation of the eigenvectors). LLE computes the local structure around each input point based on its neighbors. LLE has been applied to face recognition, and the resultant face manifold has been shown to provide better classification opportunities than the face space produced by PCA. This suggests that for the purpose of face recognition, the local structure of the manifold is a better discriminant than the global Euclidean structure.

In addition to controlling the effect of outliers, weighted input data can also be used to tune the learning of the manifold, so that data samples can be considered according to their reliability or significance. This can be useful in a face recognition system that is trained with a large number of appearances for each face, where a likelihood function can be defined for the occurrence of the different appearances in a novel image. In nonweighted subspace learning algorithms, the user needs to balance the need for a sufficient number of training samples with the risk that including uncommon and infrequent appearances in the training data may adversely affect the representation of the common appearances. Our approach extends a popular nonlinear manifold learning technique, namely LLE, to work with sample weights. If the probability of occurrence for different face appearances is known (or can be estimated), then our technique eliminates this dilemma, and effectively models the subspace based on the available weights.

We propose a locally linear dimensionality reduction technique based on locally linear embedding (LLE), where the input data is labeled with weights, which are used to bias the transformation to model certain parts of the input more effectively than others. This paper consists of four sections. The first section has motivated the need for sample weights in dimensionality reduction and for local learning of the manifold. Section 2 describes our technical approach and the algorithm used. In Section 3 we present empirical results of our algorithm applied to face recognition. Section 4 summarizes our work. Finally, Section 5 suggests directions for future research.

Section snippets

Sample weights

Our technique extends LLE to allow for weighted samples in the training phase where the weights bias the transformation to favor certain input points over others. If input data represents observations from an unknown manifold, then weights may be used to represent the reliability of the observations.

Assigning weights to training samples can come in useful in various scenarios. For example, in a face recognition application, higher weights may be assigned to areas of the face considered more

Empirical evaluation

We have designed a series of empirical tests to analyze the effectiveness of our method. We used a standard database of face images and added images of faces rotated in-plane by small angles. We chose in-plane rotation as a mutator since it can be easily generated synthetically and since faces normally appear more or less upright. The face images in the resulting database were assigned normally-distributed probabilities of occurrence with respect to the rotation angle, with a mean at the

Summary

We have presented a novel approach to weighted manifold learning, by extending the locally linear embedding (LLE) algorithm with sample weights. The weights influence the computation of the low dimensional embedding by biasing the modeling of the neighborhoods in favor of data points with higher weights. This technique may be used where input observations have associated measures of reliability, significance or probability of occurring in a test data point.

We tested our technique on face

Future work

The selection of neighbors at each point has a significant effect on the transformation (both in the weighted and nonweighted case). To date, little research has been done on techniques for selecting these neighborhood graphs, and values for k or ε are often chosen empirically. The introduction of weights further complicates this issue. With weights, the neighborhood size is no longer dictated only by the locality of the features to be preserved by the transformation. Care must also be taken to

Acknowledgment

The authors gratefully acknowledge the help of Konstantinos Derpanis in reviewing this paper.

References (25)

  • C. Jutten et al.

    Blind separation of sources

    Signal Processing

    (1991)
  • B. Draper et al.

    Recognizing faces with PCA and ICA

    Computer Vision and Image Understanding

    (2003)
  • E.J. Keogh et al.

    Dimensionality reduction for fast similarity search in large time series databases

    Knowledge and Information Systems

    (2001)
  • R. Bellman

    Adaptive Control Processes: A Guided Tour

    (1961)
  • M. Kirby et al.

    Low-dimensional procedure for the characterization of human faces

    Optical Society of America

    (1987)
  • L.L. Thurstone

    Multiple factor analysis

    Psychological Review

    (1931)
  • C.M. Bishop et al.

    A hierarchical latent variable model for data visualization

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1998)
  • M.-H. Yang et al.

    Detecting faces in images: a survey

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • W. Zhao et al.

    Face recognition: a literature survey

    ACM Computing Surveys

    (2003)
  • W. Zhao et al.

    Discriminant analysis of principal components for face recognition

  • S.Z. Li, X. Hou, H. Zhang, Q. Cheng, Learning spatially localized, parts-based representation., in: IEEE Conference on...
  • I. Jolliffe

    Principal Component Analysis

    (1986)
  • Cited by (1)

    View full text