doi:10.1016/j.neucom.2004.11.036
Copyright © 2005 Elsevier B.V. All rights reserved.
Tools for application-driven linear dimension reduction
aDepartment of Statistics, Florida State University, Tallahassee, FL 32306, USA
bDepartment of Computer Science, Florida State University, Tallahassee, FL 32306, USA
Received 22 March 2004;
revised 26 August 2004;
accepted 18 November 2004.
Communicated by S. Fiori.
Available online 13 June 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Simplicity and efficiency of linear transformations make them a popular tool for extracting features and reducing dimension before or during statistical analysis of large datasets. Examples of their applications include image compression and reconstruction, discriminant analysis, pattern classification, and image or text retrieval. Linear transformations with natural orthogonality constraints can be represented as elements of Stiefel and Grassmann manifolds. We advocate that the choice of a transformation for dimension reduction is not standard; it is dictated by the application and the data set, and can be formulated as an optimization problem on these above-mentioned manifolds. We demonstrate this idea by deriving dimension-reducing transformations in several applications, including image-based recognition of objects and content-based retrieval of images.
Keywords: Stochastic optimization; Grassmann; Stiefel; Optimal feature selection; Sparse representations; Optimization on manifolds
Fig. 1. Performance of Xsδ versus s for three random initial conditions.
Fig. 2. Temporal evolution of the optimization algorithm: (a) plots of retrieval precision (solid line) and the corresponding recall (dotted line); and (b) distance of Xsδ from X0.
Fig. 3. Evolution of sparseness performance (left panels in each row) on ORL dataset starting at three different initial conditions. Middle panels show the corresponding recognition performance for the same process and right panels plot the distance of Xsδ from X0 to show the effectiveness of the updating: (a) X0=UFastICA, (b) X0=UFDA and (c) X0=UPCA.
Fig. 4. Joint optimization over recognition and sparsity functions. Left panels plot the jointed performance, and the middle panels display the corresponding sparsity and performance functions. The three rows show results from three different initial conditions as in Fig. 3. Here λ1=0.10 and λ2=0.90: (a) X0=UFastICA, (b) X0=UFDA and (c)X0=UPCA.
Fig. 5. The search process of a linear representation with better generalization: (a) Top: F(Xsδ) on the training set (solid line) and on a separate test set (dotted line) using the nearest-neighbor classifier; bottom: the difference between training and test performance (solid line). Here X0 is obtained using the FastICA algorithm [21]. (b) The distances between a training image and all the training ones using X0 (top) and X5000δ (bottom). The thick segment corresponds to the correct class. (c) As in (b) but for a test image from the same class.