doi:10.1016/S0031-3203(01)00245-X
Copyright © 2002 Pattern Recognition Society. Published by Elsevier Science B.V.
Feature selection toolbox*1
Department of Pattern Recognition, Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, 182 08, Prague 8, Czech Republic
Received 31 October 2001;
accepted 31 October 2001.
Available online 13 January 2002.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
A software package developed for the purpose of feature selection in statistical pattern recognition is presented. The software tool includes both several classical and new methods suitable for dimensionality reduction, classification and data representation. Examples of solved problems are given, as well as observations regarding the behavior of criterion functions.
Author Keywords: Pattern recognition; Feature selection; Subset search; Search methods; Software toolbox
Fig. 1. Feature selection toolbox—windows GUI workplace.
Fig. 2. Visual comparison of 2D projections of approximation models estimated by means of the approximation method on marble data (see in text): (a) single mixture component, (b) 2 mixture components, and (c) 5 mixture components. Ellipses illustrate the equipotential component planes, component weights are not displayed.
Fig. 3. Approximation model based methods performance on the speech data. The screenshot shows the way FST stores numerical results. Different lines may be selected for graph display using specified colors and/or line thickness and shapes, as shown on Fig. 4.
Fig. 4. Subset search methods performance as shown by the FST graphic output. The left picture demonstrates sub-optimal methods performance comparison, i.e. maximal achieved criterion values for subsets of 5 to 24 features. The right picture demonstrates optimal methods performance comparison, i.e. computational time needed to find optimal subsets of 1 to 29 features.
Fig. 5. Visual comparison of 2D subspaces found on 20-dimensional marble data by maximizing: (a) Bhattacharyya (the same was found by Generalized Mahalanobis), (b) Divergence, (c) Patrick–Fischer distances. Mixture model methods using 5 components results: approximation method—(d), and divergence method—(e). Picture (f) demonstrates a subspace unsuitable for discrimination (found by minimizing the Bhattacharyya distance).
Fig. 6. Visual comparison of 2D subspaces found on less separable 30-dimensional mammogram data by maximizing: (a) Bhattacharyya (the same was found by Divergence), (b) Generalized Mahalanobis, (c) Patrick–Fischer distances. Mixture model methods using 5 components results: approximation method—(d), divergence method—(e). Picture (f) demonstrates a subspace unsuitable for discrimination (found by minimizing the Bhattacharyya distance).
Table 1. Error rates (%) of different classifiers with different parameters

The ‘Gauss’ column contains results of a Gaussian classifier. Other columns contain results obtained using the ‘approximation’ method (in this case the ‘divergence’ method yielded the same results). Results in second row for each data have been obtained after preliminary cluster-detection used to initialise the ‘approximation’ method. 5c means 5 components of mixture, etc.
Table 2. Criterion functions comparison on 2-class speech data

Single features have been ordered increasingly according to individual criterion values, i.e. the “individual discriminative power”.
Table 3. Criterion functions comparison on 2-class speech data

The table shows subsets of seven features selected to maximize different criteria. In contrast the last line shows a criterion-minimizing subset.