An improved branch and bound algorithm for feature selection

https://doi.org/10.1016/S0167-8655(03)00020-5Get rights and content

Abstract

Feature selection plays an important role in pattern classification. In this paper, we present an improved branch and bound algorithm for optimal feature subset selection. This algorithm searches for an optimal solution in a large solution tree in an efficient manner by cutting unnecessary paths which are guaranteed not to contain the optimal solution. Our experimental results demonstrate the effectiveness of the new algorithm.

Introduction

Feature selection plays an important rule in pattern recognition applications. For example, in medical diagnosis, we need to evaluate the effectiveness of various feature combinations and select effective ones for classification. By using a subset of features, the processing time required by the classification process is reduced.

Feature selection is to select a subset of m features from a larger set of n features to optimize the value of a criterion function J over all subsets of the size m. There are many different feature selection algorithms used in the literature. Sequential forward selection (SFS) and sequential backward selection (SBS) (e.g., Fukunaga, 1992) are two widely used sequential feature selection methods. The SFS method first selects the best single feature and then adds one feature at a time which in combination with the selected features maximizes the criterion function J; the SBS method starts with all input features and successively deletes one feature at a time. Both SFS and SBS methods are computationally attractive. By dynamically controlling the number of forward and backtracking steps, floating sequential search methods (e.g., Pudil et al., 1994) have been shown to perform better than both SFS and SBS algorithms. However, all these sequential methods are not guaranteed to produce a feature subset which yields the global maximum value of the criterion among all possible subsets, since “the best pair of features need not contain the best single feature” (e.g., Jain et al., 2000).

In this paper, we are interested in selecting a feature subset that is the best in terms of a given criterion function. The only optimal feature selection algorithms are exhaustive search and branch and bound (BB) (e.g., Narendra and Fukunaga, 1977). In an exhaustive search, the best solution is found by evaluating the criterion function J over all possible combinations of features, which is impractical or impossible, since the number of combinations of features increases exponentially as the dimensionality increases. The other optimal feature selection method which avoids an exhaustive search is the BB algorithm. This algorithm is efficient because it avoids an exhaustive search of the whole search space by rejecting many subsets which are guaranteed to be suboptimal and it guarantees that the selected subset is the global optimal solution for a given criterion function. The BB algorithm works for many practical problems for which an exhaustive search would be impractical or impossible. A more efficient BB (BB+) algorithm (e.g., Yu and Yuan, 1993) outperforms the original BB algorithm by reducing redundant J evaluations in a BB algorithm. Most recently, Somol et al. (2000) proposed a fast branch and bound algorithm (FBB) with prediction mechanism to evaluate J values.

In this paper, an improved BB algorithm for optimal feature selection is proposed, that further reduces redundant J evaluations and searches for the optimal solution in an efficient way. Results show that the proposed algorithm is faster than BB, BB+ and FBB algorithms.

The paper is organized into five sections. Section 2 describes BB and BB+ algorithms. The proposed feature selection algorithm is presented in Section 3. In Section 4, we give the experimental results. Finally, conclusions are drawn in Section 5.

Section snippets

The branch and bound algorithm

A brief description of the basic BB algorithm follows, as our method for feature selection uses modifications to the basic BB algorithm (e.g., Narendra and Fukunaga, 1977). We consider the problem of selecting the best set of m features out of n original features that maximizes some criteria function J, where J satisfies the monotonicity condition, i.e. for two feature sets S1 and S2, if S1 is a subset of S2, J(S1)<J(S2). A large number of criterion functions such as discriminant functions and

The improved branch and bound algorithm

Both BB and BB+ algorithms perform “top–down” search with backtracking. We now introduce our improved branch and bound (IBB) algorithm, which employs top–down and right–left search strategies together with backtracking. IBB fully utilizes the information gained from previous searches, which is ignored by both BB and BB+ algorithms.

We consider an example. Assume that the J value by removing node X in Fig. 1 (i.e. removing features (2,3), or retaining feature set (1,4,5) for the computation of J)

Experimental results

To evaluate the proposed IBB algorithm, we use three different data sets from the UCI repository (ftp.ics.uci.edu): two letter image recognition data sets (e.g., Frey and Slate, 1991) and one mammogram data from the Wisconsin Diagnostic Breast Center (e.g., Wolberg et al., 1994). The letter recognition data set consists of 20,000 samples with 26 classes (26 letters). Sixteen features were extracted for each sample. Our first data set includes two different letters A and Z with 789 As and 734 Zs

Conclusions

One of the challenges in classification problems is to select features to use and to reduce the dimensionality of feature space, while maintaining information useful for discrimination between different classes. In this paper, an IBB algorithm for optimal feature selection has been proposed. This algorithm searches for an optimal solution in a large solution tree in an efficient manner by cutting unnecessary paths which are guaranteed not to contain the optimal solution. The proposed IBB

Acknowledgements

The authors would like to thank the reviewers for their valuable comments.

References (8)

There are more references available in the full text version of this article.

Cited by (0)

View full text