ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Pattern Recognition Letters
Volume 24, Issues 9-10, June 2003, Pages 1215-1225
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (189 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S0167-8655(02)00303-3    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2002 Elsevier Science B.V. All rights reserved.

Feature subset selection using a new definition of classifiability*1

Ming Donga and Ravi KothariCorresponding Author Contact Information, E-mail The Corresponding Author, b

a Computer Science Department, Wayne State University, Detroit, MI 48202, USA b IBM––India Research Lab., Block I, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India

Received 25 July 2001; 
revised 2 August 2002. 
Available online 10 December 2002.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

The performance of most practical classifiers improves when correlated or irrelevant features are removed. Machine based classification is thus often preceded by subset selection––a procedure which identifies relevant features of a high dimensional data set. At present, the most widely used subset selection technique is the so-called “wrapper” approach in which a search algorithm is used to identify candidate subsets and the actual classifier is used as a “black box” to evaluate the fitness of the subset. Fitness evaluation of the subset however requires cross-validation or other resampling based procedure for error estimation necessitating the construction of a large number of classifiers for each subset. This significant computational burden makes the wrapper approach impractical when a large number of features are present.

In this paper, we present an approach to subset selection based on a novel definition of the classifiability of a given data. The classifiability measure we propose characterizes the relative ease with which some labeled data can be classified. We use this definition of classifiability to systematically add the feature which leads to the most increase in classifiability. The proposed approach does not require the construction of classifiers at each step and therefore does not suffer from as high a computational burden as a wrapper approach. Our results over several different data sets indicate that the results obtained are at least as good as that obtained with the wrapper approach.

Author Keywords: Feature selection; Dimensionality reduction; Classification

Article Outline

1. Introduction
2. The wrapper approach to subset selection
3. A new definition of classifiability
4. Classifiability based subset selection
5. Experimental results
5.1. Simulation group I
5.2. Simulation group II
5.3. Simulation group III
5.4. Simulation group IV
6. Conclusions
References



 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.