ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Computational Statistics & Data Analysis
Volume 43, Issue 3, 28 July 2003, Pages 341-355
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (755 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S0167-9473(02)00235-9    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2002 Elsevier B.V. All rights reserved.

Resampling methods for variable selection in robust regression

James W. WisnowskiCorresponding Author Contact Information, E-mail The Corresponding Author, a, James R. SimpsonE-mail The Corresponding Author, b, Douglas C. MontgomeryE-mail The Corresponding Author, c and George C. RungerE-mail The Corresponding Author, c

a Department of Mathematical Sciences, United States Air Force Academy, USAF/DFMS, 2354 Fairchild Dr., Suite 6D2A CO 80840-6252, USA b Department of Industrial and Manufacturing Engineering, Florida State University, Florida A&M University, Tallahassee, FL 32310-6046, USA c Department of Industrial Engineering, Arizona State University, Tempe, AZ 85287-5906, USA

Received 1 September 2001; 
revised 1 July 2002; 
accepted 1 July 2002. ;
Available online 24 October 2002.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

With the inundation of large data sets requiring analysis and empirical model building, outliers have become commonplace. Fortunately, several standard statistical software packages have allowed practitioners to use robust regression estimators to easily fit data sets that are contaminated with outliers. However, little guidance is available for selecting the best subset of the predictor variables when using these robust estimators. We initially consider cross-validation and bootstrap resampling methods that have performed well for least-squares variable selection. It turns out that these variable selection methods cannot be directly applied to contaminated data sets using a robust estimation scheme. The prediction errors, inflated by the outliers, are not reliable measures of how well the robust model fits the data.

As a result, new resampling variable selection methods are proposed by introducing alternative estimates of prediction error in the contaminated model. We demonstrate that, although robust estimation and resampling variable selection are computationally complex procedures, we can combine both techniques for superior results using modest computational resources. Monte Carlo simulation is used to evaluate the proposed variable selection procedures against alternatives through a designed experiment approach. The experiment factors include percentage of outliers, outlier geometry, bootstrap sample size, number of bootstrap samples, and cross-validation assessment size. The results are summarized and recommendations for use are provided.

Author Keywords: Outliers; Robust regression; Variable selection; Bootstrap; Cross-validation

Article Outline

1. Introduction
2. Resampling measures of prediction error
2.1. Cross-validation techniques
2.2. Bootstrap estimators
3. A proposed criterion for variable selection
4. Variable selection in the presence of outliers
4.1. Variable selection with robust regression estimators
4.2. Modified Gunst and Mason data
4.3. A designed experiment for resampling methods with compound estimators
4.3.1. Planning the simulation experiment
4.3.2. Simulation results
5. Summary
References


 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.