ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Computational Statistics & Data Analysis
Volume 51, Issue 5, 1 February 2007, Pages 2461-2486
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (957 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csda.2006.08.033    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2006 Elsevier B.V. All rights reserved.

A practical approximation algorithm for the LMS line estimatorstar, open

David M. Mounta, Corresponding Author Contact Information, 1, E-mail The Corresponding Author, Nathan S. Netanyahub, c, E-mail The Corresponding Author, E-mail The Corresponding Author, Kathleen Romanikd, 2, E-mail The Corresponding Author, Ruth Silvermanc, E-mail The Corresponding Author and Angela Y. Wue, E-mail The Corresponding Author

aDepartment of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA bDepartment of Computer Science, Bar-Ilan University, Ramat-Gan, Israel cCenter for Automation Research, University of Maryland, College Park, MD, USA dWhite Oak Technologies, Inc., Silver Spring, MD, USA eDepartment of Computer Science, American University, Washington, DC, USA

Received 1 March 2005; 
accepted 25 August 2006. 
Available online 2 October 2006.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

The problem of fitting a straight line to a finite collection of points in the plane is an important problem in statistical estimation. Robust estimators are widely used because of their lack of sensitivity to outlying data points. The least median-of-squares (LMS) regression line estimator is among the best known robust estimators. Given a set of n points in the plane, it is defined to be the line that minimizes the median squared residual or, more generally, the line that minimizes the residual of any given quantile q, where 0<qless-than-or-equals, slant1. This problem is equivalent to finding the strip defined by two parallel lines of minimum vertical separation that encloses at least half of the points.

The best known exact algorithm for this problem runs in O(n2) time. We consider two types of approximations, a residual approximation, which approximates the vertical height of the strip to within a given error bound εrgreater-or-equal, slanted0, and a quantile approximation, which approximates the fraction of points that lie within the strip to within a given error bound εqgreater-or-equal, slanted0. We present two randomized approximation algorithms for the LMS line estimator. The first is a conceptually simple quantile approximation algorithm, which given fixed q and εq>0 runs in O(nlogn) time. The second is a practical algorithm, which can solve both types of approximation problems or be used as an exact algorithm. We prove that when used as a quantile approximation, this algorithm's expected running time is View the MathML source. We present empirical evidence that the latter algorithm is quite efficient for a wide variety of input distributions, even when used as an exact algorithm.

Keywords: Least median-of-squares regression; Robust estimation; Line fitting; Approximation algorithms; Randomized algorithms; Line arrangements

Article Outline

1. Introduction
1.1. Exact algorithms for LMS
1.2. Approximating LMS
1.3. Summary of results
2. Computational methods
2.1. LMS and point-line duality
2.2. Line arrangements, levels, and LMS
2.3. Searching an arrangement
3. A randomized quantile approximation algorithm
4. A practical approach: slope decomposition
4.1. Upper bound
4.2. Pseudo-levels and the lower bound
4.3. Overall processing
5. Analysis of the slope-decomposition algorithm
6. Experimental results
6.1. Input size and quantile error factor
6.2. Quantile error and residual error factors
6.3. Inlier noise
6.4. Different distributions and actual error
7. Concluding remarks
References

















 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.