ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Data & Knowledge Engineering
Volume 61, Issue 3, June 2007, Pages 554-562
Advances on Natural Language Processing - NLDB 05
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (231 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.datak.2006.06.017    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2006 Elsevier B.V. All rights reserved.

Regression analysis for massive datasets

Tsai-Hung Fana, Corresponding Author Contact Information, E-mail The Corresponding Author, Dennis K.J. Linb and Kuang-Fu Chenga

aGraduate Institute of Statistics, National Central University, 300 Jhongda Road, Jhongli 320, Taiwan, ROC bDepartment of Supply Chain and Information Systems, The Pennsylvania State University, PA, United States

Received 27 June 2006; 
accepted 27 June 2006. 
Available online 24 July 2006.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

In the past decades, we have witnessed a revolution in information technology. Routine collection of systematically generated data is now commonplace. Databases with hundreds of fields (variables), and billions of records (observations) are not unusual. This presents a difficulty for classical data analysis methods, mainly due to the limitation of computer memory and computational costs (in time, for example). In this paper, we propose an intelligent regression analysis methodology which is suitable for modeling massive datasets. The basic idea here is to split the entire dataset into several blocks, applying the classical regression techniques for data in each block, and finally combining these regression results via weighted averages. Theoretical justification of the goodness of the proposed method is given, and empirical performance based on extensive simulation study is discussed.

Keywords: Best linear unbiased estimator; Minimum variance; Optimal weight

Article Outline

1. Introduction
2. Statistical inference on large datasets
3. Regression analysis for massive datasets
4. The proposed weighted average procedure
5. Simulation study and example
6. Discussion and conclusion
Acknowledgements
References
Vitae

Data & Knowledge Engineering
Volume 61, Issue 3, June 2007, Pages 554-562
Advances on Natural Language Processing - NLDB 05
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.