ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Computational Statistics & Data Analysis
Volume 52, Issue 2, 15 October 2007, Pages 750-762
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (345 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
There are no related articles for this article.
 
View Record in Scopus
 
doi:10.1016/j.csda.2007.03.010    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2007 Elsevier B.V. All rights reserved.

Sampling streaming data with replacement

Byung-Hoon ParkCorresponding Author Contact Information, a, E-mail The Corresponding Author, George Ostrouchova and Nagiza F. Samatovaa

aComputer Science and Mathematics Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831-6367, USA

Available online 15 March 2007.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

Simple random sampling is a widely accepted basis for estimation from a population. When data come as a stream, the total population size continuously grows and only one pass through the data is possible. Reservoir sampling is a method of maintaining a fixed size random sample from streaming data. Reservoir sampling without replacement has been extensively studied and several algorithms with sub-linear time complexity exist. Although reservoir sampling with replacement is previously mentioned by some authors, it has been studied very little and only linear algorithms exist. A with-replacement reservoir sampling algorithm of sub-linear time complexity is introduced. A thorough complexity analysis of several approaches to the with-replacement reservoir sampling problem is also provided.

Keywords: Data stream mining; Random sampling with replacement; Reservoir sampling

Article Outline

1. Introduction
2. Notation and definitions
3. Reservoir sampling without replacement (RSXR)
4. Reservoir sampling with replacement (RSWR)
4.1. Two implementations of RSWR: RSWR-naive and RSWR-batch
4.2. Formal proofs for RSWR-naive and RSWR-batch
5. Faster sampling by skipping elements
6. Performance evaluation
6.1. Expected CPU runtime
6.2. Empirical study
7. Conclusions
Acknowledgements
References





 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.