extended-abstract

Scalable Reservoir Sampling on Many-Core CPUs

Author:
Altan Birler

Technical University of Munich, Munich, Germany

Technical University of Munich, Munich, Germany
View Profile

SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019Pages 1817–1819https://doi.org/10.1145/3299869.3300096

Published:25 June 2019Publication History

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Pages 1817–1819

ABSTRACT

Database systems need to be able to convert queries to efficient execution plans. As recent research has shown, correctly estimating cardinalities of subqueries is an important factor in the efficiency of the resulting plans [7, 8]. Many algorithms have been proposed in literature that utilize a random sample to estimate cardinalities [6, 9, 13]. Thus, some modern database systems choose to store a materialized uniformly random sample for their relations [3, 6]. Such samples are built and refreshed when statistics are gathered, by loading uniformly random tuples from the relation in disk using random IO.

References

Mohammed Al-Kateb, Byung Suk Lee, and Xiaoyang Sean Wang. 2007. Adaptive-Size Reservoir Sampling over Data Streams. In SSDBM. IEEE Computer Society, 22.Google Scholar
Gustavo Alonso. 2013. Hardware killed the software star. In ICDE. IEEE Computer Society, 1--4. Google ScholarDigital Library
Surajit Chaudhuri, Eric Christensen, Goetz Graefe, Vivek R. Narasayya, and Michael J. Zwilling. 1999. Self-Tuning Technology in Microsoft SQL Server. IEEE Data Eng. Bull., Vol. 22, 2 (1999), 20--26.Google Scholar
Michael Greenwald. 1999. Non-blocking Synchronization and System Design . Technical Report. Stanford, CA, USA. Google Scholar
Viktor Leis, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference. ACM, 743--754. Google ScholarDigital Library
Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings .Google Scholar
Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query optimization through the looking glass, and what we found running the Join Order Benchmark. VLDB J., Vol. 27, 5 (2018), 643--668. Google ScholarDigital Library
Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB, Vol. 2, 1 (2009), 982--993. Google ScholarDigital Library
Magnus Mü ller, Guido Moerkotte, and Oliver Kolb. 2018. Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses. PVLDB, Vol. 11, 9 (2018), 1016--1028. Google ScholarDigital Library
Peter Sanders, Sebastian Lamm, Lorenz Hü bschle-Schneider, Emanuel Schrade, and Carsten Dachsbacher. 2018. Efficient Parallel Random Sampling - Vectorized, Cache-Efficient, and Online. ACM Trans. Math. Softw., Vol. 44, 3 (2018), 29:1--29:14. Google ScholarDigital Library
Herb Sutter. 2005. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal, Vol. 30, 3 (2005), 202--210. http://www.gotw.ca/publications/concurrency-ddj.htmGoogle Scholar
Jeffrey Scott Vitter. 1985. Random Sampling with a Reservoir. ACM Trans. Math. Softw., Vol. 11, 1 (1985), 37--57. Google ScholarDigital Library
Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigü mü s, and Jeffrey F. Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable?. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013. 1081--1092. Google ScholarDigital Library

Index Terms

Scalable Reservoir Sampling on Many-Core CPUs
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Adaptive stratified reservoir sampling over heterogeneous data streams

Reservoir sampling is a known technique for maintaining a random sample of a fixed size over a data stream of an unknown size. While reservoir sampling is suitable for applications demanding a sample over the whole data stream, it is not designed for ...
Read More
Sampling from Large Graphs with a Reservoir
NBIS '14: Proceedings of the 2014 17th International Conference on Network-Based Information Systems

Sampling is a process of choosing a suitable representative subset from a population and uniformity is a basic requirement of representative ness. A sampling process produces a uniform random sample when all possible samples of the same size have the ...
Read More
Multi- and many-core data mining with adaptive sparse grids
CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
General Chairs:
Peter Boncz
CWI & Vrije Universiteit Amsterdam, The Netherlands
,
Stefan Manegold
CWI & Universiteit Leiden, The Netherlands
,
Program Chairs:
Anastasia Ailamaki
EPFL, Switzerland
,
Amol Deshpande
University of Maryland, USA
,
Tim Kraska
MIT, USA
Copyright © 2019 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2019
Check for updates
Author Tags
main memory
many-core
multi-core
online
online sampling
parallel
reservoir
reservoir sampling
sampling
scalable
shared memory
Qualifiers
- extended-abstract
Conference

Acceptance Rates
SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 210
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable Reservoir Sampling on Many-Core CPUs

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive stratified reservoir sampling over heterogeneous data streams

Sampling from Large Graphs with a Reservoir

Multi- and many-core data mining with adaptive sparse grids