ABSTRACT
Database systems need to be able to convert queries to efficient execution plans. As recent research has shown, correctly estimating cardinalities of subqueries is an important factor in the efficiency of the resulting plans [7, 8]. Many algorithms have been proposed in literature that utilize a random sample to estimate cardinalities [6, 9, 13]. Thus, some modern database systems choose to store a materialized uniformly random sample for their relations [3, 6]. Such samples are built and refreshed when statistics are gathered, by loading uniformly random tuples from the relation in disk using random IO.
- Mohammed Al-Kateb, Byung Suk Lee, and Xiaoyang Sean Wang. 2007. Adaptive-Size Reservoir Sampling over Data Streams. In SSDBM. IEEE Computer Society, 22.Google Scholar
- Gustavo Alonso. 2013. Hardware killed the software star. In ICDE. IEEE Computer Society, 1--4. Google ScholarDigital Library
- Surajit Chaudhuri, Eric Christensen, Goetz Graefe, Vivek R. Narasayya, and Michael J. Zwilling. 1999. Self-Tuning Technology in Microsoft SQL Server. IEEE Data Eng. Bull., Vol. 22, 2 (1999), 20--26.Google Scholar
- Michael Greenwald. 1999. Non-blocking Synchronization and System Design . Technical Report. Stanford, CA, USA. Google Scholar
- Viktor Leis, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference. ACM, 743--754. Google ScholarDigital Library
- Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings .Google Scholar
- Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query optimization through the looking glass, and what we found running the Join Order Benchmark. VLDB J., Vol. 27, 5 (2018), 643--668. Google ScholarDigital Library
- Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB, Vol. 2, 1 (2009), 982--993. Google ScholarDigital Library
- Magnus Mü ller, Guido Moerkotte, and Oliver Kolb. 2018. Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses. PVLDB, Vol. 11, 9 (2018), 1016--1028. Google ScholarDigital Library
- Peter Sanders, Sebastian Lamm, Lorenz Hü bschle-Schneider, Emanuel Schrade, and Carsten Dachsbacher. 2018. Efficient Parallel Random Sampling - Vectorized, Cache-Efficient, and Online. ACM Trans. Math. Softw., Vol. 44, 3 (2018), 29:1--29:14. Google ScholarDigital Library
- Herb Sutter. 2005. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal, Vol. 30, 3 (2005), 202--210. http://www.gotw.ca/publications/concurrency-ddj.htmGoogle Scholar
- Jeffrey Scott Vitter. 1985. Random Sampling with a Reservoir. ACM Trans. Math. Softw., Vol. 11, 1 (1985), 37--57. Google ScholarDigital Library
- Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigü mü s, and Jeffrey F. Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable?. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013. 1081--1092. Google ScholarDigital Library
Index Terms
- Scalable Reservoir Sampling on Many-Core CPUs
Recommendations
Adaptive stratified reservoir sampling over heterogeneous data streams
Reservoir sampling is a known technique for maintaining a random sample of a fixed size over a data stream of an unknown size. While reservoir sampling is suitable for applications demanding a sample over the whole data stream, it is not designed for ...
Sampling from Large Graphs with a Reservoir
NBIS '14: Proceedings of the 2014 17th International Conference on Network-Based Information SystemsSampling is a process of choosing a suitable representative subset from a population and uniformity is a basic requirement of representative ness. A sampling process produces a uniform random sample when all possible samples of the same size have the ...
Multi- and many-core data mining with adaptive sparse grids
CF '11: Proceedings of the 8th ACM International Conference on Computing FrontiersGaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is ...
Comments