skip to main content
10.1145/3299869.3300096acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
extended-abstract

Scalable Reservoir Sampling on Many-Core CPUs

Published:25 June 2019Publication History

ABSTRACT

Database systems need to be able to convert queries to efficient execution plans. As recent research has shown, correctly estimating cardinalities of subqueries is an important factor in the efficiency of the resulting plans [7, 8]. Many algorithms have been proposed in literature that utilize a random sample to estimate cardinalities [6, 9, 13]. Thus, some modern database systems choose to store a materialized uniformly random sample for their relations [3, 6]. Such samples are built and refreshed when statistics are gathered, by loading uniformly random tuples from the relation in disk using random IO.

References

  1. Mohammed Al-Kateb, Byung Suk Lee, and Xiaoyang Sean Wang. 2007. Adaptive-Size Reservoir Sampling over Data Streams. In SSDBM. IEEE Computer Society, 22.Google ScholarGoogle Scholar
  2. Gustavo Alonso. 2013. Hardware killed the software star. In ICDE. IEEE Computer Society, 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Surajit Chaudhuri, Eric Christensen, Goetz Graefe, Vivek R. Narasayya, and Michael J. Zwilling. 1999. Self-Tuning Technology in Microsoft SQL Server. IEEE Data Eng. Bull., Vol. 22, 2 (1999), 20--26.Google ScholarGoogle Scholar
  4. Michael Greenwald. 1999. Non-blocking Synchronization and System Design . Technical Report. Stanford, CA, USA. Google ScholarGoogle Scholar
  5. Viktor Leis, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference. ACM, 743--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings .Google ScholarGoogle Scholar
  7. Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query optimization through the looking glass, and what we found running the Join Order Benchmark. VLDB J., Vol. 27, 5 (2018), 643--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB, Vol. 2, 1 (2009), 982--993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Magnus Mü ller, Guido Moerkotte, and Oliver Kolb. 2018. Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses. PVLDB, Vol. 11, 9 (2018), 1016--1028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Peter Sanders, Sebastian Lamm, Lorenz Hü bschle-Schneider, Emanuel Schrade, and Carsten Dachsbacher. 2018. Efficient Parallel Random Sampling - Vectorized, Cache-Efficient, and Online. ACM Trans. Math. Softw., Vol. 44, 3 (2018), 29:1--29:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Herb Sutter. 2005. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal, Vol. 30, 3 (2005), 202--210. http://www.gotw.ca/publications/concurrency-ddj.htmGoogle ScholarGoogle Scholar
  12. Jeffrey Scott Vitter. 1985. Random Sampling with a Reservoir. ACM Trans. Math. Softw., Vol. 11, 1 (1985), 37--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigü mü s, and Jeffrey F. Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable?. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013. 1081--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Reservoir Sampling on Many-Core CPUs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
          June 2019
          2106 pages
          ISBN:9781450356435
          DOI:10.1145/3299869

          Copyright © 2019 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2019

          Check for updates

          Qualifiers

          • extended-abstract

          Acceptance Rates

          SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader