Skip to main content

Parallelizing the Data Cube

  • Conference paper
  • First Online:
Database Theory — ICDT 2001 (ICDT 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1973))

Included in the following conference series:

Abstract

This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter-processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel as is done in previous parallel approaches. In fact, after the initial load distribution phase, each processor can compute its assigned subcube without any communication with the other processors. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting. The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array. Experimental results presented show that our partitioning strategies generate a close to optimal load balance between processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Agarwal, R. Agarwal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Srawagi. On the computation of multi-dimensional aggregates. In Proc. 22nd VLDB Conf., pages 506–521, 1996.

    Google Scholar 

  2. Argonne National Laboratory, http://www-unix.mcs.anl.gov/mpi/index.html.The Message Passing Interface (MPI) standard.

  3. R.I. Becker, Y. Perl, and S.R. Schach. A shifting algorithm for min-max tree partitioning. J. ACM, (29):58–67, 1982.

    Google Scholar 

  4. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. of 1999 ACM SIGMOD Conference on Management of data, pages 359–370, 1999.

    Google Scholar 

  5. T. Cheatham, A. Fahmy, D. C. Stefanescu, and L. G. Valiant. Bulk synchronous parallel computing-A paradigm for transportable software. In Proc. of the 28th Hawaii International Conference on System Sciences. Vol. 2: Software Technology, pages 268–275, 1995.

    Google Scholar 

  6. F. Dehne, W. Dittrich, and D. Hutchinson. Efficient external memory algorithms by simulating coarse-grained parallel algorithms. In Proc. 9th ACM Symposium on Parallel Algorithms and Architectures (SPAA’97), pages 106–115, 1997.

    Google Scholar 

  7. F. Dehne, W. Dittrich, D. Hutchinson, and A. Maheshwari. Parallel virtual memory. In Proc. 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 889–890, 1999.

    Google Scholar 

  8. F. Dehne, A. Fabri, and A. Rau-Chaplin. Scalable parallel computational geometry for coarse grained multicomputers. In ACM Symp. Computational Geometry, pages 298–307, 1993.

    Google Scholar 

  9. F. Dehne, D. Hutchinson, and A. Maheshwari. Reducing i/o complexity by simulating coarse grained parallel algorithms. In Proc. 13th International Parallel Processing Symposium (IPPS’99), pages 14–20, 1999.

    Google Scholar 

  10. P.M. Deshpande, S. Agarwal, J.F. Naughton, and R Ramakrishnan. Computation of multidimensional aggregates. Technical Report1314, University of Wisconsin, Madison, 1996.

    Google Scholar 

  11. P. Flajolet and G.N. Martin. Probablistic counting algorithms for database applications. Journal of Computer and System Sciences, 31(2):182–209, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  12. G.N. Frederickson. Optimal algorithms for tree partitioning. In Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 168–177, 1991.

    Google Scholar 

  13. S. Goil and A. Choudhary. High performance OLAP and data mining on parallel computers. Journal of Data Mining and Knowledge Discovery, 1(4), 1997.

    Google Scholar 

  14. S. Goil and A. Choudhary. A parallel scalable infrastructure for OLAP and data mining. In Proc. International Data Engineering and Applications Symposium (IDEAS’99), Montreal, August 1999.

    Google Scholar 

  15. M. Goudreau, K. Lang, S. Rao, T. Suel, and T. Tsantilas. Towards efficiency and portability: Programming with the BSP model. In Proc. 8th ACM Symposium on Parallel Algorithms and Architectures (SPAA’ 96), pages 1–12, 1996.

    Google Scholar 

  16. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. J. Data Mining and Knowledge Discovery,1(1):29–53, April 1997.

    Article  Google Scholar 

  17. V. Harinarayan, A. Rajaraman, and J.D. Ullman. Implementing data cubes efficiently. SIGMOD Record (ACM Special Interest Group on Management of Data), 25(2):205–216, 1996.

    Google Scholar 

  18. J. Hill, B. McColl, D. Stefanescu, M. Goudreau, K. Lang, S. Rao, T. Suel, T. Tsantilas, and R. Bisseling. BSPlib: The BSP programming library. Parallel Computing, 24(14):1947–1980, December 1998.

    Article  Google Scholar 

  19. Y. Perl and U. Vishkin. Efficient implementation of a shifting algorithm. Disc. Appl. Math., (12):71–80, 1985.

    Google Scholar 

  20. K.A. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. 23rd VLDB Conference, pages 116–125, 1997.

    Google Scholar 

  21. S. Sarawagi, R. Agrawal, and A. Gupta. On computing the data cube. Technical Report RJ10026, IBM Almaden Research Center, San Jose, CA, 1996.

    Google Scholar 

  22. A. Shukla, P. Deshpende, J.F. Naughton, and K. Ramasamy. Storage estimation for mutlidimensional aggregates in the presence of hierarchies. In Proc. 22nd VLDB Conference, pages 522–531, 1996.

    Google Scholar 

  23. J.F. Sibeyn and M. Kaufmann. BSP-like external-memory computation. In Proc. of 3rd Italian Conf. on Algorithms and Complexity (CIAC-97), volume LNCS1203,pages 229–240. Springer, 1997.

    Google Scholar 

  24. D.E. Vengroff and J.S. Vitter. I/o-efficient scientific computation using tpie. In Proc. Goddard Conference on Mass Storage Systems and Technologies, pages 553–570, 1996.

    Google Scholar 

  25. J.S. Vitter. External memory algorithms. In Proc. 17th ACM Symp. on Principles of Database Systems (PODS’ 98), pages 119–128, 1998.

    Google Scholar 

  26. J.S. Vitter and E.A.M. Shriver. Algorithms for parallel memory. i: Two-level memories. Algorithmica, 12(2–3):110–147, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  27. Y. Zhao, P.M. Deshpande, and J.F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In Proc. ACM SIGMOD Conf., pages 159–170, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dehne, F., Eavis, T., Hambrusch, S., Rau-Chaplin, A. (2001). Parallelizing the Data Cube. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-44503-X_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41456-8

  • Online ISBN: 978-3-540-44503-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics