doi:10.1016/S0306-4379(02)00026-1
Copyright © 2003 Elsevier Science Ltd. All rights reserved.
Multi-query optimization for on-line analytical processing*1, , *2
Panos Kalnis
, * and Dimitris Papadias
Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Received 23 November 2001;
accepted 18 April 2002. ;
Available online 18 February 2003.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Multi-dimensional expressions (MDX) provide an interface for asking several related OLAP queries simultaneously. An interesting problem is how to optimize the execution of an MDX query, given that most data warehouses maintain a set of redundant materialized views to accelerate OLAP operations. A number of greedy and approximation algorithms have been proposed for different versions of the problem. In this paper we evaluate experimentally their performance, concluding that they do not scale well for realistic workloads. Motivated by this fact, we develop two novel greedy algorithms. Our algorithms construct the execution plan in a top–down manner by identifying in each step the most beneficial view, instead of finding the most promising query. We show by extensive experimentation that our methods outperform the existing ones in most cases.
Author Keywords: Query optimization; OLAP; Data warehouse; MDX
Fig. 1. A data warehouse schema. The dimensions are Product, Customer and Time: (a) The star schema; (b) The data-cube lattice.
Fig. 2. A multidimensional expression (MDX).
Fig. 3. Two instances of the multiple-query optimization problem: (a)
=10000, =200; (b) =100, =150, =200.
Fig. 4. Details of the TPC-H and APB data sets: (a) The TPC-H database schema; (b) The dimensions of APB.
Fig. 5. Total execution cost for q2_2 query set: (a) SYNTH data set; (b) TPC-H data set; (c) APB data set.
Fig. 6. Total execution cost versus AVQ: (a) SYNTH data set; (b) TPC-H data set; (c) APB data set.
Fig. 7. Best view first (BVF) greedy algorithm.
Fig. 8. Multilevel best view first (MBVF) greedy algorithm.
Fig. 9. Total execution cost versus Smax. All queries use hash-based star join. The first row refers to the q50_1 query set, the second to the q25_2 and the third to the q1_50 query set: (a) TPC-H data set; (b) APB data set.
Fig. 10. Average number of shared operators in each MDX execution plan, versus Smax: (a) TPC-H data set; (b) APB data set.
Fig. 11. Total running time (in s) to generate the plan for 100 MDX queries versus Smax, for the TPC-H data set: (a) q2_2 query set (4 group-by queries per MDX); (b) q50_1 query set (50 group-by queries per MDX).
Fig. 12. Total execution cost versus Smax. 25% of the queries can use index-based star join. The first row refers to the q50_1 query set, the second to the q25_2 and the third to the q1_50 query set: (a) TPC-H data set; (b) APB data set.
Table 1. Cardinalities of the dimension tables for the SYNTH data set

Table 2. Details about the query sets
