Abstract
The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions, making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interaction-aware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made online. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently outperform (up to 4x) query schedulers in current database systems.
Similar content being viewed by others
References
Aster data systems. http://www.asterdata.com/
Greenplum. http://www.greenplum.com/
Cognos. http://www.cognos.com/
Business objects. http://www.businessobjects.com/
Ahmad, M., Aboulnaga, A., Babu, S., Munagala, K.: Modeling and exploiting query interactions in database systems. In: CIKM (2008)
Ahmad, M., Aboulnaga, A., Babu, S., Munagala, K.: QShuffler: Getting the Query Mix Right. In: ICDE (2008). (poster)
Ahmad, M., Aboulnaga, A., Babu, S.: Query interactions in database workloads. In: DBTest Workshop (2009)
Roy P., Seshadri S., Sudarshan S., Bhobe S.: Efficient and extensible algorithms for multi query optimization. SIGMOD Rec. 29(2), 249–260 (2000)
O’Gorman K., El Abbadi A., Agrawal D.: Multiple query optimization in middleware using query teamwork. Softw. Pract. Experience 35(4), 361–391 (2005)
Albuitiu, M.C., Kemper, A.: Synergy-based workload management. In: PhD Workshop, VLDB (2009)
Conway R.H., Maxwell W.L., Miller L.W.: Theory of Scheduling. Addison-Wesley, Reading, Massachusetts (1967)
Ibaraki T., Kameda T., Katoh N.: Cautious transaction schedulers for database concurrency control. IEEE Trans. Softw. Eng. 14(7), 997–1009 (1988)
Katoh N., Ibaraki T., Kameda T.: Cautious transaction schedulers with admission control. TODS 10(2), 205–229 (1985)
Abbott R., Garcia-Molina H.: Scheduling real-time transactions. SIGMOD Rec. 17(1), 71–81 (1988)
Abbott, R., Garcia-Molina, H.: Scheduling real-time transactions with disk resident data. In: VLDB (1989)
Abbott R.K., Garcia-Molina H.: Scheduling real-time transactions: a performance evaluation. TODS 17(3), 513–560 (1992)
Kang, K.D., Son, S.H., Stankovic, J.A.: Service differentiation in real-time main memory databases. In: Proceedings IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (2002)
Pang H., Carey M.J., Livny M.: Multiclass query scheduling in real-time database systems. TKDE 7(4), 533–551 (1995)
Carey, M.J., Jauhari, R., Livny, M.: Priority in DBMS resource scheduling. In: VLDB (1989)
McWherter, D.T., Schroeder, B., Ailamaki, A., Harchol-Balter, M.: Priority mechanisms for OLTP and transactional web applications. In: ICDE (2004)
McWherter, D.T., Schroeder, B., Ailamaki, A., Harchol-Balter, M.: Improving preemptive prioritization via statistical characterization of OLTP locking. In: ICDE (2005)
Sacco G.M., Schkolnick M.: Buffer management in relational database systems. TODS 11(4), 473–498 (1986)
Schroeder B., Harchol-Balter M.: Web servers under overload: how scheduling can help. ACM Trans. Internet Technol. 6(1), 20–52 (2006)
Elnikety, S., Nahum, E., Tracey, J., Zwaenepoel, W.: A method for transparent admission control and request scheduling in e-commerce web sites. In: WWW (2004)
Kelly, T.: Detecting performance anomalies in global applications. In: Proceedings Workshop on Real, Large Distributed Systems (2005)
Stewart, C., Kelly, T., Zhang, A.: Exploiting nonstationarity for performance prediction. In: EuroSys (2007)
Zhang, Q., Cherkasova, L., Smirni, E.: A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: ICAC (2007)
Zhang, Q., Cherkasova, L., Mathews, G., Greene, W., Smirni, E.: R-capriccio: a capacity planning and anomaly detection tool for enterprise services with live workloads. In: Middleware (2007)
Heiss, H.U., Wagner, R.: Adaptive load control in transaction processing systems. In: VLDB (1991)
Schroeder, B., Harchol-Balter, M., Iyengar, A., Nahum, E., Wierman, A.: How to determine a good multi-programming level for external scheduling. In: ICDE (2006)
Mönkeberg, A., Weikum, G.: Performance evaluation of an adaptive and robust load control method for the avoidance of data- contention thrashing. In: VLDB (1992)
Mehta, A., Gupta, C., Dayal, U.: BI Batch Manager: a system for managing batch workloads on enterprise data warehouses. In: EDBT (2008)
Niu, B., Martin, P., Powley, W., Bird, P., Horman, R.: Adapting mixed workloads to meet SLOs in autonomic DBMSs. In: SMDB Workshop, ICDE (2007)
Niu B., Martin P., Powley W.: Towards autonomic workload management in DBMSs. J. Database Manag. 20(3), 1–17 (2009)
Ganapathi, A., Kuno, H., Dayal, U., Wiener, J., Fox, A., Jordan, M., Patterson, D.: Predicting multiple metrics for queries: Better decisions enabled by machine learning. In: ICDE (2009)
Babu, S., Borisov, N., Duan, S., Herodotou, H., Thummala, V.: Automated experiment-driven management of (database) systems. In: HotOS Workshop (2009)
Duan, S., Thummala, V., Babu, S.: Tuning database configuration parameters with iTuned. In: VLDB (2009)
Zheng, W., Bianchini, R., Janakiraman, G.J., Santos, J.R., Turner, Y.: JustRunIt: Experiment-based management of virtualized data centers. In: Proceedings USENIX Annual Technical Conference (2009)
Belknap, P., Dageville, B., Dias, K., Yagoub, K.: Self-tuning for SQL performance in Oracle database 11g. In: SMDB Workshop, ICDE (2009)
Transaction processing performance council (TPC). http://www.tpc.org/
Babcock B., Babu S., Datar M., Motwani R., Thomas D.: Operator scheduling in data stream systems. VLDB J. 13(4), 333–353 (2004)
Ryser, H.J.: Combinatorial Mathematics. The Mathematical Association of America (1963)
Schrijver, A.: Theory of Linear and Integer Programming. Wiley (1998)
Coady, Y., Cox, R., Detreville, J., Druschel, P., Hellerstein, J., Hume, A., Keeton, K., Nguyen, T., Small, C., Stein, L., Warfield, A.: Falling off the cliff: when systems go nonlinear. In: HotOS Workshop (2005)
Zilio, D.C., Rao, J., Lightstone, S., Lohman, G., Storm, A., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: VLDB (2004)
Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in SQL databases. In: VLDB (2000)
Niu, B., Martin, P., Powley, W., Horman, R., Bird, P.: Workload adaptation in autonomic DBMSs. In: CASCON (2006)
Niu, B., Shi, J.: Scalable workload adaptation for mixed workload. In: Infoscale Conference (2009)
Loh W.Y.: Regression trees with unbiased variable selection and interaction detection. Stat. Sin. 12, 361–386 (2002)
Witten I.H., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
MySQL slow query log parser. http://code.google.com/p/mysql-slow-query-log-parser/
Garrod C., Manjhi A., Ailamaki A., Maggs B.M., Mowry T.C., Olston C., Tomasic A.: Scalable query result caching for web applications. PVLDB 1(1), 550–561 (2008)
Manjhi, A., Gibbons, P.B., Ailamaki, A., Garrod, C., Maggs, B.M., Mowry, T.C., Olston, C., Tomasic, A., Yu, H.: Invalidation clues for database scalability services. In: ICDE (2007)
Ioannidis, Y.: The history of histograms (abridged). In: VLDB (2003)
Fano U.: On the theory of ionization yield of radiations in different substances. Phys. Rev. 70, 44–52 (1946)
Cox D.R., Lewis P.A.: Statistical Analysis of Series of Events. Chapman & Hall, London (1966)
Kaufman L., Rousseeuw P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, Inc, New York, NY (1990)
Skewed TPC-D data generator. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahmad, M., Aboulnaga, A., Babu, S. et al. Interaction-aware scheduling of report-generation workloads. The VLDB Journal 20, 589–615 (2011). https://doi.org/10.1007/s00778-011-0217-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-011-0217-y