skip to main content
article

Self-tuning cost modeling of user-defined functions in an object-relational DBMS

Published:01 September 2005Publication History
Skip Abstract Section

Abstract

Query optimizers in object-relational database management systems typically require users to provide the execution cost models of user-defined functions (UDFs). Despite this need, however, there has been little work done to provide such a model. The existing approaches are static in that they require users to train the model a priori with pregenerated UDF execution cost data. Static approaches can not adapt to changing UDF execution patterns and thus degrade in accuracy when the UDF executions used for generating training data do not reflect the patterns of those performed during operation. This article proposes a new approach based on the recent trend of self-tuning DBMS by which the cost model is maintained dynamically and incrementally as UDFs are being executed online. In the context of UDF cost modeling, our approach faces a number of challenges, that is, it should work with limited memory, work with limited computation time, and adjust to the fluctuations in the execution costs (e.g., caching effect). In this article, we first provide a set of guidelines for developing techniques that meet these challenges, while achieving accurate and fast cost prediction with small overheads. Then, we present two concrete techniques developed under the guidelines. One is an instance-based technique based on the conventional k-nearest neighbor (KNN) technique which uses a multidimensional index like the R*-tree. The other is a summary-based technique which uses the quadtree to store summary values at multiple resolutions. We have performed extensive performance evaluations comparing these two techniques against existing histogram-based techniques and the KNN technique, using both real and synthetic UDFs/data sets. The results show our techniques provide better performance in most situations considered.

References

  1. Aboulnaga, A. and Chaudhuri, S. 1999. Self-tuning histograms: Building histograms without looking at data. In Proceedings of the ACM International Conference on Management of Data (SIGMOD'99). 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'90). 322--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bogartz, R. S. 1994. An Introduction to the Analysis of Variance. Praeger Publishers.Google ScholarGoogle Scholar
  4. Boulos, J. and Ono, K. 1999. Cost estimation of user-defined methods in object-relational database systems. SIGMOD Record 28, 3, 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Boulos, J., Viemont, Y., and Ono, K. 1997. A neural network approach for query cost evaluation. Trans. Inf. Process. Soc. Japan 38, 12, 2566--2575.Google ScholarGoogle Scholar
  6. Bruno, N., Chaudhuri, S., and Gravano, L. 2001. STHoles: A mulidimensional workload-aware histogram. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'01). 211--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Buccafurri, F., Furfaro, F., Sacca, D., and Sirangelo, C. 2003. A quad-tree based multiresolution approach for two-dimensional summary data. In Proceedings of the 15th International Conference on Scientific and Statistical Database Management (SSDBM'03). Cambridge, MA. 127--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chang, C. L. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Trans. Comput. 23, 11, 1179--1184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chaudhuri, S. 1999. Self-tuning databases and application tuning. IEEE Data Eng. Bull. 22, 2, 3--46.Google ScholarGoogle Scholar
  10. Chaudhuri, S., Christensen, E., and Graefe, G. 1999. Self-tuning technology in microsoft sql server. IEEE Data Eng. Bull. 22, 2, 20--26.Google ScholarGoogle Scholar
  11. Chaudhuri, S. and Shim, K. 1999. Optimization of queries with user-defined predicates. ACM Trans. Datab. Syst. 24, 2, 177--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, M. C. and Roussopoulos, N. 1994. Adaptive selectivity estimation using query feedback. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'94). 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cheung, K. L. and chee Fu, A. W. 1998. Enhanced nearest neighbour search on the R-tree. SIGMOD Record 27, 3, 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cui, B., Ooi, B. C., Su, J., and Tan, K.-L. 2003. Contorting high dimensional data for efficient main memory knn processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). 479--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Deshpande, A., Garofalakis, M., and Rastogi, R. 2001. Independence is good: Dependency-based histogram synopses for high-dimensional data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'01). 199--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Evangelidis, G., Lomet, D., and Salzberg, B. 1997. The hb -tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB J. 6, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'84). 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Han, J. and Kamber, M. 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann, Chapter 7, 303, 314--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hart, P. E. 1968. The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 3, 515--516.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. He, Z., Lee, B. S., and Snapp, R. 2004. Self-tuning UDF cost-modeling using the memory limited quadtree. In Proceedings of the International Conference on Extending Database Technology (EDBT'04). 513--531.Google ScholarGoogle Scholar
  21. Hellerstein, J. 1994. Practical predicate placement. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'94). 325--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hellerstein, J. and Stonebraker, M. 1993. Predicate migration: Optimizing queries with expensive predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'93). 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hellerstein, J. M. 1998. Optimization techniques for queries with expensive methods. ACM Trans. Datab. Syst. 23, 2, 113--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jiang, S., Lee, B. S., and He, Z. 2003. The cost modeling of spatial operators using nonparametric regression. Tech. rep. CS-03-17, Department of Computer Science, University of Vermont. (Submitted for publication).Google ScholarGoogle Scholar
  25. Jolliffe, I. 1986. Principal Component Analysis. Springer-Verlag.Google ScholarGoogle Scholar
  26. Kim, K., Cha, S., and Kwon, K. 2001. Optimizing multidimensional index trees for main memory access. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'01). 139--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lazaridis, I. and Mehrotra, S. 2001. Progressive approximate aggregate queries with a multiresolution tree structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'01). 401--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lee, B. S., Chen, L., Buzas, J., and Kannoth, V. 2004. Regression-based self-tuning modeling of smooth user-defined function costs for an object-relational database management system query optimizer. Comput. J. 47, 6 (Nov.), 673--693.Google ScholarGoogle ScholarCross RefCross Ref
  29. Lee, B. S., Kannoth, V., and Buzas, J. 2003. A statistical cost-modeling of financial time series functions for an object-relational DBMS query optimizer. Technical rep. CS-03-10, Department of Computer Science, University of Vermont (March). (Submitted for publication).Google ScholarGoogle Scholar
  30. Lee, M., Kitsuregawa, M., Ooi, B., Tan, K., and Mondal, A. 2000. Towards self-tuning data placement in parallel database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'00). 225--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lomet, D. B. and Salzberg, B. 1990. The hB-tree: A multiattribute indexing method with good guaranteed performance. ACM Trans. Datab. Syst. 15, 4, 625--658. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Morrison, A., Ross, G., and Chalmers, M. 2003. Fast multidimensional scaling through sampling, springs and interpolation. Inf. Visualiz. 2, 1, 68--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nardelli, E. and Proietti, G. 1994. A hybrid pointerless representation of quadtrees for efficient processing of window queries. In Proceedings of the International Workshop on Advanced Information Systems: Geographic Information Systems. 259--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pennsylvania. Last viewed:6-18-2003. PSADA---Data Download---Urban Areas. Available at URL:http://www.pasda.psu.edu/access/urban.shtml.Google ScholarGoogle Scholar
  35. Poosala, V. and Ioannidis, Y. 1997. Selectivity estimation without the attribute value independence assumption. In Proceedings of the 23th International Conference on Very Large Data Bases (VLDB'97). 486--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Procopiuc, O., Agarwal, P. K., Arge, L., and Vitter, J. S. 2003. Bkd-tree: A dynamic scalable kd-tree. In Proceedings of the 8th International Symposium on Spatial and Temporal Databases. 46--65.Google ScholarGoogle Scholar
  37. Rahal, A., Zhu, Q., and Larson, P.-A. 2004. Evoltionary techniques for updating query cost models in a dynamic multidatabase environment. VLDB J. 13, 2, 162--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Robinson, J. T. 1981. The K-D-B-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'81). 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Stillger, M., Lohman, G., Markl, V., and Kandil, M. 2001. LEO---DB2's LEarning optimizer. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stone, C. J. 1977. Consistent nonparametric regression. Annals of Statistics 5, 4, 595--645.Google ScholarGoogle ScholarCross RefCross Ref
  41. Thorburn, W. M. 1915. Occam's razor. Mind 24, 287--288.Google ScholarGoogle ScholarCross RefCross Ref
  42. Tousidou, E. and Manolopoulos, Y. 2000. A performance comparison of quadtree-based access methods for thematic maps. In Proceedings of the ACM Symposium on Applied Computing. 381--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. VanHorn, D., Lee, B. S., Buzas, J., and Thompson, P. 2003. Metadata-based generation of statistical cost functions for text search. Tech. rep. CS-03-13, Department of Computer Science, University of Vermont.Google ScholarGoogle Scholar
  44. Wand, M. P. and Jones, M. C. 1995. Kernel Smoothing Monographs on Statistics and Applied Probability. Chapman & Hill.Google ScholarGoogle Scholar
  45. Wilson, D. L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst., Man and Cybern. 2, 4, 408--421.Google ScholarGoogle ScholarCross RefCross Ref
  46. Yu, C., Ooi, B. C., Tan, K.-L., and Jagadish, H. V. 2001. Indexing the distance: An efficient method to KNN processing. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). 421--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zipf, G. K. 1949. Human behavior and the Principle of Least Effort. Addison-Wesley.Google ScholarGoogle Scholar

Index Terms

  1. Self-tuning cost modeling of user-defined functions in an object-relational DBMS

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 30, Issue 3
        September 2005
        226 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/1093382
        Issue’s Table of Contents

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2005
        Published in tods Volume 30, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader