Copyright © 2000 Elsevier Science B.V. All rights reserved.
Flexible and scalable cost-based query planning in mediators: A transformational approach
José Luis Ambite
,
and Craig A. Knoblock
Received 30 October 1998;
Abstract
The Internet provides access to a wealth of information. For any given topic or application domain there are a variety of available information sources. However, current systems, such as search engines or topic directories in the World Wide Web, offer only very limited capabilities for locating, combining, and organizing information. Mediators, systems that provide integrated access and database-like query capabilities to information distributed over heterogeneous sources, are critical to realize the full potential of meaningful access to networked information.
Query planning, the task of generating a cost-efficient plan that computes a user query from the relevant information sources, is central to mediator systems. However, query planning is a computationally hard problem due to the large number of possible sources and possible orderings on the operations to process the data. Moreover, the choice of sources, data processing operations, and their ordering, strongly affects the plan cost.
In this paper, we present an approach to query planning in mediators based on a general planning paradigm called Planning by Rewriting (PbR) (Ambite and Knoblock, 1997). Our work yields several contributions. First, our PbR-based query planner combines both the selection of the sources and the ordering of the operations into a single search space in which to optimize the plan quality. Second, by using local search techniques our planner explores the combined search space efficiently and produces high-quality plans. Third, because our query planner is an instantiation of a domain-independent framework it is very flexible and can be extended in a principled way. Fourth, our planner has an anytime behavior. Finally, we provide empirical results showing that our PbR-based query planner compares favorably on scalability and plan quality over previous approaches, which include both classical AI planning and dynamic-programming query optimization techniques.
Author Keywords: Query optimization; Planning by Rewriting; Information integration
References
1. A. Aboulnaga and S. Chaudhuri, Self-tuning histograms: Building histograms without looking at data. In: A. Delis, C. Faloutsos and S. Ghandeharizadeh, Editors, Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD-99), SIGMOD Record, Vol. 28 (2), ACM Press, New York (1999), pp. 181–192. View Record in Scopus | Cited By in Scopus (33)
2. S. Adali, K. Selcuk Candan, Y. Papkonstantinou and V.S. Subrahmanian, Query caching and optimization in distributed mediator systems. SIGMOD Record (ACM Special Interest Group on Management of Data) Vol. 25 2 (1996), pp. 137–148. View Record in Scopus | Cited By in Scopus (77)
3. J.L. Ambite. Planning by Rewriting, Ph.D. Thesis, University of Southern California, Marina del Rey, CA (1998).
4. J.L. Ambite and C.A. Knoblock, Planning by rewriting: Efficiently generating high-quality plans. In: Proc. AAAI-97, Providence, RI (1997).
5. J.L. Ambite, C.A. Knoblock, I. Muslea, A. Philpot, Compiling source descriptions for efficient and flexible information integration, J. Intelligent Information Systems (to appear).
6. J. Ambros-Ingerson. IPEM: Integrated planning, execution, and monitoring, Ph.D. Thesis, Department of Computer Science, University of Essex (1987).
7. Y. Arens, C.Y. Chee, C.-N. Hsu and C.A. Knoblock, Retrieving and integrating data from multiple information sources. Internat. J. Intelligent and Cooperative Information Systems Vol. 2 2 (1993), pp. 127–158.
8. Y. Arens, C.A. Knoblock and W.-M. Shen, Query reformulation for dynamic information integration. J. Intelligent Information Systems (Special Issue on Intelligent Information Integration) Vol. 6 2–3 (1996), pp. 99–130. View Record in Scopus | Cited By in Scopus (109)
9. N. Ashish, C.A. Knoblock and A. Levy, Information gathering plans with sensing actions. In: S. Steel and R. Alami, Editors, Recent Advances in AI Planning: 4th European Conference on Planning, ECP'97, Springer, New York (1997).
10. R. Brachman and J. Schmolze, An overview of the
knowledge representation system. Cognitive Sci. Vol. 9 2 (1985), pp. 171–216. Abstract
| View Record in Scopus | Cited By in Scopus (233)
11. M. Cherniack and S.B. Zdonik, Rule languages and internal algebras for rule-based optimizers. SIGMOD Record (ACM Special Interest Group on Management of Data) Vol. 25 2 (1996), pp. 401–412. View Record in Scopus | Cited By in Scopus (5)
12. M. Cherniack and S.B. Zdonik, Changing the rules: Transformations for rule-based optimizers. In: Proc. ACM SIGMOD International Conference on Management of Data, Seattle, WA (1998), pp. 61–72. View Record in Scopus | Cited By in Scopus (3)
13. W.W. Chu and P. Hurley, Optimal query processing for distributed database systems. IEEE Trans. Comput. Vol. 31 9 (1982), pp. 835–850. View Record in Scopus | Cited By in Scopus (3)
14. R.L. Cole and G. Graefe, Optimization of dynamic query evaluation plans. SIGMOD Record (ACM Special Interest Group on Management of Data) Vol. 23 2 (1994), pp. 150–160. View Record in Scopus | Cited By in Scopus (14)
15. A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Maier and D. Suciu, Querying XML data. Bull. Technical Committee on Data Engineering Vol. 22 3 (1999), pp. 27–34.
16. D. Draper, S. Hanks and D. Weld, Probabilistic planning with information gathering and contingent execution. In: Proc. 2nd International Conference on Artificial Intelligence Planning Systems, Chicago, IL (1994), pp. 31–36.
17. O.M. Duschka. Query planning and optimization in information integration, Ph.D. Thesis, Stanford University (1997).
18. O.M. Duschka and M.R. Genesereth, Answering recursive queries using views. In: Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Tucson, AZ (1997).
19. O.M. Duschka and M.R. Genesereth, Infomaster—An information integration tool. In: Proc. Internat. Workshop on Intelligent Information Integration, Freiburg, Germany (1997).
20. K. Erol, D. Nau and J. Hendler, UMCP: A sound and complete planning procedure for hierarchical task-network planning. In: Proc. 2nd International Conference on Artificial Intelligence Planning Systems, Chicago, IL (1994), pp. 249–254.
21. M. Friedman and D.S. Weld, Efficiently executing information-gathering plans. In: Proc. IJCAI-97, Nagoya, Japan (1997), pp. 785–791.
22. P. Gassner, G. Lohman, K.B. Schierfer and L. Wang, Query optimization in the IBM DB2 family. Bulletin of the Technical Committee on Data Engineering (Special Issue on Query Processing in Commercial Database Systems) Vol. 16 4 (1993), pp. 4–18.
23. G. Graefe. Query evaluation techniques for large databases ACM Computing Surveys Vol. 25 2 (1993), pp. 73–170. View Record in Scopus | Cited By in Scopus (223)
24. G. Graefe, Editor, Special Issue on Query Processing in Commercial Database Systems, Bulletin of the Technical Committee on Data Engineering Vol. 16 4 (1993).
25. G. Graefe, The Cascades framework for query optimization. Bulletin of the Technical Committee on Data Engineering (Special Issue on Database Query Processing) Vol. 18 3 (1995), pp. 19–29.
26. G. Graefe, Editor, Special Issue on Database Query Processing, Bulletin of the Technical Committee on Data Engineering Vol. 18 3 (1995).
27. G. Graefe, R.L. Cole, D.L. Davison, W.J. McKenna and R.H. Wolniewicz, Extensible query optimization and parallel execution in Volcano. In: J.C. Freytag, G. Vossen and D. Maier, Editors, Query Processing for Advanced Database Applications, Morgan Kaufmann, San Francisco, CA (1994), pp. 305–381.
28. G. Graefe and D.J. DeWitt, The EXODUS optimizer generator in: Proc. 1987 ACM SIGMOD International Conference on Management of Data. SIGMOD Record Vol. 16 3 (1987), pp. 160–172.
29. G. Graefe and W.J. McKenna, The volcano optimizer generator: Extensibility and efficient search. In: Proc. IEEE International Conference on Data Engineering, Vienna, Austria (1993), pp. 209–218. View Record in Scopus | Cited By in Scopus (36)
30. G. Graefe and K. Ward, Dynamic query optimization plans. ACM SIGMOD Record Vol. 18 2 (1989) Also published in/as: 19th ACM SIGMOD Conference on the Management of Data, Portland, OR, May–June 1989.
31. L.M. Haas, J.C. Freytag, G.M. Lohman and H. Pirahesh, Extensible query processing in Starburst. In: J. Clifford, B.G. Lindsay and D. Maier, Editors, Proc. 1989 ACM SIGMOD International Conference on Management of Data, Portland, OR, ACM Press, New York (1989), pp. 377–388.
32. L.M. Haas, D. Kossmann, E.L. Wimmers and J. Yang, Optimizing queries across diverse data sources. In: Proc. 23rd International Conference on Very Large Data Bases (VLDB-97) (1997), pp. 276–285.
33. J. Hammer, H. Garcia-Molina, K. Ireland, Y. Papakonstantinou, J. Ullman and J. Widom, Information translation, mediation, and Mosaic-based browsing in the TSIMMIS system. In: Proc. ACM SIGMOD International Conference on Management of Data, San Jose, CA (1995).
34. Y. Ioannidis and Y.C. Kang, Randomized algorithms for optimizing large join queries. In: Proc. ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ (1990), pp. 312–321. View Record in Scopus | Cited By in Scopus (43)
35. Y.E. Ioannidis and S. Christodoulakis, On the propagation of errors in the size of join results. SIGMOD Record (ACM Special Interest Group on Management of Data) Vol. 20 2 (1991), pp. 268–277.
36. Z.G. Ives, D. Florescu, M. Friedman, A. Levy and D.S. Weld, An adaptive query execution system for data integration in: A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD-99). SIGMOD Record Vol. 28 2 (1999), pp. 299–310. View Record in Scopus | Cited By in Scopus (52)
37. M. Jarke and J. Koch, Query optimization in database systems. ACM Computing Surveys Vol. 16 2 (1984), pp. 111–152. View Record in Scopus | Cited By in Scopus (63)
38. N. Kabra and D.J. DeWitt, Efficient mid-query re-optimization of sub-optimal query execution plans in: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD-98). SIGMOD Record Vol. 27 2 (1998), pp. 106–117. View Record in Scopus | Cited By in Scopus (36)
39. S. Kambhampati, C.A. Knoblock and Q. Yang, Planning as refinement search: A unified framework for evaluating the design tradeoffs in partial order planning. Artificial Intelligence Vol. 76 1–2 (1995), pp. 167–238. Article |
PDF (5378 K)
| View Record in Scopus | Cited By in Scopus (33)
40. C.A. Knoblock, Planning, executing, sensing, and replanning for information gathering. In: Proc. IJCAI-95, Montreal, Quebec (1995).
41. C.A. Knoblock, Building a planner for information gathering: A report from the trenches. In: Proc. 3rd International Conference on Artificial Intelligence Planning Systems, Edinburgh, Scotland (1996).
42. C.A. Knoblock, S. Minton, J.L. Ambite, A.G. Philpot, N. Ashish, P.J. Modi, I. Muslea and S. Tejada, Modeling web sources for information integration. In: Proc. AAAI-98, Madison, WI (1998).
43. N. Kushmerick. Wrapper induction for information extraction, Ph.D. Thesis, Department of Computer Science and Engineering, University of Washington (1997).
44. C.T. Kwok and D.S. Weld, Planning to gather information. In: Proc. AAAI-96, Portland, OR (1996).
45. E. Lambrecht, S. Kambhampati and S. Gnanaprakasam, Optimizing recursive information gathering plans. In: Proc. IJCAI-99, Stockholm, Sweden (1999).
46. A.Y. Levy, A.O. Mendelzon, Y. Sagiv and D. Srivastava, Answering queries using views. In: Proc. 14th ACM Symposium on Principles of Database Systems, San Jose, CA (1995).
47. A.Y. Levy, A. Rajaraman and J.J. Ordille, Querying heterogeneous information sources using source descriptions. In: Proc. 22th International Conference on Very Large Data Bases, Bombay, India (1996).
48. A.Y. Levy, D. Srivastava and T. Kirk, Data model and query evaluation in global information systems. J. Intelligent Information Systems (Special Issue on Networked Information Discovery and Retrieval) Vol. 5 2 (1995), pp. 121–143. View Record in Scopus | Cited By in Scopus (64)
49. G.M. Lohman, Grammar-like functional rules for representing query optimization alternatives. In: H. Boral and P.-Å. Larson, Editors, Proc. 1988 ACM SIGMOD International Conference on Management of Data, Chicago, IL, ACM Press, New York (1988), pp. 18–27.
50. R. MacGregor, The evolving technology of classification-based knowledge representation systems. In: J. Sowa, Editor, Principles of Semantic Networks: Explorations in the Representation of Knowledge, Morgan Kaufmann, San Mateo, CA (1990).
51. S. Minton, Quantitative results concerning the utility of explanation-based learning. Artificial Intelligence Vol. 42 2–3 (1990), pp. 363–392.
52. I. Muslea, S. Minton and C.A. Knoblock, Wrapper induction for semistructured, web-based information sources. In: Proc. Conference on Automated Learning and Discovery Workshop on Learning from Text and the Web, Pittsburgh, PA (1998).
53. K. Ono and G.M. Lohman, Measuring the complexity of join enumeration in query optimization. In: D. McLeod, R. Sacks-Davis and H.-J. Schek, Editors, Proc. 16th International Conference on Very Large Data Bases, Brisbane, Queensland, Australia, Morgan Kaufmann, San Mateo, CA (1990), pp. 314–325.
54. J.S. Penberthy and D.S. Weld, UCPOP: A sound, complete, partial order planner for ADL. In: Proc. 3rd International Conference on Principles of Knowledge Representation and Reasoning, Cambridge, MA (1992), pp. 189–197.
55. M. Peot and D. Smith, Conditional nonlinear planning. In: J. Hendler, Editor, Proc. First International Conference on AI Planning Systems, College Park, MD, Morgan Kaufmann, San Mateo, CA (1992), pp. 189–197. View Record in Scopus | Cited By in Scopus (46)
56. M.T. Roth, M. Arya, L.M. Haas, M.J. Carey, W. Cody, R. Fagin, P.M. Schwarz, J. Thomas and E.L. Wimmers, The Garlic project. SIGMOD Record (ACM Special Interest Group on Management of Data) Vol. 25 2 (1996), pp. 557–558.
57. M.T. Roth and P.M. Schwarz, Don't scrap it, wrap it! A wrapper architecture for legacy data sources. In: Proc. 23rd International Conference on Very Large Data Bases (VLDB-97) (1997), pp. 266–275.
58. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach, Prentice Hall, Englewood Cliffs, NJ (1995).
59. A. Silberschatz, H.F. Korth and S. Sudarshan. Database System Concepts, McGraw-Hill, New York (1997).
60. A. Swami, Optimization of large join queries: Combining heuristic and combinatorial techniques. In: Proc. ACM SIGMOD International Conference on Management of Data, Portland, OR (1989), pp. 367–376.
61. A. Swami and A. Gupta, Optimization of large join queries. SIGMOD Record (ACM Special Interest Group on Management of Data) Vol. 17 3 (1988), pp. 8–17.
62. A. Tate, Generating project networks. In: Proc. IJCAI-77, Cambridge, MA (1977), pp. 888–893.
63. A. Tomasic, L. Rashid and P. Valduriez, Scaling access to heterogeneous data sources with DISCO. IEEE Trans. Knowledge and Data Engineering Vol. 10 5 (1998), pp. 808–823. View Record in Scopus | Cited By in Scopus (58)
64. J.D. Ullman, Information integration using logical views. In: Proc. 6th International Conference on Database Theory, Delphi, Greece (1997).
65. T. Urhan, M.J. Franklin and L. Amsaleg, Cost based query scrambling for initial delays in: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD-98). SIGMOD Record Vol. 27 2 (1998), pp. 130–141. View Record in Scopus | Cited By in Scopus (26)
66. D.S. Weld, An introduction to least commitment planning. AI Magazine Vol. 15 4 (1994).
67. D.S. Weld, Recent advances in AI planning. AI Magazine Vol. 20 2 (1999).
68. G. Wiederhold, Mediators in the architecture of future information systems. IEEE Computer (1992).
69. W.P. Yan and P.-Å. Larson, Performing group-by before join. In: A.K. Elmagarmid and E. Neuhold, Editors, Proc. 10th International Conference on Data Engineering, Houston, TX, IEEE Computer Society Press (1994).
70. W.P. Yan and P.-Å. Larson, Eager aggregation and lazy aggregation. In: D. McLeod, R. Sacks-Davis and H. Schek, Editors, Proc. 21th International Conference on Very Large Data Bases, Zurich, Switzerland (1995).
71. V. Zadorozhny, L. Bright, L. Rashid, T. Urhan and M.E. Vidal. Efficient evaluation of queries in a mediator for web sources, Technical Report, UMIACS, University of Maryland (1999).
Corresponding author; email: ambite@isi.edu






E-mail Article
Add to my Quick Links

Cited By in Scopus (9)



