ABSTRACT
The task of formulating queries is greatly facilitated when they can be generated automatically from some given data values, schema concepts or both (e.g., names of particular entities and XML tags). This automation is the basis of various database applications, such as keyword search and interactive query formulation. Usually, automatic query generation is realized by finding a set of small tree patterns that contain some given labels. More formally, the computational problem at hand is to find top-k patterns, that is, k minimum-weight tree patterns that contain a given bag of labels, conform to the schema, and are non-redundant. A plethora of systems and research papers include a component that deals with this problem. This paper presents an algorithm for this problem, with complexity guarantees, that allows nontrivial schema constraints and, hence, avoids generating patterns that cannot be instantiated. Specifically, this paper shows that for schemas with certain types of neighborhood constraints, the problem is fixed-parameter tractable (FPT), the parameter being the size of the given bag of labels. As machinery, an adaptation of Lawler-Murty's procedure is developed. This adaptation reduces a top-k problem, over an infinite space of solutions, to a prefix-constrained optimization problem. It is shown how to cast the problem of top-k patterns in this adaptation. A solution is developed for the corresponding prefix-constrained optimization problem, and it uses an algorithm for finding a (single) minimum-weight tree pattern. This algorithm generalizes an earlier work by handling leaf constraints (i.e., which labels may, must or should not be leaves). It all boils down to a reduction showing that, under a language for neighborhood constraints, finding top-k patterns is FPT if a certain variant of exact cover is FPT.
- C. Beeri and T. Milo. Schemas for integration and translation of structured and semi-structured data. In ICDT, pages 296--313. Springer, 1999. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages 431--440. IEEE, 2002. Google ScholarDigital Library
- S. Cohen, Y. Kanza, B. Kimelfeld, and Y. Sagiv. Interconnection semantics for keyword search in XML. In CIKM, pages 389--396. ACM, 2005. Google ScholarDigital Library
- R. G. Downey and M. R. Fellows. Parameterized Complexity. Monographs in Computer Science. Springer, 1999.Google ScholarDigital Library
- K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD Conference, pages 927--940. ACM, 2008. Google ScholarDigital Library
- K. Golenberg, B. Kimelfeld, and Y. Sagiv. Optimizing and parallelizing ranked enumeration. PVLDB, 4(11):1028--1039, 2011.Google ScholarDigital Library
- M. Grohe and J. Flum. Parameterized Complexity Theory. Theoretical Computer Science. Springer, 2006. Google ScholarDigital Library
- H. Hamacher and M. Queyranne. K-best solutions to combinatorial optimization problems. Annals of Operations Research, 4:123--143, 1985/6.Google ScholarCross Ref
- V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670--681. Morgan Kaufmann, 2002. Google ScholarDigital Library
- A. Kemper, D. Kossmann, and B. Zeller. Performance tuning for SAP R/3. IEEE Data Eng. Bull., 22(2):32--39, 1999.Google Scholar
- B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, pages 173--182. ACM, 2006. Google ScholarDigital Library
- B. Kimelfeld and Y. Sagiv. New algorithms for computing Steiner trees for a fixed number of terminals. Accessible from the first author's home page, 2006.Google Scholar
- B. Kimelfeld and Y. Sagiv. Finding a minimal tree pattern under neighborhood constraints. In PODS, pages 235--246. ACM, 2011. Google ScholarDigital Library
- B. Kimelfeld, Y. Sagiv, and G. Weber. ExQueX: exploring and querying XML documents. In SIGMOD Conference, pages 1103--1106. ACM, 2009. Google ScholarDigital Library
- E. L. Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401--405, 1972.Google ScholarDigital Library
- Y. Li, C. Yu, and H. V. Jagadish. Schema-free XQuery. In VLDB, pages 72--83. Morgan Kaufmann, 2004. Google ScholarDigital Library
- Y. Luo, W. Wang, and X. Lin. SPARK: A keyword search engine on relational databases. In ICDE, pages 1552--1555. IEEE, 2008. Google ScholarDigital Library
- A. Markowetz, Y. Yang, and D. Papadias. Keyword search over relational tables and streams. ACM Trans. Database Syst., 34(3), 2009. Google ScholarDigital Library
- K. G. Murty. An algorithm for ranking all the assignments in order of increasing cost. Operations Research, 16(3):682--687, 1968.Google ScholarDigital Library
- L. Qin, J. X. Yu, and L. Chang. Keyword search in databases: the power of RDBMS. In SIGMOD Conference, pages 681--694. ACM, 2009. Google ScholarDigital Library
- P. P. Talukdar, M. Jacob, M. S. Mehmood, K. Crammer, Z. G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. PVLDB, 1(1):785--796, 2008. Google ScholarDigital Library
- M. Y. Vardi. The complexity of relational query languages (extended abstract). In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pages 137--146. ACM, 1982. Google ScholarDigital Library
- J. Y. Yen. Finding the k shortest loopless paths in a network. Management Science, 17:712--716, 1971.Google ScholarDigital Library
- G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl. From keywords to semantic queries - incremental query construction on the semantic Web. J. Web Sem., 7(3):166--176, 2009. Google ScholarDigital Library
Index Terms
- Extracting minimum-weight tree patterns from a schema with neighborhood constraints
Recommendations
Finding a minimal tree pattern under neighborhood constraints
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsTools that automatically generate queries are useful when schemas are hard to understand due to size or complexity. Usually, these tools find minimal tree patterns that contain a given set (or bag) of labels. The labels could be, for example, XML tags ...
A Conceptual Schema Based XML Schema with Integrity Constraints Checking
ICHIT '08: Proceedings of the 2008 International Conference on Convergence and Hybrid Information TechnologyThe more popular XML for exchanging and representing information on Web, the more important Flat XML (XML) and intelligent editors become. For data exchanging, an XML Data with an XML Schema and integrity constraints are preferred. We employ an Object-...
The subdivision-constrained minimum spanning tree problem
Motivated by the constrained minimum spanning tree (CST) problem in Hassin and Levin [R. Hassin, A. Levin, An efficient polynomial time approximation scheme for the constrained minimum spanning tree problem using matroid intersection, SIAM Journal on ...
Comments