research-article

Extracting minimum-weight tree patterns from a schema with neighborhood constraints

Authors:
Benny Kimelfeld

IBM Research--Almaden San Jose, CA

IBM Research--Almaden San Jose, CA
View Profile

,
Yehoshua Sagiv

The Hebrew University Jerusalem, Israel

The Hebrew University Jerusalem, Israel
View Profile

ICDT '13: Proceedings of the 16th International Conference on Database TheoryMarch 2013Pages 249–260https://doi.org/10.1145/2448496.2448526

Published:18 March 2013Publication History

ICDT '13: Proceedings of the 16th International Conference on Database Theory

Pages 249–260

ABSTRACT

The task of formulating queries is greatly facilitated when they can be generated automatically from some given data values, schema concepts or both (e.g., names of particular entities and XML tags). This automation is the basis of various database applications, such as keyword search and interactive query formulation. Usually, automatic query generation is realized by finding a set of small tree patterns that contain some given labels. More formally, the computational problem at hand is to find top-k patterns, that is, k minimum-weight tree patterns that contain a given bag of labels, conform to the schema, and are non-redundant. A plethora of systems and research papers include a component that deals with this problem. This paper presents an algorithm for this problem, with complexity guarantees, that allows nontrivial schema constraints and, hence, avoids generating patterns that cannot be instantiated. Specifically, this paper shows that for schemas with certain types of neighborhood constraints, the problem is fixed-parameter tractable (FPT), the parameter being the size of the given bag of labels. As machinery, an adaptation of Lawler-Murty's procedure is developed. This adaptation reduces a top-k problem, over an infinite space of solutions, to a prefix-constrained optimization problem. It is shown how to cast the problem of top-k patterns in this adaptation. A solution is developed for the corresponding prefix-constrained optimization problem, and it uses an algorithm for finding a (single) minimum-weight tree pattern. This algorithm generalizes an earlier work by handling leaf constraints (i.e., which labels may, must or should not be leaves). It all boils down to a reduction showing that, under a language for neighborhood constraints, finding top-k patterns is FPT if a certain variant of exact cover is FPT.

References

C. Beeri and T. Milo. Schemas for integration and translation of structured and semi-structured data. In ICDT, pages 296--313. Springer, 1999. Google ScholarDigital Library
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages 431--440. IEEE, 2002. Google ScholarDigital Library
S. Cohen, Y. Kanza, B. Kimelfeld, and Y. Sagiv. Interconnection semantics for keyword search in XML. In CIKM, pages 389--396. ACM, 2005. Google ScholarDigital Library
R. G. Downey and M. R. Fellows. Parameterized Complexity. Monographs in Computer Science. Springer, 1999.Google ScholarDigital Library
K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD Conference, pages 927--940. ACM, 2008. Google ScholarDigital Library
K. Golenberg, B. Kimelfeld, and Y. Sagiv. Optimizing and parallelizing ranked enumeration. PVLDB, 4(11):1028--1039, 2011.Google ScholarDigital Library
M. Grohe and J. Flum. Parameterized Complexity Theory. Theoretical Computer Science. Springer, 2006. Google ScholarDigital Library
H. Hamacher and M. Queyranne. K-best solutions to combinatorial optimization problems. Annals of Operations Research, 4:123--143, 1985/6.Google ScholarCross Ref
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670--681. Morgan Kaufmann, 2002. Google ScholarDigital Library
A. Kemper, D. Kossmann, and B. Zeller. Performance tuning for SAP R/3. IEEE Data Eng. Bull., 22(2):32--39, 1999.Google Scholar
B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, pages 173--182. ACM, 2006. Google ScholarDigital Library
B. Kimelfeld and Y. Sagiv. New algorithms for computing Steiner trees for a fixed number of terminals. Accessible from the first author's home page, 2006.Google Scholar
B. Kimelfeld and Y. Sagiv. Finding a minimal tree pattern under neighborhood constraints. In PODS, pages 235--246. ACM, 2011. Google ScholarDigital Library
B. Kimelfeld, Y. Sagiv, and G. Weber. ExQueX: exploring and querying XML documents. In SIGMOD Conference, pages 1103--1106. ACM, 2009. Google ScholarDigital Library
E. L. Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401--405, 1972.Google ScholarDigital Library
Y. Li, C. Yu, and H. V. Jagadish. Schema-free XQuery. In VLDB, pages 72--83. Morgan Kaufmann, 2004. Google ScholarDigital Library
Y. Luo, W. Wang, and X. Lin. SPARK: A keyword search engine on relational databases. In ICDE, pages 1552--1555. IEEE, 2008. Google ScholarDigital Library
A. Markowetz, Y. Yang, and D. Papadias. Keyword search over relational tables and streams. ACM Trans. Database Syst., 34(3), 2009. Google ScholarDigital Library
K. G. Murty. An algorithm for ranking all the assignments in order of increasing cost. Operations Research, 16(3):682--687, 1968.Google ScholarDigital Library
L. Qin, J. X. Yu, and L. Chang. Keyword search in databases: the power of RDBMS. In SIGMOD Conference, pages 681--694. ACM, 2009. Google ScholarDigital Library
P. P. Talukdar, M. Jacob, M. S. Mehmood, K. Crammer, Z. G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. PVLDB, 1(1):785--796, 2008. Google ScholarDigital Library
M. Y. Vardi. The complexity of relational query languages (extended abstract). In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pages 137--146. ACM, 1982. Google ScholarDigital Library
J. Y. Yen. Finding the k shortest loopless paths in a network. Management Science, 17:712--716, 1971.Google ScholarDigital Library
G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl. From keywords to semantic queries - incremental query construction on the semantic Web. J. Web Sem., 7(3):166--176, 2009. Google ScholarDigital Library

Index Terms

Extracting minimum-weight tree patterns from a schema with neighborhood constraints
1. Information systems
  1. Data management systems
  2. Information retrieval
    1. Information retrieval query processing

Recommendations

Finding a minimal tree pattern under neighborhood constraints
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Tools that automatically generate queries are useful when schemas are hard to understand due to size or complexity. Usually, these tools find minimal tree patterns that contain a given set (or bag) of labels. The labels could be, for example, XML tags ...
Read More
A Conceptual Schema Based XML Schema with Integrity Constraints Checking
ICHIT '08: Proceedings of the 2008 International Conference on Convergence and Hybrid Information Technology

The more popular XML for exchanging and representing information on Web, the more important Flat XML (XML) and intelligent editors become. For data exchanging, an XML Data with an XML Schema and integrity constraints are preferred. We employ an Object-...
Read More
The subdivision-constrained minimum spanning tree problem

Motivated by the constrained minimum spanning tree (CST) problem in Hassin and Levin [R. Hassin, A. Levin, An efficient polynomial time approximation scheme for the constrained minimum spanning tree problem using matroid intersection, SIAM Journal on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICDT '13: Proceedings of the 16th International Conference on Database Theory
March 2013
301 pages
ISBN:9781450315982
DOI:10.1145/2448496
Editors:
Wang-Chiew Tan
UC Santa Cruz
,
Giovanna Guerrini
Universita' di Genova, Italy
,
Barbara Catania
Universita' di Genova, Italy
,
Anastasios Gounaris
Aristotle University of Thessaloniki, Greece
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 March 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph search
minimal tree patterns
query extraction
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 87
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Extracting minimum-weight tree patterns from a schema with neighborhood constraints

ICDT '13: Proceedings of the 16th International Conference on Database Theory

ABSTRACT

References

Cited By

Index Terms

Recommendations

Finding a minimal tree pattern under neighborhood constraints

A Conceptual Schema Based XML Schema with Integrity Constraints Checking

The subdivision-constrained minimum spanning tree problem