research-article

Incorporating constraints in probabilistic XML

Authors:
Sara Cohen

The Hebrew University of Jerusalem, Jerusalem, Israel

The Hebrew University of Jerusalem, Jerusalem, Israel
View Profile

,
Benny Kimelfeld

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Yehoshua Sagiv

The Hebrew University of Jerusalem, Jerusalem, Israel

The Hebrew University of Jerusalem, Jerusalem, Israel
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 34 Issue 3Article No.: 18pp 1–45https://doi.org/10.1145/1567274.1567280

Published:03 September 2009Publication History

ACM Transactions on Database Systems

Abstract

Constraints are important, not only for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability subspace comprising the instances of a p-document that satisfy a set of constraints. In contrast to existing models that can express probabilistic dependencies, it is shown that query evaluation is tractable in PXDBs. The problems of sampling and determining well-definedness (i.e., whether the aforesaid subspace is nonempty) are also tractable. Furthermore, queries and constraints can include the aggregate functions count, max, min, and ratio. Finally, this approach can be easily extended to allow a probabilistic interpretation of constraints.

References

Abiteboul, S., Kimelfeld, B., Sagiv, Y., and Senellart, P. 2009. On the expressiveness of probabilistic XML models. VLDB J.Google Scholar
Abiteboul, S. and Senellart, P. 2006. Querying and updating probabilistic information in XML. In Proceedings of the International Conference on Extending Database Technology (EDBT). Springer, 1059--1068. Google ScholarDigital Library
Bidoit, N. and Colazzo, D. 2007. Testing XML constraint satisfiability. Electr. Notes Theor. Comput. Sci. 174, 6, 45--61. Google ScholarDigital Library
Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, 310--321. Google ScholarDigital Library
Buneman, P., Davidson, S. B., Fan, W., Hara, C. S., and Tan, W. C. 2002. Keys for XML. Comput. Netw. 39, 5, 473--487.Google ScholarCross Ref
Cohen, S., Kimelfeld, B., and Sagiv, Y. 2008. Incorporating constraints in probabilistic XML. In Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 109--118. Google ScholarDigital Library
Cohen, S., Kimelfeld, B., and Sagiv, Y. 2009. Running tree automata on probabilistic XML. In Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 227--236. Google ScholarDigital Library
Cooper, G. F. 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42, 2-3, 393--405. Google ScholarDigital Library
Dagum, P. and Luby, M. 1993. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artif. Intell. 60, 1, 141--153. Google ScholarDigital Library
Dalvi, N. N. and Suciu, D. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the International Conference on Very Large Database (VLDB). Morgan Kaufmann, 864--875. Google ScholarDigital Library
Dalvi, N. N. and Suciu, D. 2007. The dichotomy of conjunctive queries on probabilistic structures. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 293--302. Google ScholarDigital Library
Fan, W., Kuper, G. M., and Siméon, J. 2002. A unified constraint model for XML. Comput. Netw. 39, 5, 489--505.Google ScholarCross Ref
Fan, W. and Libkin, L. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406. Google ScholarDigital Library
Fan, W. and Siméon, J. 2003. Integrity constraints for XML. J. Comput. Syst. Sci. 66, 1, 254-- 291. Google ScholarDigital Library
Frick, M. and Grohe, M. 2002. The complexity of first-order and monadic second-order logic revisited. In Proceedings of the Annual IEEE Symposium on Logic in Computer Science (LICS). IEEE Computer Society, 215--224. Google ScholarDigital Library
Hung, E., Getoor, L., and Subrahmanian, V. S. 2003a. Probabilistic interval XML. In Proceedings of the International Conference on Database Theory (ICDT). Springer, 361--377. Google ScholarDigital Library
Hung, E., Getoor, L., and Subrahmanian, V. S. 2003b. PXML: A probabilistic semistructured data model and algebra. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 467--478.Google Scholar
Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2008. Query efficiency in probabilistic XML models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 701--714. Google ScholarDigital Library
Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2009. Query evaluation over probabilistic XML. VLDB J.Google Scholar
Kimelfeld, B. and Sagiv, Y. 2007a. Matching twigs in probabilistic XML. In Proceedings of the International Conference on Very Large Databases (VLDB). ACM, 27--38. Google ScholarDigital Library
Kimelfeld, B. and Sagiv, Y. 2007b. Maximally joining probabilistic data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 303--312. Google ScholarDigital Library
Li, T., Shao, Q., and Chen, Y. 2006. PEPX: A query-friendly probabilistic XML database. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management. ACM Press, 848--849. Google ScholarDigital Library
Neven, F. and Schwentick, T. 2002. Query automata over finite trees. Theor. Comput. Sci. 275, 1-2, 633--674. Google ScholarDigital Library
Nierman, A. and Jagadish, H. V. 2002. ProTDB: Probabilistic data in XML. In Proceedings of the International Conference on Very Large Database (VLDB). ACM, 646--657. Google ScholarDigital Library
Pearl, J. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings of the CogSci. Cognitive Science Society, University of California, Irvine, CA, 329--334.Google Scholar
Provan, J. S. and Ball, M. O. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12, 4, 777--788.Google ScholarDigital Library
Re, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE, 886--895.Google Scholar
Re, C. and Suciu, D. 2007. Efficient evaluation of HAVING queries on a probabilistic database. In Proceedings of the International Conference on Database Programming Languages (DBPL). Springer, 186--200. Google ScholarDigital Library
Senellart, P. and Abiteboul, S. 2007. On the complexity of managing probabilistic XML data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 283--292. Google ScholarDigital Library
Tamaki, H. and Sato, T. 1986. OLD resolution with tabulation. In Proceedings of the International Conference on Logic Programming (ICLP). Springer, 84--98. Google ScholarDigital Library
Toda, S. and Ogiwara, M. 1992. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21, 2, 316--328. Google ScholarDigital Library
van Keulen, M., de Keijzer, A., and Alink, W. 2005. A probabilistic XML approach to data integration. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 459--470. Google ScholarDigital Library
Warren, D. S. 1992. Memoing for logic programs. Comm. ACM 35, 3, 93--111. Google ScholarDigital Library

Index Terms

Incorporating constraints in probabilistic XML

Recommendations

Incorporating constraints in probabilistic XML
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Constraints are important not just for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability sub-space comprising the instances of a p-...
Read More
On the expressiveness of probabilistic XML models

Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic ...
Read More
Capturing continuous data and answering aggregate queries in probabilistic XML

Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 34, Issue 3
August 2009
269 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1567274
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 September 2009
- Accepted: 1 June 2009
- Revised: 1 March 2009
- Received: 1 September 2008
Published in tods Volume 34, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Probabilistic databases
constraints
probabilistic XML
sampling probabilistic data
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 543
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Incorporating constraints in probabilistic XML

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Incorporating constraints in probabilistic XML

On the expressiveness of probabilistic XML models

Capturing continuous data and answering aggregate queries in probabilistic XML

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Incorporating constraints in probabilistic XML

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Incorporating constraints in probabilistic XML

On the expressiveness of probabilistic XML models

Capturing continuous data and answering aggregate queries in probabilistic XML

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media