skip to main content
research-article

Incorporating constraints in probabilistic XML

Published:03 September 2009Publication History
Skip Abstract Section

Abstract

Constraints are important, not only for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability subspace comprising the instances of a p-document that satisfy a set of constraints. In contrast to existing models that can express probabilistic dependencies, it is shown that query evaluation is tractable in PXDBs. The problems of sampling and determining well-definedness (i.e., whether the aforesaid subspace is nonempty) are also tractable. Furthermore, queries and constraints can include the aggregate functions count, max, min, and ratio. Finally, this approach can be easily extended to allow a probabilistic interpretation of constraints.

References

  1. Abiteboul, S., Kimelfeld, B., Sagiv, Y., and Senellart, P. 2009. On the expressiveness of probabilistic XML models. VLDB J.Google ScholarGoogle Scholar
  2. Abiteboul, S. and Senellart, P. 2006. Querying and updating probabilistic information in XML. In Proceedings of the International Conference on Extending Database Technology (EDBT). Springer, 1059--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bidoit, N. and Colazzo, D. 2007. Testing XML constraint satisfiability. Electr. Notes Theor. Comput. Sci. 174, 6, 45--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, 310--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Buneman, P., Davidson, S. B., Fan, W., Hara, C. S., and Tan, W. C. 2002. Keys for XML. Comput. Netw. 39, 5, 473--487.Google ScholarGoogle ScholarCross RefCross Ref
  6. Cohen, S., Kimelfeld, B., and Sagiv, Y. 2008. Incorporating constraints in probabilistic XML. In Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 109--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cohen, S., Kimelfeld, B., and Sagiv, Y. 2009. Running tree automata on probabilistic XML. In Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cooper, G. F. 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42, 2-3, 393--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dagum, P. and Luby, M. 1993. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artif. Intell. 60, 1, 141--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dalvi, N. N. and Suciu, D. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the International Conference on Very Large Database (VLDB). Morgan Kaufmann, 864--875. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dalvi, N. N. and Suciu, D. 2007. The dichotomy of conjunctive queries on probabilistic structures. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 293--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fan, W., Kuper, G. M., and Siméon, J. 2002. A unified constraint model for XML. Comput. Netw. 39, 5, 489--505.Google ScholarGoogle ScholarCross RefCross Ref
  13. Fan, W. and Libkin, L. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fan, W. and Siméon, J. 2003. Integrity constraints for XML. J. Comput. Syst. Sci. 66, 1, 254-- 291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Frick, M. and Grohe, M. 2002. The complexity of first-order and monadic second-order logic revisited. In Proceedings of the Annual IEEE Symposium on Logic in Computer Science (LICS). IEEE Computer Society, 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hung, E., Getoor, L., and Subrahmanian, V. S. 2003a. Probabilistic interval XML. In Proceedings of the International Conference on Database Theory (ICDT). Springer, 361--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hung, E., Getoor, L., and Subrahmanian, V. S. 2003b. PXML: A probabilistic semistructured data model and algebra. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 467--478.Google ScholarGoogle Scholar
  18. Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2008. Query efficiency in probabilistic XML models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 701--714. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2009. Query evaluation over probabilistic XML. VLDB J.Google ScholarGoogle Scholar
  20. Kimelfeld, B. and Sagiv, Y. 2007a. Matching twigs in probabilistic XML. In Proceedings of the International Conference on Very Large Databases (VLDB). ACM, 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kimelfeld, B. and Sagiv, Y. 2007b. Maximally joining probabilistic data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Li, T., Shao, Q., and Chen, Y. 2006. PEPX: A query-friendly probabilistic XML database. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management. ACM Press, 848--849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Neven, F. and Schwentick, T. 2002. Query automata over finite trees. Theor. Comput. Sci. 275, 1-2, 633--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nierman, A. and Jagadish, H. V. 2002. ProTDB: Probabilistic data in XML. In Proceedings of the International Conference on Very Large Database (VLDB). ACM, 646--657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pearl, J. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings of the CogSci. Cognitive Science Society, University of California, Irvine, CA, 329--334.Google ScholarGoogle Scholar
  26. Provan, J. S. and Ball, M. O. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12, 4, 777--788.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Re, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE, 886--895.Google ScholarGoogle Scholar
  28. Re, C. and Suciu, D. 2007. Efficient evaluation of HAVING queries on a probabilistic database. In Proceedings of the International Conference on Database Programming Languages (DBPL). Springer, 186--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Senellart, P. and Abiteboul, S. 2007. On the complexity of managing probabilistic XML data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tamaki, H. and Sato, T. 1986. OLD resolution with tabulation. In Proceedings of the International Conference on Logic Programming (ICLP). Springer, 84--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Toda, S. and Ogiwara, M. 1992. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21, 2, 316--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. van Keulen, M., de Keijzer, A., and Alink, W. 2005. A probabilistic XML approach to data integration. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 459--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Warren, D. S. 1992. Memoing for logic programs. Comm. ACM 35, 3, 93--111. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incorporating constraints in probabilistic XML

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Database Systems
            ACM Transactions on Database Systems  Volume 34, Issue 3
            August 2009
            269 pages
            ISSN:0362-5915
            EISSN:1557-4644
            DOI:10.1145/1567274
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 3 September 2009
            • Accepted: 1 June 2009
            • Revised: 1 March 2009
            • Received: 1 September 2008
            Published in tods Volume 34, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader