Abstract
Constraints are important, not only for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability subspace comprising the instances of a p-document that satisfy a set of constraints. In contrast to existing models that can express probabilistic dependencies, it is shown that query evaluation is tractable in PXDBs. The problems of sampling and determining well-definedness (i.e., whether the aforesaid subspace is nonempty) are also tractable. Furthermore, queries and constraints can include the aggregate functions count, max, min, and ratio. Finally, this approach can be easily extended to allow a probabilistic interpretation of constraints.
- Abiteboul, S., Kimelfeld, B., Sagiv, Y., and Senellart, P. 2009. On the expressiveness of probabilistic XML models. VLDB J.Google Scholar
- Abiteboul, S. and Senellart, P. 2006. Querying and updating probabilistic information in XML. In Proceedings of the International Conference on Extending Database Technology (EDBT). Springer, 1059--1068. Google ScholarDigital Library
- Bidoit, N. and Colazzo, D. 2007. Testing XML constraint satisfiability. Electr. Notes Theor. Comput. Sci. 174, 6, 45--61. Google ScholarDigital Library
- Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, 310--321. Google ScholarDigital Library
- Buneman, P., Davidson, S. B., Fan, W., Hara, C. S., and Tan, W. C. 2002. Keys for XML. Comput. Netw. 39, 5, 473--487.Google ScholarCross Ref
- Cohen, S., Kimelfeld, B., and Sagiv, Y. 2008. Incorporating constraints in probabilistic XML. In Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 109--118. Google ScholarDigital Library
- Cohen, S., Kimelfeld, B., and Sagiv, Y. 2009. Running tree automata on probabilistic XML. In Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 227--236. Google ScholarDigital Library
- Cooper, G. F. 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42, 2-3, 393--405. Google ScholarDigital Library
- Dagum, P. and Luby, M. 1993. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artif. Intell. 60, 1, 141--153. Google ScholarDigital Library
- Dalvi, N. N. and Suciu, D. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the International Conference on Very Large Database (VLDB). Morgan Kaufmann, 864--875. Google ScholarDigital Library
- Dalvi, N. N. and Suciu, D. 2007. The dichotomy of conjunctive queries on probabilistic structures. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 293--302. Google ScholarDigital Library
- Fan, W., Kuper, G. M., and Siméon, J. 2002. A unified constraint model for XML. Comput. Netw. 39, 5, 489--505.Google ScholarCross Ref
- Fan, W. and Libkin, L. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406. Google ScholarDigital Library
- Fan, W. and Siméon, J. 2003. Integrity constraints for XML. J. Comput. Syst. Sci. 66, 1, 254-- 291. Google ScholarDigital Library
- Frick, M. and Grohe, M. 2002. The complexity of first-order and monadic second-order logic revisited. In Proceedings of the Annual IEEE Symposium on Logic in Computer Science (LICS). IEEE Computer Society, 215--224. Google ScholarDigital Library
- Hung, E., Getoor, L., and Subrahmanian, V. S. 2003a. Probabilistic interval XML. In Proceedings of the International Conference on Database Theory (ICDT). Springer, 361--377. Google ScholarDigital Library
- Hung, E., Getoor, L., and Subrahmanian, V. S. 2003b. PXML: A probabilistic semistructured data model and algebra. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 467--478.Google Scholar
- Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2008. Query efficiency in probabilistic XML models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 701--714. Google ScholarDigital Library
- Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2009. Query evaluation over probabilistic XML. VLDB J.Google Scholar
- Kimelfeld, B. and Sagiv, Y. 2007a. Matching twigs in probabilistic XML. In Proceedings of the International Conference on Very Large Databases (VLDB). ACM, 27--38. Google ScholarDigital Library
- Kimelfeld, B. and Sagiv, Y. 2007b. Maximally joining probabilistic data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 303--312. Google ScholarDigital Library
- Li, T., Shao, Q., and Chen, Y. 2006. PEPX: A query-friendly probabilistic XML database. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management. ACM Press, 848--849. Google ScholarDigital Library
- Neven, F. and Schwentick, T. 2002. Query automata over finite trees. Theor. Comput. Sci. 275, 1-2, 633--674. Google ScholarDigital Library
- Nierman, A. and Jagadish, H. V. 2002. ProTDB: Probabilistic data in XML. In Proceedings of the International Conference on Very Large Database (VLDB). ACM, 646--657. Google ScholarDigital Library
- Pearl, J. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings of the CogSci. Cognitive Science Society, University of California, Irvine, CA, 329--334.Google Scholar
- Provan, J. S. and Ball, M. O. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12, 4, 777--788.Google ScholarDigital Library
- Re, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE, 886--895.Google Scholar
- Re, C. and Suciu, D. 2007. Efficient evaluation of HAVING queries on a probabilistic database. In Proceedings of the International Conference on Database Programming Languages (DBPL). Springer, 186--200. Google ScholarDigital Library
- Senellart, P. and Abiteboul, S. 2007. On the complexity of managing probabilistic XML data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 283--292. Google ScholarDigital Library
- Tamaki, H. and Sato, T. 1986. OLD resolution with tabulation. In Proceedings of the International Conference on Logic Programming (ICLP). Springer, 84--98. Google ScholarDigital Library
- Toda, S. and Ogiwara, M. 1992. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21, 2, 316--328. Google ScholarDigital Library
- van Keulen, M., de Keijzer, A., and Alink, W. 2005. A probabilistic XML approach to data integration. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 459--470. Google ScholarDigital Library
- Warren, D. S. 1992. Memoing for logic programs. Comm. ACM 35, 3, 93--111. Google ScholarDigital Library
Index Terms
- Incorporating constraints in probabilistic XML
Recommendations
Incorporating constraints in probabilistic XML
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsConstraints are important not just for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability sub-space comprising the instances of a p-...
On the expressiveness of probabilistic XML models
Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic ...
Capturing continuous data and answering aggregate queries in probabilistic XML
Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured ...
Comments