Skip to main content
Log in

Probabilistic query answering over inconsistent databases

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

This paper presents a framework for querying inconsistent databases in the presence of functional dependencies. Most of the works dealing with the problem of extracting reliable information from inconsistent databases are based on the notion of repair, a minimal set of tuple insertions and deletions which leads the database to a consistent state (called repaired database), and the notion of consistent query answer, a query answer that can be obtained from every repaired database. In this work, both the notion of repair and query answer differ from the original ones. In the presence of functional dependencies, tuple deletions are the only operations that are performed in order to restore the consistency of an inconsistent database. However, deleting a tuple to remove an integrity violation potentially eliminates useful information in that tuple. In order to cope with this problem, we adopt a notion of repair, based on tuple updates, which allows us to better preserve information in the source database. A drawback of the notion of consistent query answer is that it does not allow us to discriminate among non-consistent answers, namely answers which can be obtained from a non-empty proper subset of the repaired databases. To obtain more informative query answers, we propose the notion of probabilistic query answer, that is query answers are tuples associated with probabilities. This new semantics of query answering over inconsistent databases allows us to give a measure of uncertainty to query answers. We show that the problem of computing probabilistic query answers is FP #P-complete. We also propose a technique for computing probabilistic answers to arbitrary relational algebra queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)

  2. Andritsos, P., Fuxman, A., Miller, R.J.: Clean answers over dirty databases: a probabilistic approach. In: International Conference on Data Engineering (ICDE) (2006)

  3. Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: ACM Symposium on Principles of Database Systems (PODS), pp. 68–79 (1999)

  4. Arenas, M., Bertossi, L., Chomicki, J.: Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory Pract. Log. Program. 3(45), 393–424 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Benjelloun, O., Das Sarma, A., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)

    Article  Google Scholar 

  6. Bertossi, L.: Consistent query answering in databases. SIGMOD Record 35(2), 68–76 (2006)

    Article  Google Scholar 

  7. Bertossi, L., Bravo, L., Franconi, E., Lopatenko, A.: The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Information Systems 33(4–5), 407–434 (2008)

    Article  Google Scholar 

  8. Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: ACM SIGMOD Conference, pp. 143–154 (2005)

  9. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: International Conference on Data Engineering (ICDE), pp. 746–755 (2007)

  10. Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans. Database Syst. (TODS) 27(2), 153–187 (2002)

    Article  Google Scholar 

  11. Chomicki, J.: Consistent query answering: five easy pieces. In: International Conference on Database Theory (ICDT), pp. 1–17 (2007)

  12. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Inform. Comput. 197(1–2), pp. 90–121 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: International Conference on Very Large Data Bases (VLDB), pp. 315–326 (2007)

  14. Cortés-Calabuig, A., Denecker, M., Arieli, O., Bruynooghe, M.: Representation of partial knowledge and query answering in locally complete databases. In: International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), pp. 407–421 (2006)

  15. Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: ACM Symposium on Principles of Database Systems (PODS), pp. 1–12 (2007)

  16. Dalvi, N., Suciu, D.: The Dichotomy of Conjunctive Queries on probabilistic Structures. In: ACM Symposium on Principles of Database Systems (PODS), pp. 293–302 (2007)

  17. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)

    Article  Google Scholar 

  18. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: International Conference on Very Large Data Bases (VLDB), pp. 864–875 (2004)

  19. Das Sarma, A., Benjelloun, O., Halevy, A.Y., Nabar, S.U., Widom, J.: Representing uncertain data: models, properties, and algorithms. VLDB J. 18(5), 989–1019 (2009)

    Article  Google Scholar 

  20. Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. (TODS) 21(3), 339–369 (1996)

    Article  Google Scholar 

  21. Flesca, S., Furfaro, F., Parisi, F.: Consistent query answers on numerical databases under aggregate constraints. In: International Workshop on Database Programming Languages (DBPL), pp. 279–294 (2005)

  22. Fuhr, N.: A probabilistic relational model for the integration of IR and databases. In: International Conference on Research and Development in Information Retrieval (SIGIR), pp. 309–317 (1993)

  23. Fuhr, N., Rolleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. (TOIS) 15(1), 32–66 (1997)

    Article  Google Scholar 

  24. Furfaro, F., Greco, S., Molinaro, C.: A three-valued semantics for querying and repairing inconsistent databases. Ann. Math. Artif. Intell. 51(2–4), 167–193 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. Fuxman, A., Miller, R.J.: First-order query rewriting for inconsistent databases. In: International Conference on Database Theory (ICDT), pp. 337–351 (2005)

  26. Fuxman, A., Miller, R.J.: First-order query rewriting for inconsistent databases. J. Comput. System Sci. 73(4), 610–635 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  27. Greco, S., Molinaro, C.: Approximate probabilistic query answering over inconsistent databases. In: International Conference on Conceptual Modeling (ER), pp. 311–325 (2008)

  28. Greco, S., Molinaro, C.: Querying and repairing inconsistent databases under three-valued semantics. In: International Conference on Logica Programming (ICLP), pp. 149–164 (2007)

  29. Greco, S., Zumpano, E.: Querying inconsistent databases. In: International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), pp. 308–325 (2000)

  30. Greco, G., Greco, S., Zumpano, E.: A logical framework for querying and repairing inconsistent databases. IEEE Trans. Knowl. Data Eng. (TKDE) 15(6), 1389–1408 (2003)

    Article  Google Scholar 

  31. Greco, S., Sirangelo, C., Trubitsyna, I., Zumpano, E.: Preferred repairs for inconsistent databases. In: International Conference on Database and Expert Systems Applications (DEXA), pp. 44–55 (2004)

  32. Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  33. Imielinski, T., Naqvi, S.A., Vadaparty, K.V.: Incomplete objects—a data model for design and planning applications. In: ACM SIGMOD Conference, pp. 288–297 (1991)

  34. Lopatenko, A., Bravo, L.: Efficient approximation algorithms for repairing inconsistent databases. In: International Conference on Data Engineering (ICDE), pp. 216–225 (2007)

  35. Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  36. Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: International Conference on Data Engineering (ICDE), pp. 886–895 (2007)

  37. Ross, R., Subrahamanian, V.S., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  38. Subrahmanian, V.S.: Amalgamating knowledge bases. ACM Trans. Database Syst. (TODS) 19(2), 291–331 (1994)

    Article  MathSciNet  Google Scholar 

  39. Ullman, J.: Principles of Database and Knowledge-Base Systems. Computer Science Press (1988)

  40. Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  41. Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: ACM Symposium on Theory of Computing (STOC), pp. 137–146 (1982)

  42. Wijsen, J.: Consistent query answering under primary keys: a characterization of tractable queries. In: International Conference on Database Theory (ICDT), pp. 42–52 (2009)

  43. Wijsen, J.: Database repairing using updates. ACM Transactions on Database Systems 30(3), 722–768 (2005)

    Article  Google Scholar 

  44. Wijsen, J.: Project-join-repair: an approach to consistent query answering under functional dependencies. In: International Conference on Flexible Query Answering Systems (FQAS), pp. 1–12 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Greco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Greco, S., Molinaro, C. Probabilistic query answering over inconsistent databases. Ann Math Artif Intell 64, 185–207 (2012). https://doi.org/10.1007/s10472-012-9287-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-012-9287-9

Keywords

Mathematics Subject Classifications (2010)

Navigation