Abstract
One can conceive many reasonable ways of characterizing how dirty a database is with respect to a set of integrity constraints (e.g., functional dependencies). However, dirtiness measures, as good as they can be, are difficult to interpret for an end-user and do not give the database administrator much hint about how to clean the base. This paper discusses these aspects and proposes some methods aimed at either helping the user or the administrator overcome the limitations of dirtiness measures when it comes to handling dirty databases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Decker, H., Martinenghi, D.: Getting rid of straitjackets for flexible integrity checking. In: DEXA Workshops, pp. 360–364. IEEE Computer Society, Los Alamitos (2007)
Martinenghi, D., Christiansen, H., Decker, H.: Integrity checking and maintenance in relational and deductive databases and beyond. In: Ma, Z. (ed.) Intelligent Databases: Technologies and Applications, pp. 238–285. Idea Group, USA (2006)
Decker, H., Martinenghi, D.: Avenues to flexible data integrity checking. In: DEXA Workshops, pp. 425–429. IEEE Computer Society, Los Alamitos (2006)
Arenas, M., Bertossi, L.E., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. TPLP 3(4-5), 393–424 (2003)
Wijsen, J.: Project-join-repair: An approach to consistent query answering under functional dependencies. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 1–12. Springer, Heidelberg (2006)
Martinez, M.V., Pugliese, A., Simari, G.I., Subrahmanian, V.S., Prade, H.: How dirty is your relational database? An axiomatic approach. In: Mellouli, K. (ed.) ECSQARU 2007. LNCS (LNAI), vol. 4724, pp. 103–114. Springer, Heidelberg (2007)
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: Proc. of ICDE 2007, pp. 746–755 (2007)
Delgado, M., Martin-Bautista, M.-J., Sanchez, D., Vila, M.-A.: Mining strong approximate dependencies from relational databases. In: Proc. of IPMU 2000, pp. 1123–1130 (2000)
Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theor. Comput. Sci. 149(1), 129–149 (1995)
Baral, C., Kraus, S., Minker, J., Subrahmanian, V.S.: Combining knowledge bases consisting of first-order analysis. Computational Intelligence 8, 45–71 (1992)
Lozinskii, E.L.: Resolving contradictions: A plausible semantics for inconsistent systems. J. Autom. Reasoning 12(1), 1–32 (1994)
Hunter, A., Konieczny, S.: Approaches to measuring inconsistent information. In: Bertossi, L., Hunter, A., Schaub, T. (eds.) Inconsistency Tolerance. LNCS, vol. 3300, pp. 191–236. Springer, Heidelberg (2005)
Grant, J., Hunter, A.: Measuring inconsistency in knowledgebases. J. Intell. Inf. Syst. 27(2), 159–184 (2006)
De Luca, A., Termini, S.: A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory. Information and Control 20(4), 301–312 (1972)
Bertossi, L.E.: Consistent query answering in databases. SIGMOD Record 35(2), 68–76 (2006)
Chomicki, J.: Consistent query answering: Five easy pieces. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 1–17. Springer, Heidelberg (2006)
Lipski, W.: On semantic issues connected with incomplete information databases. ACM Transactions on Database Systems 4(3), 262–296 (1979)
Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 375–390. Springer, Heidelberg (2002)
Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: SIGMOD Conference, pp. 143–154 (2005)
Fan, W., Geerts, F., Jia, X.: Conditional dependencies: A principled approach to improving data quality. In: Proc. of BNCOD 2009, pp. 8–20 (2009)
Fan, W., Geerts, F., Jia, X.: Semandaq: a data quality system based on conditional functional dependencies. PVLDB 1(2), 1460–1463 (2008)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: Proc. of VLDB 2007 07, pp. 315–326 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pivert, O., Prade, H. (2010). Handling Dirty Databases: From User Warning to Data Cleaning — Towards an Interactive Approach. In: Deshpande, A., Hunter, A. (eds) Scalable Uncertainty Management. SUM 2010. Lecture Notes in Computer Science(), vol 6379. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15951-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-15951-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15950-3
Online ISBN: 978-3-642-15951-0
eBook Packages: Computer ScienceComputer Science (R0)