Skip to main content

An Approach on ETL Attached Data Quality Management

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8646))

Included in the following conference series:

  • 1987 Accesses

Abstract

This contribution introduces an approach on ETL attached Data Quality Management by means of an autonomous Data Quality Monitoring System. The Data Quality Monitor can be attached (via light-weight connectors) to already implemented ETL processes and allows to quantify data quality and to suggest measures if the quality of a particular data package falls below a certain limit for instance. Furthermore, the long-term vision of this approach is to correct corrupted data (semi-)automatically according to user-defined Data Quality Rules. The Data Quality Monitor can be attached to an ETL process by defining ”snapshot points”, where data samples which should be validated are collected and by introducing ”approval points”, where an ETL process can be interrupted in case of corrupted input data. As the Data Quality Monitor is an autonomous module which is attached to instead of embedded into ETL processes, this approach supports the division of work between ETL developers and special data quality engineers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stumptner, R., Freudenthaler, B., Krenn, M.: BIAccelerator – A template-based approach for rapid ETL development. In: Chen, L., Felfernig, A., Liu, J., Raś, Z.W. (eds.) ISMIS 2012. LNCS, vol. 7661, pp. 435–444. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Lettner, C., Zwick, M.: A data analysis framework for high-variety product lines in the industrial manufacturing domain. To appear in Proceedings of the 16th International Conference on Enterprise Information Systems, Lisbon, Portugal (2014)

    Google Scholar 

  3. Bertossi, L., Bravo, L.: Generic and declarative approaches to data quality management. In: Handbook of Data Quality, pp. 181–211. Springer (2013)

    Google Scholar 

  4. Fan, W., Geerts, F., Jia, X.: Semandaq: A data quality system based on conditional functional dependencies. Proc. VLDB Endow. 1(2), 1460–1463 (2008)

    Article  Google Scholar 

  5. Rodic, J., Baranovic, M.: Generating data quality rules and integration into etl process. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, DOLAP 2009, pp. 65–72. ACM, New York (2009)

    Chapter  Google Scholar 

  6. Microsoft: Data quality services, sql server 2012 books online, http://msdn.microsoft.com/en-us/library/ff877925.aspx (online; accessed January 27, 2014)

  7. Farinha, J., Trigueiros, M.J., Belo, O.: Using inheritance in a metadata based approach to data quality assessment. In: Proceedings of the First International Workshop on Model Driven Service Engineering and Data Quality and Security. MoSE+DQS 2009, pp. 1–8. ACM, New York (2009)

    Chapter  Google Scholar 

  8. Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F., Ouzzani, M., Tang, N.: Nadeef: A commodity data cleaning system. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 541–552. ACM, New York (2013)

    Google Scholar 

  9. Kettle, http://community.pentaho.com/projects/data-integration/ (online; accessed February 05, 2014)

  10. PostgreSQL, http://www.postgresql.org/ (online; accessed February 05, 2014)

  11. Celko, J.: Joe Celko’s SQL for Smarties: Advanced SQL Programming. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Lettner, C., Stumptner, R., Bokesch, KH. (2014). An Approach on ETL Attached Data Quality Management. In: Bellatreche, L., Mohania, M.K. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2014. Lecture Notes in Computer Science, vol 8646. Springer, Cham. https://doi.org/10.1007/978-3-319-10160-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10160-6_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10159-0

  • Online ISBN: 978-3-319-10160-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics