AutoScale: Automatic ETL Scale Process

Martins, Pedro; Abbasi, Maryam; Furtado, Pedro

doi:10.1007/978-3-319-23201-0_3

Pedro Martins⁵,
Maryam Abbasi⁵ &
Pedro Furtado⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 539))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

1266 Accesses
1 Citations

Abstract

In this paper we investigate the problem of providing automatic scalability and data freshness to data warehouses, when at the same time dealing with high-rate data efficiently. In general, data freshness is not guaranteed in those contexts, since data loading, transformation and integration are heavy tasks that are performed only periodically, instead of row by row.

Many current data warehouse deployments are designed to be deployed and work in a single server. However, for many applications problems related with data volume processing times, data rates and requirements for fresh and fast responses, require new solutions to be found.

The solution is to use/build parallel architectures and mechanisms to speed-up data integration and to handle fresh data efficiently.

Desirably, users developing data warehouses need to concentrate solely on the conceptual and logic design (e.g. business driven requirements, logical warehouse schemas, workload and ETL process), while physical details, including mechanisms for scalability, freshness and integration of high-rate data, should be left to automated tools.

We propose a universal data warehouse parallelization solution, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL process. Our results show that the proposed system can handle scalablity to provide the desired processing speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Babcock, B., Babu, S., Datar, M., Motwani, R., Thomas, D.: Operator scheduling in data stream systems. The VLDB Journal–The International Journal on Very Large Data Bases 13(4), 333–353 (2004)
Article Google Scholar
T.P.P. Council: Tpc-h benchmark specification 2008. http://www.tcp.org/hspec.html
Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an etl workflow. In: EDBT, pp. 307–318 (2014)
Google Scholar
Karagiannis, A., Vassiliadis, P., Simitsis, A.: Scheduling strategies for efficient etl execution. Information Systems 38(6), 927–945 (2013)
Article Google Scholar
Muñoz, L., Mazón, J.-N., Trujillo, J.: Automatic generation of etl processes from conceptual models. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 33–40. ACM (2009)
Google Scholar
O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (ssb). Pat (2007)
Google Scholar
Simitsis, A., Vassiliadis, P., Sellis, T.: State-space optimization of etl workflows. IEEE Transactions on Knowledge and Data Engineering 17(10), 1404–1419 (2005)
Article Google Scholar
Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing etl workflows for fault-tolerance. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 385–396. IEEE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, University of Coimbra, Coimbra, Portugal
Pedro Martins, Maryam Abbasi & Pedro Furtado

Authors

Pedro Martins
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Furtado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Martins .

Editor information

Editors and Affiliations

Poznan University of Technology, Poznan, Poland
Tadeusz Morzy
INRIA, Montpellier, France
Patrick Valduriez
National Engineering School for Mechanics and Aerotechnics, Poitiers, France
Ladjel Bellatreche

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martins, P., Abbasi, M., Furtado, P. (2015). AutoScale: Automatic ETL Scale Process. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds) New Trends in Databases and Information Systems. ADBIS 2015. Communications in Computer and Information Science, vol 539. Springer, Cham. https://doi.org/10.1007/978-3-319-23201-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-23201-0_3
Published: 28 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23200-3
Online ISBN: 978-3-319-23201-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics