ABSTRACT
Finding relevant data for a specific task from the numerous data sources available in any organization is a daunting task. This is not only because of the number of possible data sources where the data of interest resides, but also due to the data being scattered all over the enterprise and being typically dirty and inconsistent. In practice, data scientists are routinely reporting that the majority (more than 80%) of their effort is spent finding, cleaning, integrating, and accessing data of interest to a task at hand. We propose to demonstrate DATA CIVILIZER to ease the pain faced in analyzing data "in the wild". DATA CIVILIZER is an end-to-end big data management system with components for data discovery, data integration and stitching, data cleaning, and querying data from a large variety of storage engines, running in large enterprises.
- D. Agrawal, M. L. Ba, L. Berti-Equille, S. Chawla, A. K. Elmagarmid, H. Hammady, Y. Idris, Z. Kaoudi, Z. Khayyat, S. Kruse, M. Ouzzani, P. Papotti, J. Quiané-Ruiz, N. Tang, and M. J. Zaki. Rheem: Enabling multi-platform task execution. In SIGMOD 2016. Google ScholarDigital Library
- M. J. Cafarella, A. Y. Halevy, and N. Khoussainova. Data integration for the relational web. PVLDB, 2(1):1090--1101, 2009. Google ScholarDigital Library
- D. Deng, R. C. Fernandez, Z. Abedjan, S. Wang, A. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang. The Data Civilizer System. In CIDR, 2017.Google Scholar
- J. Duggan, A. J. Elmore, M. Stonebraker, M. Balazinska, B. Howe, J. Kepner, S. Madden, D. Maier, T. Mattson, and S. B. Zdonik. The bigdawg polystore system. SIGMOD Record, 44(2):11--16, 2015. Google ScholarDigital Library
- E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4):3--13, 2000.Google Scholar
Index Terms
- A Demo of the Data Civilizer System
Recommendations
An Enhanced Technique to Clean Data in the Data Warehouse
DESE '11: Proceedings of the 2011 Developments in E-systems EngineeringData quality is a critical factor for the success of data warehousing projects. Improving the quality of data is important in data warehouse, because it is used in the process of decision support, which requires accurate data. There are many errors and ...
A data warehouse architecture for clinical data warehousing
ACSW '07: Proceedings of the fifth Australasian symposium on ACSW frontiers - Volume 68Data warehousing methodologies share a common set of tasks, including business requirements analysis, data design, architectural design, implementation and deployment. Clinical data warehouses are complex and time consuming to review a series of patient ...
PACMMOD V1 N1 Editorial - Welcome to PACMMOD
PACMMODWe are excited to welcome you to Volume 1 of a brand new journal - Proceedings of the ACM on Management of Data, PACMMOD. This new journal is concerned with the principles, algorithms, techniques, systems, and applications of database management systems, ...
Comments