skip to main content
10.1145/3035918.3058740acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

A Demo of the Data Civilizer System

Published:09 May 2017Publication History

ABSTRACT

Finding relevant data for a specific task from the numerous data sources available in any organization is a daunting task. This is not only because of the number of possible data sources where the data of interest resides, but also due to the data being scattered all over the enterprise and being typically dirty and inconsistent. In practice, data scientists are routinely reporting that the majority (more than 80%) of their effort is spent finding, cleaning, integrating, and accessing data of interest to a task at hand. We propose to demonstrate DATA CIVILIZER to ease the pain faced in analyzing data "in the wild". DATA CIVILIZER is an end-to-end big data management system with components for data discovery, data integration and stitching, data cleaning, and querying data from a large variety of storage engines, running in large enterprises.

References

  1. D. Agrawal, M. L. Ba, L. Berti-Equille, S. Chawla, A. K. Elmagarmid, H. Hammady, Y. Idris, Z. Kaoudi, Z. Khayyat, S. Kruse, M. Ouzzani, P. Papotti, J. Quiané-Ruiz, N. Tang, and M. J. Zaki. Rheem: Enabling multi-platform task execution. In SIGMOD 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. J. Cafarella, A. Y. Halevy, and N. Khoussainova. Data integration for the relational web. PVLDB, 2(1):1090--1101, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Deng, R. C. Fernandez, Z. Abedjan, S. Wang, A. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang. The Data Civilizer System. In CIDR, 2017.Google ScholarGoogle Scholar
  4. J. Duggan, A. J. Elmore, M. Stonebraker, M. Balazinska, B. Howe, J. Kepner, S. Madden, D. Maier, T. Mattson, and S. B. Zdonik. The bigdawg polystore system. SIGMOD Record, 44(2):11--16, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4):3--13, 2000.Google ScholarGoogle Scholar

Index Terms

  1. A Demo of the Data Civilizer System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
      May 2017
      1810 pages
      ISBN:9781450341974
      DOI:10.1145/3035918

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 May 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader