skip to main content
10.1145/2856767.2856791acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Public Access

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Authors Info & Claims
Published:07 March 2016Publication History

ABSTRACT

Data transformation often requires users to write many trivial and task-dependent programs to transform thousands of records. Recently, programming-by-example (PBE) approaches enable users to transform data without coding. A key challenge of these PBE approaches is to deliver correctly transformed results on large datasets, since these transformation programs are likely to be generated by non-expert users. To address this challenge, existing approaches aim to identify a small set of potentially incorrect records and ask users to examine these records instead of the entire dataset. However, because the transformation scenarios are highly task-dependent, existing approaches cannot capture the incorrect records for various scenarios. We present a approach that learns from past transformation scenarios to generate a meta-classifier to identify the incorrect records. Our approach color-codes these transformed records and then presents them for users to examine. The method allows users to either enter an example for a record transformed incorrectly or confirm the correctness of a transformed record. And our approach can learn from the users' labels to refine the meta-classifier to accurately identify the incorrect records. Simulation results and a user study show that our method can identify the incorrectly transformed records and reduce the user efforts in examining the results.

References

  1. Margaret M. Burnett, Curtis R. Cook, Omkar Pendse, Gregg Rothermel, Jay Summet, and Chris S. Wallace. 2003. End-User Software Engineering with Assertions in the Spreadsheet Paradigm. In ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M.M. Desu and D. Raghavarao (Eds.). 1990. Sample size methodology. Academic press Inc.Google ScholarGoogle Scholar
  4. Ronen Feldman and James Sanger. 2006. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yoav Freund, Robert E Schapire, and others. 1996. Experiments with a new boosting algorithm. In ICML.Google ScholarGoogle Scholar
  6. Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. David F Huynh and Mazzocchi Stefano. OpenRefine http://openrefine.org. http://openrefine.orgGoogle ScholarGoogle Scholar
  8. Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: interactive visual specification of data transformation scripts. In CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Emanuel Kitzelmann and Ute Schmid. 2006. Inductive Synthesis of Functional Programs: An Explanation Based Generalization Approach. Journal of Machine Learning Research (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The State of the Art in End-user Software Engineering. ACM Comput. Surv. (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn. (2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Henry Lieberman (Ed.). 2001. Your Wish is My Command: Programming by Example. Morgan Kaufmann Publishers Inc.Google ScholarGoogle Scholar
  13. Robert C. Miller and Brad A. Myers. 2001. Outlier Finding: Focusing User Attention on Possible Errors. In UIST. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Raymond R. Panko. 1998. What We Know About Spreadsheet Errors. J. End User Comput. (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sandra Rapps and Elaine J. Weyuker. 1985. Selecting Software Test Data Using Data Flow Information. IEEE Trans. Software Eng. (1985). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gregg Rothermel, Margaret Burnett, Lixin Li, Christopher Dupuis, and Andrei Sheretov. 2001. A Methodology for Testing Spreadsheets. ACM Trans. Softw. Eng. Methodol. (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gregg Rothermel, Lixin Li, Christopher DuPuis, and Margaret Burnett. 1997. What You See is What You Test: A Methodology for Testing Form-based Visual Programs. Technical Report. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karen J. Rothermel, Curtis R. Cook, Margaret M. Burnett, Justin Schonfeld, T. R. G. Green, and Gregg Rothermel. 2000. WYSIWYT Testing in the Spreadsheet Paradigm: An Empirical Evaluation. In ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steven A. Wolfman, Tessa A. Lau, Pedro Domingos, and Daniel S. Weld. 2001. Mixed initiative interfaces for learning tasks: SMARTedit talks back. In IUI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bo Wu and Craig A. Knoblock. 2014. Iteratively Learning Conditional Statements in Transforming Data by Example. In Proceedings of the First Workshop on Data Integration and Application at the ICDM.Google ScholarGoogle Scholar
  21. Bo Wu and Craig A. Knoblock. 2015. An Iterative Approach to Synthesize Data Transformation Programs. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bo Wu, Pedro Szekely, and Craig A. Knoblock. 2014. Minimizing User Effort in Transforming Data by Example. In IUI. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Maximizing Correctness with Minimal User Effort to Learn Data Transformations

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    IUI '16: Proceedings of the 21st International Conference on Intelligent User Interfaces
                    March 2016
                    446 pages
                    ISBN:9781450341370
                    DOI:10.1145/2856767

                    Copyright © 2016 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 7 March 2016

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    IUI '16 Paper Acceptance Rate49of194submissions,25%Overall Acceptance Rate746of2,811submissions,27%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader