Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Authors:
Bo Wu

University of Southern California, Los Angeles, CA, USA

University of Southern California, Los Angeles, CA, USA
View Profile

,
Craig A. Knoblock

University of Southern California, Marina del Rey, CA, USA

University of Southern California, Marina del Rey, CA, USA
View Profile

IUI '16: Proceedings of the 21st International Conference on Intelligent User InterfacesMarch 2016Pages 375–384https://doi.org/10.1145/2856767.2856791

Published:07 March 2016Publication History

IUI '16: Proceedings of the 21st International Conference on Intelligent User Interfaces

Pages 375–384

ABSTRACT

Data transformation often requires users to write many trivial and task-dependent programs to transform thousands of records. Recently, programming-by-example (PBE) approaches enable users to transform data without coding. A key challenge of these PBE approaches is to deliver correctly transformed results on large datasets, since these transformation programs are likely to be generated by non-expert users. To address this challenge, existing approaches aim to identify a small set of potentially incorrect records and ask users to examine these records instead of the entire dataset. However, because the transformation scenarios are highly task-dependent, existing approaches cannot capture the incorrect records for various scenarios. We present a approach that learns from past transformation scenarios to generate a meta-classifier to identify the incorrect records. Our approach color-codes these transformed records and then presents them for users to examine. The method allows users to either enter an example for a record transformed incorrectly or confirm the correctness of a transformed record. And our approach can learn from the users' labels to refine the meta-classifier to accurately identify the incorrect records. Simulation results and a user study show that our method can identify the incorrectly transformed records and reduce the user efforts in examining the results.

References

Margaret M. Burnett, Curtis R. Cook, Omkar Pendse, Gregg Rothermel, Jay Summet, and Chris S. Wallace. 2003. End-User Software Engineering with Assertions in the Spreadsheet Paradigm. In ICSE. Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2011). Google ScholarDigital Library
M.M. Desu and D. Raghavarao (Eds.). 1990. Sample size methodology. Academic press Inc.Google Scholar
Ronen Feldman and James Sanger. 2006. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press. Google ScholarDigital Library
Yoav Freund, Robert E Schapire, and others. 1996. Experiments with a new boosting algorithm. In ICML.Google Scholar
Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In POPL. Google ScholarDigital Library
David F Huynh and Mazzocchi Stefano. OpenRefine http://openrefine.org. http://openrefine.orgGoogle Scholar
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: interactive visual specification of data transformation scripts. In CHI. Google ScholarDigital Library
Emanuel Kitzelmann and Ute Schmid. 2006. Inductive Synthesis of Functional Programs: An Explanation Based Generalization Approach. Journal of Machine Learning Research (2006). Google ScholarDigital Library
Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The State of the Art in End-user Software Engineering. ACM Comput. Surv. (2011). Google ScholarDigital Library
Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn. (2003). Google ScholarDigital Library
Henry Lieberman (Ed.). 2001. Your Wish is My Command: Programming by Example. Morgan Kaufmann Publishers Inc.Google Scholar
Robert C. Miller and Brad A. Myers. 2001. Outlier Finding: Focusing User Attention on Possible Errors. In UIST. Google ScholarDigital Library
Raymond R. Panko. 1998. What We Know About Spreadsheet Errors. J. End User Comput. (1998). Google ScholarDigital Library
Sandra Rapps and Elaine J. Weyuker. 1985. Selecting Software Test Data Using Data Flow Information. IEEE Trans. Software Eng. (1985). Google ScholarDigital Library
Gregg Rothermel, Margaret Burnett, Lixin Li, Christopher Dupuis, and Andrei Sheretov. 2001. A Methodology for Testing Spreadsheets. ACM Trans. Softw. Eng. Methodol. (2001). Google ScholarDigital Library
Gregg Rothermel, Lixin Li, Christopher DuPuis, and Margaret Burnett. 1997. What You See is What You Test: A Methodology for Testing Form-based Visual Programs. Technical Report. Google ScholarDigital Library
Karen J. Rothermel, Curtis R. Cook, Margaret M. Burnett, Justin Schonfeld, T. R. G. Green, and Gregg Rothermel. 2000. WYSIWYT Testing in the Spreadsheet Paradigm: An Empirical Evaluation. In ICSE. Google ScholarDigital Library
Steven A. Wolfman, Tessa A. Lau, Pedro Domingos, and Daniel S. Weld. 2001. Mixed initiative interfaces for learning tasks: SMARTedit talks back. In IUI. Google ScholarDigital Library
Bo Wu and Craig A. Knoblock. 2014. Iteratively Learning Conditional Statements in Transforming Data by Example. In Proceedings of the First Workshop on Data Integration and Application at the ICDM.Google Scholar
Bo Wu and Craig A. Knoblock. 2015. An Iterative Approach to Synthesize Data Transformation Programs. In IJCAI. Google ScholarDigital Library
Bo Wu, Pedro Szekely, and Craig A. Knoblock. 2014. Minimizing User Effort in Transforming Data by Example. In IUI. Google ScholarDigital Library

Index Terms

Recommendations

Minimizing user effort in transforming data by example
IUI '14: Proceedings of the 19th international conference on Intelligent User Interfaces

Programming by example enables users to transform data formats without coding. To be practical, the method must synthesize the correct transformation with minimal user input. We present a method that minimizes user effort by color-coding the ...
Read More
Spreadsheet table transformations from examples
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Every day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
Read More
Spreadsheet table transformations from examples
PLDI '11

Every day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IUI '16: Proceedings of the 21st International Conference on Intelligent User Interfaces
March 2016
446 pages
ISBN:9781450341370
DOI:10.1145/2856767
General Chairs:
Jeffrey Nichols
Google Inc, USA
,
Jalal Mahmud
IBM Research, USA
,
John O'Donovan
UC Santa Barbara, USA
,
Program Chairs:
Cristina Conati
University of British Columbia, Canada
,
Massimo Zancanaro
Bruno Kessler Foundation, Italy
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 March 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data transformation
program synthesis
programming by example
Qualifiers
- research-article
Conference

Acceptance Rates
IUI '16 Paper Acceptance Rate49of194submissions,25%Overall Acceptance Rate746of2,811submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 388
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.