ABSTRACT
Data transformation often requires users to write many trivial and task-dependent programs to transform thousands of records. Recently, programming-by-example (PBE) approaches enable users to transform data without coding. A key challenge of these PBE approaches is to deliver correctly transformed results on large datasets, since these transformation programs are likely to be generated by non-expert users. To address this challenge, existing approaches aim to identify a small set of potentially incorrect records and ask users to examine these records instead of the entire dataset. However, because the transformation scenarios are highly task-dependent, existing approaches cannot capture the incorrect records for various scenarios. We present a approach that learns from past transformation scenarios to generate a meta-classifier to identify the incorrect records. Our approach color-codes these transformed records and then presents them for users to examine. The method allows users to either enter an example for a record transformed incorrectly or confirm the correctness of a transformed record. And our approach can learn from the users' labels to refine the meta-classifier to accurately identify the incorrect records. Simulation results and a user study show that our method can identify the incorrectly transformed records and reduce the user efforts in examining the results.
- Margaret M. Burnett, Curtis R. Cook, Omkar Pendse, Gregg Rothermel, Jay Summet, and Chris S. Wallace. 2003. End-User Software Engineering with Assertions in the Spreadsheet Paradigm. In ICSE. Google ScholarDigital Library
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2011). Google ScholarDigital Library
- M.M. Desu and D. Raghavarao (Eds.). 1990. Sample size methodology. Academic press Inc.Google Scholar
- Ronen Feldman and James Sanger. 2006. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press. Google ScholarDigital Library
- Yoav Freund, Robert E Schapire, and others. 1996. Experiments with a new boosting algorithm. In ICML.Google Scholar
- Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In POPL. Google ScholarDigital Library
- David F Huynh and Mazzocchi Stefano. OpenRefine http://openrefine.org. http://openrefine.orgGoogle Scholar
- Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: interactive visual specification of data transformation scripts. In CHI. Google ScholarDigital Library
- Emanuel Kitzelmann and Ute Schmid. 2006. Inductive Synthesis of Functional Programs: An Explanation Based Generalization Approach. Journal of Machine Learning Research (2006). Google ScholarDigital Library
- Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. 2011. The State of the Art in End-user Software Engineering. ACM Comput. Surv. (2011). Google ScholarDigital Library
- Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn. (2003). Google ScholarDigital Library
- Henry Lieberman (Ed.). 2001. Your Wish is My Command: Programming by Example. Morgan Kaufmann Publishers Inc.Google Scholar
- Robert C. Miller and Brad A. Myers. 2001. Outlier Finding: Focusing User Attention on Possible Errors. In UIST. Google ScholarDigital Library
- Raymond R. Panko. 1998. What We Know About Spreadsheet Errors. J. End User Comput. (1998). Google ScholarDigital Library
- Sandra Rapps and Elaine J. Weyuker. 1985. Selecting Software Test Data Using Data Flow Information. IEEE Trans. Software Eng. (1985). Google ScholarDigital Library
- Gregg Rothermel, Margaret Burnett, Lixin Li, Christopher Dupuis, and Andrei Sheretov. 2001. A Methodology for Testing Spreadsheets. ACM Trans. Softw. Eng. Methodol. (2001). Google ScholarDigital Library
- Gregg Rothermel, Lixin Li, Christopher DuPuis, and Margaret Burnett. 1997. What You See is What You Test: A Methodology for Testing Form-based Visual Programs. Technical Report. Google ScholarDigital Library
- Karen J. Rothermel, Curtis R. Cook, Margaret M. Burnett, Justin Schonfeld, T. R. G. Green, and Gregg Rothermel. 2000. WYSIWYT Testing in the Spreadsheet Paradigm: An Empirical Evaluation. In ICSE. Google ScholarDigital Library
- Steven A. Wolfman, Tessa A. Lau, Pedro Domingos, and Daniel S. Weld. 2001. Mixed initiative interfaces for learning tasks: SMARTedit talks back. In IUI. Google ScholarDigital Library
- Bo Wu and Craig A. Knoblock. 2014. Iteratively Learning Conditional Statements in Transforming Data by Example. In Proceedings of the First Workshop on Data Integration and Application at the ICDM.Google Scholar
- Bo Wu and Craig A. Knoblock. 2015. An Iterative Approach to Synthesize Data Transformation Programs. In IJCAI. Google ScholarDigital Library
- Bo Wu, Pedro Szekely, and Craig A. Knoblock. 2014. Minimizing User Effort in Transforming Data by Example. In IUI. Google ScholarDigital Library
Index Terms
- Maximizing Correctness with Minimal User Effort to Learn Data Transformations
Recommendations
Minimizing user effort in transforming data by example
IUI '14: Proceedings of the 19th international conference on Intelligent User InterfacesProgramming by example enables users to transform data formats without coding. To be practical, the method must synthesize the correct transformation with minimal user input. We present a method that minimizes user effort by color-coding the ...
Spreadsheet table transformations from examples
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationEvery day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
Spreadsheet table transformations from examples
PLDI '11Every day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
Comments