UnchartIt: an interactive framework for program recovery from charts

Authors:
Daniel Ramos

U. Lisboa, Lisboa, Portugal

U. Lisboa, Lisboa, Portugal
View Profile

,
Jorge Pereira

U. Lisboa. Lisboa, Portugal

U. Lisboa. Lisboa, Portugal
View Profile

,
Inês Lynce

U. Lisboa, Lisboa, Portugal

U. Lisboa, Lisboa, Portugal
View Profile

,
Vasco Manquinho

U. Lisboa, Lisboa, Portugal

U. Lisboa, Lisboa, Portugal
View Profile

,
Ruben Martins

Carnegie Mellon University

Carnegie Mellon University
View Profile

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software EngineeringDecember 2020Pages 175–186https://doi.org/10.1145/3324884.3416613

Published:27 January 2021Publication History

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

Pages 175–186

ABSTRACT

Charts are commonly used for data visualization. Generating a chart usually involves performing data transformations, including data pre-processing and aggregation. These tasks can be cumbersome and time-consuming, even for experienced data scientists. Reproducing existing charts can also be a challenging task when information about data transformations is no longer available.

In this paper, we tackle the problem of recovering data transformations from existing charts. Given an input table and a chart, our goal is to automatically recover the data transformation program underlying the chart. We divide our approach into four steps: (1) data extraction, (2) candidate generation, (3) candidate ranking, and (4) candidate disambiguation. We implemented our approach in a tool called UnchartIt and evaluated it on a set of 50 benchmarks from Kaggle. Experimental results show that UnchartIt successfully ranks the correct data transformation among the top-10 programs in 92% of the benchmarks. To disambiguate the top-ranking programs, we use our new interactive procedure, which successfully disambiguates 98% of the ambiguous benchmarks by asking on average fewer than 2 questions to the user.

References

Rajeev Alur, Rastislav Bodík, Eric Dallal, Dana Fisman, Pranav Garg, Garvit Juniwal, Hadas Kress-Gazit, P. Madhusudan, Milo M. K. Martin, Mukund Raghothaman, Shambwaditya Saha, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2015. Syntax-Guided Synthesis. In Dependable Software Systems Engineering. IOS Press, 1--25.Google Scholar
Matej Balog, Alexander Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to Write Programs. In Proc. International Conference on Learning Representations.Google Scholar
Leilani Battle, Peitong Duan, Zachery Miranda, Dana Mukusheva, Remco Chang, and Michael Stonebraker. 2018. Beagle: Automated Extraction and Interpretation of Visualizations from the Web. In Proc. Conference on Human Factors in Computing Systems. ACM, 594.Google ScholarDigital Library
Dirk Beyer, Matthias Dangl, and Philipp Wendler. 2018. A Unifying View on SMT-Based Software Verification. Journal of Automated Reasoning 60, 3 (2018), 299--335.Google ScholarDigital Library
Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. 2015. vZ - An Optimizing SMT Solver. In Proc. International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 194--199.Google ScholarDigital Library
François Chollet et al. 2015 (accessed May 8, 2020). Keras. https://keras.io.Google Scholar
Edmund M. Clarke, Daniel Kroening, and Flavio Lerda. 2004. A Tool for Checking ANSI-C Programs. In Proc. International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 168--176.Google ScholarCross Ref
Emir Demirovic and Peter J. Stuckey. 2019. Techniques Inspired by Local Search for Incomplete MaxSAT and the Linear Algorithm: Varying Resolution and Solution-Guided Search. In Proc. International Conference Principles and Practice of Constraint Programming. Springer, 177--194.Google Scholar
Frank Elberzhager, Alla Rosbach, Jürgen Münch, and Robert Eschbach. 2012. Reducing test effort: A systematic mapping study on existing approaches. Inf. Softw. Technol. 54, 10 (2012), 1092--1106.Google ScholarDigital Library
Kevin Ellis and Sumit Gulwani. 2017. Learning to Learn Programs from Examples: Going Beyond Program Structure. In Proc. International Joint Conference on Artificial Intelligence. ijcai.org, 1638--1645.Google ScholarCross Ref
Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Josh Tenenbaum. 2018. Learning to Infer Graphics Programs from Hand-Drawn Images. In Proc. Annual Conference on Neural Information Processing Systems. 6062--6071.Google Scholar
Dennis Felsing, Sarah Grebing, Vladimir Klebanov, Philipp Rümmer, and Mattias Ulbrich. 2014. Automating regression verification. In Proc. International Conference on Automated Software Engineering. ACM, 349--360.Google ScholarDigital Library
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 420--435.Google ScholarDigital Library
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 422--436.Google ScholarDigital Library
John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 229--239.Google ScholarDigital Library
Mikhail R. Gadelha, Felipe R. Monteiro, Jeremy Morse, Lucas C. Cordeiro, Bernd Fischer, and Denis A. Nicole. 2018. ESBMC 5.0: An Industrial-Strength C Model Checker. In Proc. International Conference on Automated Software Engineering. ACM, 888--891.Google Scholar
Joel Galenson, Philip Reames, Rastislav Bodík, Björn Hartmann, and Koushik Sen. 2014. CodeHint: dynamic and interactive synthesis of code snippets. In Proc. International Conference on Software Engineering. ACM, 653--663.Google ScholarDigital Library
Benny Godlin and Ofer Strichman. 2008. Inference rules for proving the equivalence of recursive procedures. Acta Informatica 45, 6 (2008), 403--439.Google ScholarDigital Library
Benny Godlin and Ofer Strichman. 2009. Regression verification. In Proc. Design Automation Conference. ACM, 466--471.Google ScholarDigital Library
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proc. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 317--330.Google ScholarDigital Library
Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis. Foundations and Trends in Programming Languages 4, 1--2 (2017), 1--119.Google ScholarCross Ref
Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question Selection for Interactive Program Synthesis. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM.Google ScholarDigital Library
Zhongjun Jin, Michael R. Anderson, Michael J. Cafarella, and H. V. Jagadish. 2017. Foofah: Transforming Data By Example. In Proc. International Conference on Management of Data. ACM, 683--698.Google Scholar
Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeongin Hwang, Bongshin Lee, Bo Hyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive Data Extraction from Chart Images. In Proc. Conference on Human Factors in Computing Systems. ACM, 6706--6717.Google ScholarDigital Library
Dmitri V. Kalashnikov, Laks V. S. Lakshmanan, and Divesh Srivastava. 2018. FastQRE: Fast Query Reverse Engineering. In Proc. International Conference on Management of Data. ACM, 337--350.Google ScholarDigital Library
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive Visual Specification of Data Transformation Scripts. In Proc. SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 3363--3372.Google ScholarDigital Library
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the Variance of the Adaptive Learning Rate and Beyond. CoRR abs/1908.03265 (2019).Google Scholar
Ruben Martins, Jia Chen, Yanju Chen, Yu Feng, and Isil Dillig. 2019. Trinity: An Extensible Synthesis Framework for Data Science. PVLDB 12, 12 (2019), 1914--1917.Google ScholarDigital Library
Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin G. Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In Proc. Symposium on User Interface Software & Technology. ACM, 291--301.Google ScholarDigital Library
Robert Nieuwenhuis and Albert Oliveras. 2006. On SAT Modulo Theories and Optimization Problems. In Proc. International Conference on Theory and Applications of Satisfiability Testing. Springer, 156--169.Google ScholarDigital Library
Peter Oehlert. 2005. Violating Assumptions with Fuzzing. IEEE Secur. Priv. 3, 2 (2005), 58--62.Google ScholarDigital Library
Saswat Padhi, Prateek Jain, Daniel Perelman, Oleksandr Polozov, Sumit Gulwani, and Todd D. Millstein. 2018. FlashProfile: a framework for synthesizing data profiles. Proc. ACM Program. Lang. 2, OOPSLA (2018), 150:1--150:28.Google Scholar
Mohammad Raza and Sumit Gulwani. 2017. Automated Data Extraction Using Predictive Program Synthesis. In Proc. AAAI Conference on Artificial Intelligence. AAAI Press, 882--890.Google ScholarCross Ref
Ankit Rohatgi. 2019 (accessed May 8, 2020). WebPlotDigitizer, Version 4.2. https://automeris.io/WebPlotDigitizer.Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.Google ScholarDigital Library
Manolis Savva, Nicholas Kong, Arti Chhajta, Fei-Fei Li, Maneesh Agrawala, and Jeffrey Heer. 2011. ReVision: automated classification, analysis and redesign of chart images. In Proc. Annual ACM Symposium on User Interface Software. ACM, 393--402.Google ScholarDigital Library
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proc. International Conference on Machine Learning. 6105--6114.Google Scholar
Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Interactive Query Synthesis from Input-Output Examples. In Proc. International Conference on Management of Data. ACM, 1631--1634.Google ScholarDigital Library
Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 452--466.Google ScholarDigital Library
Chenglong Wang, Yu Feng, Rastislav Bodík, Alvin Cheung, and Isil Dillig. 2020. Visualization by example. PACMPL 4, POPL (2020), 49:1--49:28.Google Scholar
Kuat Yessenov, Shubham Tulsiani, Aditya Krishna Menon, Robert C. Miller, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. 2013. A colorful approach to text processing by example. In Proc. Symposium on User Interface Software and Technology. ACM, 495--504.Google ScholarDigital Library
Andreas Zeller. 2001. Automated Debugging: Are We Close. IEEE Computer 34, 11 (2001), 26--31.Google ScholarDigital Library
Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Software Eng. 28, 2 (2002), 183--200.Google ScholarDigital Library
Michael R. Zhang, James Lucas, Jimmy Ba, and Geoffrey E. Hinton. 2019. Looka-head Optimizer: k steps forward, 1 step back. In Proc. Annual Conference on Neural Information Processing Systems. 9593--9604.Google Scholar
Sai Zhang and Yuyin Sun. 2013. Automatically synthesizing SQL queries from input-output examples. In Proc. International Conference on Automated Software Engineering. IEEE, 224--234.Google ScholarDigital Library

Index Terms

UnchartIt: an interactive framework for program recovery from charts
1. Software and its engineering

Recommendations

Theory and practice of ambiguity labelling with a view to interactive disambiguation in text and speech MT
COLING '96: Proceedings of the 16th conference on Computational linguistics - Volume 1

In many contexts, automatic analyzers cannot fully disambiguate a sentence or an utterance reliably, but can produce ambiguous results containing the correct interpretation. It is useful to study vatious properties of these ambiguities in the view of ...
Read More
Foofah: Transforming Data By Example
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Data transformation is a critical first step in modern data analysis: before any analysis can be done, data from a variety of sources must be wrangled into a uniform format that is amenable to the intended analysis and analytical software package. This ...
Read More
Synthesizing transformations on hierarchically structured data
PLDI '16

This paper presents a new approach for synthesizing transformations on tree-structured data, such as Unix directories and XML documents. We consider a general abstraction for such data, called hierarchical data trees (HDTs) and present a novel example-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
December 2020
1449 pages
ISBN:9781450367684
DOI:10.1145/3324884
General Chair:
John Grundy,
Program Chairs:
Claire Le Goues,
David Lo
Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 January 2021
Check for updates
Author Tags
interactive disambiguation
program synthesis
recovering data transformations from charts
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate82of337submissions,24%
Upcoming Conference
ASE '24

Sponsor:

sigsoft online

sigsoft online

ASE '24: 39th IEEE/ACM International Conference on Automated Software Engineering

October 27 - November 1, 2024

Sacramento , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 228
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

UnchartIt: an interactive framework for program recovery from charts

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Theory and practice of ambiguity labelling with a view to interactive disambiguation in text and speech MT

Foofah: Transforming Data By Example

Synthesizing transformations on hierarchically structured data