skip to main content
10.1145/3324884.3416613acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Open Access

UnchartIt: an interactive framework for program recovery from charts

Published:27 January 2021Publication History

ABSTRACT

Charts are commonly used for data visualization. Generating a chart usually involves performing data transformations, including data pre-processing and aggregation. These tasks can be cumbersome and time-consuming, even for experienced data scientists. Reproducing existing charts can also be a challenging task when information about data transformations is no longer available.

In this paper, we tackle the problem of recovering data transformations from existing charts. Given an input table and a chart, our goal is to automatically recover the data transformation program underlying the chart. We divide our approach into four steps: (1) data extraction, (2) candidate generation, (3) candidate ranking, and (4) candidate disambiguation. We implemented our approach in a tool called UnchartIt and evaluated it on a set of 50 benchmarks from Kaggle. Experimental results show that UnchartIt successfully ranks the correct data transformation among the top-10 programs in 92% of the benchmarks. To disambiguate the top-ranking programs, we use our new interactive procedure, which successfully disambiguates 98% of the ambiguous benchmarks by asking on average fewer than 2 questions to the user.

References

  1. Rajeev Alur, Rastislav Bodík, Eric Dallal, Dana Fisman, Pranav Garg, Garvit Juniwal, Hadas Kress-Gazit, P. Madhusudan, Milo M. K. Martin, Mukund Raghothaman, Shambwaditya Saha, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2015. Syntax-Guided Synthesis. In Dependable Software Systems Engineering. IOS Press, 1--25.Google ScholarGoogle Scholar
  2. Matej Balog, Alexander Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to Write Programs. In Proc. International Conference on Learning Representations.Google ScholarGoogle Scholar
  3. Leilani Battle, Peitong Duan, Zachery Miranda, Dana Mukusheva, Remco Chang, and Michael Stonebraker. 2018. Beagle: Automated Extraction and Interpretation of Visualizations from the Web. In Proc. Conference on Human Factors in Computing Systems. ACM, 594.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dirk Beyer, Matthias Dangl, and Philipp Wendler. 2018. A Unifying View on SMT-Based Software Verification. Journal of Automated Reasoning 60, 3 (2018), 299--335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. 2015. vZ - An Optimizing SMT Solver. In Proc. International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 194--199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. François Chollet et al. 2015 (accessed May 8, 2020). Keras. https://keras.io.Google ScholarGoogle Scholar
  7. Edmund M. Clarke, Daniel Kroening, and Flavio Lerda. 2004. A Tool for Checking ANSI-C Programs. In Proc. International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 168--176.Google ScholarGoogle ScholarCross RefCross Ref
  8. Emir Demirovic and Peter J. Stuckey. 2019. Techniques Inspired by Local Search for Incomplete MaxSAT and the Linear Algorithm: Varying Resolution and Solution-Guided Search. In Proc. International Conference Principles and Practice of Constraint Programming. Springer, 177--194.Google ScholarGoogle Scholar
  9. Frank Elberzhager, Alla Rosbach, Jürgen Münch, and Robert Eschbach. 2012. Reducing test effort: A systematic mapping study on existing approaches. Inf. Softw. Technol. 54, 10 (2012), 1092--1106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kevin Ellis and Sumit Gulwani. 2017. Learning to Learn Programs from Examples: Going Beyond Program Structure. In Proc. International Joint Conference on Artificial Intelligence. ijcai.org, 1638--1645.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Josh Tenenbaum. 2018. Learning to Infer Graphics Programs from Hand-Drawn Images. In Proc. Annual Conference on Neural Information Processing Systems. 6062--6071.Google ScholarGoogle Scholar
  12. Dennis Felsing, Sarah Grebing, Vladimir Klebanov, Philipp Rümmer, and Mattias Ulbrich. 2014. Automating regression verification. In Proc. International Conference on Automated Software Engineering. ACM, 349--360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 420--435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 422--436.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 229--239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mikhail R. Gadelha, Felipe R. Monteiro, Jeremy Morse, Lucas C. Cordeiro, Bernd Fischer, and Denis A. Nicole. 2018. ESBMC 5.0: An Industrial-Strength C Model Checker. In Proc. International Conference on Automated Software Engineering. ACM, 888--891.Google ScholarGoogle Scholar
  17. Joel Galenson, Philip Reames, Rastislav Bodík, Björn Hartmann, and Koushik Sen. 2014. CodeHint: dynamic and interactive synthesis of code snippets. In Proc. International Conference on Software Engineering. ACM, 653--663.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Benny Godlin and Ofer Strichman. 2008. Inference rules for proving the equivalence of recursive procedures. Acta Informatica 45, 6 (2008), 403--439.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Benny Godlin and Ofer Strichman. 2009. Regression verification. In Proc. Design Automation Conference. ACM, 466--471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proc. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 317--330.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis. Foundations and Trends in Programming Languages 4, 1--2 (2017), 1--119.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question Selection for Interactive Program Synthesis. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhongjun Jin, Michael R. Anderson, Michael J. Cafarella, and H. V. Jagadish. 2017. Foofah: Transforming Data By Example. In Proc. International Conference on Management of Data. ACM, 683--698.Google ScholarGoogle Scholar
  24. Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeongin Hwang, Bongshin Lee, Bo Hyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive Data Extraction from Chart Images. In Proc. Conference on Human Factors in Computing Systems. ACM, 6706--6717.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dmitri V. Kalashnikov, Laks V. S. Lakshmanan, and Divesh Srivastava. 2018. FastQRE: Fast Query Reverse Engineering. In Proc. International Conference on Management of Data. ACM, 337--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive Visual Specification of Data Transformation Scripts. In Proc. SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 3363--3372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the Variance of the Adaptive Learning Rate and Beyond. CoRR abs/1908.03265 (2019).Google ScholarGoogle Scholar
  28. Ruben Martins, Jia Chen, Yanju Chen, Yu Feng, and Isil Dillig. 2019. Trinity: An Extensible Synthesis Framework for Data Science. PVLDB 12, 12 (2019), 1914--1917.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin G. Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In Proc. Symposium on User Interface Software & Technology. ACM, 291--301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Robert Nieuwenhuis and Albert Oliveras. 2006. On SAT Modulo Theories and Optimization Problems. In Proc. International Conference on Theory and Applications of Satisfiability Testing. Springer, 156--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Peter Oehlert. 2005. Violating Assumptions with Fuzzing. IEEE Secur. Priv. 3, 2 (2005), 58--62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Saswat Padhi, Prateek Jain, Daniel Perelman, Oleksandr Polozov, Sumit Gulwani, and Todd D. Millstein. 2018. FlashProfile: a framework for synthesizing data profiles. Proc. ACM Program. Lang. 2, OOPSLA (2018), 150:1--150:28.Google ScholarGoogle Scholar
  33. Mohammad Raza and Sumit Gulwani. 2017. Automated Data Extraction Using Predictive Program Synthesis. In Proc. AAAI Conference on Artificial Intelligence. AAAI Press, 882--890.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ankit Rohatgi. 2019 (accessed May 8, 2020). WebPlotDigitizer, Version 4.2. https://automeris.io/WebPlotDigitizer.Google ScholarGoogle Scholar
  35. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Manolis Savva, Nicholas Kong, Arti Chhajta, Fei-Fei Li, Maneesh Agrawala, and Jeffrey Heer. 2011. ReVision: automated classification, analysis and redesign of chart images. In Proc. Annual ACM Symposium on User Interface Software. ACM, 393--402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proc. International Conference on Machine Learning. 6105--6114.Google ScholarGoogle Scholar
  38. Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Interactive Query Synthesis from Input-Output Examples. In Proc. International Conference on Management of Data. ACM, 1631--1634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 452--466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Chenglong Wang, Yu Feng, Rastislav Bodík, Alvin Cheung, and Isil Dillig. 2020. Visualization by example. PACMPL 4, POPL (2020), 49:1--49:28.Google ScholarGoogle Scholar
  41. Kuat Yessenov, Shubham Tulsiani, Aditya Krishna Menon, Robert C. Miller, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. 2013. A colorful approach to text processing by example. In Proc. Symposium on User Interface Software and Technology. ACM, 495--504.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Andreas Zeller. 2001. Automated Debugging: Are We Close. IEEE Computer 34, 11 (2001), 26--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Software Eng. 28, 2 (2002), 183--200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Michael R. Zhang, James Lucas, Jimmy Ba, and Geoffrey E. Hinton. 2019. Looka-head Optimizer: k steps forward, 1 step back. In Proc. Annual Conference on Neural Information Processing Systems. 9593--9604.Google ScholarGoogle Scholar
  45. Sai Zhang and Yuyin Sun. 2013. Automatically synthesizing SQL queries from input-output examples. In Proc. International Conference on Automated Software Engineering. IEEE, 224--234.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. UnchartIt: an interactive framework for program recovery from charts

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader