research-article

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Authors:
Shreya Shankar

University of California

University of California
View Profile

,
Stephen Macke

Unaffiliated

Unaffiliated
View Profile

,
Sarah Chasins

University of California

University of California
View Profile

,
Andrew Head

University of Pennsylvania

University of Pennsylvania
View Profile

,
Aditya Parameswaran

University of California

University of California
View Profile

Proceedings of the VLDB Endowment Volume 15 Issue 13pp 4038–4047https://doi.org/10.14778/3565838.3565855

Published:01 September 2022Publication History

Proceedings of the VLDB Endowment

Abstract

Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute a program slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is both bolt-on (and therefore portable) and switchable (allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate Nbslicer's ability to construct small and accurate backward slices (i.e., historical cell dependencies) and forward slices (i.e., cells affected by the "rerun" of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.

References

2018 (accessed December 1, 2020). Datalore. https://datalore.jetbrains.com/.Google Scholar
2021. Pyccolo: Declarative Instrumentation for Python. https://github.com/smacke/pyccolo.Google Scholar
2022. AST NodeTransformer. https://docs.python.org/3/library/ast.html#ast.NodeTransformer.Google Scholar
2022. sys: System-specific parameters and functions. https://docs.python.org/3/library/sys.html#sys.settrace. Date accessed: 2022-02-28.Google Scholar
Gagan Agrawal and Liang Guo. 2001. Evaluating Explicitly Context-Sensitive Program Slicing. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (Snowbird, Utah, USA) (PASTE '01). Association for Computing Machinery, New York, NY, USA, 6--12. Google ScholarDigital Library
Hiralal Agrawal, Richard A DeMillo, and Eugene H Spafford. 1993. Debugging with dynamic slicing and backtracking. Software: Practice and Experience 23, 6 (1993), 589--616.Google ScholarDigital Library
Glenn Ammons, Thomas Ball, and James R Larus. 1997. Exploiting hardware performance counters with flow and context sensitive profiling. ACM Sigplan Notices 32, 5 (1997), 85--96.Google ScholarDigital Library
Manish Kumar Anand, Shawn Bowers, Timothy Mcphillips, and Bertram Ludäscher. 2009. Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In Scientific and Statistical Database Management. Springer, 237--254.Google Scholar
Jennifer M Anderson, Lance M Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R Henzinger, Shun-Tak A Leung, Richard L Sites, Mark T Vandevoorde, Carl A Waldspurger, and William E Weihl. 1997. Continuous profiling: Where have all the cycles gone? ACM Transactions on Computer Systems (TOCS) 15, 4 (1997), 357--390.Google ScholarDigital Library
Matthew Arnold and Barbara G Ryder. 2001. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation. 168--179.Google ScholarDigital Library
David Binkley, Nicolas Gold, Mark Harman, Syed Islam, Jens Krinke, and Shin Yoo. 2015. ORBS and the limits of static slicing. In 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM). 1--10. Google ScholarCross Ref
David W Binkley and Mark Harman. 2004. A survey of empirical results on program slicing. Adv. Comput. 62, 105178 (2004), 105--178.Google ScholarCross Ref
Mike Bostock. 2020 (accessed March 1, 2020). Observable: The magic notebook for exploring data. https://observablehq.com/.Google Scholar
Shawn Bowers. 2012. Scientific workflow, provenance, and data modeling challenges and approaches. Journal on Data Semantics 1, 1 (2012), 19--30.Google ScholarCross Ref
Zhifei Chen, Lin Chen, Yuming Zhou, Zhaogui Xu, William C Chu, and Baowen Xu. 2014. Dynamic slicing of Python programs. In 2014 IEEE 38th Annual Computer Software and Applications Conference. IEEE, 219--228.Google ScholarDigital Library
Zhifei Chen, Lin Chen, Yuming Zhou, Zhaogui Xu, William C. Chu, and Baowen Xu. 2014. Dynamic Slicing of Python Programs. In Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications Conference (COMPSAC '14). IEEE Computer Society, USA, 219--228. Google ScholarDigital Library
Jim Chow, Dominic Lucchetti, Tal Garfinkel, Geoffrey Lefebvre, Ryan Gardner, Joshua Mason, Sam Small, and Peter M. Chen. 2010. Multi-Stage Replay with Crosscut. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Pittsburgh, Pennsylvania, USA) (VEE '10). Association for Computing Machinery, New York, NY, USA, 13--24. Google ScholarDigital Library
Susan B Davidson and Juliana Freire. 2008. Provenance and scientific workflows: challenges and opportunities. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1345--1350.Google ScholarDigital Library
Robert DeLine and Danyel Fisher. 2015. Supporting exploratory data analysis with live programming. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 111--119. Google ScholarCross Ref
Robert DeLine, Danyel Fisher, Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, James F. Terwilliger, and John Robert Wernsing. 2015. Tempe: Live scripting for live data. 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2015), 137--141.Google ScholarCross Ref
Joel Grus. 2018 (accessed June 26, 2020). I Don't Like Notebooks (JupyterCon 2018 Talk). https://t.ly/Wt3S.Google Scholar
Philip J Guo and Margo I Seltzer. 2012. Burrito: Wrapping your lab notebook in computational infrastructure. (2012).Google Scholar
Alena Guzharina. 2020. We Downloaded 10,000,000 Jupyter Notebooks From Github - This Is What We Learned. https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned. Date accessed: 2022-02-28.Google Scholar
Robert J Hall and Aaron J Goldberg. 1993. Call Path Profiling of Monotonic Program Resources in {UNIX}. In USENIX Summer 1993 Technical Conference (USENIX Summer 1993 Technical Conference).Google Scholar
Mark Harman. [n.d.]. Carving up bugs. http://www0.cs.ucl.ac.uk/staff/M.Harman/exe2.html. Date accessed: 2022-07-14.Google Scholar
Andrew Head, Fred Hohman, Titus Barik, Steven M Drucker, and Robert DeLine. 2019. Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
Martin Hirzel and Trishul Chilimbi. 2001. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM workshop on feedback-directed and dynamic optimization (FDDO-4). 117--126.Google Scholar
Alex Holkner and James Harland. 2009. Evaluating the Dynamic Behaviour of Python Applications. In Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91 (Wellington, New Zealand) (ACSC '09). Australian Computer Society, Inc., AUS, 19--28.Google ScholarDigital Library
Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2917--2926.Google ScholarDigital Library
Mary Beth Kery, Amber Horvath, and Brad A Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists.. In CHI, Vol. 10. 3025453--3025626.Google Scholar
Mary Beth Kery and Brad A Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 25--29.Google ScholarCross Ref
Mary Beth Kery and Brad A. Myers. 2018. Interactions for Untangling Messy History in a Computational Notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 147--155. Google ScholarCross Ref
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--11.Google ScholarDigital Library
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The Story in the Notebook: Exploratory Data Science Using a Literate Programming Tool. Association for Computing Machinery, New York, NY, USA, 1--11. Google ScholarDigital Library
Thomas Kluyver et al. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. In ELPUB. 87--90.Google Scholar
David Koop and Jay Patel. 2017. Dataflow notebooks: encoding and tracking dependencies of cells. In 9th {USENIX} Workshop on the Theory and Practice of Provenance (TaPP 2017).Google Scholar
Bogdan Korel and Janusz Laski. 1988. Dynamic program slicing. Information processing letters 29, 3 (1988), 155--163.Google Scholar
Sam Lau, Ian Drosos, Julia M. Markel, and Philip J. Guo. 2020. The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and Industry. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (VL/HCC '20).Google Scholar
Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and Application-Level Sources of Tail Latency. In Proceedings of the ACM Symposium on Cloud Computing (Seattle, WA, USA) (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14. Google ScholarDigital Library
Stephen Macke. 2020 (accessed July 29, 2020). NBSafety Experiments. https://github.com/nbsafety-project/nbsafety-experiments/.Google Scholar
Stephen Macke, Hongpu Gong, Doris Jung-Lin Lee, Andrew Head, Doris Xin, and Aditya Parameswaran. 2021. Fine-grained lineage for safer notebook interactions. Proceedings of the VLDB Endowment 14, 6 (2021), 1093--1101.Google ScholarDigital Library
Barry McCardel and Glen Takahashi. 2021 (accessed July 8, 2022). Hex 2.0: Reactivity, Graphs, and a little bit of Magic. https://hex.tech/blog/hex-two-point-oh.Google Scholar
Leonardo Murta, Vanessa Braganholo, Fernando Chirigati, David Koop, and Juliana Freire. 2014. noWorkflow: capturing and analyzing provenance of scripts. In International Provenance and Annotation Workshop. Springer, 71--83.Google Scholar
Jakob Nielsen. 1994. Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarDigital Library
Akira Nishimatsu, Minoru Jihira, Shinji Kusumoto, and Katsuro Inoue. 1999. Call-mark slicing: an efficient and economical way of reducing slice. Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002) (1999), 422--431.Google ScholarDigital Library
Jim Ormond. 2018 (accessed June 26, 2020). ACM Recognizes Innovators Who Have Shaped the Digital Revolution. https://awards.acm.org/binaries/content/assets/press-releases/2018/may/technical-awards-2017.pdf.Google Scholar
Jeffrey M Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature 563, 7732 (2018), 145--147.Google Scholar
Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2017. noWorkflow: a tool for collecting, analyzing, and managing provenance from python scripts. Proceedings of the VLDB Endowment 10, 12 (2017).Google ScholarDigital Library
Adam Rule, Ian Drosos, Aurelien Tabard, and James D. Hollan. 2018. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 150 (Nov. 2018), 12 pages. Google ScholarDigital Library
Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. Association for Computing Machinery, New York, NY, USA, 1--12. Google ScholarDigital Library
Koushik Sen, Swaroop Kalasapur, Tasneem Brutch, and Simon Gibbs. 2013. Jalangi: A Selective Record-Replay and Dynamic Analysis Framework for JavaScript. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (Saint Petersburg, Russia) (ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 488--498. Google ScholarDigital Library
S Shankar, S Macke, S Chasins, A Head, and A Parameswaran. 2022. Bolt-on, Compact, and Rapid Program Slicing for Notebooks. Technical Report. Available at: https://smacke.net/papers/nbslicer.pdf.Google Scholar
Ben Shneiderman. 1984. Response Time and Display Rate in Human Performance with Computers. ACM Comput. Surv. 16, 3 (sep 1984), 265--285. Google ScholarDigital Library
Frank Tip. 1995. A survey of program slicing techniques. J. Program. Lang. 3 (1995).Google Scholar
Fons van der Plas. 2020 (accessed July 8, 2022). Pluto.jl: Simple reactive notebooks for Julia. https://github.com/fonsp/Pluto.jl.Google Scholar
G. Venkatesh. 1995. Experimental results from dynamic slicing of C programs. ACM Trans. Program. Lang. Syst. 17 (1995), 197--216.Google ScholarDigital Library
Tao Wang and Abhik Roychoudhury. 2008. Dynamic slicing on Java bytecode traces. ACM Transactions on Programming Languages and Systems (TOPLAS) 30, 2 (2008), 1--49.Google ScholarDigital Library
Mark Weiser. 1982. Programmers use slices when debugging. Commun. ACM 25, 7 (1982), 446--452.Google ScholarDigital Library
John Whaley. 2000. A portable sampling-based profiler for Java virtual machines. In Proceedings of the ACM 2000 conference on Java Grande. 78--87.Google ScholarDigital Library
Zhaogui Xu, Ju Qian, Lin Chen, Zhifei Chen, and Baowen Xu. 2013. Static Slicing for Python First-Class Objects. 2013 13th International Conference on Quality Software (2013), 117--124.Google Scholar
YoungSeok Yoon and B. Myers. 2012. An exploratory study of backtracking strategies used by developers. 201 25th International Workshop on Co-operative and Human Aspects of Software Engineering (CHASE) (2012), 138--144.Google Scholar
Xiangyu Zhang and Rajiv Gupta. 2004. Cost effective dynamic program slicing. ACM SIGPLAN Notices 39, 6 (2004), 94--106.Google ScholarDigital Library
Kevin Zielnicki. 2017 (accessed July 5, 2020). Nodebook. https://multithreaded.stitchfix.com/blog/2017/07/26/nodebook/.Google Scholar

Recommendations

Abstract Program Slicing: An Abstract Interpretation-Based Approach to Program Slicing

In the present article, we formally define the notion of abstract program slicing, a general form of program slicing where properties of data are considered instead of their exact value. This approach is applied to a language with numeric and reference ...
Read More
Abstract program slicing: from theory towards an implementation
ICFEM'10: Proceedings of the 12th international conference on Formal engineering methods and software engineering

In this paper we extend the formal framework proposed by Binkley et al. for representing and comparing forms of program slicing. This framework describes many well-known forms of slicing in a unique formal structure based on (abstract) projections of ...
Read More
A brief survey of program slicing

Program slicing is a technique to extract program parts with respect to some special computation. Since Weiser first proposed the notion of slicing in 1979, hundreds of papers have been presented in this area. Tens of variants of slicing have been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 15, Issue 13
September 2022
278 pages
ISSN:2150-8097
Editors:
Fatma Özcan
Google
,
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2022
Published in pvldb Volume 15, Issue 13
Badges
- Artifacts Available / v1.1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 104
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Abstract Program Slicing: An Abstract Interpretation-Based Approach to Program Slicing

Abstract program slicing: from theory towards an implementation

A brief survey of program slicing