skip to main content
research-article
Artifacts Available / v1.1

Bolt-on, Compact, and Rapid Program Slicing for Notebooks

Published:01 September 2022Publication History
Skip Abstract Section

Abstract

Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute a program slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is both bolt-on (and therefore portable) and switchable (allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate Nbslicer's ability to construct small and accurate backward slices (i.e., historical cell dependencies) and forward slices (i.e., cells affected by the "rerun" of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.

References

  1. 2018 (accessed December 1, 2020). Datalore. https://datalore.jetbrains.com/.Google ScholarGoogle Scholar
  2. 2021. Pyccolo: Declarative Instrumentation for Python. https://github.com/smacke/pyccolo.Google ScholarGoogle Scholar
  3. 2022. AST NodeTransformer. https://docs.python.org/3/library/ast.html#ast.NodeTransformer.Google ScholarGoogle Scholar
  4. 2022. sys: System-specific parameters and functions. https://docs.python.org/3/library/sys.html#sys.settrace. Date accessed: 2022-02-28.Google ScholarGoogle Scholar
  5. Gagan Agrawal and Liang Guo. 2001. Evaluating Explicitly Context-Sensitive Program Slicing. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (Snowbird, Utah, USA) (PASTE '01). Association for Computing Machinery, New York, NY, USA, 6--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hiralal Agrawal, Richard A DeMillo, and Eugene H Spafford. 1993. Debugging with dynamic slicing and backtracking. Software: Practice and Experience 23, 6 (1993), 589--616.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Glenn Ammons, Thomas Ball, and James R Larus. 1997. Exploiting hardware performance counters with flow and context sensitive profiling. ACM Sigplan Notices 32, 5 (1997), 85--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Manish Kumar Anand, Shawn Bowers, Timothy Mcphillips, and Bertram Ludäscher. 2009. Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In Scientific and Statistical Database Management. Springer, 237--254.Google ScholarGoogle Scholar
  9. Jennifer M Anderson, Lance M Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R Henzinger, Shun-Tak A Leung, Richard L Sites, Mark T Vandevoorde, Carl A Waldspurger, and William E Weihl. 1997. Continuous profiling: Where have all the cycles gone? ACM Transactions on Computer Systems (TOCS) 15, 4 (1997), 357--390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Matthew Arnold and Barbara G Ryder. 2001. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation. 168--179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Binkley, Nicolas Gold, Mark Harman, Syed Islam, Jens Krinke, and Shin Yoo. 2015. ORBS and the limits of static slicing. In 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM). 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  12. David W Binkley and Mark Harman. 2004. A survey of empirical results on program slicing. Adv. Comput. 62, 105178 (2004), 105--178.Google ScholarGoogle ScholarCross RefCross Ref
  13. Mike Bostock. 2020 (accessed March 1, 2020). Observable: The magic notebook for exploring data. https://observablehq.com/.Google ScholarGoogle Scholar
  14. Shawn Bowers. 2012. Scientific workflow, provenance, and data modeling challenges and approaches. Journal on Data Semantics 1, 1 (2012), 19--30.Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhifei Chen, Lin Chen, Yuming Zhou, Zhaogui Xu, William C Chu, and Baowen Xu. 2014. Dynamic slicing of Python programs. In 2014 IEEE 38th Annual Computer Software and Applications Conference. IEEE, 219--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zhifei Chen, Lin Chen, Yuming Zhou, Zhaogui Xu, William C. Chu, and Baowen Xu. 2014. Dynamic Slicing of Python Programs. In Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications Conference (COMPSAC '14). IEEE Computer Society, USA, 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jim Chow, Dominic Lucchetti, Tal Garfinkel, Geoffrey Lefebvre, Ryan Gardner, Joshua Mason, Sam Small, and Peter M. Chen. 2010. Multi-Stage Replay with Crosscut. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Pittsburgh, Pennsylvania, USA) (VEE '10). Association for Computing Machinery, New York, NY, USA, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Susan B Davidson and Juliana Freire. 2008. Provenance and scientific workflows: challenges and opportunities. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1345--1350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robert DeLine and Danyel Fisher. 2015. Supporting exploratory data analysis with live programming. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 111--119. Google ScholarGoogle ScholarCross RefCross Ref
  20. Robert DeLine, Danyel Fisher, Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, James F. Terwilliger, and John Robert Wernsing. 2015. Tempe: Live scripting for live data. 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2015), 137--141.Google ScholarGoogle ScholarCross RefCross Ref
  21. Joel Grus. 2018 (accessed June 26, 2020). I Don't Like Notebooks (JupyterCon 2018 Talk). https://t.ly/Wt3S.Google ScholarGoogle Scholar
  22. Philip J Guo and Margo I Seltzer. 2012. Burrito: Wrapping your lab notebook in computational infrastructure. (2012).Google ScholarGoogle Scholar
  23. Alena Guzharina. 2020. We Downloaded 10,000,000 Jupyter Notebooks From Github - This Is What We Learned. https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned. Date accessed: 2022-02-28.Google ScholarGoogle Scholar
  24. Robert J Hall and Aaron J Goldberg. 1993. Call Path Profiling of Monotonic Program Resources in {UNIX}. In USENIX Summer 1993 Technical Conference (USENIX Summer 1993 Technical Conference).Google ScholarGoogle Scholar
  25. Mark Harman. [n.d.]. Carving up bugs. http://www0.cs.ucl.ac.uk/staff/M.Harman/exe2.html. Date accessed: 2022-07-14.Google ScholarGoogle Scholar
  26. Andrew Head, Fred Hohman, Titus Barik, Steven M Drucker, and Robert DeLine. 2019. Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Martin Hirzel and Trishul Chilimbi. 2001. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM workshop on feedback-directed and dynamic optimization (FDDO-4). 117--126.Google ScholarGoogle Scholar
  28. Alex Holkner and James Harland. 2009. Evaluating the Dynamic Behaviour of Python Applications. In Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91 (Wellington, New Zealand) (ACSC '09). Australian Computer Society, Inc., AUS, 19--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2917--2926.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mary Beth Kery, Amber Horvath, and Brad A Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists.. In CHI, Vol. 10. 3025453--3025626.Google ScholarGoogle Scholar
  31. Mary Beth Kery and Brad A Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 25--29.Google ScholarGoogle ScholarCross RefCross Ref
  32. Mary Beth Kery and Brad A. Myers. 2018. Interactions for Untangling Messy History in a Computational Notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 147--155. Google ScholarGoogle ScholarCross RefCross Ref
  33. Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The Story in the Notebook: Exploratory Data Science Using a Literate Programming Tool. Association for Computing Machinery, New York, NY, USA, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thomas Kluyver et al. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. In ELPUB. 87--90.Google ScholarGoogle Scholar
  36. David Koop and Jay Patel. 2017. Dataflow notebooks: encoding and tracking dependencies of cells. In 9th {USENIX} Workshop on the Theory and Practice of Provenance (TaPP 2017).Google ScholarGoogle Scholar
  37. Bogdan Korel and Janusz Laski. 1988. Dynamic program slicing. Information processing letters 29, 3 (1988), 155--163.Google ScholarGoogle Scholar
  38. Sam Lau, Ian Drosos, Julia M. Markel, and Philip J. Guo. 2020. The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and Industry. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (VL/HCC '20).Google ScholarGoogle Scholar
  39. Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and Application-Level Sources of Tail Latency. In Proceedings of the ACM Symposium on Cloud Computing (Seattle, WA, USA) (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stephen Macke. 2020 (accessed July 29, 2020). NBSafety Experiments. https://github.com/nbsafety-project/nbsafety-experiments/.Google ScholarGoogle Scholar
  41. Stephen Macke, Hongpu Gong, Doris Jung-Lin Lee, Andrew Head, Doris Xin, and Aditya Parameswaran. 2021. Fine-grained lineage for safer notebook interactions. Proceedings of the VLDB Endowment 14, 6 (2021), 1093--1101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Barry McCardel and Glen Takahashi. 2021 (accessed July 8, 2022). Hex 2.0: Reactivity, Graphs, and a little bit of Magic. https://hex.tech/blog/hex-two-point-oh.Google ScholarGoogle Scholar
  43. Leonardo Murta, Vanessa Braganholo, Fernando Chirigati, David Koop, and Juliana Freire. 2014. noWorkflow: capturing and analyzing provenance of scripts. In International Provenance and Annotation Workshop. Springer, 71--83.Google ScholarGoogle Scholar
  44. Jakob Nielsen. 1994. Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Akira Nishimatsu, Minoru Jihira, Shinji Kusumoto, and Katsuro Inoue. 1999. Call-mark slicing: an efficient and economical way of reducing slice. Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002) (1999), 422--431.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jim Ormond. 2018 (accessed June 26, 2020). ACM Recognizes Innovators Who Have Shaped the Digital Revolution. https://awards.acm.org/binaries/content/assets/press-releases/2018/may/technical-awards-2017.pdf.Google ScholarGoogle Scholar
  47. Jeffrey M Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature 563, 7732 (2018), 145--147.Google ScholarGoogle Scholar
  48. Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2017. noWorkflow: a tool for collecting, analyzing, and managing provenance from python scripts. Proceedings of the VLDB Endowment 10, 12 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Adam Rule, Ian Drosos, Aurelien Tabard, and James D. Hollan. 2018. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 150 (Nov. 2018), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. Association for Computing Machinery, New York, NY, USA, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Koushik Sen, Swaroop Kalasapur, Tasneem Brutch, and Simon Gibbs. 2013. Jalangi: A Selective Record-Replay and Dynamic Analysis Framework for JavaScript. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (Saint Petersburg, Russia) (ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 488--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. S Shankar, S Macke, S Chasins, A Head, and A Parameswaran. 2022. Bolt-on, Compact, and Rapid Program Slicing for Notebooks. Technical Report. Available at: https://smacke.net/papers/nbslicer.pdf.Google ScholarGoogle Scholar
  54. Ben Shneiderman. 1984. Response Time and Display Rate in Human Performance with Computers. ACM Comput. Surv. 16, 3 (sep 1984), 265--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Frank Tip. 1995. A survey of program slicing techniques. J. Program. Lang. 3 (1995).Google ScholarGoogle Scholar
  56. Fons van der Plas. 2020 (accessed July 8, 2022). Pluto.jl: Simple reactive notebooks for Julia. https://github.com/fonsp/Pluto.jl.Google ScholarGoogle Scholar
  57. G. Venkatesh. 1995. Experimental results from dynamic slicing of C programs. ACM Trans. Program. Lang. Syst. 17 (1995), 197--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Tao Wang and Abhik Roychoudhury. 2008. Dynamic slicing on Java bytecode traces. ACM Transactions on Programming Languages and Systems (TOPLAS) 30, 2 (2008), 1--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Mark Weiser. 1982. Programmers use slices when debugging. Commun. ACM 25, 7 (1982), 446--452.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. John Whaley. 2000. A portable sampling-based profiler for Java virtual machines. In Proceedings of the ACM 2000 conference on Java Grande. 78--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zhaogui Xu, Ju Qian, Lin Chen, Zhifei Chen, and Baowen Xu. 2013. Static Slicing for Python First-Class Objects. 2013 13th International Conference on Quality Software (2013), 117--124.Google ScholarGoogle Scholar
  62. YoungSeok Yoon and B. Myers. 2012. An exploratory study of backtracking strategies used by developers. 201 25th International Workshop on Co-operative and Human Aspects of Software Engineering (CHASE) (2012), 138--144.Google ScholarGoogle Scholar
  63. Xiangyu Zhang and Rajiv Gupta. 2004. Cost effective dynamic program slicing. ACM SIGPLAN Notices 39, 6 (2004), 94--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kevin Zielnicki. 2017 (accessed July 5, 2020). Nodebook. https://multithreaded.stitchfix.com/blog/2017/07/26/nodebook/.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader