Abstract
Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute a program slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is both bolt-on (and therefore portable) and switchable (allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate Nbslicer's ability to construct small and accurate backward slices (i.e., historical cell dependencies) and forward slices (i.e., cells affected by the "rerun" of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.
- 2018 (accessed December 1, 2020). Datalore. https://datalore.jetbrains.com/.Google Scholar
- 2021. Pyccolo: Declarative Instrumentation for Python. https://github.com/smacke/pyccolo.Google Scholar
- 2022. AST NodeTransformer. https://docs.python.org/3/library/ast.html#ast.NodeTransformer.Google Scholar
- 2022. sys: System-specific parameters and functions. https://docs.python.org/3/library/sys.html#sys.settrace. Date accessed: 2022-02-28.Google Scholar
- Gagan Agrawal and Liang Guo. 2001. Evaluating Explicitly Context-Sensitive Program Slicing. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (Snowbird, Utah, USA) (PASTE '01). Association for Computing Machinery, New York, NY, USA, 6--12. Google ScholarDigital Library
- Hiralal Agrawal, Richard A DeMillo, and Eugene H Spafford. 1993. Debugging with dynamic slicing and backtracking. Software: Practice and Experience 23, 6 (1993), 589--616.Google ScholarDigital Library
- Glenn Ammons, Thomas Ball, and James R Larus. 1997. Exploiting hardware performance counters with flow and context sensitive profiling. ACM Sigplan Notices 32, 5 (1997), 85--96.Google ScholarDigital Library
- Manish Kumar Anand, Shawn Bowers, Timothy Mcphillips, and Bertram Ludäscher. 2009. Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In Scientific and Statistical Database Management. Springer, 237--254.Google Scholar
- Jennifer M Anderson, Lance M Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R Henzinger, Shun-Tak A Leung, Richard L Sites, Mark T Vandevoorde, Carl A Waldspurger, and William E Weihl. 1997. Continuous profiling: Where have all the cycles gone? ACM Transactions on Computer Systems (TOCS) 15, 4 (1997), 357--390.Google ScholarDigital Library
- Matthew Arnold and Barbara G Ryder. 2001. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation. 168--179.Google ScholarDigital Library
- David Binkley, Nicolas Gold, Mark Harman, Syed Islam, Jens Krinke, and Shin Yoo. 2015. ORBS and the limits of static slicing. In 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM). 1--10. Google ScholarCross Ref
- David W Binkley and Mark Harman. 2004. A survey of empirical results on program slicing. Adv. Comput. 62, 105178 (2004), 105--178.Google ScholarCross Ref
- Mike Bostock. 2020 (accessed March 1, 2020). Observable: The magic notebook for exploring data. https://observablehq.com/.Google Scholar
- Shawn Bowers. 2012. Scientific workflow, provenance, and data modeling challenges and approaches. Journal on Data Semantics 1, 1 (2012), 19--30.Google ScholarCross Ref
- Zhifei Chen, Lin Chen, Yuming Zhou, Zhaogui Xu, William C Chu, and Baowen Xu. 2014. Dynamic slicing of Python programs. In 2014 IEEE 38th Annual Computer Software and Applications Conference. IEEE, 219--228.Google ScholarDigital Library
- Zhifei Chen, Lin Chen, Yuming Zhou, Zhaogui Xu, William C. Chu, and Baowen Xu. 2014. Dynamic Slicing of Python Programs. In Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications Conference (COMPSAC '14). IEEE Computer Society, USA, 219--228. Google ScholarDigital Library
- Jim Chow, Dominic Lucchetti, Tal Garfinkel, Geoffrey Lefebvre, Ryan Gardner, Joshua Mason, Sam Small, and Peter M. Chen. 2010. Multi-Stage Replay with Crosscut. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Pittsburgh, Pennsylvania, USA) (VEE '10). Association for Computing Machinery, New York, NY, USA, 13--24. Google ScholarDigital Library
- Susan B Davidson and Juliana Freire. 2008. Provenance and scientific workflows: challenges and opportunities. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1345--1350.Google ScholarDigital Library
- Robert DeLine and Danyel Fisher. 2015. Supporting exploratory data analysis with live programming. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 111--119. Google ScholarCross Ref
- Robert DeLine, Danyel Fisher, Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, James F. Terwilliger, and John Robert Wernsing. 2015. Tempe: Live scripting for live data. 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2015), 137--141.Google ScholarCross Ref
- Joel Grus. 2018 (accessed June 26, 2020). I Don't Like Notebooks (JupyterCon 2018 Talk). https://t.ly/Wt3S.Google Scholar
- Philip J Guo and Margo I Seltzer. 2012. Burrito: Wrapping your lab notebook in computational infrastructure. (2012).Google Scholar
- Alena Guzharina. 2020. We Downloaded 10,000,000 Jupyter Notebooks From Github - This Is What We Learned. https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned. Date accessed: 2022-02-28.Google Scholar
- Robert J Hall and Aaron J Goldberg. 1993. Call Path Profiling of Monotonic Program Resources in {UNIX}. In USENIX Summer 1993 Technical Conference (USENIX Summer 1993 Technical Conference).Google Scholar
- Mark Harman. [n.d.]. Carving up bugs. http://www0.cs.ucl.ac.uk/staff/M.Harman/exe2.html. Date accessed: 2022-07-14.Google Scholar
- Andrew Head, Fred Hohman, Titus Barik, Steven M Drucker, and Robert DeLine. 2019. Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
- Martin Hirzel and Trishul Chilimbi. 2001. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM workshop on feedback-directed and dynamic optimization (FDDO-4). 117--126.Google Scholar
- Alex Holkner and James Harland. 2009. Evaluating the Dynamic Behaviour of Python Applications. In Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91 (Wellington, New Zealand) (ACSC '09). Australian Computer Society, Inc., AUS, 19--28.Google ScholarDigital Library
- Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2917--2926.Google ScholarDigital Library
- Mary Beth Kery, Amber Horvath, and Brad A Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists.. In CHI, Vol. 10. 3025453--3025626.Google Scholar
- Mary Beth Kery and Brad A Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 25--29.Google ScholarCross Ref
- Mary Beth Kery and Brad A. Myers. 2018. Interactions for Untangling Messy History in a Computational Notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 147--155. Google ScholarCross Ref
- Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--11.Google ScholarDigital Library
- Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The Story in the Notebook: Exploratory Data Science Using a Literate Programming Tool. Association for Computing Machinery, New York, NY, USA, 1--11. Google ScholarDigital Library
- Thomas Kluyver et al. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. In ELPUB. 87--90.Google Scholar
- David Koop and Jay Patel. 2017. Dataflow notebooks: encoding and tracking dependencies of cells. In 9th {USENIX} Workshop on the Theory and Practice of Provenance (TaPP 2017).Google Scholar
- Bogdan Korel and Janusz Laski. 1988. Dynamic program slicing. Information processing letters 29, 3 (1988), 155--163.Google Scholar
- Sam Lau, Ian Drosos, Julia M. Markel, and Philip J. Guo. 2020. The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and Industry. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (VL/HCC '20).Google Scholar
- Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and Application-Level Sources of Tail Latency. In Proceedings of the ACM Symposium on Cloud Computing (Seattle, WA, USA) (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14. Google ScholarDigital Library
- Stephen Macke. 2020 (accessed July 29, 2020). NBSafety Experiments. https://github.com/nbsafety-project/nbsafety-experiments/.Google Scholar
- Stephen Macke, Hongpu Gong, Doris Jung-Lin Lee, Andrew Head, Doris Xin, and Aditya Parameswaran. 2021. Fine-grained lineage for safer notebook interactions. Proceedings of the VLDB Endowment 14, 6 (2021), 1093--1101.Google ScholarDigital Library
- Barry McCardel and Glen Takahashi. 2021 (accessed July 8, 2022). Hex 2.0: Reactivity, Graphs, and a little bit of Magic. https://hex.tech/blog/hex-two-point-oh.Google Scholar
- Leonardo Murta, Vanessa Braganholo, Fernando Chirigati, David Koop, and Juliana Freire. 2014. noWorkflow: capturing and analyzing provenance of scripts. In International Provenance and Annotation Workshop. Springer, 71--83.Google Scholar
- Jakob Nielsen. 1994. Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarDigital Library
- Akira Nishimatsu, Minoru Jihira, Shinji Kusumoto, and Katsuro Inoue. 1999. Call-mark slicing: an efficient and economical way of reducing slice. Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002) (1999), 422--431.Google ScholarDigital Library
- Jim Ormond. 2018 (accessed June 26, 2020). ACM Recognizes Innovators Who Have Shaped the Digital Revolution. https://awards.acm.org/binaries/content/assets/press-releases/2018/may/technical-awards-2017.pdf.Google Scholar
- Jeffrey M Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature 563, 7732 (2018), 145--147.Google Scholar
- Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2017. noWorkflow: a tool for collecting, analyzing, and managing provenance from python scripts. Proceedings of the VLDB Endowment 10, 12 (2017).Google ScholarDigital Library
- Adam Rule, Ian Drosos, Aurelien Tabard, and James D. Hollan. 2018. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 150 (Nov. 2018), 12 pages. Google ScholarDigital Library
- Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
- Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. Association for Computing Machinery, New York, NY, USA, 1--12. Google ScholarDigital Library
- Koushik Sen, Swaroop Kalasapur, Tasneem Brutch, and Simon Gibbs. 2013. Jalangi: A Selective Record-Replay and Dynamic Analysis Framework for JavaScript. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (Saint Petersburg, Russia) (ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 488--498. Google ScholarDigital Library
- S Shankar, S Macke, S Chasins, A Head, and A Parameswaran. 2022. Bolt-on, Compact, and Rapid Program Slicing for Notebooks. Technical Report. Available at: https://smacke.net/papers/nbslicer.pdf.Google Scholar
- Ben Shneiderman. 1984. Response Time and Display Rate in Human Performance with Computers. ACM Comput. Surv. 16, 3 (sep 1984), 265--285. Google ScholarDigital Library
- Frank Tip. 1995. A survey of program slicing techniques. J. Program. Lang. 3 (1995).Google Scholar
- Fons van der Plas. 2020 (accessed July 8, 2022). Pluto.jl: Simple reactive notebooks for Julia. https://github.com/fonsp/Pluto.jl.Google Scholar
- G. Venkatesh. 1995. Experimental results from dynamic slicing of C programs. ACM Trans. Program. Lang. Syst. 17 (1995), 197--216.Google ScholarDigital Library
- Tao Wang and Abhik Roychoudhury. 2008. Dynamic slicing on Java bytecode traces. ACM Transactions on Programming Languages and Systems (TOPLAS) 30, 2 (2008), 1--49.Google ScholarDigital Library
- Mark Weiser. 1982. Programmers use slices when debugging. Commun. ACM 25, 7 (1982), 446--452.Google ScholarDigital Library
- John Whaley. 2000. A portable sampling-based profiler for Java virtual machines. In Proceedings of the ACM 2000 conference on Java Grande. 78--87.Google ScholarDigital Library
- Zhaogui Xu, Ju Qian, Lin Chen, Zhifei Chen, and Baowen Xu. 2013. Static Slicing for Python First-Class Objects. 2013 13th International Conference on Quality Software (2013), 117--124.Google Scholar
- YoungSeok Yoon and B. Myers. 2012. An exploratory study of backtracking strategies used by developers. 201 25th International Workshop on Co-operative and Human Aspects of Software Engineering (CHASE) (2012), 138--144.Google Scholar
- Xiangyu Zhang and Rajiv Gupta. 2004. Cost effective dynamic program slicing. ACM SIGPLAN Notices 39, 6 (2004), 94--106.Google ScholarDigital Library
- Kevin Zielnicki. 2017 (accessed July 5, 2020). Nodebook. https://multithreaded.stitchfix.com/blog/2017/07/26/nodebook/.Google Scholar
Recommendations
Abstract Program Slicing: An Abstract Interpretation-Based Approach to Program Slicing
In the present article, we formally define the notion of abstract program slicing, a general form of program slicing where properties of data are considered instead of their exact value. This approach is applied to a language with numeric and reference ...
Abstract program slicing: from theory towards an implementation
ICFEM'10: Proceedings of the 12th international conference on Formal engineering methods and software engineeringIn this paper we extend the formal framework proposed by Binkley et al. for representing and comparing forms of program slicing. This framework describes many well-known forms of slicing in a unique formal structure based on (abstract) projections of ...
A brief survey of program slicing
Program slicing is a technique to extract program parts with respect to some special computation. Since Weiser first proposed the notion of slicing in 1979, hundreds of papers have been presented in this area. Tens of variants of slicing have been ...
Comments