Work in Progress

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

Authors:
Frederick Choi

Grainger College of Engineering, University of Illinois at Urbana-Champaign, United States

Grainger College of Engineering, University of Illinois at Urbana-Champaign, United States

0000-0002-8818-2456
View Profile

,
Sajjadur Rahman

Megagon Labs, United States60271989

Megagon Labs, United States60271989

0000-0003-4210-1582
View Profile

,
Hannah Kim

Megagon Labs, United States60271989

Megagon Labs, United States60271989

0000-0002-0137-7171
View Profile

,
Dan Zhang

Megagon Labs, United States60271989

Megagon Labs, United States60271989

0000-0002-6330-0217
View Profile

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing SystemsApril 2023Article No.: 290Pages 1–8https://doi.org/10.1145/3544549.3585807

Published:19 April 2023Publication History

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Pages 1–8

ABSTRACT

Data science workflows are human-centered processes involving on-demand programming and analysis. While programmable and interactive interfaces such as widgets embedded within computational notebooks are suitable for these workflows, they lack robust state management capabilities and do not support user-defined customization of the interactive components. The absence of such capabilities hinders workflow reusability and transparency while limiting the scope of exploration of the end-users. In response, we developed Magneton, a framework for authoring interactive widgets within computational notebooks that enables transparent, reusable, and customizable data science workflows. The framework enhances existing widgets to support fine-grained interaction history management, reusable states, and user-defined customizations. We conducted three case studies in a real-world knowledge graph construction and serving platform to evaluate the effectiveness of these widgets. Based on the observations, we discuss future implications of employing Magneton widgets for general-purpose data science workflows.

Footnotes

¹ Similar to Magneton, a robot-like Pokémon, our proposed framework stitches together three objectives (Magnemite): transparency, reusability, and customization, to enable robust programmable and interactive interfaces in computational notebooks.
Footnote

Supplemental Material

3544549.3585807-video-figure.mp4

mp4

147.3 MB

Download

3544549.3585807-talk-video.mp4

mp4

34.9 MB

Download

Available for Download

zip

Supplemental Materials (79.4 KB)

References

Eytan Adar. 2006. GUESS: a language and interface for graph exploration. In Proceedings of the SIGCHI conference on Human Factors in computing systems. ACM, New York, NY, USA, 791–800.Google ScholarDigital Library
Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, and Marti A. Hearst. 2019. Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 22–31. https://doi.org/10.1109/TVCG.2018.2865040Google ScholarDigital Library
Robert Amar, James Eagan, and John Stasko. 2005. Low-level components of analytic activity in information visualization. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005.IEEE, New York, NY, USA, 111–117.Google ScholarCross Ref
Andrea Batch and Niklas Elmqvist. 2017. The interactive visualization gap in initial exploratory data analysis. IEEE transactions on visualization and computer graphics 24, 1 (2017), 278–287.Google Scholar
Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, and Dominik Moritz. 2022. Symphony: Composing Interactive Interfaces for Machine Learning. In CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–14.Google Scholar
Mangesh Bendre, Tana Wattanawaroon, Sajjadur Rahman, Kelly Mack, Yuyang Liu, Shichu Zhu, Yu Lu, Ping-Jing Yang, Xinyan Zhou, Kevin Chen-Chuan Chang, 2019. Faster, higher, stronger: Redesigning spreadsheets for scale. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, New York, NY, USA, 1972–1975.Google ScholarCross Ref
Michael Brachmann and William Spoth. 2020. Your notebook is not crumby enough, REPLace it. In Conference on Innovative Data Systems Research (CIDR). CIDRDB, Chaminade, California, 1–8.Google Scholar
Souti Chattopadhyay, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020. What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–12.Google ScholarDigital Library
Will Epperson, Doris Jung-Lin Lee, Leijie Wang, Kunal Agarwal, Aditya G Parameswaran, Dominik Moritz, and Adam Perer. 2022. Leveraging Analysis History for Improved In Situ Visualization Recommendation. Computer Graphics Forum 41, 3 (2022), 145–155.Google ScholarCross Ref
Max Franz, Christian T Lopes, Gerardo Huck, Yue Dong, Onur Sumer, and Gary D Bader. 2016. Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics 32, 2 (2016), 309–311.Google ScholarCross Ref
Graphileon. 2022. Graphileon. Graphileon. Retrieved January 19, 2023 from https://graphileon.com/Google Scholar
Peter Griggs, Cagatay Demiralp, and Sajjadur Rahman. 2021. Towards integrated, interactive, and extensible text data analytics with Leam. In Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances. Association for Computational Linguistics, Online, 52–58. https://doi.org/10.18653/v1/2021.dash-1.9Google ScholarCross Ref
Holoviz. 2022. Panel. Panel. Retrieved January 19, 2023 from https://panel.holoviz.org/Google Scholar
IDOM. 2022. IDOM - a declarative Python package for building highly interactive user interfaces. IDOM. Retrieved January 19, 2023 from https://ryanmorshead.com/articles/2021/idom-react-but-its-python/article/Google Scholar
Observable Inc.2022. Observable. Observable. Retrieved January 19, 2023 from https://observablehq.com/Google Scholar
Streamlit Inc.2022. Streamlit. Streamlit. Retrieved January 19, 2023 from https://streamlit.io/Google Scholar
Jupyter. 2022. IPyWidgets. Jupyter. Retrieved January 19, 2023 from https://ipywidgets.readthedocs.io/en/stable/Google Scholar
Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1265–1276.Google ScholarDigital Library
Mary Beth Kery, Bonnie E John, Patrick O’Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–13.Google ScholarDigital Library
Mary Beth Kery and Brad A. Myers. 2018. Interactions for Untangling Messy History in a Computational Notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, New York, NY, USA, 147–155.Google ScholarCross Ref
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–11.Google ScholarDigital Library
Mary Beth Kery, Donghao Ren, Fred Hohman, Dominik Moritz, Kanit Wongsuphasawat, and Kayur Patel. 2020. mage: Fluid moves between code and graphical work in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. ACM, New York, NY, USA, 140–151.Google ScholarDigital Library
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. Vol. 2016. IOS Press, Amsterdam, Netherlands.Google Scholar
Larry G Kontosh. 1999. An occupational information system for the 21st century: The development of the O* NET. Journal of Applied Rehabilitation Counseling 30, 2 (1999), 43.Google Scholar
Nicolas Kruchten, Jon Mease, and Dominik Moritz. 2022. VegaFusion: Automatic Server-Side Scaling for Interactive Vega Visualizations. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, New York, NY, USA, 11–15.Google Scholar
Zhicheng Liu, Biye Jiang, and Jeffrey Heer. 2013. imMens: Real-time visual querying of big data. In Computer Graphics Forum. Wiley Online Library, New York, NY, USA, 421–430.Google Scholar
Justin J Miller. 2013. Graph database applications and concepts with Neo4j. In Proceedings of the southern association for information systems conference, Atlanta, GA, USA. AIS eLibrary, WWW, 1–7.Google Scholar
Samir Passi and Steven J Jackson. 2018. Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–28.Google ScholarDigital Library
Plotly. 2022. Dash. Plotly. Retrieved January 19, 2023 from https://plotly.com/dash/Google Scholar
Plotly. 2022. Low-code Data Apps. Plotly. Retrieved January 19, 2023 from https://plotly.com/Google Scholar
Sajjadur Rahman, Mangesh Bendre, Yuyang Liu, Shichu Zhu, Zhaoyuan Su, Karrie Karahalios, and Aditya Parameswaran. 2021. NOAH: Interactive Spreadsheet Exploration with Dynamic Hierarchical Overviews. Proceedings of the VLDB Endowment 14, 6 (2021), 970–983.Google ScholarDigital Library
Sajjadur Rahman, Peter Griggs, and Çağatay Demiralp. 2021. Leam: An Interactive System for In-situ Visual Text Analysis. In CIDR. CIDRDB, cidrdb.org, 1–7.Google Scholar
Sajjadur Rahman and Eser Kandogan. 2022. Characterizing Practices, Limitations, and Opportunities Related to Text Information Extraction Workflows: A Human-in-the-Loop Perspective. In CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 628, 15 pages. https://doi.org/10.1145/3491102.3502068Google ScholarDigital Library
Sajjadur Rahman, Hannah Kim, Dan Zhang, Estevam Hruschka, and Eser Kandogan. 2022. Towards Multifaceted Human-Centered AI. ArXiv abs/2301.03656 (2022), 1–2.Google Scholar
Sajjadur Rahman, Pao Siangliulue, and Adam Marcus. 2020. MixTAPE: Mixed-initiative Team Action Plan Creation Through Semi-structured Notes, Automatic Task Generation, and Task Classification. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–26.Google ScholarDigital Library
ReactJS. 2022. React Framework. ReactJS. Retrieved January 19, 2023 from https://reactjs.org/Google Scholar
Arvind Satyanarayan and Jeffrey Heer. 2014. Lyra: An interactive visualization design environment. In Computer Graphics Forum. Wiley Online Library, New York, NY, USA, 351–360.Google Scholar
Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2016. Vega-lite: A grammar of interactive graphics. IEEE transactions on visualization and computer graphics 23, 1 (2016), 341–350.Google ScholarDigital Library
TypeScript. 2022. TypeScript Language. TypeScript. Retrieved January 19, 2023 from https://www.typescriptlang.org/Google Scholar
Jacob VanderPlas, Brian Granger, Jeffrey Heer, Dominik Moritz, Kanit Wongsuphasawat, Arvind Satyanarayan, Eitan Lees, Ilia Timofeev, Ben Welsh, and Scott Sievert. 2018. Altair: interactive statistical visualizations for Python. Journal of open source software 3, 32 (2018), 1057.Google ScholarCross Ref
Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study. ArXiv abs/1911.00568 (2019), 1–10.Google Scholar
Yifan Wu, Joseph M Hellerstein, and Arvind Satyanarayan. 2020. B2: Bridging code and interactive visualization in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 152–165.Google ScholarDigital Library
Ji Soo Yi, Youn ah Kang, John Stasko, and Julie A Jacko. 2007. Toward a deeper understanding of the role of interaction in information visualization. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1224–1231.Google Scholar
Amy X Zhang, Michael Muller, and Dakuo Wang. 2020. How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–23.Google ScholarDigital Library
Dan Zhang, Hannah Kim, Rafael Li Chen, Eser Kandogan, and Estevam Hruschka. 2022. MEGAnno: Exploratory Labeling for NLP in Computational Notebooks. In Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 1–7. https://aclanthology.org/2022.dash-1.1Google Scholar
Xiong Zhang, Jonathan Engel, Sara Evensen, Yuliang Li, Çağatay Demiralp, and Wang-Chiew Tan. 2020. Teddy: A System for Interactive Review Analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376235Google ScholarDigital Library

Index Terms

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
  2. Visualization
2. Information systems

Recommendations

Exploration and Explanation in Computational Notebooks
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Computational notebooks combine code, visualizations, and text in a single document. Researchers, data analysts, and even journalists are rapidly adopting this new medium. We present three studies of how they are using notebooks to document and share ...
Read More
The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Literate programming tools are used by millions of programmers today, and are intended to facilitate presenting data analyses in the form of a narrative. We interviewed 21 data scientists to study coding behaviors in a literate programming environment ...
Read More
StickyLand: Breaking the Linear Presentation of Computational Notebooks
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems

How can we better organize code in computational notebooks? Notebooks have become a popular tool among data scientists, as they seamlessly weave text and code together, supporting users to rapidly iterate and document code experiments. However, it is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
April 2023
3914 pages
ISBN:9781450394222
DOI:10.1145/3544549
Editors:
Albrecht Schmidt
LMU Munich, Germany
,
Kaisa Väänänen
Tampere University, Finland
,
Tesh Goyal
Google Research, USA
,
Per Ola Kristensson
University of Cambridge, UK
,
Anicia Peters
University of Namibia, Namibia
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 April 2023
Check for updates
Author Tags
Computational notebooks;
Data Science
Exploratory Programming
Interactive programming
Literate Programming
Qualifiers
- Work in Progress
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,164of23,696submissions,26%
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 138
  Total Downloads
- Downloads (Last 12 months)121
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Footnotes

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Exploration and Explanation in Computational Notebooks

The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool

StickyLand: Breaking the Linear Presentation of Computational Notebooks