ABSTRACT
Data science workflows are human-centered processes involving on-demand programming and analysis. While programmable and interactive interfaces such as widgets embedded within computational notebooks are suitable for these workflows, they lack robust state management capabilities and do not support user-defined customization of the interactive components. The absence of such capabilities hinders workflow reusability and transparency while limiting the scope of exploration of the end-users. In response, we developed Magneton, a framework for authoring interactive widgets within computational notebooks that enables transparent, reusable, and customizable data science workflows. The framework enhances existing widgets to support fine-grained interaction history management, reusable states, and user-defined customizations. We conducted three case studies in a real-world knowledge graph construction and serving platform to evaluate the effectiveness of these widgets. Based on the observations, we discuss future implications of employing Magneton widgets for general-purpose data science workflows.
Footnotes
1 Similar to Magneton, a robot-like Pokémon, our proposed framework stitches together three objectives (Magnemite): transparency, reusability, and customization, to enable robust programmable and interactive interfaces in computational notebooks.
Footnote
Supplemental Material
- Eytan Adar. 2006. GUESS: a language and interface for graph exploration. In Proceedings of the SIGCHI conference on Human Factors in computing systems. ACM, New York, NY, USA, 791–800.Google ScholarDigital Library
- Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, and Marti A. Hearst. 2019. Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 22–31. https://doi.org/10.1109/TVCG.2018.2865040Google ScholarDigital Library
- Robert Amar, James Eagan, and John Stasko. 2005. Low-level components of analytic activity in information visualization. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005.IEEE, New York, NY, USA, 111–117.Google ScholarCross Ref
- Andrea Batch and Niklas Elmqvist. 2017. The interactive visualization gap in initial exploratory data analysis. IEEE transactions on visualization and computer graphics 24, 1 (2017), 278–287.Google Scholar
- Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, and Dominik Moritz. 2022. Symphony: Composing Interactive Interfaces for Machine Learning. In CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–14.Google Scholar
- Mangesh Bendre, Tana Wattanawaroon, Sajjadur Rahman, Kelly Mack, Yuyang Liu, Shichu Zhu, Yu Lu, Ping-Jing Yang, Xinyan Zhou, Kevin Chen-Chuan Chang, 2019. Faster, higher, stronger: Redesigning spreadsheets for scale. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, New York, NY, USA, 1972–1975.Google ScholarCross Ref
- Michael Brachmann and William Spoth. 2020. Your notebook is not crumby enough, REPLace it. In Conference on Innovative Data Systems Research (CIDR). CIDRDB, Chaminade, California, 1–8.Google Scholar
- Souti Chattopadhyay, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020. What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–12.Google ScholarDigital Library
- Will Epperson, Doris Jung-Lin Lee, Leijie Wang, Kunal Agarwal, Aditya G Parameswaran, Dominik Moritz, and Adam Perer. 2022. Leveraging Analysis History for Improved In Situ Visualization Recommendation. Computer Graphics Forum 41, 3 (2022), 145–155.Google ScholarCross Ref
- Max Franz, Christian T Lopes, Gerardo Huck, Yue Dong, Onur Sumer, and Gary D Bader. 2016. Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics 32, 2 (2016), 309–311.Google ScholarCross Ref
- Graphileon. 2022. Graphileon. Graphileon. Retrieved January 19, 2023 from https://graphileon.com/Google Scholar
- Peter Griggs, Cagatay Demiralp, and Sajjadur Rahman. 2021. Towards integrated, interactive, and extensible text data analytics with Leam. In Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances. Association for Computational Linguistics, Online, 52–58. https://doi.org/10.18653/v1/2021.dash-1.9Google ScholarCross Ref
- Holoviz. 2022. Panel. Panel. Retrieved January 19, 2023 from https://panel.holoviz.org/Google Scholar
- IDOM. 2022. IDOM - a declarative Python package for building highly interactive user interfaces. IDOM. Retrieved January 19, 2023 from https://ryanmorshead.com/articles/2021/idom-react-but-its-python/article/Google Scholar
- Observable Inc.2022. Observable. Observable. Retrieved January 19, 2023 from https://observablehq.com/Google Scholar
- Streamlit Inc.2022. Streamlit. Streamlit. Retrieved January 19, 2023 from https://streamlit.io/Google Scholar
- Jupyter. 2022. IPyWidgets. Jupyter. Retrieved January 19, 2023 from https://ipywidgets.readthedocs.io/en/stable/Google Scholar
- Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1265–1276.Google ScholarDigital Library
- Mary Beth Kery, Bonnie E John, Patrick O’Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–13.Google ScholarDigital Library
- Mary Beth Kery and Brad A. Myers. 2018. Interactions for Untangling Messy History in a Computational Notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, New York, NY, USA, 147–155.Google ScholarCross Ref
- Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–11.Google ScholarDigital Library
- Mary Beth Kery, Donghao Ren, Fred Hohman, Dominik Moritz, Kanit Wongsuphasawat, and Kayur Patel. 2020. mage: Fluid moves between code and graphical work in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. ACM, New York, NY, USA, 140–151.Google ScholarDigital Library
- Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. Vol. 2016. IOS Press, Amsterdam, Netherlands.Google Scholar
- Larry G Kontosh. 1999. An occupational information system for the 21st century: The development of the O* NET. Journal of Applied Rehabilitation Counseling 30, 2 (1999), 43.Google Scholar
- Nicolas Kruchten, Jon Mease, and Dominik Moritz. 2022. VegaFusion: Automatic Server-Side Scaling for Interactive Vega Visualizations. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, New York, NY, USA, 11–15.Google Scholar
- Zhicheng Liu, Biye Jiang, and Jeffrey Heer. 2013. imMens: Real-time visual querying of big data. In Computer Graphics Forum. Wiley Online Library, New York, NY, USA, 421–430.Google Scholar
- Justin J Miller. 2013. Graph database applications and concepts with Neo4j. In Proceedings of the southern association for information systems conference, Atlanta, GA, USA. AIS eLibrary, WWW, 1–7.Google Scholar
- Samir Passi and Steven J Jackson. 2018. Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–28.Google ScholarDigital Library
- Plotly. 2022. Dash. Plotly. Retrieved January 19, 2023 from https://plotly.com/dash/Google Scholar
- Plotly. 2022. Low-code Data Apps. Plotly. Retrieved January 19, 2023 from https://plotly.com/Google Scholar
- Sajjadur Rahman, Mangesh Bendre, Yuyang Liu, Shichu Zhu, Zhaoyuan Su, Karrie Karahalios, and Aditya Parameswaran. 2021. NOAH: Interactive Spreadsheet Exploration with Dynamic Hierarchical Overviews. Proceedings of the VLDB Endowment 14, 6 (2021), 970–983.Google ScholarDigital Library
- Sajjadur Rahman, Peter Griggs, and Çağatay Demiralp. 2021. Leam: An Interactive System for In-situ Visual Text Analysis. In CIDR. CIDRDB, cidrdb.org, 1–7.Google Scholar
- Sajjadur Rahman and Eser Kandogan. 2022. Characterizing Practices, Limitations, and Opportunities Related to Text Information Extraction Workflows: A Human-in-the-Loop Perspective. In CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 628, 15 pages. https://doi.org/10.1145/3491102.3502068Google ScholarDigital Library
- Sajjadur Rahman, Hannah Kim, Dan Zhang, Estevam Hruschka, and Eser Kandogan. 2022. Towards Multifaceted Human-Centered AI. ArXiv abs/2301.03656 (2022), 1–2.Google Scholar
- Sajjadur Rahman, Pao Siangliulue, and Adam Marcus. 2020. MixTAPE: Mixed-initiative Team Action Plan Creation Through Semi-structured Notes, Automatic Task Generation, and Task Classification. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–26.Google ScholarDigital Library
- ReactJS. 2022. React Framework. ReactJS. Retrieved January 19, 2023 from https://reactjs.org/Google Scholar
- Arvind Satyanarayan and Jeffrey Heer. 2014. Lyra: An interactive visualization design environment. In Computer Graphics Forum. Wiley Online Library, New York, NY, USA, 351–360.Google Scholar
- Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2016. Vega-lite: A grammar of interactive graphics. IEEE transactions on visualization and computer graphics 23, 1 (2016), 341–350.Google ScholarDigital Library
- TypeScript. 2022. TypeScript Language. TypeScript. Retrieved January 19, 2023 from https://www.typescriptlang.org/Google Scholar
- Jacob VanderPlas, Brian Granger, Jeffrey Heer, Dominik Moritz, Kanit Wongsuphasawat, Arvind Satyanarayan, Eitan Lees, Ilia Timofeev, Ben Welsh, and Scott Sievert. 2018. Altair: interactive statistical visualizations for Python. Journal of open source software 3, 32 (2018), 1057.Google ScholarCross Ref
- Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study. ArXiv abs/1911.00568 (2019), 1–10.Google Scholar
- Yifan Wu, Joseph M Hellerstein, and Arvind Satyanarayan. 2020. B2: Bridging code and interactive visualization in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 152–165.Google ScholarDigital Library
- Ji Soo Yi, Youn ah Kang, John Stasko, and Julie A Jacko. 2007. Toward a deeper understanding of the role of interaction in information visualization. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1224–1231.Google Scholar
- Amy X Zhang, Michael Muller, and Dakuo Wang. 2020. How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–23.Google ScholarDigital Library
- Dan Zhang, Hannah Kim, Rafael Li Chen, Eser Kandogan, and Estevam Hruschka. 2022. MEGAnno: Exploratory Labeling for NLP in Computational Notebooks. In Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 1–7. https://aclanthology.org/2022.dash-1.1Google Scholar
- Xiong Zhang, Jonathan Engel, Sara Evensen, Yuliang Li, Çağatay Demiralp, and Wang-Chiew Tan. 2020. Teddy: A System for Interactive Review Analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376235Google ScholarDigital Library
Index Terms
- Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks
Recommendations
Exploration and Explanation in Computational Notebooks
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsComputational notebooks combine code, visualizations, and text in a single document. Researchers, data analysts, and even journalists are rapidly adopting this new medium. We present three studies of how they are using notebooks to document and share ...
The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsLiterate programming tools are used by millions of programmers today, and are intended to facilitate presenting data analyses in the form of a narrative. We interviewed 21 data scientists to study coding behaviors in a literate programming environment ...
StickyLand: Breaking the Linear Presentation of Computational Notebooks
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing SystemsHow can we better organize code in computational notebooks? Notebooks have become a popular tool among data scientists, as they seamlessly weave text and code together, supporting users to rapidly iterate and document code experiments. However, it is ...
Comments