skip to main content
10.1145/3544549.3585807acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Work in Progress

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

Published:19 April 2023Publication History

ABSTRACT

Data science workflows are human-centered processes involving on-demand programming and analysis. While programmable and interactive interfaces such as widgets embedded within computational notebooks are suitable for these workflows, they lack robust state management capabilities and do not support user-defined customization of the interactive components. The absence of such capabilities hinders workflow reusability and transparency while limiting the scope of exploration of the end-users. In response, we developed Magneton, a framework for authoring interactive widgets within computational notebooks that enables transparent, reusable, and customizable data science workflows. The framework enhances existing widgets to support fine-grained interaction history management, reusable states, and user-defined customizations. We conducted three case studies in a real-world knowledge graph construction and serving platform to evaluate the effectiveness of these widgets. Based on the observations, we discuss future implications of employing Magneton widgets for general-purpose data science workflows.

Footnotes

  1. 1 Similar to Magneton, a robot-like Pokémon, our proposed framework stitches together three objectives (Magnemite): transparency, reusability, and customization, to enable robust programmable and interactive interfaces in computational notebooks.

    Footnote
Skip Supplemental Material Section

Supplemental Material

3544549.3585807-video-figure.mp4

mp4

147.3 MB

3544549.3585807-talk-video.mp4

mp4

34.9 MB

References

  1. Eytan Adar. 2006. GUESS: a language and interface for graph exploration. In Proceedings of the SIGCHI conference on Human Factors in computing systems. ACM, New York, NY, USA, 791–800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, and Marti A. Hearst. 2019. Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 22–31. https://doi.org/10.1109/TVCG.2018.2865040Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Robert Amar, James Eagan, and John Stasko. 2005. Low-level components of analytic activity in information visualization. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005.IEEE, New York, NY, USA, 111–117.Google ScholarGoogle ScholarCross RefCross Ref
  4. Andrea Batch and Niklas Elmqvist. 2017. The interactive visualization gap in initial exploratory data analysis. IEEE transactions on visualization and computer graphics 24, 1 (2017), 278–287.Google ScholarGoogle Scholar
  5. Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, and Dominik Moritz. 2022. Symphony: Composing Interactive Interfaces for Machine Learning. In CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–14.Google ScholarGoogle Scholar
  6. Mangesh Bendre, Tana Wattanawaroon, Sajjadur Rahman, Kelly Mack, Yuyang Liu, Shichu Zhu, Yu Lu, Ping-Jing Yang, Xinyan Zhou, Kevin Chen-Chuan Chang, 2019. Faster, higher, stronger: Redesigning spreadsheets for scale. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, New York, NY, USA, 1972–1975.Google ScholarGoogle ScholarCross RefCross Ref
  7. Michael Brachmann and William Spoth. 2020. Your notebook is not crumby enough, REPLace it. In Conference on Innovative Data Systems Research (CIDR). CIDRDB, Chaminade, California, 1–8.Google ScholarGoogle Scholar
  8. Souti Chattopadhyay, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020. What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Will Epperson, Doris Jung-Lin Lee, Leijie Wang, Kunal Agarwal, Aditya G Parameswaran, Dominik Moritz, and Adam Perer. 2022. Leveraging Analysis History for Improved In Situ Visualization Recommendation. Computer Graphics Forum 41, 3 (2022), 145–155.Google ScholarGoogle ScholarCross RefCross Ref
  10. Max Franz, Christian T Lopes, Gerardo Huck, Yue Dong, Onur Sumer, and Gary D Bader. 2016. Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics 32, 2 (2016), 309–311.Google ScholarGoogle ScholarCross RefCross Ref
  11. Graphileon. 2022. Graphileon. Graphileon. Retrieved January 19, 2023 from https://graphileon.com/Google ScholarGoogle Scholar
  12. Peter Griggs, Cagatay Demiralp, and Sajjadur Rahman. 2021. Towards integrated, interactive, and extensible text data analytics with Leam. In Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances. Association for Computational Linguistics, Online, 52–58. https://doi.org/10.18653/v1/2021.dash-1.9Google ScholarGoogle ScholarCross RefCross Ref
  13. Holoviz. 2022. Panel. Panel. Retrieved January 19, 2023 from https://panel.holoviz.org/Google ScholarGoogle Scholar
  14. IDOM. 2022. IDOM - a declarative Python package for building highly interactive user interfaces. IDOM. Retrieved January 19, 2023 from https://ryanmorshead.com/articles/2021/idom-react-but-its-python/article/Google ScholarGoogle Scholar
  15. Observable Inc.2022. Observable. Observable. Retrieved January 19, 2023 from https://observablehq.com/Google ScholarGoogle Scholar
  16. Streamlit Inc.2022. Streamlit. Streamlit. Retrieved January 19, 2023 from https://streamlit.io/Google ScholarGoogle Scholar
  17. Jupyter. 2022. IPyWidgets. Jupyter. Retrieved January 19, 2023 from https://ipywidgets.readthedocs.io/en/stable/Google ScholarGoogle Scholar
  18. Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1265–1276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mary Beth Kery, Bonnie E John, Patrick O’Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mary Beth Kery and Brad A. Myers. 2018. Interactions for Untangling Messy History in a Computational Notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, New York, NY, USA, 147–155.Google ScholarGoogle ScholarCross RefCross Ref
  21. Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mary Beth Kery, Donghao Ren, Fred Hohman, Dominik Moritz, Kanit Wongsuphasawat, and Kayur Patel. 2020. mage: Fluid moves between code and graphical work in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. ACM, New York, NY, USA, 140–151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. Vol. 2016. IOS Press, Amsterdam, Netherlands.Google ScholarGoogle Scholar
  24. Larry G Kontosh. 1999. An occupational information system for the 21st century: The development of the O* NET. Journal of Applied Rehabilitation Counseling 30, 2 (1999), 43.Google ScholarGoogle Scholar
  25. Nicolas Kruchten, Jon Mease, and Dominik Moritz. 2022. VegaFusion: Automatic Server-Side Scaling for Interactive Vega Visualizations. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, New York, NY, USA, 11–15.Google ScholarGoogle Scholar
  26. Zhicheng Liu, Biye Jiang, and Jeffrey Heer. 2013. imMens: Real-time visual querying of big data. In Computer Graphics Forum. Wiley Online Library, New York, NY, USA, 421–430.Google ScholarGoogle Scholar
  27. Justin J Miller. 2013. Graph database applications and concepts with Neo4j. In Proceedings of the southern association for information systems conference, Atlanta, GA, USA. AIS eLibrary, WWW, 1–7.Google ScholarGoogle Scholar
  28. Samir Passi and Steven J Jackson. 2018. Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Plotly. 2022. Dash. Plotly. Retrieved January 19, 2023 from https://plotly.com/dash/Google ScholarGoogle Scholar
  30. Plotly. 2022. Low-code Data Apps. Plotly. Retrieved January 19, 2023 from https://plotly.com/Google ScholarGoogle Scholar
  31. Sajjadur Rahman, Mangesh Bendre, Yuyang Liu, Shichu Zhu, Zhaoyuan Su, Karrie Karahalios, and Aditya Parameswaran. 2021. NOAH: Interactive Spreadsheet Exploration with Dynamic Hierarchical Overviews. Proceedings of the VLDB Endowment 14, 6 (2021), 970–983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sajjadur Rahman, Peter Griggs, and Çağatay Demiralp. 2021. Leam: An Interactive System for In-situ Visual Text Analysis. In CIDR. CIDRDB, cidrdb.org, 1–7.Google ScholarGoogle Scholar
  33. Sajjadur Rahman and Eser Kandogan. 2022. Characterizing Practices, Limitations, and Opportunities Related to Text Information Extraction Workflows: A Human-in-the-Loop Perspective. In CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 628, 15 pages. https://doi.org/10.1145/3491102.3502068Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sajjadur Rahman, Hannah Kim, Dan Zhang, Estevam Hruschka, and Eser Kandogan. 2022. Towards Multifaceted Human-Centered AI. ArXiv abs/2301.03656 (2022), 1–2.Google ScholarGoogle Scholar
  35. Sajjadur Rahman, Pao Siangliulue, and Adam Marcus. 2020. MixTAPE: Mixed-initiative Team Action Plan Creation Through Semi-structured Notes, Automatic Task Generation, and Task Classification. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. ReactJS. 2022. React Framework. ReactJS. Retrieved January 19, 2023 from https://reactjs.org/Google ScholarGoogle Scholar
  37. Arvind Satyanarayan and Jeffrey Heer. 2014. Lyra: An interactive visualization design environment. In Computer Graphics Forum. Wiley Online Library, New York, NY, USA, 351–360.Google ScholarGoogle Scholar
  38. Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2016. Vega-lite: A grammar of interactive graphics. IEEE transactions on visualization and computer graphics 23, 1 (2016), 341–350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. TypeScript. 2022. TypeScript Language. TypeScript. Retrieved January 19, 2023 from https://www.typescriptlang.org/Google ScholarGoogle Scholar
  40. Jacob VanderPlas, Brian Granger, Jeffrey Heer, Dominik Moritz, Kanit Wongsuphasawat, Arvind Satyanarayan, Eitan Lees, Ilia Timofeev, Ben Welsh, and Scott Sievert. 2018. Altair: interactive statistical visualizations for Python. Journal of open source software 3, 32 (2018), 1057.Google ScholarGoogle ScholarCross RefCross Ref
  41. Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study. ArXiv abs/1911.00568 (2019), 1–10.Google ScholarGoogle Scholar
  42. Yifan Wu, Joseph M Hellerstein, and Arvind Satyanarayan. 2020. B2: Bridging code and interactive visualization in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 152–165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ji Soo Yi, Youn ah Kang, John Stasko, and Julie A Jacko. 2007. Toward a deeper understanding of the role of interaction in information visualization. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1224–1231.Google ScholarGoogle Scholar
  44. Amy X Zhang, Michael Muller, and Dakuo Wang. 2020. How do data science workers collaborate? roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dan Zhang, Hannah Kim, Rafael Li Chen, Eser Kandogan, and Estevam Hruschka. 2022. MEGAnno: Exploratory Labeling for NLP in Computational Notebooks. In Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 1–7. https://aclanthology.org/2022.dash-1.1Google ScholarGoogle Scholar
  46. Xiong Zhang, Jonathan Engel, Sara Evensen, Yuliang Li, Çağatay Demiralp, and Wang-Chiew Tan. 2020. Teddy: A System for Interactive Review Analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376235Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
          April 2023
          3914 pages
          ISBN:9781450394222
          DOI:10.1145/3544549

          Copyright © 2023 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 April 2023

          Check for updates

          Qualifiers

          • Work in Progress
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate6,164of23,696submissions,26%

          Upcoming Conference

          CHI '24
          CHI Conference on Human Factors in Computing Systems
          May 11 - 16, 2024
          Honolulu , HI , USA
        • Article Metrics

          • Downloads (Last 12 months)121
          • Downloads (Last 6 weeks)14

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format