skip to main content
10.1145/3397271.3401059acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

On Understanding Data Worker Interaction Behaviors

Authors Info & Claims
Published:25 July 2020Publication History

Editorial Notes

A corrigendum was issued for this paper on December 7, 2020. You can download the corrigendum from the supplemental material section of this citation page.

ABSTRACT

Understanding how data workers interact with data and various pieces of information (e.g., code snippet examples) is key to design systems that can better support them in exploring a given dataset. To date, however, there is a paucity of research studying information seeking patterns and the strategies adopted by data workers as they carry out data curation activities. In this work, we aim at understanding the behaviors of data workers in discovering data quality issues, and how these behavioral observations relate to their performance. Specifically, we investigate how data workers use information resources and tools to support their task completion. To this end, we collect a multi-modal dataset through a data-driven experiment that relies on the use of eye-tracking technology with a purpose-designed platform built on top of iPython Notebook. The collected data reveals that: (i) searching in external resources is a prevalent action that can be leveraged to achieve better performance; (ii) 'copy-paste-modify' is a typical strategy for writing code to complete tasks; (iii) providing sample code within the system could help data workers to get started with their task; and (iv) surfacing underlying data is an effective way to support exploration. By investigating the behaviors prior to each search action, we also find that the most common reasons that trigger external search actions are the need to seek assistance in writing or debugging code and to search for relevant code to reuse. Our findings provide insights into patterns of interactions with various system components and information resources to perform data curation tasks. This bears implications on the design of domain-specific IR systems for data workers like code-base search.

Skip Supplemental Material Section

Supplemental Material

3397271.3401059.mp4

mp4

24.3 MB

References

  1. Tarek M Ahmed, Weiyi Shang, and Ahmed E Hassan. 2015. An empirical study of the copy and paste behavior during development. In Proceedings of the 12th Working Conference on MSR. IEEE Press, 99--110.Google ScholarGoogle ScholarCross RefCross Ref
  2. Anne Aula, Rehan M Khan, Zhiwei Guan, Paul Fontes, and Peter Hong. 2010. A comparison of visual and textual page previews in judging the helpfulness of web pages. In Proceedings of WWW. ACM, 51--60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nurzety A Azuan, Suzanne M Embury, and Norman W Paton. 2017. Observing the data scientist: Using manual corrections as implicit feedback. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alan Baddeley. 1992. Working memory. Science, Vol. 255, 5044 (1992), 556--559.Google ScholarGoogle ScholarCross RefCross Ref
  5. Nilavra Bhattacharya and Jacek Gwizdka. 2019. Measuring learning during search: differences in interactions, eye-gaze, and semantic similarity to expert knowledge. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (CHIIR). 63--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Georg Buscher, Edward Cutrell, and Meredith Ringel Morris. 2009. What do you see when you're surfing?: using eye tracking to predict salient regions of web pages. In Proceedings of CHI. ACM, 21--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yeounoh Chung, Sanjay Krishnan, and Tim Kraska. 2017. A data quality metric (DQM): how to estimate the number of undetected errors in data sets. Proceedings of the VLDB Endowment, Vol. 10, 10 (2017), 1094--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tamraparni Dasu and Theodore Johnson. 2003. Exploratory data mining and data cleaning. Vol. 479. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Max Grusky, Jeiran Jahani, Josh Schwartz, Dan Valente, Yoav Artzi, and Mor Naaman. 2017. Modeling Sub-Document Attention Using Viewport Time. In Proceedings of CHI. ACM, 6475--6480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Adam Grzywaczewski and Rahat Iqbal. 2012. Task-specific information retrieval systems for software engineers. J. Comput. System Sci., Vol. 78, 4 (2012), 1204--1218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Philip J Guo, Sean Kandel, Joseph M Hellerstein, and Jeffrey Heer. 2011. Proactive wrangling: Mixed-initiative end-user programming of data transformation scripts. In Proceedings of UIST. 65--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jeff Huang, Ryen W White, and Susan Dumais. 2011. No clicks, no problem: using cursor movements to understand and improve search. In Proceedings of CHI. ACM, 1225--1234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3363--3372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jaewon Kim, Paul Thomas, Ramesh Sankaranarayana, Tom Gedeon, and Hwan-Jin Yoon. 2015. Eye-tracking analysis of user behavior and performance in web search on large and small screens. Journal of the Association for Information Science and Technology, Vol. 66, 3 (2015), 526--544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Miryung Kim, Lawrence Bergman, Tessa Lau, and David Notkin. 2004. An ethnographic study of copy and paste programming practices in OOPL. In Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE'04. IEEE, 83--92.Google ScholarGoogle Scholar
  16. Andrew J Ko, Brad A Myers, Michael J Coblenz, and Htet Htet Aung. 2006. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Transactions on Software Engineering 12 (2006), 971--987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dmitry Lagun and Mounia Lalmas. 2016. Understanding user attention and engagement in online news reading. In Proceedings of WSDM. ACM, 113--122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jing Li, Aixin Sun, Zhenchang Xing, and Lei Han. 2018. API Caveat Explorer--Surfacing Negative Usages from Practice: An API-oriented Interactive Exploratory Search System for Programmers. In Proceedings of SIGIR. 1293--1296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yun Lin, Xin Peng, Zhenchang Xing, Diwen Zheng, and Wenyun Zhao. 2015. Clone-based and Interactive Recommendation for Modifying Pasted Code. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, 520--531.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jiqun Liu, Matthew Mitsui, Nicholas J Belkin, and Chirag Shah. 2019. Task, information seeking intentions, and user behavior: Toward a multi-level understanding of Web search. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (CHIIR). 123--132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50--60.Google ScholarGoogle Scholar
  22. R Mehrotra, AH Awadallah, M Shokouhi, E Yilmaz, I Zitouni, A El Kholy, and M Khabsa. 2017. Deep Sequential Models for Task Satisfaction Prediction. In Proceedings of CIKM. ACM, 737--746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Roberto Minelli, Andrea Mocci, and Michele Lanza. 2015. I know what you did last summer: an investigation of how developers spend their time. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension. IEEE Press, 25--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Krishna Narasimhan and Christoph Reichenbach. 2015. Copy and Paste Redeemed (T). 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2015), 630--640.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David J Piorkowski, Scott D Fleming, Irwin Kwan, Margaret M Burnett, Christopher Scaffidi, Rachel KE Bellamy, and Joshua Jordahl. 2013. The whats and hows of programmers' foraging diets. In Proceedings of CHI. ACM, 3063--3072.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tilmann Rabl and Meikel Poess. 2011. Parallel data generation for performance analysis of large, complex RDBMS. In Proceedings of the Fourth International Workshop on Testing Database Systems. ACM, 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Juan Ramos et almbox. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. 133--142.Google ScholarGoogle Scholar
  29. John W Ratcliff and David E Metzener. 1988. Pattern-matching-the gestalt approach. Dr Dobbs Journal, Vol. 13, 7 (1988), 46.Google ScholarGoogle Scholar
  30. Shazia Sadiq, Tamraparni Dasu, Xin Luna Dong, Juliana Freire, Ihab F Ilyas, Sebastian Link, Miller J Miller, Felix Naumann, Xiaofang Zhou, and Divesh Srivastava. 2018. Data quality: The role of empiricism. ACM SIGMOD Record, Vol. 46, 4 (2018), 35--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How Developers Search for Code: A Case Study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, 191--201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ripon K. Saha, Chanchal K. Roy, Kevin A. Schneider, and Dewayne E. Perry. 2013. Understanding the Evolution of Type-3 Clones: An Exploratory Study. In Proceedings of the 10th Working Conference on MSR. IEEE Press, 139--148.Google ScholarGoogle Scholar
  33. Charles Sutton, Timothy Hobson, James Geddes, and Rich Caruana. 2018. Data Diff: Interpretable, executable summaries of changes in distributions for data wrangling. In Proceedings of KDD. 2279--2288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y Tay. 2011. Data generation for application-specific benchmarking. VLDB, Challenges and Visions, Vol. 7 (2011).Google ScholarGoogle Scholar
  35. Ashish Thusoo and Joydeep Sarma. 2017. Creating a Data-Driven Enterprise with DataOps. O'Reilly Media, Incorporated.Google ScholarGoogle Scholar
  36. Michael E Tipping and Christopher M Bishop. 1999. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 61, 3 (1999), 611--622.Google ScholarGoogle ScholarCross RefCross Ref
  37. Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, and Katsuro Inoue. 2019. How do developers utilize source code from stack overflow? Empirical Software Engineering, Vol. 24, 2 (2019), 637--673.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaohui Xie, Jiaxin Mao, Maarten de Rijke, Ruizhe Zhang, Min Zhang, and Shaoping Ma. 2018. Constructing an interaction behavior model for web image search. In Proceedings of SIGIR. 425--434.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Manuela Züger and Thomas Fritz. 2015. Interruptibility of software developers and its prediction using psycho-physiological sensors. In Proceedings of CHI. ACM, 2981--2990.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On Understanding Data Worker Interaction Behaviors

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
                    July 2020
                    2548 pages
                    ISBN:9781450380164
                    DOI:10.1145/3397271

                    Copyright © 2020 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 25 July 2020

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    Overall Acceptance Rate792of3,983submissions,20%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader