skip to main content
research-article

'Yes, I comply!': Motivations and Practices around Research Data Management and Reuse across Scientific Fields

Published:15 October 2020Publication History
Skip Abstract Section

Abstract

As science becomes increasingly data-intensive, the requirements for comprehensive Research Data Management (RDM) grow. This often overwhelms scientists, requiring more workload and training. The failure to conduct effective RDM leads to producing research artefacts that cannot be reproduced or reused. Past research placed high value on supporting data science workers, but focused mainly on data production, collection, processing, and sensemaking. In order to understand practices and needs of data science workers in relation to documentation, preservation, sharing, and reuse, we conducted a cross-domain study with 15 scientists and data managers from diverse scientific domains. We identified five core concepts which describe requirements, drivers, and boundaries in the development of commitment for RDM, essential for generating reproducible research artefacts: Practice, Adoption, Barriers, Education, and Impact. Based on those concepts, we introduce a stage-based model of personal RDM commitment evolution. The model can be used to drive the design of future systems that support a transition to open science. We discuss infrastructure, policies, and motivations involved at the stages and transitions in the model. Our work supports designers in understanding the constraints and challenges involved in designing for reproducibility in an age of data-driven science.

Skip Supplemental Material Section

Supplemental Material

References

  1. ACM. 2018. Artifact Review and Badging. Website. (April 2018). https://www.acm.org/publications/policies/artifact-review-badging Retrieved January 15, 2020.Google ScholarGoogle Scholar
  2. Katherine G Akers and Jennifer Doty. 2013. Disciplinary differences in faculty research data management practices and perspectives. (2013).Google ScholarGoogle Scholar
  3. Monya Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature, Vol. 533, 7604 (2016), 452--454. https://doi.org/10.1038/533452aGoogle ScholarGoogle ScholarCross RefCross Ref
  4. Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, and Carole Goble. 2013. Why linked data is not enough for scientists. Future Generation Computer Systems, Vol. 29, 2 (2013), 599--611. https://doi.org/10.1016/j.future.2011.08.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Khalid Belhajjame, Jun Zhao, Daniel Garijo, Kristina Hettne, Raul Palma, Ó scar Corcho, José -Manuel Gó mez-Pé rez, Sean Bechhofer, Graham Klyne, and Carole Goble. 2014. The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web. arXiv preprint arXiv: 1401.4307 February 2014 (2014), 20. arxiv: 1401.4307 http://arxiv.org/abs/1401.4307Google ScholarGoogle Scholar
  6. Gordon Bell, Tony Hey, and Alex Szalay. 2009. Beyond the data deluge. Science, Vol. 323, 5919 (2009), 1297--1298.Google ScholarGoogle Scholar
  7. Tim Berners-Lee, Robert Cailliau, Jean-Francois Groff, and Bernd Pollermann. 1992. World-wide web: The information universe ., bibinfonumpages52--58 pages. http://links.emeraldinsight.com/doi/10.1108/eb047254Google ScholarGoogle Scholar
  8. Jeremy P Birnholtz and Matthew J Bietz. 2003. Data at work: supporting sharing in science and engineering. In Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work. ACM, 339--348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Carolyn Bishoff and Lisa Johnston. 2015. Approaches to Data Sharing: An Analysis of NSF Data Management Plans from a Large Research University. Journal of Librarianship & Scholarly Communication, Vol. 3, 2 (2015).Google ScholarGoogle ScholarCross RefCross Ref
  10. Ann Blandford, Dominic Furniss, and Stephann Makri. 2016. Qualitative HCI Research: Going Behind the Scenes .Morgan & Claypool Publishers, 51--60. https://doi.org/10.2200/S00706ED1V01Y201602HCI034Google ScholarGoogle ScholarCross RefCross Ref
  11. Shannon Bohle. 2013. What is e-science and how should it be managed. Nature, Spektrum der Wissenschaft (Scientific American), http://www.scilogs.com/scientific_and_medicallib raries/what-is-e-science-and-how-should-it-be-managed (2013).Google ScholarGoogle Scholar
  12. Ronald F Boisvert. 2016. Incentivizing reproducibility. Commun. ACM, Vol. 59, 10 (2016), 5--5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christine L Borgman. 2007. Scholarship in the digital age: information, infrastructure, and the internet. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cunera M Buys and Pamela L Shaw. 2015. Data Management Practices Across an Institution: Survey and Report. Journal of Librarianship & Scholarly Communication, Vol. 3, 2 (2015).Google ScholarGoogle ScholarCross RefCross Ref
  15. Xiaoli Chen, Sünje Dallmeier-Tiessen, Anxhela Dani, Robin Dasler, Javier Delgado Fernández, Pamfilos Fokianos, Patricia Herterich, and Tibor vS imko. 2016. CERN Analysis Preservation: A Novel Digital Library Service to Enable Reusable and Reproducible Research. In International Conference on Theory and Practice of Digital Libraries. Springer, 347--356.Google ScholarGoogle Scholar
  16. Xiaoli Chen, Sünje Dallmeier-Tiessen, Robin Dasler, Sebastian Feger, Pamfilos Fokianos, Jose Benito Gonzalez, Harri Hirvonsalo, Dinos Kousidis, Artemis Lavasa, Salvatore Mele, et almbox. 2019. Open is not enough. Nature Physics, Vol. 15, 2 (2019), 113--119.Google ScholarGoogle ScholarCross RefCross Ref
  17. Victoria Clarke and Virginia Braun. 2014. Thematic Analysis .Springer New York, New York, NY, 1947--1952. https://doi.org/10.1007/978--1--4614--5583--7_311Google ScholarGoogle Scholar
  18. Open Science Collaboration. 2012. An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science. Perspectives on Psychological Science, Vol. 7, 6 (2012), 657--660. nulltextgreater https://doi.org/10.1177/1745691612462588Google ScholarGoogle ScholarCross RefCross Ref
  19. A De Waard, H Cousijn, and IjJ Aalbersberg. 2015. 10 aspects of highly effective research data: Good research data management makes data reusable. (11 December 2015). https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-dataGoogle ScholarGoogle Scholar
  20. Edward L Deci and Richard M Ryan. 1985. Toward an organismic integration theory. In Intrinsic motivation and self-determination in human behavior. Springer, 113--148.Google ScholarGoogle Scholar
  21. Vasant Dhar. 2012. Data science and prediction. (2012).Google ScholarGoogle Scholar
  22. Florian Echtler and Maximilian H"aussler. 2018. Open Source, Open Science, and the Replication Crisis in HCI (CHI EA '18). ACM, New York, NY, USA, Article alt02, bibinfonumpages8 pages. https://doi.org/10.1145/3170427.3188395Google ScholarGoogle Scholar
  23. Ixchel M Faniel and Trond E Jacobsen. 2010. Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues' data. Computer Supported Cooperative Work (CSCW), Vol. 19, 3--4 (2010), 355--375.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Benedikt Fecher, Sascha Friesike, Marcel Hebing, and Stephanie Linek. 2017. A reputation economy: how individual reward considerations trump systemic arguments for open access to data. Palgrave Communications, Vol. 3 (2017), 17051.Google ScholarGoogle ScholarCross RefCross Ref
  25. Sebastian Feger, Sünje Dallmeier-Tiessen, Pawel Wozniak, and Albrecht Schmidt. 2018. Just Not The Usual Workplace: Meaningful Gamification in Science. In Mensch und Computer 2018 - Workshopband, Raimund Dachselt and Gerhard Weber (Eds.). Gesellschaft für Informatik e.V., Bonn. https://doi.org/10.18420/muc2018-ws03-0366Google ScholarGoogle Scholar
  26. Sebastian S. Feger, Sünje Dallmeier-Tiessen, Albrecht Schmidt, and Paweł W. Wo'zniak. 2019 a. Designing for Reproducibility: A Qualitative Study of Challenges and Opportunities in High Energy Physics. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI'19 (2019). https://doi.org/10.1145/3290605.3300685Google ScholarGoogle Scholar
  27. Sebastian S. Feger, Sünje Dallmeier-Tiessen, Paweł W. Wo'zniak, and Albrecht Schmidt. 2019 b. Gamification in Science: A Study of Requirements in the Context of Reproducible Research. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI'19 (2019). https://doi.org/10.1145/3290605.3300690Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sebastian S. Feger, Sünje Dallmeier-Tiessen, Paweł W. Wo'zniak, and Albrecht Schmidt. 2019 c. The Role of HCI in Reproducible Science: Understanding, Supporting and Motivating Core Practices (CHI EA '19). ACM, New York, NY, USA, Article LBW0246, bibinfonumpages6 pages. https://doi.org/10.1145/3290607.3312905Google ScholarGoogle Scholar
  29. Center for Open Science (COS). 2019. Open Science Badges. Website. (2019). https://cos.io/our-services/open-science-badges Retrieved February 27, 2019.Google ScholarGoogle Scholar
  30. FORCE11. 2014. The FAIR data principles. Website. Retrieved August 8, 2017 from https://www.force11.org/group/fairgroup/fairprinciples.Google ScholarGoogle Scholar
  31. Frédéric Guay, Robert J Vallerand, and Céline Blanchard. 2000. On the assessment of situational intrinsic and extrinsic motivation: The Situational Motivation Scale (SIMS). Motivation and emotion, Vol. 24, 3 (2000), 175--213.Google ScholarGoogle Scholar
  32. James Howison and James D Herbsleb. 2011. Scientific software production: incentives and collaboration. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 513--522.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. James Howison and James D. Herbsleb. 2013. Incentives and Integration in Scientific Software Production (CSCW '13). Association for Computing Machinery, New York, NY, USA, 459--470. https://doi.org/10.1145/2441776.2441828Google ScholarGoogle Scholar
  34. Matthew B Hoy. 2014. Big data: An introduction for librarians. Medical reference services quarterly, Vol. 33, 3 (2014), 320--326. https://doi.org/10.1080/02763869.2014.925709Google ScholarGoogle Scholar
  35. Xing Huang, Xianghua Ding, Charlotte P Lee, Tun Lu, and Ning Gu. 2013. Meanings and boundaries of scientific software sharing. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 423--434.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Matthew Hutson. 2018. Artificial intelligence faces reproducibility crisis.Google ScholarGoogle Scholar
  37. Lori M Jahnke and Andrew Asher. 2012. The problem of data: Data management and curation practices among university researchers. L. Jahnke, A. Asher & SDC Keralis, The problem of data (2012), 3--31.Google ScholarGoogle Scholar
  38. Marina Jirotka, Charlotte P Lee, and Gary M Olson. 2013. Supporting scientific collaboration: Methods, tools and concepts. Computer Supported Cooperative Work (CSCW), Vol. 22, 4--6 (2013), 667--715.Google ScholarGoogle ScholarCross RefCross Ref
  39. Marina Jirotka, Rob Procter, Tom Rodden, and Geoffrey C. Bowker. 2006. Special Issue: Collaboration in e-Research. Computer Supported Cooperative Work (CSCW), Vol. 15, 4 (01 Aug 2006), 251--255. https://doi.org/10.1007/s10606-006--9028-xGoogle ScholarGoogle Scholar
  40. Helena Karasti, Karen S. Baker, and Eija Halkola. 2006. Enriching the notion of data curation in e-Science: Data managing and information infrastructuring in the Long Term Ecological Research (LTER) network. Computer Supported Cooperative Work, Vol. 15, 4 (2006), 321--358. https://doi.org/10.1007/s10606-006--9023--2Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Karina Kervin, Thomas Finholt, and Margaret Hedstrom. 2012. Macro and micro pressures in data sharing. In 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI). IEEE, 525--532.Google ScholarGoogle ScholarCross RefCross Ref
  42. Mallory C Kidwell, Ljiljana B Lazarević, Erica Baranski, Tom E Hardwicke, Sarah Piechowski, Lina-Sophia Falkenberg, Curtis Kennett, Agnieszka Slowik, Carina Sonnleitner, Chelsey Hess-Holden, et almbox. 2016. Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS biology, Vol. 14, 5 (2016), e1002456.Google ScholarGoogle Scholar
  43. Wendy E Mackay, Caroline Appert, Michel Beaudouin-Lafon, Olivier Chapuis, Yangzhou Du, Jean-Daniel Fekete, and Yves Guiard. 2007. Touchstone: exploratory design of experiments. CHI '07 Proceedings of the SIGCHI Conference on Human Factors in Computing System (2007), 1425--1434. https://doi.org/10.1145/1240624.1240840Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Matthew S Mayernik, Jillian C Wallis, Alberto Pepe, and Christine L Borgman. 2008. Whose data do you trust? Integrity issues in the preservation of scientific data. (2008).Google ScholarGoogle Scholar
  45. Michael Muller. 2014. Curiosity, creativity, and surprise as analytic tools: Grounded theory method. In Ways of Knowing in HCI. Springer, 25--48.Google ScholarGoogle Scholar
  46. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation (CHI '19). ACM, New York, NY, USA, Article 126, bibinfonumpages15 pages. https://doi.org/10.1145/3290605.3300356Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Brian A Nosek, Jeffrey R Spies, and Matt Motyl. 2012. Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, Vol. 7, 6 (2012), 615--631.Google ScholarGoogle ScholarCross RefCross Ref
  48. Drew Paine, Erin Sy, Ron Piell, and Charlotte P Lee. 2015. Examining data processing work as part of the scientific data lifecycle: Comparing practices across four scientific research groups. iConference 2015 Proceedings (2015).Google ScholarGoogle Scholar
  49. Irene V. Pasquetto, Ashley E. Sands, Peter T. Darch, and Christine L. Borgman. 2016. Open Data in Scientific Settings: From Policy to Practice (CHI '16). ACM, New York, NY, USA, 1585--1596. https://doi.org/10.1145/2858036.2858543Google ScholarGoogle Scholar
  50. Jian Qin. 2016. Metadata and reproducibility: A case study of gravitational wave data management. International Journal of Digital Curation, Vol. 11, 1 (2016), 218--231.Google ScholarGoogle ScholarCross RefCross Ref
  51. Betsy Rolland and Charlotte P Lee. 2013. Beyond trust and reliability: reusing data in collaborative cancer epidemiology research. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 435--444.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Michael Rosenblatt. 2016. An incentive-based approach for improving data reproducibility. Science Translational Medicine, Vol. 8, 336 (2016), 336ed5--336ed5. https://doi.org/10.1126/scitranslmed.aaf5003 https://doi.org/10.1038/sdata.2016.18Google ScholarGoogle ScholarCross RefCross Ref
  53. Daniel J Worden. 2017. Emerging Technologies for Data Research: Implications for Bias, Curation, and Reproducible Results. In Human Capital and Assets in the Networked World. https://doi.org/doi:10.1108/978--1--78714--827--720171003Google ScholarGoogle Scholar
  54. Ann Zimmerman. 2007. Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse. International Journal on Digital Libraries, Vol. 7, 1--2 (2007), 5--16.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. 'Yes, I comply!': Motivations and Practices around Research Data Management and Reuse across Scientific Fields

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Human-Computer Interaction
        Proceedings of the ACM on Human-Computer Interaction  Volume 4, Issue CSCW2
        CSCW
        October 2020
        2310 pages
        EISSN:2573-0142
        DOI:10.1145/3430143
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 October 2020
        Published in pacmhci Volume 4, Issue CSCW2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader