Abstract
As science becomes increasingly data-intensive, the requirements for comprehensive Research Data Management (RDM) grow. This often overwhelms scientists, requiring more workload and training. The failure to conduct effective RDM leads to producing research artefacts that cannot be reproduced or reused. Past research placed high value on supporting data science workers, but focused mainly on data production, collection, processing, and sensemaking. In order to understand practices and needs of data science workers in relation to documentation, preservation, sharing, and reuse, we conducted a cross-domain study with 15 scientists and data managers from diverse scientific domains. We identified five core concepts which describe requirements, drivers, and boundaries in the development of commitment for RDM, essential for generating reproducible research artefacts: Practice, Adoption, Barriers, Education, and Impact. Based on those concepts, we introduce a stage-based model of personal RDM commitment evolution. The model can be used to drive the design of future systems that support a transition to open science. We discuss infrastructure, policies, and motivations involved at the stages and transitions in the model. Our work supports designers in understanding the constraints and challenges involved in designing for reproducibility in an age of data-driven science.
Supplemental Material
Available for Download
Contains several resources that foster the reproducibility of our work.
- ACM. 2018. Artifact Review and Badging. Website. (April 2018). https://www.acm.org/publications/policies/artifact-review-badging Retrieved January 15, 2020.Google Scholar
- Katherine G Akers and Jennifer Doty. 2013. Disciplinary differences in faculty research data management practices and perspectives. (2013).Google Scholar
- Monya Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature, Vol. 533, 7604 (2016), 452--454. https://doi.org/10.1038/533452aGoogle ScholarCross Ref
- Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, and Carole Goble. 2013. Why linked data is not enough for scientists. Future Generation Computer Systems, Vol. 29, 2 (2013), 599--611. https://doi.org/10.1016/j.future.2011.08.004Google ScholarDigital Library
- Khalid Belhajjame, Jun Zhao, Daniel Garijo, Kristina Hettne, Raul Palma, Ó scar Corcho, José -Manuel Gó mez-Pé rez, Sean Bechhofer, Graham Klyne, and Carole Goble. 2014. The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web. arXiv preprint arXiv: 1401.4307 February 2014 (2014), 20. arxiv: 1401.4307 http://arxiv.org/abs/1401.4307Google Scholar
- Gordon Bell, Tony Hey, and Alex Szalay. 2009. Beyond the data deluge. Science, Vol. 323, 5919 (2009), 1297--1298.Google Scholar
- Tim Berners-Lee, Robert Cailliau, Jean-Francois Groff, and Bernd Pollermann. 1992. World-wide web: The information universe ., bibinfonumpages52--58 pages. http://links.emeraldinsight.com/doi/10.1108/eb047254Google Scholar
- Jeremy P Birnholtz and Matthew J Bietz. 2003. Data at work: supporting sharing in science and engineering. In Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work. ACM, 339--348.Google ScholarDigital Library
- Carolyn Bishoff and Lisa Johnston. 2015. Approaches to Data Sharing: An Analysis of NSF Data Management Plans from a Large Research University. Journal of Librarianship & Scholarly Communication, Vol. 3, 2 (2015).Google ScholarCross Ref
- Ann Blandford, Dominic Furniss, and Stephann Makri. 2016. Qualitative HCI Research: Going Behind the Scenes .Morgan & Claypool Publishers, 51--60. https://doi.org/10.2200/S00706ED1V01Y201602HCI034Google ScholarCross Ref
- Shannon Bohle. 2013. What is e-science and how should it be managed. Nature, Spektrum der Wissenschaft (Scientific American), http://www.scilogs.com/scientific_and_medicallib raries/what-is-e-science-and-how-should-it-be-managed (2013).Google Scholar
- Ronald F Boisvert. 2016. Incentivizing reproducibility. Commun. ACM, Vol. 59, 10 (2016), 5--5.Google ScholarDigital Library
- Christine L Borgman. 2007. Scholarship in the digital age: information, infrastructure, and the internet. MIT Press, Cambridge, MA.Google ScholarDigital Library
- Cunera M Buys and Pamela L Shaw. 2015. Data Management Practices Across an Institution: Survey and Report. Journal of Librarianship & Scholarly Communication, Vol. 3, 2 (2015).Google ScholarCross Ref
- Xiaoli Chen, Sünje Dallmeier-Tiessen, Anxhela Dani, Robin Dasler, Javier Delgado Fernández, Pamfilos Fokianos, Patricia Herterich, and Tibor vS imko. 2016. CERN Analysis Preservation: A Novel Digital Library Service to Enable Reusable and Reproducible Research. In International Conference on Theory and Practice of Digital Libraries. Springer, 347--356.Google Scholar
- Xiaoli Chen, Sünje Dallmeier-Tiessen, Robin Dasler, Sebastian Feger, Pamfilos Fokianos, Jose Benito Gonzalez, Harri Hirvonsalo, Dinos Kousidis, Artemis Lavasa, Salvatore Mele, et almbox. 2019. Open is not enough. Nature Physics, Vol. 15, 2 (2019), 113--119.Google ScholarCross Ref
- Victoria Clarke and Virginia Braun. 2014. Thematic Analysis .Springer New York, New York, NY, 1947--1952. https://doi.org/10.1007/978--1--4614--5583--7_311Google Scholar
- Open Science Collaboration. 2012. An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science. Perspectives on Psychological Science, Vol. 7, 6 (2012), 657--660. nulltextgreater https://doi.org/10.1177/1745691612462588Google ScholarCross Ref
- A De Waard, H Cousijn, and IjJ Aalbersberg. 2015. 10 aspects of highly effective research data: Good research data management makes data reusable. (11 December 2015). https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-dataGoogle Scholar
- Edward L Deci and Richard M Ryan. 1985. Toward an organismic integration theory. In Intrinsic motivation and self-determination in human behavior. Springer, 113--148.Google Scholar
- Vasant Dhar. 2012. Data science and prediction. (2012).Google Scholar
- Florian Echtler and Maximilian H"aussler. 2018. Open Source, Open Science, and the Replication Crisis in HCI (CHI EA '18). ACM, New York, NY, USA, Article alt02, bibinfonumpages8 pages. https://doi.org/10.1145/3170427.3188395Google Scholar
- Ixchel M Faniel and Trond E Jacobsen. 2010. Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues' data. Computer Supported Cooperative Work (CSCW), Vol. 19, 3--4 (2010), 355--375.Google ScholarDigital Library
- Benedikt Fecher, Sascha Friesike, Marcel Hebing, and Stephanie Linek. 2017. A reputation economy: how individual reward considerations trump systemic arguments for open access to data. Palgrave Communications, Vol. 3 (2017), 17051.Google ScholarCross Ref
- Sebastian Feger, Sünje Dallmeier-Tiessen, Pawel Wozniak, and Albrecht Schmidt. 2018. Just Not The Usual Workplace: Meaningful Gamification in Science. In Mensch und Computer 2018 - Workshopband, Raimund Dachselt and Gerhard Weber (Eds.). Gesellschaft für Informatik e.V., Bonn. https://doi.org/10.18420/muc2018-ws03-0366Google Scholar
- Sebastian S. Feger, Sünje Dallmeier-Tiessen, Albrecht Schmidt, and Paweł W. Wo'zniak. 2019 a. Designing for Reproducibility: A Qualitative Study of Challenges and Opportunities in High Energy Physics. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI'19 (2019). https://doi.org/10.1145/3290605.3300685Google Scholar
- Sebastian S. Feger, Sünje Dallmeier-Tiessen, Paweł W. Wo'zniak, and Albrecht Schmidt. 2019 b. Gamification in Science: A Study of Requirements in the Context of Reproducible Research. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI'19 (2019). https://doi.org/10.1145/3290605.3300690Google ScholarDigital Library
- Sebastian S. Feger, Sünje Dallmeier-Tiessen, Paweł W. Wo'zniak, and Albrecht Schmidt. 2019 c. The Role of HCI in Reproducible Science: Understanding, Supporting and Motivating Core Practices (CHI EA '19). ACM, New York, NY, USA, Article LBW0246, bibinfonumpages6 pages. https://doi.org/10.1145/3290607.3312905Google Scholar
- Center for Open Science (COS). 2019. Open Science Badges. Website. (2019). https://cos.io/our-services/open-science-badges Retrieved February 27, 2019.Google Scholar
- FORCE11. 2014. The FAIR data principles. Website. Retrieved August 8, 2017 from https://www.force11.org/group/fairgroup/fairprinciples.Google Scholar
- Frédéric Guay, Robert J Vallerand, and Céline Blanchard. 2000. On the assessment of situational intrinsic and extrinsic motivation: The Situational Motivation Scale (SIMS). Motivation and emotion, Vol. 24, 3 (2000), 175--213.Google Scholar
- James Howison and James D Herbsleb. 2011. Scientific software production: incentives and collaboration. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 513--522.Google ScholarDigital Library
- James Howison and James D. Herbsleb. 2013. Incentives and Integration in Scientific Software Production (CSCW '13). Association for Computing Machinery, New York, NY, USA, 459--470. https://doi.org/10.1145/2441776.2441828Google Scholar
- Matthew B Hoy. 2014. Big data: An introduction for librarians. Medical reference services quarterly, Vol. 33, 3 (2014), 320--326. https://doi.org/10.1080/02763869.2014.925709Google Scholar
- Xing Huang, Xianghua Ding, Charlotte P Lee, Tun Lu, and Ning Gu. 2013. Meanings and boundaries of scientific software sharing. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 423--434.Google ScholarDigital Library
- Matthew Hutson. 2018. Artificial intelligence faces reproducibility crisis.Google Scholar
- Lori M Jahnke and Andrew Asher. 2012. The problem of data: Data management and curation practices among university researchers. L. Jahnke, A. Asher & SDC Keralis, The problem of data (2012), 3--31.Google Scholar
- Marina Jirotka, Charlotte P Lee, and Gary M Olson. 2013. Supporting scientific collaboration: Methods, tools and concepts. Computer Supported Cooperative Work (CSCW), Vol. 22, 4--6 (2013), 667--715.Google ScholarCross Ref
- Marina Jirotka, Rob Procter, Tom Rodden, and Geoffrey C. Bowker. 2006. Special Issue: Collaboration in e-Research. Computer Supported Cooperative Work (CSCW), Vol. 15, 4 (01 Aug 2006), 251--255. https://doi.org/10.1007/s10606-006--9028-xGoogle Scholar
- Helena Karasti, Karen S. Baker, and Eija Halkola. 2006. Enriching the notion of data curation in e-Science: Data managing and information infrastructuring in the Long Term Ecological Research (LTER) network. Computer Supported Cooperative Work, Vol. 15, 4 (2006), 321--358. https://doi.org/10.1007/s10606-006--9023--2Google ScholarDigital Library
- Karina Kervin, Thomas Finholt, and Margaret Hedstrom. 2012. Macro and micro pressures in data sharing. In 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI). IEEE, 525--532.Google ScholarCross Ref
- Mallory C Kidwell, Ljiljana B Lazarević, Erica Baranski, Tom E Hardwicke, Sarah Piechowski, Lina-Sophia Falkenberg, Curtis Kennett, Agnieszka Slowik, Carina Sonnleitner, Chelsey Hess-Holden, et almbox. 2016. Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS biology, Vol. 14, 5 (2016), e1002456.Google Scholar
- Wendy E Mackay, Caroline Appert, Michel Beaudouin-Lafon, Olivier Chapuis, Yangzhou Du, Jean-Daniel Fekete, and Yves Guiard. 2007. Touchstone: exploratory design of experiments. CHI '07 Proceedings of the SIGCHI Conference on Human Factors in Computing System (2007), 1425--1434. https://doi.org/10.1145/1240624.1240840Google ScholarDigital Library
- Matthew S Mayernik, Jillian C Wallis, Alberto Pepe, and Christine L Borgman. 2008. Whose data do you trust? Integrity issues in the preservation of scientific data. (2008).Google Scholar
- Michael Muller. 2014. Curiosity, creativity, and surprise as analytic tools: Grounded theory method. In Ways of Knowing in HCI. Springer, 25--48.Google Scholar
- Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation (CHI '19). ACM, New York, NY, USA, Article 126, bibinfonumpages15 pages. https://doi.org/10.1145/3290605.3300356Google ScholarDigital Library
- Brian A Nosek, Jeffrey R Spies, and Matt Motyl. 2012. Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, Vol. 7, 6 (2012), 615--631.Google ScholarCross Ref
- Drew Paine, Erin Sy, Ron Piell, and Charlotte P Lee. 2015. Examining data processing work as part of the scientific data lifecycle: Comparing practices across four scientific research groups. iConference 2015 Proceedings (2015).Google Scholar
- Irene V. Pasquetto, Ashley E. Sands, Peter T. Darch, and Christine L. Borgman. 2016. Open Data in Scientific Settings: From Policy to Practice (CHI '16). ACM, New York, NY, USA, 1585--1596. https://doi.org/10.1145/2858036.2858543Google Scholar
- Jian Qin. 2016. Metadata and reproducibility: A case study of gravitational wave data management. International Journal of Digital Curation, Vol. 11, 1 (2016), 218--231.Google ScholarCross Ref
- Betsy Rolland and Charlotte P Lee. 2013. Beyond trust and reliability: reusing data in collaborative cancer epidemiology research. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 435--444.Google ScholarDigital Library
- Michael Rosenblatt. 2016. An incentive-based approach for improving data reproducibility. Science Translational Medicine, Vol. 8, 336 (2016), 336ed5--336ed5. https://doi.org/10.1126/scitranslmed.aaf5003 https://doi.org/10.1038/sdata.2016.18Google ScholarCross Ref
- Daniel J Worden. 2017. Emerging Technologies for Data Research: Implications for Bias, Curation, and Reproducible Results. In Human Capital and Assets in the Networked World. https://doi.org/doi:10.1108/978--1--78714--827--720171003Google Scholar
- Ann Zimmerman. 2007. Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse. International Journal on Digital Libraries, Vol. 7, 1--2 (2007), 5--16.Google ScholarCross Ref
Index Terms
- 'Yes, I comply!': Motivations and Practices around Research Data Management and Reuse across Scientific Fields
Recommendations
Research Data Management Commitment Drivers: An Analysis of Practices, Training, Policies, Infrastructure, and Motivation in Global Agricultural Science
CSCWScientists largely acknowledge the value of research data management (RDM) to enable reproducibility and reuse. But, RDM practices are not sufficiently rewarded within the traditional academic reputation economy. Recent work showed that emerging RDM ...
Three Gaps in Opening Science
AbstractThe Open Science (OS) agenda has potentially massive cultural, organizational and infrastructural consequences. Ambitions for OS-driven policies have proliferated, within which researchers are expected to publish their scientific data. Significant ...
Engagement Through Praxis in Educational Game Design: Common Threads
Background. Engagement in praxis, the process of acting and reflecting in a cyclical fashion, fosters the development of critical thinking and problem-solving skills among educational game designers. Resulting from this process, designers can apply ...
Comments