skip to main content
10.1145/3377325.3377501acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems

Published:17 March 2020Publication History

ABSTRACT

We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists' trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies - qualitative interviews, a controlled experiment, and a card-sorting task - to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.

References

  1. Saleema Amershi, BongshinLee, Ashish Kapoor, Ratul Mahajan, and Blaine Christian. 2011. Human-guided machine learning for fast and accurate network alarm triage. In Twenty-Second International Joint Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  2. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Vijay Arya, Rachel KE Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, et al. 2019. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv preprint arXiv:1909.03012 (2019).Google ScholarGoogle Scholar
  4. auto sklearn. [n. d.]. auto-sklearn. Retrieved 06-Oct-2019 from https://automl.github.io/auto-sklearn/master/Google ScholarGoogle Scholar
  5. Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. 2019. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 8, 8 (2019), 832.Google ScholarGoogle Scholar
  6. Virginia Dignum. 2017. Responsible autonomy. arXiv preprint arXiv:1706.02513 (2017).Google ScholarGoogle Scholar
  7. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  8. Jen DuBois. 2019. Is There a Data Scientist Shortage in 2019? Retrieved 06-Oct-19 from https://blog.quanthub.com/is-there-a-data-scientist-shortage-in-2019Google ScholarGoogle Scholar
  9. EpistasisLab. [n. d.]. tpot. Retrieved 06-Oct-2019 from https://github.com/EpistasisLab/tpotGoogle ScholarGoogle Scholar
  10. Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2952--2963.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D'Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards human-guided machine learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 614--624.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1487--1495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Google. [n. d.]. Cloud AutoML. Retrieved 06-Oct-2019 fromhttps://cloud.google.com/automl/Google ScholarGoogle Scholar
  14. Philip J Guo, Sean Kandel, Joseph M Heller stein, and Jeffrey Heer. 2011. Proactive wrangling: mixed-initiative end-user programming of data transformation scripts. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 65--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H2O. [n. d.]. H2O. Retrieved 06-Oct-2019 from https://h2o.aiGoogle ScholarGoogle Scholar
  16. Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic bias: From discrimination discovery to fairness-aware data mining. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2125--2126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Peter A Hancock, Deborah R Billings, Kristin E Schaefer, Jessie YC Chen, Ewart J De Visser, and Raja Parasuraman. 2011. A meta-analysis of factors affecting trust in human-robot interaction. Human factors 53, 5 (2011), 517--527.Google ScholarGoogle Scholar
  18. R. R. Hoffman, M.Johnson, J. M. Bradshaw, and A. Underbrink. 2013. Trust in Automation. IEEE Intelligent Systems 28, 1 (Jan 2013), 84--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608 (2018).Google ScholarGoogle Scholar
  20. Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: collaborative analytics and broker roles in civic data hackathons. Proceedings of the ACM on Human-Computer Interaction 1, CSCW(2017), 53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IBM. [n.d.]. AutoAI. Retrieved 06-Oct-2019 from https://www.ibm.com/cloud/watson-studio/autoaiGoogle ScholarGoogle Scholar
  22. Project Jupyter. [n. d.]. Jupyter Notebook. Retrieved 3-April-2019 from https://jupyter.orgGoogle ScholarGoogle Scholar
  23. Kaggle. [n. d.]. Kaggle: Your Home for Data Science. Retrieved 3-April-2019 from https://www.kaggle.comGoogle ScholarGoogle Scholar
  24. Kaggle. [n. d.]. Titanic: Machine Learning from Disaster. Retrieved 05-Jul-2019 from https://kaggle.com/c/titanic/dataGoogle ScholarGoogle Scholar
  25. Kaggle. 2018. Kaggle Data Science Survey 2018. Retrieved 17-September-2019 from https://www.kaggle.com/sudhirnl7/data-science-survey-2018/Google ScholarGoogle Scholar
  26. James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mary Beth Kery, Bonnie E John, Patrick O'Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards Effective Foraging by Data Scientists to Find Past Analysis Choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Udayan Khurana, Horst Samulowitz, and Deepak Turaga. 2018. Feature engineering for predictive modeling using reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  30. Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. 2016. Cognito: Automated feature engineering for supervised learning. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, 1304--1307.Google ScholarGoogle ScholarCross RefCross Ref
  31. Georgia Kougka, Anastasios Gounaris, and Alkis Simitsis. 2018. The many faces of data-centric workflow optimization: a survey. International Journal of Data Science and Analytics 6, 2 (2018), 81--107.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sean Kross and Philip J Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, and Oznur Alkan. 2017. One button machine for automating feature engineering in relational databases. arXiv preprint arXiv:1706.00327 (2017).Google ScholarGoogle Scholar
  34. Doris Jung-Lin Lee, Stephen Macke, Doris Xin, Angela Lee, Silu Huang, and Aditya Parameswaran. 2019. A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead. Data Engineering (2019), 58.Google ScholarGoogle Scholar
  35. Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, and Alexander Gray. 2019. An ADMM Based Framework for AutoML Pipeline Configuration. arXiv:cs.LG/1905.00424Google ScholarGoogle Scholar
  37. Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In 11th australasian conference on information systems, Vol. 53. Citeseer, 6--8.Google ScholarGoogle Scholar
  38. Susan Malaika and Dakuo Wang. 2019. AutoAI: Humans and machines better together. https://developer.ibm.com/articles/autoai-humans-and-machines-better-together/Google ScholarGoogle Scholar
  39. Yaoli Mao, Dakuo Wang, Michael Muller, Kush Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilovic. 2020. How Data Scientists Work Together With Domain Experts in Scientific Collaborations. In Proceedings of the 2020 ACM conference on GROUP. ACM.Google ScholarGoogle Scholar
  40. Joseph E Mercado, Michael A Rupp, Jessie YC Chen, Michael J Barnes, Daniel Barber, and Katelyn Procci. 2016. Intelligent agent transparency in human-agent teaming for Multi-UxV management. Human factors 58, 3 (2016), 401--415.Google ScholarGoogle Scholar
  41. Stephanie M Merritt. 2011. Affective processes in human-automation interactions. Human Factors 53, 4 (2011), 356--370.Google ScholarGoogle ScholarCross RefCross Ref
  42. Microsoft. [n. d.]. Azure Machine Learning Studio. Retrieved 06-Oct-2019 from https://azure.microsoft.com/en-us/services/machine-learning-studio/Google ScholarGoogle Scholar
  43. Jeremy Miles and Mark Shevlin. 2001. Applying regression and correlation: A guide for students and researchers. Sage.Google ScholarGoogle Scholar
  44. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Judith S Olson, Dakuo Wang, Gary M Olson, and Jingwen Zhang. 2017. How people write together now: Beginning the investigation with advanced undergraduates in a project course. ACM Transactions on Computer-Human Interaction (TOCHI) 24, 1 (2017), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Randal S Olson and Jason H Moore. 2016. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Workshop on Automatic Machine Learning. 66--74.Google ScholarGoogle Scholar
  47. Heungseok Park, Jinwoong Kim, Minkyu Kim, Ji-Hoon Kim, Jaegul Choo, Jung-Woo Ha, and Nako Sung. 2019. VisualHyperTuner: Visual analytics for user-driven hyperparameter tuning of deep neural networks. In Demo at SysML Conference.Google ScholarGoogle Scholar
  48. Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Data Robot. [n. d.]. Data Robot: Automated Machine Learning for Predictive Modeling. Retrieved 06-Oct-2019 from https://datarobot.comGoogle ScholarGoogle Scholar
  50. Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Keng Siau and Weiyu Wang. 2018. Building trust in artificial intelligence, machine learning, and robotics. Cutter Business Technology Journal 31, 2 (2018), 47--53.Google ScholarGoogle Scholar
  52. Donna Spencer. 2009. Card sorting: Designing usable categories. Rosenfeld Media.Google ScholarGoogle Scholar
  53. Donna Spencer and Todd Warfel. 2004. Card sorting: a definitive guide. Boxes and Arrows 2 (2004).Google ScholarGoogle Scholar
  54. Mohsen Tavakol and Reg Dennick. 2011. Making sense of Cronbach's alpha. International journal of medical education 2 (2011), 53.Google ScholarGoogle ScholarCross RefCross Ref
  55. Bruce Thompson. 2004. Exploratory and confirmatory factor analysis: Understanding concepts and applications. American Psychological Association.Google ScholarGoogle Scholar
  56. Pedro Uria-Recio. 2018. Can Artificial Intelligence replace Data Scientists? Retrieved 06-Oct-19 from https://towardsdatascience.com/can-artificial-intelligence-replace-data-scientists-e4d4d828e31eGoogle ScholarGoogle Scholar
  57. Wil MP van der Aalst, Martin Bichler, and Armin Heinzl. 2017. Responsible data science.Google ScholarGoogle Scholar
  58. Fernanda B Viegas, Martin Wattenberg, Frank Van Ham, Jesse Kriss, and Matt McKeon. 2007. Manyeyes: a site for visualization at internet scale. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1121--1128.Google ScholarGoogle Scholar
  59. April Yi Wang, Anant Mittal, Christopher Brooks, and Steve Oney. 2019. How Data Scientists Use Computational Notebooks for Real-Time Collaboration. (2019).Google ScholarGoogle Scholar
  60. Dakuo Wang, Judith S. Olson, Jingwen Zhang, Trung Nguyen, and Gary M. Olson. 2015. DocuViz: Visualizing Collaborative Writing. In Proceedings of CHI'15. ACM, New York, NY, USA, 1865--1874.Google ScholarGoogle Scholar
  61. Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. To appear in Computer Supported Cooperative Work (CSCW) (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. Atmseer: Increasing transparency and controllability in automated machine learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Daniel Karl I. Weidele. 2019. Conditional Parallel Coordinates. IEEE transactions on visualization and computer graphics 26, 1 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  64. Katharina Weitz, Dominik Schiller, Ruben Schlagowski, Tobias Huber, and Elisabeth André. 2019. "Do You Trust Me?": Increasing User-Trust by Integrating Virtual Agents in Explainable AI Interaction Design. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (IVA '19). ACM, New York, NY, USA, 7--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Article 279, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Ruijing Zhao, Izak Benbasat, and Hasan Cavusoglu. 2019. Do users always want to know more? Investigating the relationship between system transparency and users' trust in advice-giving systems. In Proceedings of ECIS 2019.Google ScholarGoogle Scholar
  68. Marc-André Zöller and Marco F Huber. 2019. Survey on Automated Machine Learning. arXiv preprint arXiv:1904.12054 (2019).Google ScholarGoogle Scholar

Index Terms

  1. Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces
            March 2020
            607 pages
            ISBN:9781450371186
            DOI:10.1145/3377325

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 March 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate746of2,811submissions,27%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader