research-article

Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems

Authors:
Jaimie Drozdal

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute
View Profile

,
Justin Weisz

IBM Research

IBM Research
View Profile

,
Dakuo Wang

IBM Research

IBM Research
View Profile

,
Gaurav Dass

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute
View Profile

,
Bingsheng Yao

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute
View Profile

,
Changruo Zhao

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute
View Profile

,
Michael Muller

IBM Research

IBM Research
View Profile

,
Lin Ju

IBM

IBM
View Profile

,
Hui Su

IBM Research

IBM Research
View Profile

IUI '20: Proceedings of the 25th International Conference on Intelligent User InterfacesMarch 2020Pages 297–307https://doi.org/10.1145/3377325.3377501

Published:17 March 2020Publication History

IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces

Pages 297–307

ABSTRACT

We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists' trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies - qualitative interviews, a controlled experiment, and a card-sorting task - to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.

References

Saleema Amershi, BongshinLee, Ashish Kapoor, Ratul Mahajan, and Blaine Christian. 2011. Human-guided machine learning for fast and accurate network alarm triage. In Twenty-Second International Joint Conference on Artificial Intelligence.Google Scholar
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 3.Google ScholarDigital Library
Vijay Arya, Rachel KE Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, et al. 2019. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv preprint arXiv:1909.03012 (2019).Google Scholar
auto sklearn. [n. d.]. auto-sklearn. Retrieved 06-Oct-2019 from https://automl.github.io/auto-sklearn/master/Google Scholar
Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. 2019. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 8, 8 (2019), 832.Google Scholar
Virginia Dignum. 2017. Responsible autonomy. arXiv preprint arXiv:1706.02513 (2017).Google Scholar
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Jen DuBois. 2019. Is There a Data Scientist Shortage in 2019? Retrieved 06-Oct-19 from https://blog.quanthub.com/is-there-a-data-scientist-shortage-in-2019Google Scholar
EpistasisLab. [n. d.]. tpot. Retrieved 06-Oct-2019 from https://github.com/EpistasisLab/tpotGoogle Scholar
Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2952--2963.Google ScholarDigital Library
Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D'Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards human-guided machine learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 614--624.Google ScholarDigital Library
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1487--1495.Google ScholarDigital Library
Google. [n. d.]. Cloud AutoML. Retrieved 06-Oct-2019 fromhttps://cloud.google.com/automl/Google Scholar
Philip J Guo, Sean Kandel, Joseph M Heller stein, and Jeffrey Heer. 2011. Proactive wrangling: mixed-initiative end-user programming of data transformation scripts. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 65--74.Google ScholarDigital Library
H2O. [n. d.]. H2O. Retrieved 06-Oct-2019 from https://h2o.aiGoogle Scholar
Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic bias: From discrimination discovery to fairness-aware data mining. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2125--2126.Google ScholarDigital Library
Peter A Hancock, Deborah R Billings, Kristin E Schaefer, Jessie YC Chen, Ewart J De Visser, and Raja Parasuraman. 2011. A meta-analysis of factors affecting trust in human-robot interaction. Human factors 53, 5 (2011), 517--527.Google Scholar
R. R. Hoffman, M.Johnson, J. M. Bradshaw, and A. Underbrink. 2013. Trust in Automation. IEEE Intelligent Systems 28, 1 (Jan 2013), 84--88. Google ScholarDigital Library
Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608 (2018).Google Scholar
Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: collaborative analytics and broker roles in civic data hackathons. Proceedings of the ACM on Human-Computer Interaction 1, CSCW(2017), 53.Google ScholarDigital Library
IBM. [n.d.]. AutoAI. Retrieved 06-Oct-2019 from https://www.ibm.com/cloud/watson-studio/autoaiGoogle Scholar
Project Jupyter. [n. d.]. Jupyter Notebook. Retrieved 3-April-2019 from https://jupyter.orgGoogle Scholar
Kaggle. [n. d.]. Kaggle: Your Home for Data Science. Retrieved 3-April-2019 from https://www.kaggle.comGoogle Scholar
Kaggle. [n. d.]. Titanic: Machine Learning from Disaster. Retrieved 05-Jul-2019 from https://kaggle.com/c/titanic/dataGoogle Scholar
Kaggle. 2018. Kaggle Data Science Survey 2018. Retrieved 17-September-2019 from https://www.kaggle.com/sudhirnl7/data-science-survey-2018/Google Scholar
James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 1--10.Google ScholarCross Ref
Mary Beth Kery, Bonnie E John, Patrick O'Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards Effective Foraging by Data Scientists to Find Past Analysis Choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 92.Google ScholarDigital Library
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 174.Google ScholarDigital Library
Udayan Khurana, Horst Samulowitz, and Deepak Turaga. 2018. Feature engineering for predictive modeling using reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. 2016. Cognito: Automated feature engineering for supervised learning. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, 1304--1307.Google ScholarCross Ref
Georgia Kougka, Anastasios Gounaris, and Alkis Simitsis. 2018. The many faces of data-centric workflow optimization: a survey. International Journal of Data Science and Analytics 6, 2 (2018), 81--107.Google ScholarCross Ref
Sean Kross and Philip J Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 263.Google ScholarDigital Library
Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, and Oznur Alkan. 2017. One button machine for automating feature engineering in relational databases. arXiv preprint arXiv:1706.00327 (2017).Google Scholar
Doris Jung-Lin Lee, Stephen Macke, Doris Xin, Angela Lee, Silu Huang, and Aditya Parameswaran. 2019. A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead. Data Engineering (2019), 58.Google Scholar
Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM.Google ScholarDigital Library
Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, and Alexander Gray. 2019. An ADMM Based Framework for AutoML Pipeline Configuration. arXiv:cs.LG/1905.00424Google Scholar
Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In 11th australasian conference on information systems, Vol. 53. Citeseer, 6--8.Google Scholar
Susan Malaika and Dakuo Wang. 2019. AutoAI: Humans and machines better together. https://developer.ibm.com/articles/autoai-humans-and-machines-better-together/Google Scholar
Yaoli Mao, Dakuo Wang, Michael Muller, Kush Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilovic. 2020. How Data Scientists Work Together With Domain Experts in Scientific Collaborations. In Proceedings of the 2020 ACM conference on GROUP. ACM.Google Scholar
Joseph E Mercado, Michael A Rupp, Jessie YC Chen, Michael J Barnes, Daniel Barber, and Katelyn Procci. 2016. Intelligent agent transparency in human-agent teaming for Multi-UxV management. Human factors 58, 3 (2016), 401--415.Google Scholar
Stephanie M Merritt. 2011. Affective processes in human-automation interactions. Human Factors 53, 4 (2011), 356--370.Google ScholarCross Ref
Microsoft. [n. d.]. Azure Machine Learning Studio. Retrieved 06-Oct-2019 from https://azure.microsoft.com/en-us/services/machine-learning-studio/Google Scholar
Jeremy Miles and Mark Shevlin. 2001. Applying regression and correlation: A guide for students and researchers. Sage.Google Scholar
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 126.Google ScholarDigital Library
Judith S Olson, Dakuo Wang, Gary M Olson, and Jingwen Zhang. 2017. How people write together now: Beginning the investigation with advanced undergraduates in a project course. ACM Transactions on Computer-Human Interaction (TOCHI) 24, 1 (2017), 4.Google ScholarDigital Library
Randal S Olson and Jason H Moore. 2016. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Workshop on Automatic Machine Learning. 66--74.Google Scholar
Heungseok Park, Jinwoong Kim, Minkyu Kim, Ji-Hoon Kim, Jaegul Choo, Jung-Woo Ha, and Nako Sung. 2019. VisualHyperTuner: Visual analytics for user-driven hyperparameter tuning of deep neural networks. In Demo at SysML Conference.Google Scholar
Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 136.Google ScholarDigital Library
Data Robot. [n. d.]. Data Robot: Automated Machine Learning for Predictive Modeling. Retrieved 06-Oct-2019 from https://datarobot.comGoogle Scholar
Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 32.Google ScholarDigital Library
Keng Siau and Weiyu Wang. 2018. Building trust in artificial intelligence, machine learning, and robotics. Cutter Business Technology Journal 31, 2 (2018), 47--53.Google Scholar
Donna Spencer. 2009. Card sorting: Designing usable categories. Rosenfeld Media.Google Scholar
Donna Spencer and Todd Warfel. 2004. Card sorting: a definitive guide. Boxes and Arrows 2 (2004).Google Scholar
Mohsen Tavakol and Reg Dennick. 2011. Making sense of Cronbach's alpha. International journal of medical education 2 (2011), 53.Google ScholarCross Ref
Bruce Thompson. 2004. Exploratory and confirmatory factor analysis: Understanding concepts and applications. American Psychological Association.Google Scholar
Pedro Uria-Recio. 2018. Can Artificial Intelligence replace Data Scientists? Retrieved 06-Oct-19 from https://towardsdatascience.com/can-artificial-intelligence-replace-data-scientists-e4d4d828e31eGoogle Scholar
Wil MP van der Aalst, Martin Bichler, and Armin Heinzl. 2017. Responsible data science.Google Scholar
Fernanda B Viegas, Martin Wattenberg, Frank Van Ham, Jesse Kriss, and Matt McKeon. 2007. Manyeyes: a site for visualization at internet scale. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1121--1128.Google Scholar
April Yi Wang, Anant Mittal, Christopher Brooks, and Steve Oney. 2019. How Data Scientists Use Computational Notebooks for Real-Time Collaboration. (2019).Google Scholar
Dakuo Wang, Judith S. Olson, Jingwen Zhang, Trung Nguyen, and Gary M. Olson. 2015. DocuViz: Visualizing Collaborative Writing. In Proceedings of CHI'15. ACM, New York, NY, USA, 1865--1874.Google Scholar
Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. To appear in Computer Supported Cooperative Work (CSCW) (2019).Google ScholarDigital Library
Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. Atmseer: Increasing transparency and controllability in automated machine learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 681.Google ScholarDigital Library
Daniel Karl I. Weidele. 2019. Conditional Parallel Coordinates. IEEE transactions on visualization and computer graphics 26, 1 (2019).Google ScholarCross Ref
Katharina Weitz, Dominik Schiller, Ruben Schlagowski, Tobias Huber, and Elisabeth André. 2019. "Do You Trust Me?": Increasing User-Trust by Integrating Virtual Agents in Explainable AI Interaction Design. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (IVA '19). ACM, New York, NY, USA, 7--9. Google ScholarDigital Library
Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Article 279, 12 pages. Google ScholarDigital Library
Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM.Google ScholarDigital Library
Ruijing Zhao, Izak Benbasat, and Hasan Cavusoglu. 2019. Do users always want to know more? Investigating the relationship between system transparency and users' trust in advice-giving systems. In Proceedings of ECIS 2019.Google Scholar
Marc-André Zöller and Marco F Huber. 2019. Survey on Automated Machine Learning. arXiv preprint arXiv:1904.12054 (2019).Google Scholar

Index Terms

Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI
    2. HCI design and evaluation methods
      1. User studies

Recommendations

AutoDS: Towards Human-Centered Automation of Data Science
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Data science (DS) projects often follow a lifecycle that consists of laborious tasks for data scientists and domain experts (e.g., data exploration, model training, etc.). Only till recently, machine learning(ML) researchers have developed promising ...
Read More
A General Recipe for Automated Machine Learning in Practice
Advances in Artificial Intelligence – IBERAMIA 2022
Abstract
Automated Machine Learning (AutoML) is an area of research that focuses on developing methods to generate machine learning models automatically. The idea of being able to build machine learning models with very little human intervention represents ...
Read More
A Survey on Automated Machine Learning: Problems, Methods and Frameworks
Human-Computer Interaction. Theoretical Approaches and Design Methods
Abstract
Automated Machine Learning (AutoML) is a research field that automates machine learning processes and optimizes their costs. As machine learning begins to be widely used, many users in industry and academia are paying attention to AutoML. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces
March 2020
607 pages
ISBN:9781450371186
DOI:10.1145/3377325
General Chairs:
Fabio Paternò,
Nuria Oliver,
Program Chairs:
Cristina Conati,
Lucio Davide Spano,
Nava Tintarev
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 March 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AutoAI
AutoDS
AutoML
automated artificial intelligence
automated data science
automated machine learning
trust
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate746of2,811submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 60
  Total Citations
  View Citations
- 1,623
  Total Downloads
- Downloads (Last 12 months)262
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems

IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

AutoDS: Towards Human-Centered Automation of Data Science

A General Recipe for Automated Machine Learning in Practice

A Survey on Automated Machine Learning: Problems, Methods and Frameworks