ABSTRACT
We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists' trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies - qualitative interviews, a controlled experiment, and a card-sorting task - to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.
- Saleema Amershi, BongshinLee, Ashish Kapoor, Ratul Mahajan, and Blaine Christian. 2011. Human-guided machine learning for fast and accurate network alarm triage. In Twenty-Second International Joint Conference on Artificial Intelligence.Google Scholar
- Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 3.Google ScholarDigital Library
- Vijay Arya, Rachel KE Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, et al. 2019. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv preprint arXiv:1909.03012 (2019).Google Scholar
- auto sklearn. [n. d.]. auto-sklearn. Retrieved 06-Oct-2019 from https://automl.github.io/auto-sklearn/master/Google Scholar
- Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. 2019. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 8, 8 (2019), 832.Google Scholar
- Virginia Dignum. 2017. Responsible autonomy. arXiv preprint arXiv:1706.02513 (2017).Google Scholar
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Jen DuBois. 2019. Is There a Data Scientist Shortage in 2019? Retrieved 06-Oct-19 from https://blog.quanthub.com/is-there-a-data-scientist-shortage-in-2019Google Scholar
- EpistasisLab. [n. d.]. tpot. Retrieved 06-Oct-2019 from https://github.com/EpistasisLab/tpotGoogle Scholar
- Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2952--2963.Google ScholarDigital Library
- Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D'Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards human-guided machine learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 614--624.Google ScholarDigital Library
- Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1487--1495.Google ScholarDigital Library
- Google. [n. d.]. Cloud AutoML. Retrieved 06-Oct-2019 fromhttps://cloud.google.com/automl/Google Scholar
- Philip J Guo, Sean Kandel, Joseph M Heller stein, and Jeffrey Heer. 2011. Proactive wrangling: mixed-initiative end-user programming of data transformation scripts. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 65--74.Google ScholarDigital Library
- H2O. [n. d.]. H2O. Retrieved 06-Oct-2019 from https://h2o.aiGoogle Scholar
- Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic bias: From discrimination discovery to fairness-aware data mining. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2125--2126.Google ScholarDigital Library
- Peter A Hancock, Deborah R Billings, Kristin E Schaefer, Jessie YC Chen, Ewart J De Visser, and Raja Parasuraman. 2011. A meta-analysis of factors affecting trust in human-robot interaction. Human factors 53, 5 (2011), 517--527.Google Scholar
- R. R. Hoffman, M.Johnson, J. M. Bradshaw, and A. Underbrink. 2013. Trust in Automation. IEEE Intelligent Systems 28, 1 (Jan 2013), 84--88. Google ScholarDigital Library
- Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608 (2018).Google Scholar
- Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: collaborative analytics and broker roles in civic data hackathons. Proceedings of the ACM on Human-Computer Interaction 1, CSCW(2017), 53.Google ScholarDigital Library
- IBM. [n.d.]. AutoAI. Retrieved 06-Oct-2019 from https://www.ibm.com/cloud/watson-studio/autoaiGoogle Scholar
- Project Jupyter. [n. d.]. Jupyter Notebook. Retrieved 3-April-2019 from https://jupyter.orgGoogle Scholar
- Kaggle. [n. d.]. Kaggle: Your Home for Data Science. Retrieved 3-April-2019 from https://www.kaggle.comGoogle Scholar
- Kaggle. [n. d.]. Titanic: Machine Learning from Disaster. Retrieved 05-Jul-2019 from https://kaggle.com/c/titanic/dataGoogle Scholar
- Kaggle. 2018. Kaggle Data Science Survey 2018. Retrieved 17-September-2019 from https://www.kaggle.com/sudhirnl7/data-science-survey-2018/Google Scholar
- James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 1--10.Google ScholarCross Ref
- Mary Beth Kery, Bonnie E John, Patrick O'Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards Effective Foraging by Data Scientists to Find Past Analysis Choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 92.Google ScholarDigital Library
- Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 174.Google ScholarDigital Library
- Udayan Khurana, Horst Samulowitz, and Deepak Turaga. 2018. Feature engineering for predictive modeling using reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. 2016. Cognito: Automated feature engineering for supervised learning. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, 1304--1307.Google ScholarCross Ref
- Georgia Kougka, Anastasios Gounaris, and Alkis Simitsis. 2018. The many faces of data-centric workflow optimization: a survey. International Journal of Data Science and Analytics 6, 2 (2018), 81--107.Google ScholarCross Ref
- Sean Kross and Philip J Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 263.Google ScholarDigital Library
- Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, and Oznur Alkan. 2017. One button machine for automating feature engineering in relational databases. arXiv preprint arXiv:1706.00327 (2017).Google Scholar
- Doris Jung-Lin Lee, Stephen Macke, Doris Xin, Angela Lee, Silu Huang, and Aditya Parameswaran. 2019. A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead. Data Engineering (2019), 58.Google Scholar
- Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM.Google ScholarDigital Library
- Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, and Alexander Gray. 2019. An ADMM Based Framework for AutoML Pipeline Configuration. arXiv:cs.LG/1905.00424Google Scholar
- Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In 11th australasian conference on information systems, Vol. 53. Citeseer, 6--8.Google Scholar
- Susan Malaika and Dakuo Wang. 2019. AutoAI: Humans and machines better together. https://developer.ibm.com/articles/autoai-humans-and-machines-better-together/Google Scholar
- Yaoli Mao, Dakuo Wang, Michael Muller, Kush Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilovic. 2020. How Data Scientists Work Together With Domain Experts in Scientific Collaborations. In Proceedings of the 2020 ACM conference on GROUP. ACM.Google Scholar
- Joseph E Mercado, Michael A Rupp, Jessie YC Chen, Michael J Barnes, Daniel Barber, and Katelyn Procci. 2016. Intelligent agent transparency in human-agent teaming for Multi-UxV management. Human factors 58, 3 (2016), 401--415.Google Scholar
- Stephanie M Merritt. 2011. Affective processes in human-automation interactions. Human Factors 53, 4 (2011), 356--370.Google ScholarCross Ref
- Microsoft. [n. d.]. Azure Machine Learning Studio. Retrieved 06-Oct-2019 from https://azure.microsoft.com/en-us/services/machine-learning-studio/Google Scholar
- Jeremy Miles and Mark Shevlin. 2001. Applying regression and correlation: A guide for students and researchers. Sage.Google Scholar
- Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 126.Google ScholarDigital Library
- Judith S Olson, Dakuo Wang, Gary M Olson, and Jingwen Zhang. 2017. How people write together now: Beginning the investigation with advanced undergraduates in a project course. ACM Transactions on Computer-Human Interaction (TOCHI) 24, 1 (2017), 4.Google ScholarDigital Library
- Randal S Olson and Jason H Moore. 2016. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Workshop on Automatic Machine Learning. 66--74.Google Scholar
- Heungseok Park, Jinwoong Kim, Minkyu Kim, Ji-Hoon Kim, Jaegul Choo, Jung-Woo Ha, and Nako Sung. 2019. VisualHyperTuner: Visual analytics for user-driven hyperparameter tuning of deep neural networks. In Demo at SysML Conference.Google Scholar
- Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 136.Google ScholarDigital Library
- Data Robot. [n. d.]. Data Robot: Automated Machine Learning for Predictive Modeling. Retrieved 06-Oct-2019 from https://datarobot.comGoogle Scholar
- Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 32.Google ScholarDigital Library
- Keng Siau and Weiyu Wang. 2018. Building trust in artificial intelligence, machine learning, and robotics. Cutter Business Technology Journal 31, 2 (2018), 47--53.Google Scholar
- Donna Spencer. 2009. Card sorting: Designing usable categories. Rosenfeld Media.Google Scholar
- Donna Spencer and Todd Warfel. 2004. Card sorting: a definitive guide. Boxes and Arrows 2 (2004).Google Scholar
- Mohsen Tavakol and Reg Dennick. 2011. Making sense of Cronbach's alpha. International journal of medical education 2 (2011), 53.Google ScholarCross Ref
- Bruce Thompson. 2004. Exploratory and confirmatory factor analysis: Understanding concepts and applications. American Psychological Association.Google Scholar
- Pedro Uria-Recio. 2018. Can Artificial Intelligence replace Data Scientists? Retrieved 06-Oct-19 from https://towardsdatascience.com/can-artificial-intelligence-replace-data-scientists-e4d4d828e31eGoogle Scholar
- Wil MP van der Aalst, Martin Bichler, and Armin Heinzl. 2017. Responsible data science.Google Scholar
- Fernanda B Viegas, Martin Wattenberg, Frank Van Ham, Jesse Kriss, and Matt McKeon. 2007. Manyeyes: a site for visualization at internet scale. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1121--1128.Google Scholar
- April Yi Wang, Anant Mittal, Christopher Brooks, and Steve Oney. 2019. How Data Scientists Use Computational Notebooks for Real-Time Collaboration. (2019).Google Scholar
- Dakuo Wang, Judith S. Olson, Jingwen Zhang, Trung Nguyen, and Gary M. Olson. 2015. DocuViz: Visualizing Collaborative Writing. In Proceedings of CHI'15. ACM, New York, NY, USA, 1865--1874.Google Scholar
- Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. To appear in Computer Supported Cooperative Work (CSCW) (2019).Google ScholarDigital Library
- Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. Atmseer: Increasing transparency and controllability in automated machine learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 681.Google ScholarDigital Library
- Daniel Karl I. Weidele. 2019. Conditional Parallel Coordinates. IEEE transactions on visualization and computer graphics 26, 1 (2019).Google ScholarCross Ref
- Katharina Weitz, Dominik Schiller, Ruben Schlagowski, Tobias Huber, and Elisabeth André. 2019. "Do You Trust Me?": Increasing User-Trust by Integrating Virtual Agents in Explainable AI Interaction Design. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (IVA '19). ACM, New York, NY, USA, 7--9. Google ScholarDigital Library
- Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Article 279, 12 pages. Google ScholarDigital Library
- Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM.Google ScholarDigital Library
- Ruijing Zhao, Izak Benbasat, and Hasan Cavusoglu. 2019. Do users always want to know more? Investigating the relationship between system transparency and users' trust in advice-giving systems. In Proceedings of ECIS 2019.Google Scholar
- Marc-André Zöller and Marco F Huber. 2019. Survey on Automated Machine Learning. arXiv preprint arXiv:1904.12054 (2019).Google Scholar
Index Terms
- Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems
Recommendations
AutoDS: Towards Human-Centered Automation of Data Science
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsData science (DS) projects often follow a lifecycle that consists of laborious tasks for data scientists and domain experts (e.g., data exploration, model training, etc.). Only till recently, machine learning(ML) researchers have developed promising ...
A General Recipe for Automated Machine Learning in Practice
Advances in Artificial Intelligence – IBERAMIA 2022AbstractAutomated Machine Learning (AutoML) is an area of research that focuses on developing methods to generate machine learning models automatically. The idea of being able to build machine learning models with very little human intervention represents ...
A Survey on Automated Machine Learning: Problems, Methods and Frameworks
Human-Computer Interaction. Theoretical Approaches and Design MethodsAbstractAutomated Machine Learning (AutoML) is a research field that automates machine learning processes and optimizes their costs. As machine learning begins to be widely used, many users in industry and academia are paying attention to AutoML. However, ...
Comments