poster

Wizard of Errors: Introducing and Evaluating Machine Learning Errors in Wizard of Oz Studies

Authors:
Anniek Jansen

Department of Industrial Design, Eindhoven University of Technology, Netherlands

Department of Industrial Design, Eindhoven University of Technology, Netherlands
View Profile

,
Sara Colombo

Department of Industrial Design, Eindhoven University of Technology, Netherlands

Department of Industrial Design, Eindhoven University of Technology, Netherlands
View Profile

CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing SystemsApril 2022Article No.: 426Pages 1–7https://doi.org/10.1145/3491101.3519684

Published:28 April 2022Publication History

CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems

Pages 1–7

ABSTRACT

When designing Machine Learning (ML) enabled solutions, designers often need to simulate ML behavior through the Wizard of Oz (WoZ) approach to test the user experience before the ML model is available. Although reproducing ML errors is essential for having a good representation, they are rarely considered. We introduce Wizard of Errors (WoE), a tool for conducting WoZ studies on ML-enabled solutions that allows simulating ML errors during user experience assessment. We explored how this system can be used to simulate the behavior of a computer vision model. We tested WoE with design students to determine the importance of considering ML errors in design, the relevance of using descriptive error types instead of confusion matrix, and the suitability of manual error control in WoZ studies. Our work identifies several challenges, which prevent realistic error representation by designers in such studies. We discuss the implications of these findings for design.

Supplemental Material

3491101.3519684-video-preview.mp4

mp4

10.5 MB

Download

References

Abhay Agarwal and Marcy Regalado. n.d.. A Design Language for Human-Centered AI. Retrieved on April 6, 2021 from https://linguafranca.polytopal.ai.Google Scholar
Emma Beauxis-Aussalet, Joost van Doorn, and Lynda Hardman. 2018. Supporting End-User Understanding of Classification Errors. In Proceedings of the 36th European Conference on Cognitive Ergonomics (Utrecht, Netherlands) (ECCE’18). Association for Computing Machinery, New York, NY, USA, Article 10, 8 pages. https://doi.org/10.1145/3232078.3232096Google ScholarDigital Library
Andrew Begel, John Tang, Sean Andrist, Michael Barnett, Tony Carbary, Piali Choudhury, Edward Cutrell, Alberto Fung, Sasa Junuzovic, Daniel McDuff, Kael Rowan, Shibashankar Sahoo, Jennifer Frances Waldern, Jessica Wolk, Hui Zheng, and Annuska Zolyomi. 2020. Lessons Learned in Designing AI for Autistic Adults. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 46, 6 pages. https://doi.org/10.1145/3373625.3418305Google ScholarDigital Library
Jared N. Bott and Joseph J. Laviola Jr.2015. The WOZ Recognizer: A Wizard of Oz Sketch Recognition System. ACM Trans. Interact. Intell. Syst. 5, 3, Article 15 (Oct. 2015), 38 pages. https://doi.org/10.1145/2743029Google ScholarDigital Library
Jacob T. Browne. 2019. Wizard of Oz Prototyping for Machine Learning Experiences. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, Article LBW2621, 6 pages. https://doi.org/10.1145/3290607.3312877Google ScholarDigital Library
Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 278–288. https://doi.org/10.1145/3025453.3025739Google ScholarDigital Library
Andrew Finke. 2019. Lake: A Digital Wizard of Oz Prototyping Tool. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3308455Google ScholarDigital Library
Marco Gillies, Rebecca Fiebrink, Atau Tanaka, Jérémie Garcia, Frédéric Bevilacqua, Alexis Heloir, Fabrizio Nunnari, Wendy Mackay, Saleema Amershi, Bongshin Lee, Nicolas d’Alessandro, Joëlle Tilmanne, Todd Kulesza, and Baptiste Caramiaux. 2016. Human-Centred Machine Learning. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA ’16). Association for Computing Machinery, New York, NY, USA, 3558–3565. https://doi.org/10.1145/2851581.2856492Google ScholarDigital Library
Leo A Goodman. 1961. Snowball sampling. The annals of mathematical statistics 32, 1 (1961), 148–170.Google Scholar
Matthew K. Hong, Adam Fourney, Derek DeBellis, and Saleema Amershi. 2021. Planning for Natural Language Failures with the AI Playbook. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 386, 11 pages. https://doi.org/10.1145/3411764.3445735Google ScholarDigital Library
Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-User Expectations of AI Systems. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3290605.3300641Google ScholarDigital Library
Gierad Laput, Robert Xiao, and Chris Harrison. 2016. ViBand: High-Fidelity Bio-Acoustic Sensing Using Commodity Smartwatch Accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 321–333. https://doi.org/10.1145/2984511.2984582Google ScholarDigital Library
Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3986–3999. https://doi.org/10.1145/3025453.3025773Google ScholarDigital Library
Yuan Liu, Ayush Jain, Clara Eng, David H Way, Kang Lee, Peggy Bui, Kimberly Kanada, Guilherme de Oliveira Marinho, Jessica Gallegos, Sara Gabriele, 2020. A deep learning system for differential diagnosis of skin diseases. Nature medicine 26, 6 (2020), 900–908.Google Scholar
David Maulsby, Saul Greenberg, and Richard Mander. 1993. Prototyping an Intelligent Agent through Wizard of Oz. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 277–284. https://doi.org/10.1145/169059.169215Google ScholarDigital Library
Andrea Isabell Müller, Veronika Weinbeer, and Klaus Bengler. 2019. Using the Wizard of Oz Paradigm to Prototype Automated Vehicles: Methodological Challenges. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings (Utrecht, Netherlands) (AutomotiveUI ’19). Association for Computing Machinery, New York, NY, USA, 181–186. https://doi.org/10.1145/3349263.3351526Google ScholarDigital Library
Mahsan Nourani, Chiradeep Roy, Jeremy E Block, Donald R Honeycutt, Tahrima Rahman, Eric Ragan, and Vibhav Gogate. 2021. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI ’21). Association for Computing Machinery, New York, NY, USA, 340–350. https://doi.org/10.1145/3397481.3450639Google ScholarDigital Library
Google PAIR. 2019. People + AI Guidebook. pair.withgoogle.com/guidebook.Google Scholar
Laurel D. Riek. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. J. Hum.-Robot Interact. 1, 1 (July 2012), 119–136. https://doi.org/10.5898/JHRI.1.1.RiekGoogle ScholarDigital Library
Hong Shen, Haojian Jin, Ángel Alexander Cabrera, Adam Perer, Haiyi Zhu, and Jason I. Hong. 2020. Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance. Proc. ACM Hum.-Comput. Interact. 4, CSCW2, Article 153 (Oct. 2020), 22 pages. https://doi.org/10.1145/3415224Google ScholarDigital Library
Sly Golovanov. 2015. “IKEA Concept Kitchen 2025” April 21, 2015. [YouTube video]. https://www.youtube.com/watch?v=qD60cBQOABYGoogle Scholar
Alaa Tharwat. 2020. Classification assessment methods. Applied Computing and Informatics 17, 1 (2020), 168–192.Google ScholarCross Ref
Philip van Allen. 2018. Prototyping Ways of Prototyping AI. Interactions 25, 6 (Oct. 2018), 46–51. https://doi.org/10.1145/3274566Google ScholarDigital Library
Sruthi Viswanathan, Behrooz Omidvar-Tehrani, Adrien Bruyat, Frédéric Roulland, and Antonietta Maria Grasso. 2020. Hybrid Wizard of Oz: Concept Testing a Recommender System. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3334480.3383097Google ScholarDigital Library
Qian Yang, Nikola Banovic, and John Zimmerman. 2018. Mapping Machine Learning Advances from HCI Research to Reveal Starting Places for Design Innovation. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173704Google ScholarDigital Library
Qian Yang, Justin Cranshaw, Saleema Amershi, Shamsi T. Iqbal, and Jaime Teevan. 2019. Sketching NLP: A Case Study of Exploring the Right Things To Design with Language Intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300415Google ScholarDigital Library
Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating How Experienced UX Designers Effectively Work with Machine Learning. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 585–596. https://doi.org/10.1145/3196709.3196730Google ScholarDigital Library
Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301Google ScholarDigital Library
Qian Yang, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning Adaptive Mobile Experiences When Wireframing. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane, QLD, Australia) (DIS ’16). Association for Computing Machinery, New York, NY, USA, 565–576. https://doi.org/10.1145/2901790.2901858Google ScholarDigital Library

Recommendations

Wizard of Oz Prototyping for Machine Learning Experiences
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

Machine learning is being adopted in a wide range of products and services. Despite its adoption, design and research processes for machine learning experiences have yet to be cemented in the user experience community. Prototyping machine learning ...
Read More
Wizard of Oz experiments for companions
BCS-HCI '09: Proceedings of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology

Wizard of Oz experiments allow designers and developers to see the reactions of people as they interact with to-be-developed technologies. At the Centre for Interaction Design at Edinburgh Napier University we are developing a Wizard of Oz system to ...
Read More
Wizard of Oz experiments and companion dialogues
BCS '10: Proceedings of the 24th BCS Interaction Specialist Group Conference

Novel speech systems such as the conversational agents being developed by the Companions Project (www.companions-project.org) can be simulated using the Wizard of Oz methodology. In this approach technologies that are not yet ready for testing by people ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems
April 2022
3066 pages
ISBN:9781450391566
DOI:10.1145/3491101
Editors:
Simone Barbosa
PUC-Rio, Brazil
,
Cliff Lampe
University of Michigan, USA
,
Caroline Appert
Université Paris-Saclay, France
,
David A. Shamma
Toyota Research Institute, USA
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 April 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computer Vision
Interaction Design
Machine Learning
Machine Learning Errors
Prototyping Methods
User Experience Analysis
User Experience Design
Wizard of Oz
Qualifiers
- poster
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,164of23,696submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 302
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Wizard of Errors: Introducing and Evaluating Machine Learning Errors in Wizard of Oz Studies

CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Wizard of Oz Prototyping for Machine Learning Experiences

Wizard of Oz experiments for companions

Wizard of Oz experiments and companion dialogues