ABSTRACT
When designing Machine Learning (ML) enabled solutions, designers often need to simulate ML behavior through the Wizard of Oz (WoZ) approach to test the user experience before the ML model is available. Although reproducing ML errors is essential for having a good representation, they are rarely considered. We introduce Wizard of Errors (WoE), a tool for conducting WoZ studies on ML-enabled solutions that allows simulating ML errors during user experience assessment. We explored how this system can be used to simulate the behavior of a computer vision model. We tested WoE with design students to determine the importance of considering ML errors in design, the relevance of using descriptive error types instead of confusion matrix, and the suitability of manual error control in WoZ studies. Our work identifies several challenges, which prevent realistic error representation by designers in such studies. We discuss the implications of these findings for design.
Supplemental Material
- Abhay Agarwal and Marcy Regalado. n.d.. A Design Language for Human-Centered AI. Retrieved on April 6, 2021 from https://linguafranca.polytopal.ai.Google Scholar
- Emma Beauxis-Aussalet, Joost van Doorn, and Lynda Hardman. 2018. Supporting End-User Understanding of Classification Errors. In Proceedings of the 36th European Conference on Cognitive Ergonomics (Utrecht, Netherlands) (ECCE’18). Association for Computing Machinery, New York, NY, USA, Article 10, 8 pages. https://doi.org/10.1145/3232078.3232096Google ScholarDigital Library
- Andrew Begel, John Tang, Sean Andrist, Michael Barnett, Tony Carbary, Piali Choudhury, Edward Cutrell, Alberto Fung, Sasa Junuzovic, Daniel McDuff, Kael Rowan, Shibashankar Sahoo, Jennifer Frances Waldern, Jessica Wolk, Hui Zheng, and Annuska Zolyomi. 2020. Lessons Learned in Designing AI for Autistic Adults. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 46, 6 pages. https://doi.org/10.1145/3373625.3418305Google ScholarDigital Library
- Jared N. Bott and Joseph J. Laviola Jr.2015. The WOZ Recognizer: A Wizard of Oz Sketch Recognition System. ACM Trans. Interact. Intell. Syst. 5, 3, Article 15 (Oct. 2015), 38 pages. https://doi.org/10.1145/2743029Google ScholarDigital Library
- Jacob T. Browne. 2019. Wizard of Oz Prototyping for Machine Learning Experiences. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, Article LBW2621, 6 pages. https://doi.org/10.1145/3290607.3312877Google ScholarDigital Library
- Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 278–288. https://doi.org/10.1145/3025453.3025739Google ScholarDigital Library
- Andrew Finke. 2019. Lake: A Digital Wizard of Oz Prototyping Tool. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3308455Google ScholarDigital Library
- Marco Gillies, Rebecca Fiebrink, Atau Tanaka, Jérémie Garcia, Frédéric Bevilacqua, Alexis Heloir, Fabrizio Nunnari, Wendy Mackay, Saleema Amershi, Bongshin Lee, Nicolas d’Alessandro, Joëlle Tilmanne, Todd Kulesza, and Baptiste Caramiaux. 2016. Human-Centred Machine Learning. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA ’16). Association for Computing Machinery, New York, NY, USA, 3558–3565. https://doi.org/10.1145/2851581.2856492Google ScholarDigital Library
- Leo A Goodman. 1961. Snowball sampling. The annals of mathematical statistics 32, 1 (1961), 148–170.Google Scholar
- Matthew K. Hong, Adam Fourney, Derek DeBellis, and Saleema Amershi. 2021. Planning for Natural Language Failures with the AI Playbook. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 386, 11 pages. https://doi.org/10.1145/3411764.3445735Google ScholarDigital Library
- Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-User Expectations of AI Systems. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3290605.3300641Google ScholarDigital Library
- Gierad Laput, Robert Xiao, and Chris Harrison. 2016. ViBand: High-Fidelity Bio-Acoustic Sensing Using Commodity Smartwatch Accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 321–333. https://doi.org/10.1145/2984511.2984582Google ScholarDigital Library
- Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3986–3999. https://doi.org/10.1145/3025453.3025773Google ScholarDigital Library
- Yuan Liu, Ayush Jain, Clara Eng, David H Way, Kang Lee, Peggy Bui, Kimberly Kanada, Guilherme de Oliveira Marinho, Jessica Gallegos, Sara Gabriele, 2020. A deep learning system for differential diagnosis of skin diseases. Nature medicine 26, 6 (2020), 900–908.Google Scholar
- David Maulsby, Saul Greenberg, and Richard Mander. 1993. Prototyping an Intelligent Agent through Wizard of Oz. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 277–284. https://doi.org/10.1145/169059.169215Google ScholarDigital Library
- Andrea Isabell Müller, Veronika Weinbeer, and Klaus Bengler. 2019. Using the Wizard of Oz Paradigm to Prototype Automated Vehicles: Methodological Challenges. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings (Utrecht, Netherlands) (AutomotiveUI ’19). Association for Computing Machinery, New York, NY, USA, 181–186. https://doi.org/10.1145/3349263.3351526Google ScholarDigital Library
- Mahsan Nourani, Chiradeep Roy, Jeremy E Block, Donald R Honeycutt, Tahrima Rahman, Eric Ragan, and Vibhav Gogate. 2021. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI ’21). Association for Computing Machinery, New York, NY, USA, 340–350. https://doi.org/10.1145/3397481.3450639Google ScholarDigital Library
- Google PAIR. 2019. People + AI Guidebook. pair.withgoogle.com/guidebook.Google Scholar
- Laurel D. Riek. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. J. Hum.-Robot Interact. 1, 1 (July 2012), 119–136. https://doi.org/10.5898/JHRI.1.1.RiekGoogle ScholarDigital Library
- Hong Shen, Haojian Jin, Ángel Alexander Cabrera, Adam Perer, Haiyi Zhu, and Jason I. Hong. 2020. Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance. Proc. ACM Hum.-Comput. Interact. 4, CSCW2, Article 153 (Oct. 2020), 22 pages. https://doi.org/10.1145/3415224Google ScholarDigital Library
- Sly Golovanov. 2015. “IKEA Concept Kitchen 2025” April 21, 2015. [YouTube video]. https://www.youtube.com/watch?v=qD60cBQOABYGoogle Scholar
- Alaa Tharwat. 2020. Classification assessment methods. Applied Computing and Informatics 17, 1 (2020), 168–192.Google ScholarCross Ref
- Philip van Allen. 2018. Prototyping Ways of Prototyping AI. Interactions 25, 6 (Oct. 2018), 46–51. https://doi.org/10.1145/3274566Google ScholarDigital Library
- Sruthi Viswanathan, Behrooz Omidvar-Tehrani, Adrien Bruyat, Frédéric Roulland, and Antonietta Maria Grasso. 2020. Hybrid Wizard of Oz: Concept Testing a Recommender System. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3334480.3383097Google ScholarDigital Library
- Qian Yang, Nikola Banovic, and John Zimmerman. 2018. Mapping Machine Learning Advances from HCI Research to Reveal Starting Places for Design Innovation. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173704Google ScholarDigital Library
- Qian Yang, Justin Cranshaw, Saleema Amershi, Shamsi T. Iqbal, and Jaime Teevan. 2019. Sketching NLP: A Case Study of Exploring the Right Things To Design with Language Intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300415Google ScholarDigital Library
- Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating How Experienced UX Designers Effectively Work with Machine Learning. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 585–596. https://doi.org/10.1145/3196709.3196730Google ScholarDigital Library
- Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301Google ScholarDigital Library
- Qian Yang, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning Adaptive Mobile Experiences When Wireframing. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane, QLD, Australia) (DIS ’16). Association for Computing Machinery, New York, NY, USA, 565–576. https://doi.org/10.1145/2901790.2901858Google ScholarDigital Library
Recommendations
Wizard of Oz Prototyping for Machine Learning Experiences
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing SystemsMachine learning is being adopted in a wide range of products and services. Despite its adoption, design and research processes for machine learning experiences have yet to be cemented in the user experience community. Prototyping machine learning ...
Wizard of Oz experiments for companions
BCS-HCI '09: Proceedings of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and TechnologyWizard of Oz experiments allow designers and developers to see the reactions of people as they interact with to-be-developed technologies. At the Centre for Interaction Design at Edinburgh Napier University we are developing a Wizard of Oz system to ...
Wizard of Oz experiments and companion dialogues
BCS '10: Proceedings of the 24th BCS Interaction Specialist Group ConferenceNovel speech systems such as the conversational agents being developed by the Companions Project (www.companions-project.org) can be simulated using the Wizard of Oz methodology. In this approach technologies that are not yet ready for testing by people ...
Comments