skip to main content
10.1145/3491101.3519684acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
poster

Wizard of Errors: Introducing and Evaluating Machine Learning Errors in Wizard of Oz Studies

Authors Info & Claims
Published:28 April 2022Publication History

ABSTRACT

When designing Machine Learning (ML) enabled solutions, designers often need to simulate ML behavior through the Wizard of Oz (WoZ) approach to test the user experience before the ML model is available. Although reproducing ML errors is essential for having a good representation, they are rarely considered. We introduce Wizard of Errors (WoE), a tool for conducting WoZ studies on ML-enabled solutions that allows simulating ML errors during user experience assessment. We explored how this system can be used to simulate the behavior of a computer vision model. We tested WoE with design students to determine the importance of considering ML errors in design, the relevance of using descriptive error types instead of confusion matrix, and the suitability of manual error control in WoZ studies. Our work identifies several challenges, which prevent realistic error representation by designers in such studies. We discuss the implications of these findings for design.

Skip Supplemental Material Section

Supplemental Material

3491101.3519684-video-preview.mp4

mp4

10.5 MB

References

  1. Abhay Agarwal and Marcy Regalado. n.d.. A Design Language for Human-Centered AI. Retrieved on April 6, 2021 from https://linguafranca.polytopal.ai.Google ScholarGoogle Scholar
  2. Emma Beauxis-Aussalet, Joost van Doorn, and Lynda Hardman. 2018. Supporting End-User Understanding of Classification Errors. In Proceedings of the 36th European Conference on Cognitive Ergonomics (Utrecht, Netherlands) (ECCE’18). Association for Computing Machinery, New York, NY, USA, Article 10, 8 pages. https://doi.org/10.1145/3232078.3232096Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrew Begel, John Tang, Sean Andrist, Michael Barnett, Tony Carbary, Piali Choudhury, Edward Cutrell, Alberto Fung, Sasa Junuzovic, Daniel McDuff, Kael Rowan, Shibashankar Sahoo, Jennifer Frances Waldern, Jessica Wolk, Hui Zheng, and Annuska Zolyomi. 2020. Lessons Learned in Designing AI for Autistic Adults. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 46, 6 pages. https://doi.org/10.1145/3373625.3418305Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jared N. Bott and Joseph J. Laviola Jr.2015. The WOZ Recognizer: A Wizard of Oz Sketch Recognition System. ACM Trans. Interact. Intell. Syst. 5, 3, Article 15 (Oct. 2015), 38 pages. https://doi.org/10.1145/2743029Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob T. Browne. 2019. Wizard of Oz Prototyping for Machine Learning Experiences. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, Article LBW2621, 6 pages. https://doi.org/10.1145/3290607.3312877Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 278–288. https://doi.org/10.1145/3025453.3025739Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andrew Finke. 2019. Lake: A Digital Wizard of Oz Prototyping Tool. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3308455Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Marco Gillies, Rebecca Fiebrink, Atau Tanaka, Jérémie Garcia, Frédéric Bevilacqua, Alexis Heloir, Fabrizio Nunnari, Wendy Mackay, Saleema Amershi, Bongshin Lee, Nicolas d’Alessandro, Joëlle Tilmanne, Todd Kulesza, and Baptiste Caramiaux. 2016. Human-Centred Machine Learning. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA ’16). Association for Computing Machinery, New York, NY, USA, 3558–3565. https://doi.org/10.1145/2851581.2856492Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Leo A Goodman. 1961. Snowball sampling. The annals of mathematical statistics 32, 1 (1961), 148–170.Google ScholarGoogle Scholar
  10. Matthew K. Hong, Adam Fourney, Derek DeBellis, and Saleema Amershi. 2021. Planning for Natural Language Failures with the AI Playbook. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 386, 11 pages. https://doi.org/10.1145/3411764.3445735Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-User Expectations of AI Systems. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3290605.3300641Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gierad Laput, Robert Xiao, and Chris Harrison. 2016. ViBand: High-Fidelity Bio-Acoustic Sensing Using Commodity Smartwatch Accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 321–333. https://doi.org/10.1145/2984511.2984582Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3986–3999. https://doi.org/10.1145/3025453.3025773Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yuan Liu, Ayush Jain, Clara Eng, David H Way, Kang Lee, Peggy Bui, Kimberly Kanada, Guilherme de Oliveira Marinho, Jessica Gallegos, Sara Gabriele, 2020. A deep learning system for differential diagnosis of skin diseases. Nature medicine 26, 6 (2020), 900–908.Google ScholarGoogle Scholar
  15. David Maulsby, Saul Greenberg, and Richard Mander. 1993. Prototyping an Intelligent Agent through Wizard of Oz. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 277–284. https://doi.org/10.1145/169059.169215Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrea Isabell Müller, Veronika Weinbeer, and Klaus Bengler. 2019. Using the Wizard of Oz Paradigm to Prototype Automated Vehicles: Methodological Challenges. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings (Utrecht, Netherlands) (AutomotiveUI ’19). Association for Computing Machinery, New York, NY, USA, 181–186. https://doi.org/10.1145/3349263.3351526Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mahsan Nourani, Chiradeep Roy, Jeremy E Block, Donald R Honeycutt, Tahrima Rahman, Eric Ragan, and Vibhav Gogate. 2021. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI ’21). Association for Computing Machinery, New York, NY, USA, 340–350. https://doi.org/10.1145/3397481.3450639Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Google PAIR. 2019. People + AI Guidebook. pair.withgoogle.com/guidebook.Google ScholarGoogle Scholar
  19. Laurel D. Riek. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. J. Hum.-Robot Interact. 1, 1 (July 2012), 119–136. https://doi.org/10.5898/JHRI.1.1.RiekGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hong Shen, Haojian Jin, Ángel Alexander Cabrera, Adam Perer, Haiyi Zhu, and Jason I. Hong. 2020. Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance. Proc. ACM Hum.-Comput. Interact. 4, CSCW2, Article 153 (Oct. 2020), 22 pages. https://doi.org/10.1145/3415224Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sly Golovanov. 2015. “IKEA Concept Kitchen 2025” April 21, 2015. [YouTube video]. https://www.youtube.com/watch?v=qD60cBQOABYGoogle ScholarGoogle Scholar
  22. Alaa Tharwat. 2020. Classification assessment methods. Applied Computing and Informatics 17, 1 (2020), 168–192.Google ScholarGoogle ScholarCross RefCross Ref
  23. Philip van Allen. 2018. Prototyping Ways of Prototyping AI. Interactions 25, 6 (Oct. 2018), 46–51. https://doi.org/10.1145/3274566Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sruthi Viswanathan, Behrooz Omidvar-Tehrani, Adrien Bruyat, Frédéric Roulland, and Antonietta Maria Grasso. 2020. Hybrid Wizard of Oz: Concept Testing a Recommender System. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3334480.3383097Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Qian Yang, Nikola Banovic, and John Zimmerman. 2018. Mapping Machine Learning Advances from HCI Research to Reveal Starting Places for Design Innovation. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173704Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Qian Yang, Justin Cranshaw, Saleema Amershi, Shamsi T. Iqbal, and Jaime Teevan. 2019. Sketching NLP: A Case Study of Exploring the Right Things To Design with Language Intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300415Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating How Experienced UX Designers Effectively Work with Machine Learning. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 585–596. https://doi.org/10.1145/3196709.3196730Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Qian Yang, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning Adaptive Mobile Experiences When Wireframing. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane, QLD, Australia) (DIS ’16). Association for Computing Machinery, New York, NY, USA, 565–576. https://doi.org/10.1145/2901790.2901858Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format