ABSTRACT
We present an extensible user simulation toolkit to facilitate automatic evaluation of conversational recommender systems. It builds on an established agenda-based approach and extends it with several novel elements, including user satisfaction prediction, persona and context modeling, and conditional natural language generation. We showcase the toolkit with a pre-existing movie recommender system and demonstrate its ability to simulate dialogues that mimic real conversations, while requiring only a handful of manually annotated dialogues as training data.
- Krisztian Balog. 2021. Conversational AI from an Information Retrieval Perspective: Remaining Challenges and a Case for User Simulation. In Proc. of DESIRES '21. 80--90.Google Scholar
- Krisztian Balog and Tom Kenter. 2019. Personal Knowledge Graphs: A Research Agenda. In Proc. of ICTIR '19. 217--220.Google ScholarDigital Library
- Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2021. Report on the 1st Simulation for Information Retrieval Workshop (Sim4IR 2021) at SIGIR 2021. SIGIR Forum 55, 2, Article 10 (dec 2021).Google ScholarDigital Library
- Tanja Bunk, Daksh Varshneya, Vladimir Vlasov, and Alan Nichol. 2020. DIET: Lightweight Language Understanding for Dialogue Systems. arXiv:2004.09936 [cs.CL]Google Scholar
- Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. In Proc. of KDD '16. 815--824.Google ScholarDigital Library
- Chongming Gao, Wenqiang Lei, Xiangnan He, Maarten de Rijke, and Tat-Seng Chua. 2021. Advances and Challenges in Conversational Recommender Systems: A Survey. AI Open 2 (2021), 100--126.Google ScholarCross Ref
- Javeria Habib, Shuo Zhang, and Krisztian Balog. 2020. IAI MovieBot: A Conversational Movie Recommender System. In Proc. of CIKM '20. 3405--3408.Google ScholarDigital Library
- Eugene Ie, Chih wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. arXiv:1909.04847 [cs.LG]Google Scholar
- Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A Survey on Conversational Recommender Systems. ACM Comput. Surv. 54, 5 (2021).Google ScholarDigital Library
- Ivica Kostric, Krisztian Balog, and Filip Radlinski. 2021. Soliciting User Preferences in Conversational Recommender Systems via Usage-Related Questions. In Proc. of RecSys '21. 724--729.Google ScholarDigital Library
- Karl Krauth, Sarah Dean, Alex Zhao, Wenshuo Guo, Mihaela Curmei, Benjamin Recht, and Michael I. Jordan. 2020. Do Offline Metrics Predict Online Performance in Recommender Systems? arXiv:2011.07931 [cs.LG]Google Scholar
- Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, and Tat-Seng Chua. 2020. Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems. In Proc. of WSDM '20. 304--312.Google ScholarDigital Library
- Raymond Li, Samira Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards Deep Conversational Recommendations. In Proc. of NIPS '18. 9748--9758.Google Scholar
- Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, and Craig Boutilier. 2021. RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems. arXiv:2103.08057 [cs.LG]Google Scholar
- Namkee Park, Kyungeun Jang, Seonggyeol Cho, and Jinyoung Choi. 2021. Use of Offensive Language in Human-Artificial Intelligence Chatbot Interaction: The Effects of Ethical Ideology, Social Competence, and Perceived Humanlikeness. Comput. Hum. Behav. 121 (2021), 106795.Google ScholarDigital Library
- David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv:1808.00720 [cs.IR]Google Scholar
- Alexandre Salle, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2021. Studying the Effectiveness of Conversational Search Refinement Through User Simulation. In Proc. of ECIR '21. 587--602.Google ScholarDigital Library
- Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. 2007. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System. In Proc. of NAACL '07. 149--152.Google ScholarCross Ref
- Jost Schatzmann, Karl Weilhammer, Matt Stuttle, and Steve Young. 2006. A Survey of Statistical User Simulation Techniques for Reinforcement-Learning of Dialogue Management Strategies. Knowl. Eng. Rev. 21, 2 (June 2006), 97--126.Google ScholarDigital Library
- Ivan Sekulic, Mohammad Aliannejadi, and Fabio Crestani. 2022. Evaluating Mixed-Initiative Conversational Search Systems via User Simulation. In Proc. of WSDM '22. 888--896.Google ScholarDigital Library
- Bichen Shi, Makbule Gulcin Ozsoy, Neil Hurley, Barry Smyth, Elias Z. Tragos, James Geraci, and Aonghus Lawlor. 2019. PyRecGym: A Reinforcement Learning Gym for Recommender Systems. In Proc. of RecSys '19. 491--495.Google ScholarDigital Library
- Weiyan Shi, Kun Qian, Xuewei Wang, and Zhou Yu. 2019. How to Build User Simulators to Train RL-based Dialog Systems. In Proc. of EMNLP-IJCNLP '19. 1990--2000.Google ScholarCross Ref
- Weiwei Sun, Shuo Zhang, Krisztian Balog, Zhaochun Ren, Pengjie Ren, Zhumin Chen, and Maarten de Rijke. 2021. Simulating User Satisfaction for the Evaluation of Task-Oriented Dialogue Systems. In Proc. of SIGIR '21. 2499--2506.Google ScholarDigital Library
- Bo-Hsiang Tseng, Yinpei Dai, Florian Kreyssig, and Bill Byrne. 2021. Transferable Dialogue Systems and User Simulators. In Proc. of ACL '21. 152--166.Google ScholarCross Ref
- Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In Proc. of KDD '20. 1512--1520.Google ScholarDigital Library
- Shuo Zhang, Mu-Chun Wang, and Krisztian Balog. 2022. Analyzing and Simulating User Utterance Reformulation in Conversational Recommender Systems. In Proc. of SIGIR '22. 133--143.Google ScholarDigital Library
- Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018. Towards Conversational Search and Recommendation: System Ask, User Respond. In Proc. of CIKM '18. 177--186.Google Scholar
- Qi Zhu, Zheng Zhang, Yan Fang, Xiang Li, Ryuichi Takanobu, Jinchao Li, Baolin Peng, Jianfeng Gao, Xiaoyan Zhu, and Minlie Huang. 2020. ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems. In Proc. of ACL '20. 142--149.Google ScholarCross Ref
- Jie Zou, Yifan Chen, and Evangelos Kanoulas. 2020. Towards Question-Based Recommender Systems. In Proc. of SIGIR '20. 881--890.Google ScholarDigital Library
Index Terms
- UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems
Recommendations
Evaluating Conversational Recommender Systems via User Simulation
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningConversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, ...
Unifying Recommender Systems and Conversational User Interfaces
CUI '22: Proceedings of the 4th Conference on Conversational User InterfacesThis paper considers unifying research on conversational user interfaces and recommender systems. Studies on conversational user interfaces (CUIs) typically examine how conversations can be facilitated (i.e., optimizing the means). Recommender systems ...
Impacts of Personal Characteristics on User Trust in Conversational Recommender Systems
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing SystemsConversational recommender systems (CRSs) imitate human advisors to assist users in finding items through conversations and have recently gained increasing attention in domains such as media and e-commerce. Like in human communication, building trust in ...
Comments