ABSTRACT
Language systems have been of great interest to the research community and have recently reached the mass market through various assistant platforms on the web. Reinforcement Learning methods that optimize dialogue policies have seen successes in past years and have recently been extended into methods that personalize the dialogue, e.g. take the personal context of users into account. These works, however, are limited to personalization to a single user with whom they require multiple interactions and do not generalize the usage of context across users. This work introduces a problem where a generalized usage of context is relevant and proposes two Reinforcement Learning (RL)-based approaches to this problem. The first approach uses a single learner and extends the traditional POMDP formulation of dialogue state with features that describe the user context. The second approach segments users by context and then employs a learner per context. We compare these approaches in a benchmark of existing non-RL and RL-based methods in three established and one novel application domain of financial product recommendation. We compare the influence of context and training experiences on performance and find that learning approaches generally outperform a handcrafted gold standard.
- [1] Gediminas Adomavicius and Alexander Tuzhilin. Context-aware recommender systems. In Recommender systems handbook, pages 217–253. Springer, 2011.Google Scholar
- [2] Grigoris Antoniou and Frank Van Harmelen. A semantic web primer. MIT press, 2004.Google ScholarDigital Library
- [3] Jeesoo Bang, Hyungjong Noh, Yonghee Kim, and Gary Geunbae Lee. Example-based chat-oriented dialogue system with personalized long-term memory. In 2015 International Conference on Big Data and Smart Computing (BigComp), pages 238–243. IEEE, 2015.Google Scholar
- [4] Anouschka Bergmann, Kathleen Currie Hall, and Sharon Miriam Ross. Language files: Materials for an introduction to language and linguistics. Ohio State University Press, 2007.Google Scholar
- [5] Iñigo Casanueva, Paweł Budzianowski, Pei-Hao Su, Nikola Mrkšić, Tsung-Hsien Wen, Stefan Ultes, Lina Rojas-Barahona, Steve Young, and Milica Gašić. A benchmarking environment for reinforcement learning based task oriented dialogue management. In Deep Reinforcement Learning Symposium, 31st Conference on Neural Information Processing Systems, 2017.Google Scholar
- [6] Inigo Casanueva, Thomas Hain, Heidi Christensen, Ricard Marxer, and Phil Green. Knowledge transfer between speakers for personalised dialogue management. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 12–21, 2015.Google ScholarCross Ref
- [7] Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman. Policy networks with two-stage training for dialogue systems. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 101–110, 2016.Google ScholarCross Ref
- [8] Milica Gašić, Filip Jurčíček, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, and Steve Young. Gaussian processes for fast policy optimisation of POMDP-based dialogue managers. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 201–204. Association for Computational Linguistics, 2010.Google Scholar
- [9] Aude Genevay and Romain Laroche. Transfer learning for user adaptation in spoken dialogue systems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pages 975–983, 2016.Google Scholar
- [10] Mehmet H Göker and Cynthia A Thompson. Personalized conversational case-based recommendation. In European Workshop on Advances in Case-Based Reasoning, pages 99–111. Springer, 2000.Google ScholarCross Ref
- [11] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1856–1865, 2018.Google Scholar
- [12] Matthew Henderson, Blaise Thomson, and Jason D Williams. The second dialog state tracking challenge. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 263–272, 2014.Google ScholarCross Ref
- [13] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems, pages 79–86. ACM, 2010.Google Scholar
- [14] Yonghee Kim, Jeesoo Bang, Junhwi Choi, Seonghan Ryu, Sangjun Koo, and Gary Geunbae Lee. Acquisition and use of long-term memory for personalized dialog systems. In International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, pages 78–87. Springer, 2014.Google Scholar
- [15] Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010.Google Scholar
- [16] Diane J Litman, Michael S Kearns, Satinder Singh, and Marilyn A Walker. Automatic optimization of dialogue management. In COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics, pages 502–508. Association for Computational Linguistics, 2000.Google ScholarDigital Library
- [17] Omid Madani and Dennis DeCoste. Contextual recommender problems. In Proceedings of the 1st international workshop on Utility-based data mining, pages 86–89. ACM, 2005.Google Scholar
- [18] Tariq Mahmood, Ghulam Mujtaba, and Adriano Venturini. Dynamic personalization in conversational recommender systems. Information Systems and e-Business Management, 12(2):213–238, 2014.Google Scholar
- [19] John McCarthy, Marvin L Minsky, Nathaniel Rochester, and Claude E Shannon. A proposal for the Dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine, 27(4):12, 2006.Google ScholarDigital Library
- [20] Kaixiang Mo, Yu Zhang, Shuangyin Li, Jiajun Li, and Qiang Yang. Personalizing a dialogue system with transfer reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google ScholarCross Ref
- [21] Natalya F Noy, Deborah L McGuinness, et al. Ontology development 101: A guide to creating your first ontology. Technical Report SMI-2001-0880, Stanford Medical Informatics, 2001.Google Scholar
- [22] Michael J Pazzani and Daniel Billsus. Content-based recommendation systems. In The adaptive web, pages 325–341. Springer, 2007.Google Scholar
- [23] Andrzej Pelc. Searching games with errors - fifty years of coping with liars. Theoretical Computer Science, 270(1-2):71–109, 2002.Google ScholarDigital Library
- [24] Soujanya Poria, Erik Cambria, Newton Howard, Guang-Bin Huang, and Amir Hussain. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing, 174:50–59, 2016.Google ScholarDigital Library
- [25] Nicholas Roy, Joelle Pineau, and Sebastian Thrun. Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 93–100, 2000.Google ScholarDigital Library
- [26] Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pages 149–152, 2007.Google ScholarCross Ref
- [27] Heung-Yeung Shum, Xiao-dong He, and Di Li. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering, 19(1):10–26, 2018.Google ScholarCross Ref
- [28] Pei-Hao Su, Paweł Budzianowski, Stefan Ultes, Milica Gasic, and Steve Young. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 147–157, 2017.Google ScholarCross Ref
- [29] Cynthia A Thompson, Mehmet H Goker, and Pat Langley. A personalized system for conversational recommendations. Journal of Artificial Intelligence Research, 21:393–428, 2004.Google ScholarCross Ref
- [30] A. M. Turing. Computing machinery and intelligence. Mind, LIX(236):433–460, 10 1950.Google ScholarCross Ref
- [31] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double Q-learning. In Thirtieth AAAI conference on artificial intelligence, volume 2, page 5, 2016.Google ScholarCross Ref
- [32] Jason Williams, Antoine Raux, Deepak Ramachandran, and Alan Black. The dialog state tracking challenge. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 404–413, 2013.Google Scholar
- [33] Jason D Williams and Steve Young. Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2):393–422, 2007.Google ScholarDigital Library
- [34] Ji Wu, Miao Li, and Chin-Hui Lee. An entropy minimization framework for goal-driven dialogue management. In Sixteenth Annual Conference of the International Speech Communication Association, 2015.Google Scholar
- [35] Steve Young, Milica Gašić, Blaise Thomson, and Jason D Williams. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160–1179, 2013.Google ScholarCross Ref
Index Terms
- Reinforcement Learning for Personalized Dialogue Management
Recommendations
Reinforcement Learning Based on Contextual Bandits for Personalized Online Learning Recommendation Systems
AbstractPersonalized online learning has been significantly adopted in recent years and become a potential instructional strategy in online learning. The promising way to provide personalized online learning is personalized recommendation by navigating ...
Social signal and user adaptation in reinforcement learning-based dialogue management
MLIS '13: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and CommunicationThis paper investigates the conditions under which cues from social signals can be used for user adaptation (or user tracking) of a learning agent. In this work we consider the case of the Reinforcement Learning (RL) of a dialogue management module. ...
Experience Replay-based Deep Reinforcement Learning for Dialogue Management Optimisation
Dialogue policy is a crucial component in task-oriented Spoken Dialogue Systems (SDSs). As a decision function, it takes the current dialogue state as input and generates appropriate system’s response. In this paper, we explore the reinforcement learning ...
Comments