skip to main content
10.1145/3350546.3352501acmotherconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Reinforcement Learning for Personalized Dialogue Management

Published:14 October 2019Publication History

ABSTRACT

Language systems have been of great interest to the research community and have recently reached the mass market through various assistant platforms on the web. Reinforcement Learning methods that optimize dialogue policies have seen successes in past years and have recently been extended into methods that personalize the dialogue, e.g. take the personal context of users into account. These works, however, are limited to personalization to a single user with whom they require multiple interactions and do not generalize the usage of context across users. This work introduces a problem where a generalized usage of context is relevant and proposes two Reinforcement Learning (RL)-based approaches to this problem. The first approach uses a single learner and extends the traditional POMDP formulation of dialogue state with features that describe the user context. The second approach segments users by context and then employs a learner per context. We compare these approaches in a benchmark of existing non-RL and RL-based methods in three established and one novel application domain of financial product recommendation. We compare the influence of context and training experiences on performance and find that learning approaches generally outperform a handcrafted gold standard.

References

  1. [1] Gediminas Adomavicius and Alexander Tuzhilin. Context-aware recommender systems. In Recommender systems handbook, pages 217–253. Springer, 2011.Google ScholarGoogle Scholar
  2. [2] Grigoris Antoniou and Frank Van Harmelen. A semantic web primer. MIT press, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Jeesoo Bang, Hyungjong Noh, Yonghee Kim, and Gary Geunbae Lee. Example-based chat-oriented dialogue system with personalized long-term memory. In 2015 International Conference on Big Data and Smart Computing (BigComp), pages 238–243. IEEE, 2015.Google ScholarGoogle Scholar
  4. [4] Anouschka Bergmann, Kathleen Currie Hall, and Sharon Miriam Ross. Language files: Materials for an introduction to language and linguistics. Ohio State University Press, 2007.Google ScholarGoogle Scholar
  5. [5] Iñigo Casanueva, Paweł Budzianowski, Pei-Hao Su, Nikola Mrkšić, Tsung-Hsien Wen, Stefan Ultes, Lina Rojas-Barahona, Steve Young, and Milica Gašić. A benchmarking environment for reinforcement learning based task oriented dialogue management. In Deep Reinforcement Learning Symposium, 31st Conference on Neural Information Processing Systems, 2017.Google ScholarGoogle Scholar
  6. [6] Inigo Casanueva, Thomas Hain, Heidi Christensen, Ricard Marxer, and Phil Green. Knowledge transfer between speakers for personalised dialogue management. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 12–21, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, and Kaheer Suleman. Policy networks with two-stage training for dialogue systems. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 101–110, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Milica Gašić, Filip Jurčíček, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, and Steve Young. Gaussian processes for fast policy optimisation of POMDP-based dialogue managers. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 201–204. Association for Computational Linguistics, 2010.Google ScholarGoogle Scholar
  9. [9] Aude Genevay and Romain Laroche. Transfer learning for user adaptation in spoken dialogue systems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pages 975–983, 2016.Google ScholarGoogle Scholar
  10. [10] Mehmet H Göker and Cynthia A Thompson. Personalized conversational case-based recommendation. In European Workshop on Advances in Case-Based Reasoning, pages 99–111. Springer, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1856–1865, 2018.Google ScholarGoogle Scholar
  12. [12] Matthew Henderson, Blaise Thomson, and Jason D Williams. The second dialog state tracking challenge. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 263–272, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems, pages 79–86. ACM, 2010.Google ScholarGoogle Scholar
  14. [14] Yonghee Kim, Jeesoo Bang, Junhwi Choi, Seonghan Ryu, Sangjun Koo, and Gary Geunbae Lee. Acquisition and use of long-term memory for personalized dialog systems. In International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, pages 78–87. Springer, 2014.Google ScholarGoogle Scholar
  15. [15] Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010.Google ScholarGoogle Scholar
  16. [16] Diane J Litman, Michael S Kearns, Satinder Singh, and Marilyn A Walker. Automatic optimization of dialogue management. In COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics, pages 502–508. Association for Computational Linguistics, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Omid Madani and Dennis DeCoste. Contextual recommender problems. In Proceedings of the 1st international workshop on Utility-based data mining, pages 86–89. ACM, 2005.Google ScholarGoogle Scholar
  18. [18] Tariq Mahmood, Ghulam Mujtaba, and Adriano Venturini. Dynamic personalization in conversational recommender systems. Information Systems and e-Business Management, 12(2):213–238, 2014.Google ScholarGoogle Scholar
  19. [19] John McCarthy, Marvin L Minsky, Nathaniel Rochester, and Claude E Shannon. A proposal for the Dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine, 27(4):12, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Kaixiang Mo, Yu Zhang, Shuangyin Li, Jiajun Li, and Qiang Yang. Personalizing a dialogue system with transfer reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Natalya F Noy, Deborah L McGuinness, et al. Ontology development 101: A guide to creating your first ontology. Technical Report SMI-2001-0880, Stanford Medical Informatics, 2001.Google ScholarGoogle Scholar
  22. [22] Michael J Pazzani and Daniel Billsus. Content-based recommendation systems. In The adaptive web, pages 325–341. Springer, 2007.Google ScholarGoogle Scholar
  23. [23] Andrzej Pelc. Searching games with errors - fifty years of coping with liars. Theoretical Computer Science, 270(1-2):71–109, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Soujanya Poria, Erik Cambria, Newton Howard, Guang-Bin Huang, and Amir Hussain. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing, 174:50–59, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Nicholas Roy, Joelle Pineau, and Sebastian Thrun. Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 93–100, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pages 149–152, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Heung-Yeung Shum, Xiao-dong He, and Di Li. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering, 19(1):10–26, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Pei-Hao Su, Paweł Budzianowski, Stefan Ultes, Milica Gasic, and Steve Young. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 147–157, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Cynthia A Thompson, Mehmet H Goker, and Pat Langley. A personalized system for conversational recommendations. Journal of Artificial Intelligence Research, 21:393–428, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] A. M. Turing. Computing machinery and intelligence. Mind, LIX(236):433–460, 10 1950.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double Q-learning. In Thirtieth AAAI conference on artificial intelligence, volume 2, page 5, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Jason Williams, Antoine Raux, Deepak Ramachandran, and Alan Black. The dialog state tracking challenge. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 404–413, 2013.Google ScholarGoogle Scholar
  33. [33] Jason D Williams and Steve Young. Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2):393–422, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Ji Wu, Miao Li, and Chin-Hui Lee. An entropy minimization framework for goal-driven dialogue management. In Sixteenth Annual Conference of the International Speech Communication Association, 2015.Google ScholarGoogle Scholar
  35. [35] Steve Young, Milica Gašić, Blaise Thomson, and Jason D Williams. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160–1179, 2013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Reinforcement Learning for Personalized Dialogue Management
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format