skip to main content
10.1145/3178876.3185994acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Public Access

DRN: A Deep Reinforcement Learning Framework for News Recommendation

Published:23 April 2018Publication History

ABSTRACT

In this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation. Online personalized news recommendation is a highly challenging problem due to the dynamic nature of news features and user preferences. Although some online recommendation models have been proposed to address the dynamic nature of news recommendation, these methods have three major issues. First, they only try to model current reward (e.g., Click Through Rate). Second, very few studies consider to use user feedback other than click / no click labels (e.g., how frequent user returns) to help improve recommendation. Third, these methods tend to keep recommending similar news to users, which may cause users to get bored. Therefore, to address the aforementioned challenges, we propose a Deep Q-Learning based recommendation framework, which can model future reward explicitly. We further consider user return pattern as a supplement to click / no click label in order to capture more user feedback information. In addition, an effective exploration strategy is incorporated to find new attractive news for users. Extensive experiments are conducted on the offline dataset and online production environment of a commercial news recommendation application and have shown the superior performance of our methods.

References

  1. Gediminas Adomavicius and YoungOk Kwon. 2012. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 5 (2012), 896--911. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering, Vol. 17, 6 (2005), 734--749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledge-based systems Vol. 46 (2013), 109--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Djallel Bouneffouf, Amel Bouzeghoub, and Alda Ganccarski. 2012. A contextual-bandit algorithm for mobile context-aware recommender system Neural Information Processing. Springer, 324--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nicolo Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. 2013. A gang of bandits Advances in Neural Information Processing Systems. 737--745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249--2257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Franccois Chollet et al. 2015. Keras. https://github.com/fchollet/keras. (2015).Google ScholarGoogle Scholar
  9. D Manning Christopher, Raghavan Prabhakar, and SCHÜTZE Hinrich. 2008. Introduction to information retrieval. An Introduction To Information Retrieval Vol. 151 (2008), 177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering Proceedings of the 16th international conference on World Wide Web. ACM, 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gianmarco De Francisci Morales, Aristides Gionis, and Claudio Lucchese. 2012. From chatter to headlines: harnessing the real-time web for personalized news recommendation Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nan Du, Yichen Wang, Niao He, Jimeng Sun, and Le Song. 2015. Time-sensitive recommendation from recurrent user activities Advances in Neural Information Processing Systems. 3492--3500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Claudio Gentile, Shuai Li, and Giovanni Zappella. 2014. Online Clustering of Bandits.. In ICML. 757--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Google. 2017. Google News. https://news.google.com/. (2017).Google ScholarGoogle Scholar
  15. Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for information retrieval: SIGIR 2016 tutorial Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1215--1218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing historical interaction data for faster online learning to rank for IR Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 183--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joseph G Ibrahim, Ming-Hui Chen, and Debajyoti Sinha. 2005. Bayesian survival analysis. Wiley Online Library.Google ScholarGoogle Scholar
  18. Wouter IJntema, Frank Goossen, Flavius Frasincar, and Frederik Hogenboom. 2010. Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops. ACM, 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. How Jing and Alexander J Smola. 2017. Neural survival recommender. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 515--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jaya Kawale, Hung H Bui, Branislav Kveton, Long Tran-Thanh, and Sanjay Chawla. 2015. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation Advances in Neural Information Processing Systems. 1297--1305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Michal Kompan and Mária Bieliková. 2010. Content-Based News Recommendation.. In EC-Web, Vol. Vol. 61. Springer, 61--72.Google ScholarGoogle Scholar
  22. Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation Proceedings of the 19th international conference on World wide web. ACM, 661--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011. SCENE: a scalable two-stage personalized news recommendation system Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 125--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhongqi Lu and Qiang Yang. 2016. Partially Observable Markov Decision Process for Recommender Systems. arXiv preprint arXiv:1608.07793 (2016).Google ScholarGoogle Scholar
  26. Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems Proceedings of the ninth international conference on Electronic commerce. ACM, 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Benjamin Marlin and Richard S Zemel. 2004. The multiple multiplicative factor model for collaborative filtering Proceedings of the twenty-first international conference on Machine learning. ACM, 73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Alexander Novikov Mikhail Trofimov. 2016. tffm: TensorFlow implementation of an arbitrary order Factorization Machine. https://github.com/geffy/tffm. (2016).Google ScholarGoogle Scholar
  29. Rupert G Miller Jr. 2011. Survival analysis. Vol. Vol. 66. John Wiley & Sons.Google ScholarGoogle Scholar
  30. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  31. Atsuyoshi Nakamura. 2015. A ucb-like strategy of collaborative filtering. In Asian Conference on Machine Learning. 315--329.Google ScholarGoogle Scholar
  32. Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. 2011. Terms of a feather: Content-based news recommendation and discovery using twitter. Advances in Information Retrieval (2011), 448--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pornthep Rojanavasu, Phaitoon Srinil, and Ouen Pinngern. 2005. New recommendation system using reinforcement learning. Special Issue of the Intl. J. Computer, the Internet and Management, Vol. 13, SP 3 (2005).Google ScholarGoogle Scholar
  35. Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research Vol. 6, Sep (2005), 1265--1295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-based web recommendations: a reinforcement learning approach Proceedings of the 2007 ACM conference on Recommender systems. ACM, 113--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Liang Tang, Yexi Jiang, Lei Li, and Tao Li. 2014. Ensemble contextual bandits for personalized recommendation Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, and Tao Li. 2015. Personalized recommendation via parameter-free contextual bandits Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 323--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. AAAI. 2094--2100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Huazheng Wang, Qingyun Wu, and Hongning Wang. 2016. Learning Hidden Features for Contextual Bandits. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1633--1642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017 a. Factorization Bandits for Interactive Recommendation. AAAI. 2695--2702.Google ScholarGoogle Scholar
  43. Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 11, 1 (2014), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017 b. Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors' Demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051--2059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of ndcg type ranking measures Conference on Learning Theory. 25--54.Google ScholarGoogle Scholar
  46. Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2015. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015).Google ScholarGoogle Scholar
  47. Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems. (2017).Google ScholarGoogle Scholar
  48. Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1201--1208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2025--2034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 1411--1420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 425--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification Proceedings of the 14th international conference on World Wide Web. ACM, 22--32. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DRN: A Deep Reinforcement Learning Framework for News Recommendation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '18: Proceedings of the 2018 World Wide Web Conference
          April 2018
          2000 pages
          ISBN:9781450356398

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          International World Wide Web Conferences Steering Committee

          Republic and Canton of Geneva, Switzerland

          Publication History

          • Published: 23 April 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format