ABSTRACT
In this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation. Online personalized news recommendation is a highly challenging problem due to the dynamic nature of news features and user preferences. Although some online recommendation models have been proposed to address the dynamic nature of news recommendation, these methods have three major issues. First, they only try to model current reward (e.g., Click Through Rate). Second, very few studies consider to use user feedback other than click / no click labels (e.g., how frequent user returns) to help improve recommendation. Third, these methods tend to keep recommending similar news to users, which may cause users to get bored. Therefore, to address the aforementioned challenges, we propose a Deep Q-Learning based recommendation framework, which can model future reward explicitly. We further consider user return pattern as a supplement to click / no click label in order to capture more user feedback information. In addition, an effective exploration strategy is incorporated to find new attractive news for users. Extensive experiments are conducted on the offline dataset and online production environment of a commercial news recommendation application and have shown the superior performance of our methods.
- Gediminas Adomavicius and YoungOk Kwon. 2012. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 5 (2012), 896--911. Google ScholarDigital Library
- Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering, Vol. 17, 6 (2005), 734--749. Google ScholarDigital Library
- Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledge-based systems Vol. 46 (2013), 109--132. Google ScholarDigital Library
- Djallel Bouneffouf, Amel Bouzeghoub, and Alda Ganccarski. 2012. A contextual-bandit algorithm for mobile context-aware recommender system Neural Information Processing. Springer, 324--331. Google ScholarDigital Library
- Nicolo Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. 2013. A gang of bandits Advances in Neural Information Processing Systems. 737--745. Google ScholarDigital Library
- Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249--2257. Google ScholarDigital Library
- Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10. Google ScholarDigital Library
- Franccois Chollet et al. 2015. Keras. https://github.com/fchollet/keras. (2015).Google Scholar
- D Manning Christopher, Raghavan Prabhakar, and SCHÜTZE Hinrich. 2008. Introduction to information retrieval. An Introduction To Information Retrieval Vol. 151 (2008), 177. Google ScholarDigital Library
- Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering Proceedings of the 16th international conference on World Wide Web. ACM, 271--280. Google ScholarDigital Library
- Gianmarco De Francisci Morales, Aristides Gionis, and Claudio Lucchese. 2012. From chatter to headlines: harnessing the real-time web for personalized news recommendation Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 153--162. Google ScholarDigital Library
- Nan Du, Yichen Wang, Niao He, Jimeng Sun, and Le Song. 2015. Time-sensitive recommendation from recurrent user activities Advances in Neural Information Processing Systems. 3492--3500. Google ScholarDigital Library
- Claudio Gentile, Shuai Li, and Giovanni Zappella. 2014. Online Clustering of Bandits.. In ICML. 757--765. Google ScholarDigital Library
- Google. 2017. Google News. https://news.google.com/. (2017).Google Scholar
- Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for information retrieval: SIGIR 2016 tutorial Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1215--1218. Google ScholarDigital Library
- Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing historical interaction data for faster online learning to rank for IR Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 183--192. Google ScholarDigital Library
- Joseph G Ibrahim, Ming-Hui Chen, and Debajyoti Sinha. 2005. Bayesian survival analysis. Wiley Online Library.Google Scholar
- Wouter IJntema, Frank Goossen, Flavius Frasincar, and Frederik Hogenboom. 2010. Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops. ACM, 16. Google ScholarDigital Library
- How Jing and Alexander J Smola. 2017. Neural survival recommender. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 515--524. Google ScholarDigital Library
- Jaya Kawale, Hung H Bui, Branislav Kveton, Long Tran-Thanh, and Sanjay Chawla. 2015. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation Advances in Neural Information Processing Systems. 1297--1305. Google ScholarDigital Library
- Michal Kompan and Mária Bieliková. 2010. Content-Based News Recommendation.. In EC-Web, Vol. Vol. 61. Springer, 61--72.Google Scholar
- Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation Proceedings of the 19th international conference on World wide web. ACM, 661--670. Google ScholarDigital Library
- Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011. SCENE: a scalable two-stage personalized news recommendation system Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 125--134. Google ScholarDigital Library
- Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 31--40. Google ScholarDigital Library
- Zhongqi Lu and Qiang Yang. 2016. Partially Observable Markov Decision Process for Recommender Systems. arXiv preprint arXiv:1608.07793 (2016).Google Scholar
- Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems Proceedings of the ninth international conference on Electronic commerce. ACM, 75--84. Google ScholarDigital Library
- Benjamin Marlin and Richard S Zemel. 2004. The multiple multiplicative factor model for collaborative filtering Proceedings of the twenty-first international conference on Machine learning. ACM, 73. Google ScholarDigital Library
- Alexander Novikov Mikhail Trofimov. 2016. tffm: TensorFlow implementation of an arbitrary order Factorization Machine. https://github.com/geffy/tffm. (2016).Google Scholar
- Rupert G Miller Jr. 2011. Survival analysis. Vol. Vol. 66. John Wiley & Sons.Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.Google Scholar
- Atsuyoshi Nakamura. 2015. A ucb-like strategy of collaborative filtering. In Asian Conference on Machine Learning. 315--329.Google Scholar
- Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. 2011. Terms of a feather: Content-based news recommendation and discovery using twitter. Advances in Information Retrieval (2011), 448--459. Google ScholarDigital Library
- Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000. Google ScholarDigital Library
- Pornthep Rojanavasu, Phaitoon Srinil, and Ouen Pinngern. 2005. New recommendation system using reinforcement learning. Special Issue of the Intl. J. Computer, the Internet and Management, Vol. 13, SP 3 (2005).Google Scholar
- Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research Vol. 6, Sep (2005), 1265--1295. Google ScholarDigital Library
- Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
- Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-based web recommendations: a reinforcement learning approach Proceedings of the 2007 ACM conference on Recommender systems. ACM, 113--120. Google ScholarDigital Library
- Liang Tang, Yexi Jiang, Lei Li, and Tao Li. 2014. Ensemble contextual bandits for personalized recommendation Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 73--80. Google ScholarDigital Library
- Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, and Tao Li. 2015. Personalized recommendation via parameter-free contextual bandits Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 323--332. Google ScholarDigital Library
- Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. AAAI. 2094--2100. Google ScholarDigital Library
- Huazheng Wang, Qingyun Wu, and Hongning Wang. 2016. Learning Hidden Features for Contextual Bandits. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1633--1642. Google ScholarDigital Library
- Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017 a. Factorization Bandits for Interactive Recommendation. AAAI. 2695--2702.Google Scholar
- Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 11, 1 (2014), 7. Google ScholarDigital Library
- Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017 b. Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors' Demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051--2059. Google ScholarDigital Library
- Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of ndcg type ranking measures Conference on Learning Theory. 25--54.Google Scholar
- Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2015. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015).Google Scholar
- Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems. (2017).Google Scholar
- Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1201--1208. Google ScholarDigital Library
- Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2025--2034. Google ScholarDigital Library
- Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 1411--1420. Google ScholarDigital Library
- Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 425--434. Google ScholarDigital Library
- Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification Proceedings of the 14th international conference on World Wide Web. ACM, 22--32. Google ScholarDigital Library
Index Terms
- DRN: A Deep Reinforcement Learning Framework for News Recommendation
Recommendations
News Session-Based Recommendations using Deep Neural Networks
DLRS 2018: Proceedings of the 3rd Workshop on Deep Learning for Recommender SystemsNews recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling,...
Personalized Chit-Chat Generation for Recommendation Using External Chat Corpora
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningChit-chat has been shown effective in engaging users in human-computer interaction. We find with a user study that generating appropriate chit-chat for news articles can help expand user interest and increase the probability that a user reads a ...
CHAMELEON: a deep learning meta-architecture for news recommender systems
RecSys '18: Proceedings of the 12th ACM Conference on Recommender SystemsNews recommender systems are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, ...
Comments