DRN: A Deep Reinforcement Learning Framework for News Recommendation

Authors:
Guanjie Zheng

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Fuzheng Zhang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Zihan Zheng

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Yang Xiang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Nicholas Jing Yuan

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Xing Xie

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Zhenhui Li

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018Pages 167–176https://doi.org/10.1145/3178876.3185994

Published:23 April 2018Publication History

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 167–176

ABSTRACT

In this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation. Online personalized news recommendation is a highly challenging problem due to the dynamic nature of news features and user preferences. Although some online recommendation models have been proposed to address the dynamic nature of news recommendation, these methods have three major issues. First, they only try to model current reward (e.g., Click Through Rate). Second, very few studies consider to use user feedback other than click / no click labels (e.g., how frequent user returns) to help improve recommendation. Third, these methods tend to keep recommending similar news to users, which may cause users to get bored. Therefore, to address the aforementioned challenges, we propose a Deep Q-Learning based recommendation framework, which can model future reward explicitly. We further consider user return pattern as a supplement to click / no click label in order to capture more user feedback information. In addition, an effective exploration strategy is incorporated to find new attractive news for users. Extensive experiments are conducted on the offline dataset and online production environment of a commercial news recommendation application and have shown the superior performance of our methods.

References

Gediminas Adomavicius and YoungOk Kwon. 2012. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 5 (2012), 896--911. Google ScholarDigital Library
Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering, Vol. 17, 6 (2005), 734--749. Google ScholarDigital Library
Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledge-based systems Vol. 46 (2013), 109--132. Google ScholarDigital Library
Djallel Bouneffouf, Amel Bouzeghoub, and Alda Ganccarski. 2012. A contextual-bandit algorithm for mobile context-aware recommender system Neural Information Processing. Springer, 324--331. Google ScholarDigital Library
Nicolo Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. 2013. A gang of bandits Advances in Neural Information Processing Systems. 737--745. Google ScholarDigital Library
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249--2257. Google ScholarDigital Library
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10. Google ScholarDigital Library
Franccois Chollet et al. 2015. Keras. https://github.com/fchollet/keras. (2015).Google Scholar
D Manning Christopher, Raghavan Prabhakar, and SCHÜTZE Hinrich. 2008. Introduction to information retrieval. An Introduction To Information Retrieval Vol. 151 (2008), 177. Google ScholarDigital Library
Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering Proceedings of the 16th international conference on World Wide Web. ACM, 271--280. Google ScholarDigital Library
Gianmarco De Francisci Morales, Aristides Gionis, and Claudio Lucchese. 2012. From chatter to headlines: harnessing the real-time web for personalized news recommendation Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 153--162. Google ScholarDigital Library
Nan Du, Yichen Wang, Niao He, Jimeng Sun, and Le Song. 2015. Time-sensitive recommendation from recurrent user activities Advances in Neural Information Processing Systems. 3492--3500. Google ScholarDigital Library
Claudio Gentile, Shuai Li, and Giovanni Zappella. 2014. Online Clustering of Bandits.. In ICML. 757--765. Google ScholarDigital Library
Google. 2017. Google News. https://news.google.com/. (2017).Google Scholar
Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for information retrieval: SIGIR 2016 tutorial Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1215--1218. Google ScholarDigital Library
Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing historical interaction data for faster online learning to rank for IR Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 183--192. Google ScholarDigital Library
Joseph G Ibrahim, Ming-Hui Chen, and Debajyoti Sinha. 2005. Bayesian survival analysis. Wiley Online Library.Google Scholar
Wouter IJntema, Frank Goossen, Flavius Frasincar, and Frederik Hogenboom. 2010. Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops. ACM, 16. Google ScholarDigital Library
How Jing and Alexander J Smola. 2017. Neural survival recommender. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 515--524. Google ScholarDigital Library
Jaya Kawale, Hung H Bui, Branislav Kveton, Long Tran-Thanh, and Sanjay Chawla. 2015. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation Advances in Neural Information Processing Systems. 1297--1305. Google ScholarDigital Library
Michal Kompan and Mária Bieliková. 2010. Content-Based News Recommendation.. In EC-Web, Vol. Vol. 61. Springer, 61--72.Google Scholar
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation Proceedings of the 19th international conference on World wide web. ACM, 661--670. Google ScholarDigital Library
Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011. SCENE: a scalable two-stage personalized news recommendation system Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 125--134. Google ScholarDigital Library
Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 31--40. Google ScholarDigital Library
Zhongqi Lu and Qiang Yang. 2016. Partially Observable Markov Decision Process for Recommender Systems. arXiv preprint arXiv:1608.07793 (2016).Google Scholar
Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems Proceedings of the ninth international conference on Electronic commerce. ACM, 75--84. Google ScholarDigital Library
Benjamin Marlin and Richard S Zemel. 2004. The multiple multiplicative factor model for collaborative filtering Proceedings of the twenty-first international conference on Machine learning. ACM, 73. Google ScholarDigital Library
Alexander Novikov Mikhail Trofimov. 2016. tffm: TensorFlow implementation of an arbitrary order Factorization Machine. https://github.com/geffy/tffm. (2016).Google Scholar
Rupert G Miller Jr. 2011. Survival analysis. Vol. Vol. 66. John Wiley & Sons.Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.Google Scholar
Atsuyoshi Nakamura. 2015. A ucb-like strategy of collaborative filtering. In Asian Conference on Machine Learning. 315--329.Google Scholar
Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. 2011. Terms of a feather: Content-based news recommendation and discovery using twitter. Advances in Information Retrieval (2011), 448--459. Google ScholarDigital Library
Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000. Google ScholarDigital Library
Pornthep Rojanavasu, Phaitoon Srinil, and Ouen Pinngern. 2005. New recommendation system using reinforcement learning. Special Issue of the Intl. J. Computer, the Internet and Management, Vol. 13, SP 3 (2005).Google Scholar
Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research Vol. 6, Sep (2005), 1265--1295. Google ScholarDigital Library
Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-based web recommendations: a reinforcement learning approach Proceedings of the 2007 ACM conference on Recommender systems. ACM, 113--120. Google ScholarDigital Library
Liang Tang, Yexi Jiang, Lei Li, and Tao Li. 2014. Ensemble contextual bandits for personalized recommendation Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 73--80. Google ScholarDigital Library
Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, and Tao Li. 2015. Personalized recommendation via parameter-free contextual bandits Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 323--332. Google ScholarDigital Library
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. AAAI. 2094--2100. Google ScholarDigital Library
Huazheng Wang, Qingyun Wu, and Hongning Wang. 2016. Learning Hidden Features for Contextual Bandits. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1633--1642. Google ScholarDigital Library
Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017 a. Factorization Bandits for Interactive Recommendation. AAAI. 2695--2702.Google Scholar
Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 11, 1 (2014), 7. Google ScholarDigital Library
Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017 b. Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors' Demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051--2059. Google ScholarDigital Library
Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of ndcg type ranking measures Conference on Learning Theory. 25--54.Google Scholar
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2015. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015).Google Scholar
Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems. (2017).Google Scholar
Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1201--1208. Google ScholarDigital Library
Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2025--2034. Google ScholarDigital Library
Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 1411--1420. Google ScholarDigital Library
Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 425--434. Google ScholarDigital Library
Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification Proceedings of the 14th international conference on World Wide Web. ACM, 22--32. Google ScholarDigital Library

Index Terms

DRN: A Deep Reinforcement Learning Framework for News Recommendation
1. Information systems
  1. World Wide Web
    1. Web searching and information discovery

Recommendations

News Session-Based Recommendations using Deep Neural Networks
DLRS 2018: Proceedings of the 3rd Workshop on Deep Learning for Recommender Systems

News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling,...
Read More
Personalized Chit-Chat Generation for Recommendation Using External Chat Corpora
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Chit-chat has been shown effective in engaging users in human-computer interaction. We find with a user study that generating appropriate chit-chat for news articles can help expand user interest and increase the probability that a user reads a ...
Read More
CHAMELEON: a deep learning meta-architecture for news recommender systems
RecSys '18: Proceedings of the 12th ACM Conference on Recommender Systems

News recommender systems are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 23 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep Q-Learning
news recommendation
reinforcement learning
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 393
  Total Citations
  View Citations
- 17,617
  Total Downloads
- Downloads (Last 12 months)4,388
- Downloads (Last 6 weeks)683
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

DRN: A Deep Reinforcement Learning Framework for News Recommendation

WWW '18: Proceedings of the 2018 World Wide Web Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

News Session-Based Recommendations using Deep Neural Networks

Personalized Chit-Chat Generation for Recommendation Using External Chat Corpora

CHAMELEON: a deep learning meta-architecture for news recommender systems