research-article

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

Authors:
Qihua Zhang

Tencent, Shenzhen, China

Tencent, Shenzhen, China
View Profile

,
Junning Liu

Tencent, Beijing, China

Tencent, Beijing, China
View Profile

,
Yuzhuo Dai

Tencent, Shenzhen, China

Tencent, Shenzhen, China
View Profile

,
Yiyan Qi

Tencent, Shenzhen, China

Tencent, Shenzhen, China
View Profile

,
Yifan Yuan

Tencent, Shenzhen, China

Tencent, Shenzhen, China
View Profile

,
Kunlun Zheng

Tencent, Beijing, China

Tencent, Beijing, China
View Profile

,
Fan Huang

Tencent, Shenzhen, China

Tencent, Shenzhen, China
View Profile

,
Xianfeng Tan

Tencent, Beijing, China

Tencent, Beijing, China
View Profile

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2022Pages 4510–4520https://doi.org/10.1145/3534678.3539040

Published:14 August 2022Publication History

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 4510–4520

ABSTRACT

Recommender System (RS) is an important online application that affects billions of users every day. The mainstream RS ranking framework is composed of two parts: a Multi-Task Learning model (MTL) that predicts various user feedback, i.e., clicks, likes, sharings, and a Multi-Task Fusion model (MTF) that combines the multi-task outputs into one final ranking score with respect to user satisfaction. There has not been much research on the fusion model while it has great impact on the final recommendation as the last crucial process of the ranking. To optimize long-term user satisfaction rather than obtain instant returns greedily, we formulate MTF task as Markov Decision Process (MDP) within a recommendation session and propose a Batch Reinforcement Learning (RL) based Multi-Task Fusion framework (BatchRL-MTF) that includes a Batch RL framework and an online exploration. The former exploits Batch RL to learn an optimal recommendation policy from the fixed batch data offline for long-term user satisfaction, while the latter explores potential high-value actions online to break through the local optimal dilemma. With a comprehensive investigation on user behaviors, we model the user satisfaction reward with subtle heuristics from two aspects of user stickiness and user activeness. Finally, we conduct extensive experiments on a billion-sample level real-world dataset to show the effectiveness of our model. We propose a conservative offline policy estimator (Conservative-OPEstimator) to test our model offline. Furthermore, we take online experiments in a real recommendation environment to compare performance of different models. As one of few Batch RL researches applied in MTF task successfully, our model has also been deployed on a large-scale industrial short video platform, serving hundreds of millions of users.

Supplemental Material

KDD22-apfp0347.mp4

mp4

9.3 MB

Download

References

M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. arXiv preprint arXiv:2101.06286 (2021).Google Scholar
Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies--a comprehensive introduction. Natural computing, Vol. 1, 1 (2002), 3--52.Google Scholar
David Carmel, Elad Haramaty, Arnon Lazerson, and Liane Lewin-Eytan. 2020. Multi-Objective Ranking Optimization for Product Search Using Stochastic Label Aggregation. In WWW 2020. 373--383.Google Scholar
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. 191--198.Google Scholar
Stephen Dankwa and Wenfeng Zheng. 2019. Twin-delayed DDPG: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent. In ICVISP. 1--5.Google Scholar
Anlei Dong, Yi Chang, Zhaohui Zheng, Gilad Mishne, Jing Bai, Ruiqiang Zhang, Karolina Buchner, Ciya Liao, and Fernando Diaz. 2010. Towards recency ranking in web search. In WSDM. 11--20.Google Scholar
Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).Google Scholar
Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In ICML. PMLR, 1587--1596.Google Scholar
Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In ICML. PMLR, 2052--2062.Google Scholar
Bruno G Galuzzi, Ilaria Giordani, Antonio Candelieri, Riccardo Perego, and Francesco Archetti. 2020. Hyperparameter optimization for recommender systems through Bayesian optimization. CMS, Vol. 17, 4 (2020), 495--515.Google ScholarCross Ref
Yulong Gu, Zhuoye Ding, Shuaiqiang Wang, and Dawei Yin. 2020 a. Hierarchical User Profiling for E-commerce Recommender Systems. In WSDM. 223--231.Google Scholar
Yulong Gu, Zhuoye Ding, Shuaiqiang Wang, Lixin Zou, Yiding Liu, and Dawei Yin. 2020 b. Deep Multifaceted Transformers for Multi-objective Ranking in Large-Scale E-commerce Recommender Systems. In CIKM. 2493--2500.Google Scholar
Yulong Gu, Jiaxing Song, Weidong Liu, and Lixin Zou. 2016. HLGPS: a home location global positioning system in location-based social networks. In ICDM. IEEE, 901--906.Google Scholar
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870.Google Scholar
Jianhua Han, Yong Yu, Feng Liu, Ruiming Tang, and Yuzhou Zhang. 2019. Optimizing Ranking Algorithm in Recommender System via Deep Reinforcement Learning. In AIAM. IEEE, 22--26.Google Scholar
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting clicks on ads at facebook. In ADKDD. 1--9.Google Scholar
Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning. arXiv preprint arXiv:2006.04779 (2020).Google Scholar
Hoang Le, Cameron Voloshin, and Yisong Yue. 2019. Batch policy learning under constraints. In International Conference on Machine Learning. PMLR, 3703--3712.Google Scholar
Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, Vol. 7, 1 (2003), 76--80.Google Scholar
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In SIGKDD. 1930--1939.Google Scholar
Melanie Mitchell. 1998. An introduction to genetic algorithms .MIT press.Google Scholar
Jonas Movc kus. 1975. On Bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference. Springer, 400--404.Google Scholar
Wentao Ouyang, Xiuwu Zhang, Li Li, Heng Zou, Xin Xing, Zhaojie Liu, and Yanlong Du. 2019. Deep spatio-temporal neural networks for click-through rate prediction. In SIGKDD. 2078--2086.Google Scholar
Changhua Pei, Xinru Yang, Qing Cui, Xiao Lin, Fei Sun, Peng Jiang, Wenwu Ou, and Yongfeng Zhang. 2019. Value-aware recommendation based on reinforced profit maximization in e-commerce systems. arXiv preprint arXiv:1902.00851 (2019).Google Scholar
Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on long sequential user behavior modeling for click-through rate prediction. In SIGKDD. 2671--2679.Google Scholar
Marco Tulio Ribeiro, Nivio Ziviani, Edleno Silva De Moura, Itamar Hata, Anisio Lacerda, and Adriano Veloso. 2014. Multiobjective pareto-efficient approaches for recommender systems. TIST, Vol. 5, 4 (2014), 1--20.Google ScholarDigital Library
Mario Rodriguez, Christian Posse, and Ethan Zhang. 2012. Multiple objective optimization in recommender systems. In RecSys. 11--18.Google Scholar
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In ICML. PMLR, 387--395.Google Scholar
Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In RecSys. 269--278.Google Scholar
Sebastian Thrun and Michael L Littman. 2000. Reinforcement learning: an introduction. AI Magazine, Vol. 21, 1 (2000), 103--103.Google ScholarDigital Library
Cameron Voloshin, Hoang M Le, Nan Jiang, and Yisong Yue. 2019. Empirical study of off-policy policy evaluation for reinforcement learning. arXiv preprint arXiv:1911.06854 (2019).Google Scholar
Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In SIGKDD. 2447--2456.Google Scholar
Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, and Hanlin Goh. 2021. Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning. arXiv preprint arXiv:2105.08140 (2021).Google Scholar
Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In RecSys. 95--103.Google Scholar
Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending what video to watch next: a multitask ranking system. In RecSys. 43--51.Google ScholarDigital Library
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In AAAI, Vol. 33. 5941--5948.Google ScholarDigital Library
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In SIGKDD. 1059--1068.Google Scholar
Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement learning to optimize long-term user engagement in recommender systems. In SIGKDD. 2810--2818.Google Scholar

Index Terms

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
      2. Reinforcement learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Recommender systems play a crucial role in our daily lives. Feed streaming mechanism has been widely used in the recommender system, especially on the mobile Apps. The feed streaming setting provides users the interactive manner of recommendation in ...
Read More
A deep reinforcement learning based long-term recommender system
Abstract
Recommender systems aim to maximize the overall accuracy for long-term recommendations. However, most of the existing recommendation models adopt a static view, and ignore the fact that recommendation is a dynamic sequential decision-...
Highlights
- A novel top-N interactive recommender system based on deep reinforcement learning is proposed.
Read More
User Personality and User Satisfaction with Recommender Systems

In this study, we show that individual users' preferences for the level of diversity, popularity, and serendipity in recommendation lists cannot be inferred from their ratings alone. We demonstrate that we can extract strong signals about individual ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
batch reinforcement learning
long-term user satisfaction
multi-task fusion
recommender system
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 547
  Total Downloads
- Downloads (Last 12 months)191
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

A deep reinforcement learning based long-term recommender system

User Personality and User Satisfaction with Recommender Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

A deep reinforcement learning based long-term recommender system

User Personality and User Satisfaction with Recommender Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media