skip to main content
10.1145/3534678.3539312acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Neural Bandit with Arm Group Graph

Authors Info & Claims
Published:14 August 2022Publication History

ABSTRACT

Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based on their contextual information. Motivated by the fact that the arms usually exhibit group behaviors and the mutual impacts exist among groups, we introduce a new model, Arm Group Graph (AGG), where the nodes represent the groups of arms and the weighted edges formulate the correlations among groups. To leverage the rich information in AGG, we propose a bandit algorithm, AGG-UCB, where the neural networks are designed to estimate rewards, and we propose to utilize graph neural networks (GNN) to learn the representations of arm groups with correlations. To solve the exploitation-exploration dilemma in bandits, we derive a new upper confidence bound (UCB) built on neural networks (exploitation) for exploration. Furthermore, we prove that AGG-UCB can achieve a near-optimal regret bound with over-parameterized neural networks, and provide the convergence analysis of GNN with fully-connected layers which may be of independent interest. In the end, we conduct extensive experiments against state-of-the-art baselines on multiple public data sets, showing the effectiveness of the proposed algorithm.

References

  1. Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. NeurIPS 24 (2011), 2312--2320.Google ScholarGoogle Scholar
  2. Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. 2019. A convergence theory for deep learning via over-parameterization. In ICML. PMLR, 242--252.Google ScholarGoogle Scholar
  3. Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2--3 (2002), 235--256.Google ScholarGoogle Scholar
  4. Yikun Ban and Jingrui He. 2021. Convolutional neural bandit: Provable algorithm for visual-aware advertising. arXiv preprint arXiv:2107.07438 (2021).Google ScholarGoogle Scholar
  5. Yikun Ban and Jingrui He. 2021. Local clustering in contextual multi-armed bandits. In Proceedings of the Web Conference 2021. 2335--2346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yikun Ban, Jingrui He, and Curtiss B Cook. 2021. Multi-facet Contextual Bandits: A Neural Network Perspective. arXiv preprint arXiv:2106.03039 (2021).Google ScholarGoogle Scholar
  7. Yikun Ban, Yunzhe Qi, Tianxin Wei, and Jingrui He. 2022. Neural Collaborative Filtering Bandits via Meta Learning. ArXiv abs/2201.13395 (2022).Google ScholarGoogle Scholar
  8. Yikun Ban, Yuchen Yan, Arindam Banerjee, and Jingrui He. 2022. EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits. In ICLR.Google ScholarGoogle Scholar
  9. Gilles Blanchard, Gyemin Lee, and Clayton Scott. 2011. Generalizing from several related classification tasks to a new unlabeled sample. In NeurIPS. 2178--2186.Google ScholarGoogle Scholar
  10. Edward Chlebus. 2009. An approximate formula for a partial sum of the divergent p-series. Applied Mathematics Letters 22, 5 (2009), 732--737.Google ScholarGoogle ScholarCross RefCross Ref
  11. Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In AISTATS. 208--214.Google ScholarGoogle Scholar
  12. Aniket Anand Deshmukh, Urun Dogan, and Clay Scott. 2017. Multi-task learning for contextual bandits. In NeurIPS. 4848--4856.Google ScholarGoogle Scholar
  13. Simon Du, Jason Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. 2019. Gradient descent finds global minima of deep neural networks. In ICML. PMLR, 1675--1685.Google ScholarGoogle Scholar
  14. Simon S Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, and Keyulu Xu. 2019. Graph neural tangent kernel: Fusing graph neural networks with graph kernels. arXiv preprint arXiv:1905.13192 (2019).Google ScholarGoogle Scholar
  15. Audrey Durand, Charis Achilleos, Demetris Iacovides, Katerina Strati, Georgios D Mitsis, and Joelle Pineau. 2018. Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis. In Machine learning for healthcare conference. PMLR, 67--82.Google ScholarGoogle Scholar
  16. Dongqi Fu and Jingrui He. 2021. DPPIN: A biological repository of dynamic protein-protein interaction network data. arXiv preprint arXiv:2107.02168 (2021).Google ScholarGoogle Scholar
  17. Dongqi Fu and Jingrui He. 2021. SDG: A Simplified and Dynamic Graph Neural Network. In SIGIR '21. 2273--2277.Google ScholarGoogle Scholar
  18. Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Giovanni Zappella, and Evans Etrue. 2017. On context-dependent clustering of bandits. In ICML. 1253--1262.Google ScholarGoogle Scholar
  19. Claudio Gentile, Shuai Li, and Giovanni Zappella. 2014. Online clustering of bandits. In ICML. 757--765.Google ScholarGoogle Scholar
  20. William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216 (2017).Google ScholarGoogle Scholar
  21. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google ScholarGoogle Scholar
  22. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. 173--182.Google ScholarGoogle Scholar
  23. NT Hoang, Takanori Maehara, and Tsuyoshi Murata. 2021. Revisiting Graph Neural Networks: Graph Filtering Perspective. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 8376--8383.Google ScholarGoogle Scholar
  24. Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent kernel: Convergence and generalization in neural networks. arXiv preprint arXiv:1806.07572 (2018).Google ScholarGoogle Scholar
  25. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  26. Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997 (2018).Google ScholarGoogle Scholar
  27. Andreas Krause and Cheng Soon Ong. 2011. Contextual Gaussian Process Bandit Optimization.. In NeurIPS. 2447--2455.Google ScholarGoogle Scholar
  28. Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextualbandit approach to personalized news article recommendation. In WWW. 661-- 670.Google ScholarGoogle Scholar
  29. Shuai Li, Wei Chen, Shuai Li, and Kwong-Sak Leung. 2019. Improved Algorithm on Online Clustering of Bandits. In IJCAI. 2923--2929.Google ScholarGoogle Scholar
  30. Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In SIGIR. 539--548.Google ScholarGoogle Scholar
  31. Sandra Sajeev, Jade Huang, Nikos Karampatziakis, Matthew Hall, Sebastian Kochman, and Weizhu Chen. 2021. Contextual Bandit Applications in a Customer Support Bot. In KDD '21. 3522--3530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xin Shao, Ning Lv, Jie Liao, Jinbo Long, Rui Xue, Ni Ai, Donghang Xu, and Xiaohui Fan. 2019. Copy number variation is highly correlated with differential gene expression: a pan-cancer study. BMC medical genetics 20, 1 (2019), 1--14.Google ScholarGoogle Scholar
  33. Sohini Upadhyay, Mikhail Yurochkin, Mayank Agarwal, Yasaman Khazaeni, et al. 2020. Online Semi-Supervised Learning with Bandit Feedback. arXiv preprint arXiv:2010.12574 (2020).Google ScholarGoogle Scholar
  34. Michal Valko, Nathaniel Korda, Rémi Munos, Ilias Flaounas, and Nelo Cristianini. 2013. Finite-time analysis of kernelised contextual bandits. arXiv preprint arXiv:1309.6869 (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sofía S Villar, Jack Bowden, and James Wason. 2015. Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Statistical science: a review journal of the Institute of Mathematical Statistics 30, 2 (2015), 199.Google ScholarGoogle Scholar
  36. Weiran Wang, Raman Arora, Karen Livescu, and Jeff A Bilmes. 2015. Unsupervised learning of acoustic features via deep canonical correlation analysis. In 2015 IEEE ICASSP. IEEE.Google ScholarGoogle Scholar
  37. Tianxin Wei, Ziwei Wu, Ruirui Li, Ziniu Hu, Fuli Feng, Xiangnan He, Yizhou Sun, and Wei Wang. 2020. Fast adaptation for cold-start collaborative filtering with meta-learning. In ICDM. IEEE, 661--670.Google ScholarGoogle Scholar
  38. Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In ICML. PMLR, 6861--6871.Google ScholarGoogle Scholar
  39. Qingyun Wu, Huazheng Wang, Quanquan Gu, and Hongning Wang. 2016. Contextual bandits in a collaborative environment. In SIGIR. 529--538.Google ScholarGoogle Scholar
  40. Qingyun Wu, Huazheng Wang, Yanen Li, and Hongning Wang. 2019. Dynamic Ensemble of Contextual Bandits to Satisfy Users' Changing Interests. In WWW. 2080--2090.Google ScholarGoogle Scholar
  41. Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. 2018. Representation learning on graphs with jumping knowledge networks. In ICML. PMLR, 5453--5462.Google ScholarGoogle Scholar
  42. Keyulu Xu, Mozhi Zhang, Stefanie Jegelka, and Kenji Kawaguchi. 2021. Optimization of graph neural networks: Implicit acceleration by skip connections and more depth. In ICML. PMLR, 11592--11602.Google ScholarGoogle Scholar
  43. Jiaxuan You, Rex Ying, and Jure Leskovec. 2019. Position-aware graph neural networks. In ICML. PMLR, 7134--7143.Google ScholarGoogle Scholar
  44. Weitong Zhang, Dongruo Zhou, Lihong Li, and Quanquan Gu. 2020. Neural thompson sampling. arXiv preprint arXiv:2010.00827 (2020).Google ScholarGoogle Scholar
  45. Dongruo Zhou, Lihong Li, and Quanquan Gu. 2020. Neural Contextual Bandits with UCB-based Exploration. arXiv:1911.04462 [cs.LG]Google ScholarGoogle Scholar
  46. Yao Zhou, Haonan Wang, Jingrui He, and Haixun Wang. 2021. From Intrinsic to Counterfactual: On the Explainability of Contextualized Recommender Systems. ArXiv (2021). arXiv:2110.14844Google ScholarGoogle Scholar
  47. Yao Zhou, Jianpeng Xu, Jun Wu, Zeinab Taghavi Nasrabadi, Evren Körpeoglu, Kannan Achan, and Jingrui He. 2021. PURE: Positive-Unlabeled Recommendation with Generative Adversarial Network. In KDD '21. 2409--2419Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Neural Bandit with Arm Group Graph

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2022
        5033 pages
        ISBN:9781450393850
        DOI:10.1145/3534678

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 August 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader