Abstract
We consider the problem of multiple users targeting the arms of a single multi-armed stochastic bandit. The motivation for this problem comes from cognitive radio networks, where selfish users need to coexist without any side communication between them, implicit cooperation or common control. Even the number of users may be unknown and can vary as users join or leave the network. We propose an algorithm that combines an ε-greedy learning rule with a collision avoidance mechanism. We analyze its regret with respect to the system-wide optimum and show that sub-linear regret can be obtained in this setting. Experiments show dramatic improvement compared to other algorithms for this setting.
Chapter PDF
Similar content being viewed by others
Keywords
References
Anandkumar, A., Michael, N., Tang, A.K., Swami, A.: Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications 29(4), 731–745 (2011)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235–256 (2002)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)
Avner, O., Mannor, S.: Stochastic bandits with pathwise constraints. In: 50th IEEE Conference on Decision and Control (December 2011)
Avner, O., Mannor, S., Shamir, O.: Decoupling exploration and exploitation in multi-armed bandits. In: International Conference on Machine Learning (2012)
Berry, D.A., Fristedt, B.: Bandit problems: sequential allocation of experiments. Chapman and Hall London (1985)
Choe, S.: Performance analysis of slotted aloha based multi-channel cognitive packet radio network. In: IEEE CCNC (2009)
Even-Dar, E., Mannor, S., Mansour, Y.: PAC bounds for multi-armed bandit and markov decision processes. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, p. 255. Springer, Heidelberg (2002)
Fang, X., Yang, D., Xue, G.: Taming wheel of fortune in the air: An algorithmic framework for channel selection strategy in cognitive radio networks. IEEE Transactions on Vehicular Technology 62(2), 783–796 (2013)
Garivier, A., Cappé, O.: The KL-UCB algorithm for bounded stochastic bandits and beyond. In: Conference on Learning Theory, pp. 359–376 (July 2011)
Jouini, W., Ernst, D., Moy, C., Palicot, J.: Multi-armed bandit based policies for cognitive radio’s decision making issues. In: 2009 3rd International Conference on Signals, Circuits and Systems (SCS), pp. 1–6. IEEE (2010)
Kalathil, D., Nayyar, N., Jain, R.: Decentralized learning for multi-player multi-armed bandits. In: 51st IEEE Conference on Decision and Control (2012)
Lai, L., El Gamal, H., Jiang, H., Poor, V.H.: Cognitive medium access: Exploration, exploitation, and competition. IEEE Transactions on Mobile Computing 10(2), 239–253 (2011)
Leith, D.J., Clifford, P., Badarla, V., Malone, D.: WLAN channel selection without communication. Computer Networks (2012)
Li, X., Liu, H., Roy, S., Zhang, J., Zhang, P., Ghosh, C.: Throughput analysis for a multi-user, multi-channel ALOHA cognitive radio system. IEEE Transactions on Wireless Communications 11(11), 3900–3909 (2012)
Liu, K., Zhao, Q.: Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing 58(11), 5667–5681 (2010)
Maghsudi, S., Stanczak, S.: Channel selection for network-assisted D2D communication via no-regret bandit learning with calibrated forecasting. CoRR, abs/1404.7061 (2014)
McKinney, E.H.: Generalized birthday problem. American Mathematical Monthly, 385–387 (1966)
Mitola, J., Maguire, G.Q.: Cognitive radio: making software radios more personal. IEEE Personal Communications 6(4), 13–18 (1999)
Nie, N., Comaniciu, C.: Adaptive channel allocation spectrum etiquette for cognitive radio networks. Mobile Networks and Applications 11(6), 779–797 (2006)
Niyato, D., Hossain, E.: Competitive spectrum sharing in cognitive radio networks: a dynamic game approach. IEEE Trans. on Wireless Communications 7, 2651–2660 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Avner, O., Mannor, S. (2014). Concurrent Bandits and Cognitive Radio Networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-44848-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44847-2
Online ISBN: 978-3-662-44848-9
eBook Packages: Computer ScienceComputer Science (R0)