Navigating Complexity in Collaborative Environments through Innovations in Multi-Agent Multi-Armed Bandit Algorithms
DOI:
https://doi.org/10.62051/fynwzq88Keywords:
Multi-Agent Multi-Armed Bandit; Communication; Reward; Fairness.Abstract
As Multi-Armed Bandit (MAB) applications grow increasingly complex, particularly when multiple agents collaborate or compete, traditional bandit algorithms face fresh challenges, underscoring the rising importance of research in multi-agent multi-armed bandits (MAMAB). Developments in MAMAB algorithms have spurred significant advances across a variety of fields, addressing challenges in dynamic and uncertain environments. This paper offers an exhaustive review of recent progress in MAMAB algorithms, emphasizing major strides in enhancing cooperative decision-making and operational efficiency. Our focus is particularly on the contributions of Filippo Vannella et al., who have explored sample complexity within the MAMAB framework. Their research signals a shift towards optimizing global actions by minimizing sample complexity and harnesses mean field techniques in contexts such as optimizing wireless networks. Additionally, this paper addresses communication complexity, a crucial aspect of MAMAB systems, where numerous novel algorithms have been developed. These algorithms strike a balance between performance and communication overhead, diminishing the need for frequent and costly interactions among agents. In application terms, the incorporation of MAMAB algorithms in sectors like clinical trials and wireless network spectrum management showcases their potential to revolutionize conventional approaches. Through a detailed examination of current research trends and prospective future directions, this article contributes to the broader discourse on harnessing MAMAB algorithms to navigate the complexities of collaborative environments effectively.
Downloads
References
Robbins, H. (1952). Some aspects of the sequential design of experiments.
Gai, Y., Krishnamachari, B., & Jain, R. (2010, April). Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation. In 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN) (pp. 1-9). IEEE.
Vannella, F., Proutiere, A., & Jeong, J. (2023, July). Best arm identification in multi-agent multi-armed bandits. In International Conference on Machine Learning (pp. 34875-34907). PMLR.
Pankayaraj, P., & Maithripala, D. H. S. (2020, May). A Decentralized Communication Policy for Multi Agent Multi Armed Bandit Problems. In 2020 European Control Conference (ECC) (pp. 356-361). IEEE.
Agarwal, M., Aggarwal, V., & Azizzadenesheli, K. (2022). Multi-agent multi-armed bandits with limited communication. Journal of Machine Learning Research, 23(212), 1-24.
Madhushani, U., Dubey, A., Leonard, N., & Pentland, A. (2021). One more step towards reality: Cooperative bandits with imperfect communication. Advances in Neural Information Processing Systems, 34, 7813-7824.
Chen, Y. Z. J., Yang, L., Wang, X., Liu, X., Hajiesmaili, M., Lui, J. C., & Towsley, D. (2023, April). On-demand communication for asynchronous multi-agent bandits. In International Conference on Artificial Intelligence and Statistics (pp. 3903-3930). PMLR.
Zhu, J., Sandhu, R., & Liu, J. (2020, December). A distributed algorithm for sequential decision making in multi-armed bandit with homogeneous rewards. In 2020 59th IEEE Conference on Decision and Control (CDC) (pp. 3078-3083). IEEE.
Xu, M., & Klabjan, D. (2024). Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards. Advances in Neural Information Processing Systems, 36.
Zhu, X., Huang, Y., Wang, X., & Wang, R. (2023). Emotion recognition based on brain-like multimodal hierarchical perception. Multimedia Tools and Applications, 1-19.
Dubey, A. (2020, November). Cooperative multi-agent bandits with heavy tails. In International conference on machine learning (pp. 2730-2739). PMLR.
Tossou, A. C., Dimitrakakis, C., Rzepecki, J., & Hofmann, K. (2020, May). A novel individually rational objective in multi-agent multi-armed bandits: Algorithms and regret bounds. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (pp. 1395-1403).
Hossain, S., Micha, E., & Shah, N. (2021). Fair algorithms for multi-agent multi-armed bandits. Advances in Neural Information Processing Systems, 34, 24005-24017.
Wang, X., Ye, J., & Lui, J. C. (2022, May). Decentralized task offloading in edge computing: A multi-user multi-armed bandit approach. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications (pp. 1199-1208). IEEE.
Jones, M., Nguyen, H., & Nguyen, T. (2023, June). An efficient algorithm for fair multi-agent multi-armed bandit with low regret. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 7, pp. 8159-8167).
Downloads
Published
Conference Proceedings Volume
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.