Advancing Decision-Making in Dynamic Environments
DOI:
https://doi.org/10.62051/kwdw5923Keywords:
Thompson Sampling algorithm; Opportunistic bandit; Adaptive method.Abstract
This paper introduces the Adaptive Thompson Sampling (AdaTS) algorithm, specifically designed for the challenges posed by opportunistic bandit problems. Unlike conventional multi-armed bandit scenarios where the optimality of actions remains relatively static, opportunistic bandit environments are significantly influenced by fluctuating external conditions. These variations necessitate a highly adaptive strategy for decision-making. The AdaTS algorithm meets this requirement through a novel integration of real-time system load assessments, which dynamically adjust the balance between exploration and exploitation. This method not only ensures more effective adaptation to changing conditions but also considerably enhances performance metrics. By incorporating system load into the decision-making process, AdaTS effectively reduces regret and maximizes reward, particularly under scenarios of variable load intensities. This is particularly evident in environments characterized by binary-valued loads and stochastic reward distributions. The results highlight AdaTS's robustness and efficiency, making it a promising approach for complex environments where adaptability is crucial. This research marks a significant advancement in the field of adaptive algorithms for opportunistic bandit problems.
Downloads
References
H. Wu, X. Guo, and X. Liu, “Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits,”.
Chapelle and L. Li, “An Empirical Evaluation of Thompson Sampling,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2011. Accessed: Apr. 19, 2024. [Online]. Available: https:// proceedings. neurips.cc/paper/2011/hash/e53a0a2978c28872a4505bdb51db06dc-Abstract.html.
S. Agrawal and N. Goyal, “Analysis of Thompson Sampling for the Multi-armed Bandit Problem,” in Proceedings of the 25th Annual Conference on Learning Theory, JMLR Workshop and Conference Proceedings, Jun. 2012, p. 39.1-39.26. Accessed: Apr. 19, 2024. [Online]. Available: https://proceedings.mlr.press/v23/agrawal12.html.
Gopalan, S. Mannor, and Y. Mansour, “Thompson Sampling for Complex Online Problems,” in Proceedings of the 31st International Conference on Machine Learning, PMLR, Jan. 2014, pp. 100–108. Accessed: Apr. 19, 2024. [Online]. Available: https://proceedings.mlr.press/v32/gopalan14.html.
D. J. Russo, B. V. Roy, A. Kazerouni, I. Osband, and Z. Wen, “A Tutorial on Thompson Sampling,” MAL, vol. 11, no. 1, pp. 1–96, Jul. 2018, doi: 10.1561/2200000070.
F. Trovo, S. Paladino, M. Restelli, and N. Gatti, “Sliding-Window Thompson Sampling for Non-Stationary Settings,” Journal of Artificial Intelligence Research, vol. 68, pp. 311–364, May 2020, doi: 10.1613/jair.1.11407.
T. Lattimore and C. Szepesvári, Bandit Algorithms, 1st ed. Cambridge University Press, 2020. doi: 10.1017/9781108571401.
Q. Zhu and V. Tan, “Thompson Sampling Algorithms for Mean-Variance Bandits,” in Proceedings of the 37th International Conference on Machine Learning, PMLR, Nov. 2020, pp. 11599–11608. Accessed: Apr. 11, 2024. [Online]. Available: https://proceedings.mlr.press/v119/zhu20d.html
E. Cavenaghi, G. Sottocornola, F. Stella, and M. Zanker, “Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm,” Entropy, vol. 23, no. 3, p. 380, Mar. 2021, doi: 10.3390/ e23030380.
B. Kveton et al., “Meta-Thompson Sampling,” in Proceedings of the 38th International Conference on Machine Learning, PMLR, Jul. 2021, pp. 5884–5893. Accessed: Apr. 11, 2024. [Online]. Available: https://proceedings. mlr.press/v139/kveton21a.html.
Y. Liu, B. V. Roy, and K. Xu, “Nonstationary Bandit Learning via Predictive Sampling,” in Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR, Apr. 2023, pp. 6215–6244. Accessed: Apr. 11, 2024. [Online]. Available: https://proceedings.mlr.press/v206/liu23e.html.
T. Jin, X. Yang, X. Xiao, and P. Xu, “Thompson Sampling with Less Exploration is Fast and Optimal,” in Proceedings of the 40th International Conference on Machine Learning, PMLR, Jul. 2023, pp. 15239–15261. Accessed: Apr. 11, 2024. [Online]. Available: https://proceedings.mlr.press/v202/jin23b.html.
R. Xu, Y. Min, and T. Wang, “Noise-Adaptive Thompson Sampling for Linear Contextual Bandits,” Advances in Neural Information Processing Systems, vol. 36, pp. 23630–23657, Dec. 2023.
Downloads
Published
Conference Proceedings Volume
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







