Optimism in the Face of Uncertainty: UCB Methods

Was this section helpful?

References

Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer, 2002 Machine Learning, Vol. 47 (Springer Nature B.V.) DOI: 10.1023/A:1013689704352 - Provides the original analysis and algorithm for UCB1 in multi-armed bandit problems.
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Standard textbook covering foundational concepts of reinforcement learning, including multi-armed bandits and UCB methods.
Bandit Based Monte-Carlo Planning, Levente Kocsis, Csaba Szepesvári, 2006 Machine Learning: ECML 2006 (ECML 2006) (Springer-Verlag Berlin Heidelberg) DOI: 10.1007/11871842_29 - Introduces Upper Confidence Bounds for Trees (UCT), applying UCB for efficient search in Monte Carlo Tree Search.