Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Standard textbook covering foundational concepts of reinforcement learning, including multi-armed bandits and UCB methods.
Bandit Based Monte-Carlo Planning, Levente Kocsis, Csaba Szepesvári, 2006Machine Learning: ECML 2006 (ECML 2006) (Springer-Verlag Berlin Heidelberg)DOI: 10.1007/11871842_29 - Introduces Upper Confidence Bounds for Trees (UCT), applying UCB for efficient search in Monte Carlo Tree Search.