Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Provides a comprehensive introduction to reinforcement learning, including foundational explanations of off-policy prediction with importance sampling in Chapter 7.
High-Confidence Off-Policy Evaluation, P. S. Thomas, G. Theocharous, M. Ghavamzadeh, 2015Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29(1) (Association for the Advancement of Artificial Intelligence)DOI: 10.1609/aaai.v29i1.9541 - Addresses the significant challenge of high variance in off-policy evaluation, proposing methods to provide confidence intervals for importance sampling estimators.