Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Provides fundamental concepts of reinforcement learning, including off-policy evaluation and the principles of importance sampling.
Doubly Robust Off-Policy Evaluation and Learning, Nan Jiang and Lihong Li, 2016Proceedings of the 33rd International Conference on Machine Learning (ICML), Vol. 48 - Introduces the Doubly Robust estimator, a method combining model-based and importance sampling approaches for off-policy evaluation.