Off-Policy Deep Reinforcement Learning without Exploration, Scott Fujimoto, David Meger, Doina Precup, 2019Proceedings of the 36th International Conference on Machine Learning, Vol. 97 (PMLR)DOI: 10.48550/arXiv.1812.02900 - Introduces Batch-Constrained Q-learning (BCQ), a policy constraint method for offline RL, addressing distributional shift by limiting actions to those well-represented in the dataset.
A Survey of Offline Policy Evaluation in Reinforcement Learning, Shengchao Liu, Fangfang Gao, Zongkai Ding, Jincai Huang, and Bo Yang, 2023IEEE Transactions on Artificial Intelligence, Vol. 4 (IEEE)DOI: 10.1109/TAI.2023.3323091 - A dedicated survey on offline policy evaluation (OPE) methods, detailing various techniques like importance sampling and model-based approaches, and discussing their bias-variance trade-offs.