Batch Reinforcement Learning, Sascha Lange, Martin Riedmiller, Alois Knoll, 2012 (Springer Berlin Heidelberg)DOI: 10.1007/978-3-642-27645-3_4 - A book chapter offering an early, detailed explanation of batch reinforcement learning concepts and techniques.
Off-Policy Deep Reinforcement Learning without Exploration, Scott Fujimoto, David Meger, Doina Precup, 2019Proceedings of the 36th International Conference on Machine Learning (ICML), Vol. 97 (PMLR)DOI: 10.5555/3305890.3306013 - Introduces Batch-Constrained Q-learning (BCQ), a seminal algorithm designed to mitigate the issue of distributional shift in offline deep RL.
Conservative Q-Learning for Offline Reinforcement Learning, Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine, 2020Advances in Neural Information Processing Systems (NeurIPS), Vol. 33DOI: 10.48550/arXiv.2006.04779 - Presents Conservative Q-Learning (CQL), a principled method to address overestimation errors from out-of-distribution actions in offline RL by learning a lower-bound Q-function.