Spinning Up in Deep RL, Joshua Achiam and OpenAI, 2018 - A comprehensive practical guide and educational resource covering fundamental deep reinforcement learning algorithms, common pitfalls, and implementation details for effective agent training.
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017arXiv preprintDOI: 10.48550/arXiv.1707.06347 - Introduces the Proximal Policy Optimization (PPO) algorithm, detailing its objective function, clipping mechanism, and practical considerations for stable policy updates, which are relevant for debugging learning plateaus and instability.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel and Sergey Levine, 2018ICML 2018DOI: 10.48550/arXiv.1801.01290 - Presents the Soft Actor-Critic (SAC) algorithm, emphasizing the role of entropy regularization for robust exploration and its off-policy learning framework, relevant for tuning and debugging exploration issues.
Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - The foundational paper introducing Deep Q-Networks (DQN), highlighting key stabilization techniques like experience replay and target networks, which are crucial for understanding and debugging training instability.