Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A comprehensive textbook that covers the fundamental concepts of reinforcement learning, including policy gradients, actor-critic methods, TD error, Monte Carlo methods, and value function estimation, providing the theoretical background for GAE.
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017arXiv preprint arXiv:1707.06347DOI: 10.48550/arXiv.1707.06347 - Presents the Proximal Policy Optimization (PPO) algorithm, a widely used actor-critic method that effectively utilizes Generalized Advantage Estimation (GAE) for stable and efficient policy updates.