Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A standard textbook for reinforcement learning, offering a thorough explanation of policy gradient methods, variance reduction through baselines, and actor-critic algorithms.
Actor-Critic Algorithms, Vijay R. Konda and John N. Tsitsiklis, 2000Advances in Neural Information Processing Systems, Vol. 12 (The MIT Press) - A paper that introduced and formalized the Actor-Critic framework, illustrating how a learned value function (critic) can serve as an effective baseline for policy gradient methods.
Spinning Up in Deep RL, Joshua Achiam, 2018-2023 (OpenAI) - An accessible online resource from OpenAI that provides practical explanations and implementations of policy gradient methods, including the application of baselines and advantage estimation in deep reinforcement learning.
High-Dimensional Continuous Control Using Generalized Advantage Estimation, John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel, 2016International Conference on Learning Representations (ICLR 2016) (OpenReview)DOI: 10.48550/arXiv.1506.02438 - This paper introduces Generalized Advantage Estimation (GAE), a widely adopted technique for reducing variance in policy gradient methods by constructing improved advantage estimates, a direct application of baselines.