All Courses

Introduction to Reinforcement Learning

Chapter 1: Foundations of Reinforcement Learning

What is Reinforcement Learning?

Agents and Environments

States, Actions, and Rewards

Policies: Mapping States to Actions

The RL Workflow: Interaction Loops

Types of RL Tasks: Episodic vs Continuing

Comparing RL with Other Learning Types

Setting up Your Python Environment for RL

Quiz for Chapter 1

Chapter 2: Markov Decision Processes (MDPs)

Modeling Sequential Decision Making

Formal Definition of an MDP

State Transition Probabilities

Reward Functions

Return: Cumulative Future Rewards

Discounting Future Rewards

Policies and Value Functions (Vπ, Qπ)

Finding Optimal Policies

Quiz for Chapter 2

Chapter 3: Estimating Value Functions

The Bellman Expectation Equation

The Bellman Optimality Equation

Solving Bellman Equations (Overview)

Dynamic Programming: Policy Iteration

Dynamic Programming: Value Iteration

Limitations of Dynamic Programming

Quiz for Chapter 3

Chapter 4: Monte Carlo Methods

Learning from Complete Episodes

Monte Carlo Prediction: Estimating Vπ

Monte Carlo Control: Estimating Qπ

On-Policy vs Off-Policy Learning

MC Control without Exploring Starts

On-Policy First-Visit MC Control Implementation

Off-Policy MC Prediction and Control Intro

Practice: Implementing MC Prediction

Quiz for Chapter 4

Chapter 5: Temporal-Difference Learning

Learning from Incomplete Episodes

TD(0) Prediction: Estimating Vπ

Advantages of TD Learning over MC

SARSA: On-Policy TD Control

Q-Learning: Off-Policy TD Control

Comparing SARSA and Q-Learning

Hands-on Practical: Implementing Q-Learning

Quiz for Chapter 5

Chapter 6: Function Approximation in RL

Handling Large State Spaces

Value Function Approximation (VFA)

Feature Vectors for State Representation

Linear Methods for VFA

Gradient Descent for Parameter Learning

Semi-gradient TD Methods

Using Neural Networks for VFA

Practice: Applying Linear VFA

Quiz for Chapter 6

Chapter 7: Introduction to Deep Q-Networks (DQN)

Combining Q-Learning with Deep Learning

Challenges with Neural Networks in RL

Experience Replay Mechanism

Fixed Q-Targets (Target Networks)

The DQN Algorithm Structure

Architectural Considerations for DQNs

Hands-on Practical: Building a Basic DQN

Quiz for Chapter 7

Chapter 8: Introduction to Policy Gradient Methods

Learning Policies Directly

Policy Gradient Theorem (Concept)

REINFORCE Algorithm

Baselines for Variance Reduction

Actor-Critic Methods Overview

Comparing Value-Based and Policy-Based Methods

Practice: Implementing REINFORCE

Quiz for Chapter 8

Baselines for Variance Reduction

Was this section helpful?

References

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A standard textbook for reinforcement learning, offering a thorough explanation of policy gradient methods, variance reduction through baselines, and actor-critic algorithms.
Actor-Critic Algorithms, Vijay R. Konda and John N. Tsitsiklis, 2000 Advances in Neural Information Processing Systems, Vol. 12 (The MIT Press) - A paper that introduced and formalized the Actor-Critic framework, illustrating how a learned value function (critic) can serve as an effective baseline for policy gradient methods.
Spinning Up in Deep RL, Joshua Achiam, 2018-2023 (OpenAI) - An accessible online resource from OpenAI that provides practical explanations and implementations of policy gradient methods, including the application of baselines and advantage estimation in deep reinforcement learning.
High-Dimensional Continuous Control Using Generalized Advantage Estimation, John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel, 2016 International Conference on Learning Representations (ICLR 2016) (OpenReview) DOI: 10.48550/arXiv.1506.02438 - This paper introduces Generalized Advantage Estimation (GAE), a widely adopted technique for reducing variance in policy gradient methods by constructing improved advantage estimates, a direct application of baselines.

© 2025 ApX Machine LearningEngineered with