All Courses

Introduction to Reinforcement Learning

Chapter 1: Foundations of Reinforcement Learning

What is Reinforcement Learning?

Agents and Environments

States, Actions, and Rewards

Policies: Mapping States to Actions

The RL Workflow: Interaction Loops

Types of RL Tasks: Episodic vs Continuing

Comparing RL with Other Learning Types

Setting up Your Python Environment for RL

Quiz for Chapter 1

Chapter 2: Markov Decision Processes (MDPs)

Modeling Sequential Decision Making

Formal Definition of an MDP

State Transition Probabilities

Reward Functions

Return: Cumulative Future Rewards

Discounting Future Rewards

Policies and Value Functions (Vπ, Qπ)

Finding Optimal Policies

Quiz for Chapter 2

Chapter 3: Estimating Value Functions

The Bellman Expectation Equation

The Bellman Optimality Equation

Solving Bellman Equations (Overview)

Dynamic Programming: Policy Iteration

Dynamic Programming: Value Iteration

Limitations of Dynamic Programming

Quiz for Chapter 3

Chapter 4: Monte Carlo Methods

Learning from Complete Episodes

Monte Carlo Prediction: Estimating Vπ

Monte Carlo Control: Estimating Qπ

On-Policy vs Off-Policy Learning

MC Control without Exploring Starts

On-Policy First-Visit MC Control Implementation

Off-Policy MC Prediction and Control Intro

Practice: Implementing MC Prediction

Quiz for Chapter 4

Chapter 5: Temporal-Difference Learning

Learning from Incomplete Episodes

TD(0) Prediction: Estimating Vπ

Advantages of TD Learning over MC

SARSA: On-Policy TD Control

Q-Learning: Off-Policy TD Control

Comparing SARSA and Q-Learning

Hands-on Practical: Implementing Q-Learning

Quiz for Chapter 5

Chapter 6: Function Approximation in RL

Handling Large State Spaces

Value Function Approximation (VFA)

Feature Vectors for State Representation

Linear Methods for VFA

Gradient Descent for Parameter Learning

Semi-gradient TD Methods

Using Neural Networks for VFA

Practice: Applying Linear VFA

Quiz for Chapter 6

Chapter 7: Introduction to Deep Q-Networks (DQN)

Combining Q-Learning with Deep Learning

Challenges with Neural Networks in RL

Experience Replay Mechanism

Fixed Q-Targets (Target Networks)

The DQN Algorithm Structure

Architectural Considerations for DQNs

Hands-on Practical: Building a Basic DQN

Quiz for Chapter 7

Chapter 8: Introduction to Policy Gradient Methods

Learning Policies Directly

Policy Gradient Theorem (Concept)

REINFORCE Algorithm

Baselines for Variance Reduction

Actor-Critic Methods Overview

Comparing Value-Based and Policy-Based Methods

Practice: Implementing REINFORCE

Quiz for Chapter 8

Q-Learning: Off-Policy TD Control

Was this section helpful?

References

Learning from Delayed Rewards, Christopher John Cornish Hellaby Watkins, 1989 (University of Cambridge) - The original PhD thesis that introduced the Q-learning algorithm, establishing the principles of off-policy temporal-difference control.
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (The MIT Press) - A standard textbook in reinforcement learning, providing a thorough explanation of Q-Learning, its update rule, algorithm, and theoretical foundations.
UCL Course on Reinforcement Learning - Lecture 6: Off-policy Control, David Silver, 2015 (University College London (UCL)) - Lecture slides from a widely respected university course, providing a concise explanation of off-policy control and the Q-Learning algorithm.

© 2025 ApX Machine LearningEngineered with