All Courses

Introduction to Reinforcement Learning

Chapter 1: Foundations of Reinforcement Learning

What is Reinforcement Learning?

Agents and Environments

States, Actions, and Rewards

Policies: Mapping States to Actions

The RL Workflow: Interaction Loops

Types of RL Tasks: Episodic vs Continuing

Comparing RL with Other Learning Types

Setting up Your Python Environment for RL

Quiz for Chapter 1

Chapter 2: Markov Decision Processes (MDPs)

Modeling Sequential Decision Making

Formal Definition of an MDP

State Transition Probabilities

Reward Functions

Return: Cumulative Future Rewards

Discounting Future Rewards

Policies and Value Functions (Vπ, Qπ)

Finding Optimal Policies

Quiz for Chapter 2

Chapter 3: Estimating Value Functions

The Bellman Expectation Equation

The Bellman Optimality Equation

Solving Bellman Equations (Overview)

Dynamic Programming: Policy Iteration

Dynamic Programming: Value Iteration

Limitations of Dynamic Programming

Quiz for Chapter 3

Chapter 4: Monte Carlo Methods

Learning from Complete Episodes

Monte Carlo Prediction: Estimating Vπ

Monte Carlo Control: Estimating Qπ

On-Policy vs Off-Policy Learning

MC Control without Exploring Starts

On-Policy First-Visit MC Control Implementation

Off-Policy MC Prediction and Control Intro

Practice: Implementing MC Prediction

Quiz for Chapter 4

Chapter 5: Temporal-Difference Learning

Learning from Incomplete Episodes

TD(0) Prediction: Estimating Vπ

Advantages of TD Learning over MC

SARSA: On-Policy TD Control

Q-Learning: Off-Policy TD Control

Comparing SARSA and Q-Learning

Hands-on Practical: Implementing Q-Learning

Quiz for Chapter 5

Chapter 6: Function Approximation in RL

Handling Large State Spaces

Value Function Approximation (VFA)

Feature Vectors for State Representation

Linear Methods for VFA

Gradient Descent for Parameter Learning

Semi-gradient TD Methods

Using Neural Networks for VFA

Practice: Applying Linear VFA

Quiz for Chapter 6

Chapter 7: Introduction to Deep Q-Networks (DQN)

Combining Q-Learning with Deep Learning

Challenges with Neural Networks in RL

Experience Replay Mechanism

Fixed Q-Targets (Target Networks)

The DQN Algorithm Structure

Architectural Considerations for DQNs

Hands-on Practical: Building a Basic DQN

Quiz for Chapter 7

Chapter 8: Introduction to Policy Gradient Methods

Learning Policies Directly

Policy Gradient Theorem (Concept)

REINFORCE Algorithm

Baselines for Variance Reduction

Actor-Critic Methods Overview

Comparing Value-Based and Policy-Based Methods

Practice: Implementing REINFORCE

Quiz for Chapter 8

Chapter 4: Monte Carlo Methods

In the preceding chapters, we established the framework of Markov Decision Processes (MDPs) and used Dynamic Programming methods to find optimal policies. A key limitation of DP is the requirement of a complete model of the environment, including state transition probabilities and reward functions. Often, such a model is unavailable.

This chapter introduces Monte Carlo (MC) methods, a class of model-free Reinforcement Learning algorithms. MC methods learn directly from episodes of experience, without needing prior knowledge of the environment's dynamics. They operate by averaging the sample returns obtained from interaction sequences (episodes).

Here, you will focus on:

The fundamental idea of learning from complete episodes.
Using MC for prediction: estimating the state-value function $V^\pi$ .
Applying MC for control: finding optimal policies by estimating the action-value function $Q^\pi$ .
Understanding the difference between on-policy and off-policy MC methods.
Addressing the need for exploration in MC control, for example, using $\epsilon$ -soft policies.
Implementing basic MC prediction.

By working through MC methods, you'll gain insight into learning optimal behavior purely from sampled experience, a necessary step towards tackling problems where the environment's rules are unknown.

Sections

4.1 Learning from Complete Episodes
4.2 Monte Carlo Prediction: Estimating Vπ
4.3 Monte Carlo Control: Estimating Qπ
4.4 On-Policy vs Off-Policy Learning
4.5 MC Control without Exploring Starts
4.6 On-Policy First-Visit MC Control Implementation
4.7 Off-Policy MC Prediction and Control Intro
4.8 Practice: Implementing MC Prediction

© 2025 ApX Machine Learning