Introduction to Reinforcement Learning
Chapter 1: Foundations of Reinforcement Learning
What is Reinforcement Learning?
States, Actions, and Rewards
Policies: Mapping States to Actions
The RL Workflow: Interaction Loops
Types of RL Tasks: Episodic vs Continuing
Comparing RL with Other Learning Types
Setting up Your Python Environment for RL
Chapter 2: Markov Decision Processes (MDPs)
Modeling Sequential Decision Making
Formal Definition of an MDP
State Transition Probabilities
Return: Cumulative Future Rewards
Discounting Future Rewards
Policies and Value Functions (Vπ, Qπ)
Chapter 3: Estimating Value Functions
The Bellman Expectation Equation
The Bellman Optimality Equation
Solving Bellman Equations (Overview)
Dynamic Programming: Policy Iteration
Dynamic Programming: Value Iteration
Limitations of Dynamic Programming
Chapter 4: Monte Carlo Methods
Learning from Complete Episodes
Monte Carlo Prediction: Estimating Vπ
Monte Carlo Control: Estimating Qπ
On-Policy vs Off-Policy Learning
MC Control without Exploring Starts
On-Policy First-Visit MC Control Implementation
Off-Policy MC Prediction and Control Intro
Practice: Implementing MC Prediction
Chapter 5: Temporal-Difference Learning
Learning from Incomplete Episodes
TD(0) Prediction: Estimating Vπ
Advantages of TD Learning over MC
SARSA: On-Policy TD Control
Q-Learning: Off-Policy TD Control
Comparing SARSA and Q-Learning
Hands-on Practical: Implementing Q-Learning
Chapter 6: Function Approximation in RL
Handling Large State Spaces
Value Function Approximation (VFA)
Feature Vectors for State Representation
Gradient Descent for Parameter Learning
Using Neural Networks for VFA
Practice: Applying Linear VFA
Chapter 7: Introduction to Deep Q-Networks (DQN)
Combining Q-Learning with Deep Learning
Challenges with Neural Networks in RL
Experience Replay Mechanism
Fixed Q-Targets (Target Networks)
The DQN Algorithm Structure
Architectural Considerations for DQNs
Hands-on Practical: Building a Basic DQN
Chapter 8: Introduction to Policy Gradient Methods
Learning Policies Directly
Policy Gradient Theorem (Concept)
Baselines for Variance Reduction
Actor-Critic Methods Overview
Comparing Value-Based and Policy-Based Methods
Practice: Implementing REINFORCE