Schedule

Event

Date

Description

Description
Session

09/02/2025 17:00
Tuesday

First Lecture
Lecture

09/02/2025
Tuesday

Lecture 0: Course Overview and Logistics
Lecture Notes:
- Chapter 0
Lecture

09/02/2025
Tuesday

Lecture 1: RL as a Learning Problem
Lecture Notes:
- Chapter 1 - Section 1
Further Reads:
- Intro to RL: Chapter 1 - Sections 1.1 and 1.2 of [SB]
Lecture

09/02/2025
Tuesday

Lecture 2: Optimal and Random Playing of Multi-armed Bandit
Lecture Notes:
- Chapter 1 - Section 1
Further Reads:
- k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
- Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
Lecture

09/05/2025
Friday

Lecture 3: Exploiting Explorations in Multi-armed Bandit
Lecture Notes:
- Chapter 1 - Section 1
Further Reads:
- k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
- Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
Lecture

09/05/2025
Friday

Lecture 4: Formulating the RL Framework
Lecture Notes:
- Chapter 1 - Section 2
Further Reads:
- RL Setting: Chapter 3 - Sections 3.1 to 3.3 of [SB]
Lecture

09/05/2025
Friday

Lecture 5: Environment as State-Dependent System
Lecture Notes:
- Chapter 1 - Section 2
Further Reads:
- RL Setting: Chapter 3 - Sections 3.1 to 3.3 of [SB]
Lecture

09/09/2025
Tuesday

Lecture 6: Examples of RL Setting
Lecture Notes:
- Chapter 1 - Section 2
Lecture

09/09/2025
Tuesday

Lecture 7: Policy and Its Value
Lecture Notes:
- Chapter 1 - Section 3
Further Reads:
- RL Setting: Chapter 1 - Sections 1.3 and 1.4 of [SB]
Lecture

09/09/2025
Tuesday

Lecture 8: Playing Tic-Tac-Toe
Lecture Notes:
- Chapter 1 - Section 3
Further Reads:
- RL Setting: Chapter 1 - Section 1.5 of [SB]
Lecture

09/09/2025
Tuesday

Lecture 9: Optimal Policy
Lecture Notes:
- Chapter 1 - Section 3
Lecture

09/12/2025
Friday

Lecture 10: Frozen Lake Example -- Terminal State and Episode
Lecture Notes:
- Chapter 1 - Section 4
Further Reads:
- RL Setting: Chapter 3 - Section 3.4 of [SB]
Lecture

09/12/2025
Friday

Lecture 11: Markov Decision Processes
Lecture Notes:
- Chapter 2 - Section 1
Further Reads:
- RL Setting: Chapter 3 - Sections 3.1 and 3.3 of [SB]
Lecture

09/12/2025
Friday

Lecture 12: Value Function Calculation via MDPs -- Naive Approach
Lecture Notes:
- Chapter 2 - Section 1
Further Reads:
- RL Setting: Chapter 3 - Section 3.5 [SB]
Assignment

09/16/2025
Tuesday

Assignment #1 - Basics of RL released!

[Assignment #1 - Basics of RL]
Lecture

09/16/2025
Tuesday

Lecture 13: Bellman Equation
Lecture Notes:
- Chapter 2 - Section 1
Further Reads:
- Bellman: Chapter 3 - Section 3.6 [SB]
Lecture

09/16/2025
Tuesday

Lecture 14: Bellman Equation for Action-Value and Backup Diagram
Lecture Notes:
- Chapter 2 - Section 1
Further Reads:
- Bellman: Chapter 3 - Section 3.6 [SB]
Lecture

09/16/2025
Tuesday

Lecture 15: Bellman Optimality Equation
Lecture Notes:
- Chapter 2 - Section 2
Further Reads:
- Optimal Policy: Chapter 3 - Section 3.7 [SB]
Lecture

09/19/2025
Friday

Lecture 16: Back-Tracking Optimal Policy
Lecture Notes:
- Chapter 2 - Section 2
Further Reads:
- Optimal Policy: Chapter 3 - Section 3.7 [SB]
Lecture

09/19/2025
Friday

Lecture 17: Policy Evaluation by Dynamic Programming
Lecture Notes:
- Chapter 2 - Section 3
Further Reads:
- Policy Evaluation: Chapter 4 - Section 4.1 of [SB]
Lecture

09/19/2025
Friday

Lecture 18: Policy Improvement and Policy Iteration
Lecture Notes:
- Chapter 2 - Section 3
Further Reads:
- Policy Improvement and Iteration: Chapter 4 - Sections 4.2 and 4.3 of [SB]
Assignment

09/21/2025
Sunday

Project Proposal released!

[Project Proposal]
Lecture

09/23/2025
Tuesday

Lecture 19: Value Iteration
Lecture Notes:
- Chapter 2 - Section 3
Further Reads:
- Value Iteration: Chapter 4 - Section 4.4 of [SB]
Lecture

09/23/2025
Tuesday

Lecture 20: Generalized Policy Iteration
Lecture Notes:
- Chapter 2 - Section 3
Further Reads:
- Generalized Policy Iteration: Chapter 4 - Sections 4.6 and 4.7 of [SB]
Lecture

09/23/2025
Tuesday

Lecture 21: Model-free Policy Evaluation via Monte-Carlo
Lecture Notes:
- Chapter 3 - Section 1
Further Reads:
- Monte-Carlo: Chapter 5 - Sections 5.1 and 5.2 of [SB]
Lecture

09/26/2025
Friday

Lecture 22: GPI via Monte-Carlo
Lecture Notes:
- Chapter 3 - Section 1
Further Reads:
- Monte-Carlo: Chapter 5 - Section 5.3 of [SB]
Lecture

09/26/2025
Friday

Lecture 23: Bootstrapping
Lecture Notes:
- Chapter 3 - Section 2
Further Reads:
- TD-0: Chapter 6 - Section 6.1 of [SB]
Lecture

09/26/2025
Friday

Lecture 24: GPI via Temporal Difference
- Chapter 3 - Section 2
Further Reads:
- TD-0: Chapter 6 - Sections 6.2 and 6.3 of [SB]
Lecture

09/30/2025
Tuesday

Lecture 25: Deep Bootstrapping and TD-n
- Chapter 3 - Section 3
Further Reads:
- TD-n: Chapter 7 - Sections 7.1 and 7.2 of [SB]
Lecture

09/30/2025
Tuesday

Lecture 26: TD-λ
- Chapter 3 - Section 3
Further Reads:
- TD-lambda: Chapter 12 - Sections 12.1 to 12.3 of [SB]
Due

09/30/2025 23:59
Tuesday

Assignment #1 due
Lecture

10/03/2025
Friday

Lecture 27: TD with Eligibility Tracing
- Chapter 3 - Section 3
Further Reads:
- Eligibility Tracing: Chapter 12 - Sections 12.4 and 12.5 of [SB]
Lecture

10/03/2025
Friday

Lecture 28: Control Loop with Monte Carlo
- Chapter 3 - Section 4
Further Reads:
- MC Control: Chapter 5 - Sections 5.3 and 5.4 of [SB]
Lecture

10/03/2025
Friday

Lecture 29: Adding Exploration to Control Loop
- Chapter 3 - Section 4
Further Reads:
- ε-Greedy: Chapter 2 - Sections 2.5 and 2.6 of [SB]
Due

10/03/2025 23:59
Friday

Proposal due
Assignment

10/06/2025
Monday

Assignment #2 - Tabular RL released!

[Assignment #2 - Tabular RL]
Lecture

10/07/2025
Tuesday

Lecture 30: On-Policy RL via SARSA
- Chapter 3 - Section 5
Further Reads:
- Online Q-Learning Article On-Line Q-Learning Using Connectionist Systems published in 1994 by G. Rummery and M. Niranjan proposing SARSA as an online version of Q-Learning
- Sarsa: Chapter 6 - Section 6.4 of [SB]
- Sarsa: Chapter 10 - Sections 10.2 and 10.5 of [SB]
- Sarsa: Chapter 12 - Section 12.7 of [SB]
Lecture

10/07/2025
Tuesday

Lecture 31: Off-Policy RL via Importance Sampling
- Chapter 3 - Section 5
Further Reads:
- Importance Sampling: Chapter 5 - Section 5.5 of [SB]
- Off Policy Learning: Chapter 12 - Sections 12.9 and 12.11 of [SB]
Lecture

10/07/2025
Tuesday

Lecture 32: Q-Learning
- Chapter 3 - Section 5
Further Reads:
- Q-Learning Paper Paper Q-learning published in 1992 by _C Watkins and P. Dayan_proposing the off-policy learning as in Q-learning algorithm
- Q-Learning: Chapter 6 - Section 6.5 of [SB]
- Q-Learning: Chapter 12 - Section 12.10 of [SB]
Lecture

10/10/2025
Friday

Lecture 33: Convergence of Q-Learning and SARSA
- Chapter 3 - Section 5
Further Reads: *Convergence Paper The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning published in 2000 by V. Borkar and S. Meyn studying convergence of Q-Learning and SARSA
Lecture

10/10/2025
Friday

Lecture 34: Why Deep RL?
- Chapter 4 - Section 1
Further Reads:
- Neuro-dynamic Programming Paper Neuro-dynamic programming: an overview published in 1995 by D. Bertsekas and J. Tsitsiklis discussing function approximation for value learning
Lecture

10/10/2025
Friday

Lecture 35: Using Function Approximation in RL
- Chapter 4 - Section 1
Further Reads:
- Function Approximation for RL: Chapter 9 of [SB]
- Neuro-dynamic Programming Paper Neuro-dynamic programming: an overview published in 1995 by D. Bertsekas and J. Tsitsiklis discussing function approximation for value learning
Lecture

10/14/2025
Tuesday

Lecture 36: Flexibility of RL via Function Approximation
- Chapter 4 - Section 1
Further Reads:
- Prediction with FA: Chapter 3 of [CS]
Lecture

10/14/2025
Tuesday

Lecture 37: Training Value Model for Prediction
- Chapter 4 - Section 2
Further Reads:
- TD with FA Paper Analysis of Temporal-Diffference Learning with Function Approximation published in 1996 by J. Tsitsiklis and B. Roy analyzing Prediction with parameterized models
Lecture

10/14/2025
Tuesday

Lecture 38: Back to Tabular RL
- Chapter 4 - Section 2
Further Reads:
- FA vs Tabular Paper Analyzing feature generation for value-function approximation published in 2008 by R. Parr et al. discussing connections of RL with FA to tabular RL
Lecture

10/14/2025
Tuesday

Lecture 39: Learning Action-Value via Function Approximation
- Chapter 4 - Section 2
Further Reads:
- RL with FA Paper Residual Algorithms: Reinforcement Learning with Function Approximation published in 1995 by L. BAird giving some critics to RL with FA
Lecture

10/17/2025
Friday

Lecture 40: Control via Function Approximation and Deep Q-Learning
- Chapter 4 - Section 4
Further Reads:
- Control with FA: Chapter 4 of [CS]
Lecture

10/17/2025
Friday

Lecture 41: Experience Replay in DQL
- Chapter 4 - Section 4
Further Reads:
- DQL: Chapter 4 - Section 4.3 of [CS]
- Deep Q-Learning Paper Human-level control through deep reinforcement learning published in 2015 by V. Mnih et al. proposing the legendary idea of Deep Q-Learning
- DQL Paper I Paper Playing Atari with Deep Reinforcement Learning published in 2013 by V. Mnih et al. describing DQL details
Exam

10/21/2025 17:00
Tuesday

Midterm
Topics:
- The exam is 3 hours long
- No programming questions
- Starts at 5:00 PM
Lecture

10/24/2025
Friday

Lecture 42: Target Network
- Chapter 4 - Section 4
Further Reads:
- DQL Paper I Paper Playing Atari with Deep Reinforcement Learning published in 2013 by V. Mnih et al. describing DQL details
Lecture

10/24/2025
Friday

Lecture 43: Double DQL and Gorila
- Chapter 4 - Section 4
Further Reads:
- DQL Paper II Paper Deep Reinforcement Learning with Double Q-learning published in 2015 by H. Haasselt et al. proposing Double DQL
- DQL Paper III Paper Dueling Network Architectures for Deep Reinforcement Learning published in 2016 by Z. Wang et al. proposing Dueling DQL
- DQL Paper IV Paper Prioritized Experience Replay published in 2016 by T. Schaul et al. proposing a prioritizing experience replay scheme
- Gorila Paper Massively Parallel Methods for Deep Reinforcement Learning published in 2015 by A. Nair et al. proposing Gorila
Lecture

10/24/2025
Friday

Lecture 44: Why Policy Net?
- Chapter 4 - Section 4
Further Reads:
- Why Policy Net Article Deep Deterministic Policy Gradient at OpenAI Spinning Up
Due

10/24/2025 23:59
Friday

Assignment #2 due
Lecture

11/04/2025
Tuesday

Lecture 45: Policy Net and Its Learning Objective
- Chapter 5 - Section 1
Further Reads:
- REINFORCE Paper Simple statistical gradient-following algorithms for connectionist reinforcement learning published by R. Williams in 1992 introducing REINFORCE algorithm
Lecture

11/04/2025
Tuesday

Lecture 46: Training Policy Net via SGD
- Chapter 5 - Section 1
Further Reads:
- PGM Theorem Paper Policy Gradient Methods for Reinforcement Learning with Function Approximation published by R. Sutton et al. in 1999 developing the Policy Gradient Theorem
Lecture

11/07/2025
Friday

Lecture 47: Policy Gradient Theorem
- Chapter 5 - Section 1
Further Reads:
- PGM Theorem Paper Policy Gradient Methods for Reinforcement Learning with Function Approximation published by R. Sutton et al. in 1999 developing the Policy Gradient Theorem
Lecture

11/07/2025
Friday

Lecture 48: Vanilla and Baseline PGM
- Chapter 5 - Section 2
Further Reads:
- Baseline Paper Policy invariance under reward transformations: Theory and application to reward shaping published by A. Ng et al. in 1999
Lecture

11/11/2025
Tuesday

Lecture 49: PGM as Sequential Surrogate Optimization
- Chapter 5 - Section 3
Further Reads:
- Nat PGM Paper A Natural Policy Gradient published by S. Kakade in 2001 proposing a basic natural PGM
- TRPO Paper Trust Region Policy Optimization published by J. Schulman et al. in 2015 proposing TRPO
Lecture

11/11/2025
Tuesday

Lecture 50: Trust Region and Natural PGM
- Chapter 5 - Section 3
Further Reads:
- Nat PGM Paper A Natural Policy Gradient published by S. Kakade in 2001 proposing a basic natural PGM
- TRPO Paper Trust Region Policy Optimization published by J. Schulman et al. in 2015 proposing TRPO
Lecture

11/14/2025
Friday

Lecture 51: TRPO Algorithm
- Chapter 5 - Section 4
Further Reads:
- TRPO Paper Trust Region Policy Optimization published by J. Schulman et al. in 2015 proposing TRPO
Lecture

11/14/2025
Friday

Lecture 52: PPO Algorithm
- Chapter 5 - Section 4
Further Reads:
- PPO Paper Proximal Policy Optimization Algorithms published by J. Schulman et al. in 2017 proposing PPO

Tutorial Schedule