Schedule
-
EventDateDescriptionDescription
-
Session09/02/2025 17:00
TuesdayFirst Lecture -
Lecture09/02/2025
TuesdayLecture 0: Course Overview and LogisticsLecture Notes:
-
Lecture09/02/2025
TuesdayLecture 1: RL as a Learning Problem -
Lecture09/02/2025
TuesdayLecture 2: Optimal and Random Playing of Multi-armed BanditLecture Notes:
Further Reads:
- k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
- Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
-
Lecture09/05/2025
FridayLecture 3: Exploiting Explorations in Multi-armed BanditLecture Notes:
Further Reads:
- k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
- Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
-
Lecture09/05/2025
FridayLecture 4: Formulating the RL Framework -
Lecture09/05/2025
FridayLecture 5: Environment as State-Dependent System -
Lecture09/09/2025
TuesdayLecture 6: Examples of RL SettingLecture Notes:
-
Lecture09/09/2025
TuesdayLecture 7: Policy and Its Value -
Lecture09/09/2025
TuesdayLecture 8: Playing Tic-Tac-Toe -
Lecture09/09/2025
TuesdayLecture 9: Optimal PolicyLecture Notes:
-
Lecture09/12/2025
FridayLecture 10: Frozen Lake Example -- Terminal State and Episode -
Lecture09/12/2025
FridayLecture 11: Markov Decision Processes -
Lecture09/12/2025
FridayLecture 12: Value Function Calculation via MDPs -- Naive Approach -
Assignment09/16/2025
TuesdayAssignment #1 - Basics of RL released! -
Lecture09/16/2025
TuesdayLecture 13: Bellman Equation -
Lecture09/16/2025
TuesdayLecture 14: Bellman Equation for Action-Value and Backup Diagram -
Lecture09/16/2025
TuesdayLecture 15: Bellman Optimality Equation -
Lecture09/19/2025
FridayLecture 16: Back-Tracking Optimal Policy -
Lecture09/19/2025
FridayLecture 17: Policy Evaluation by Dynamic Programming -
Lecture09/19/2025
FridayLecture 18: Policy Improvement and Policy IterationLecture Notes:
Further Reads:
- Policy Improvement and Iteration: Chapter 4 - Sections 4.2 and 4.3 of [SB]
-
Assignment09/21/2025
SundayProject Proposal released! -
Lecture09/23/2025
TuesdayLecture 19: Value Iteration -
Lecture09/23/2025
TuesdayLecture 20: Generalized Policy IterationLecture Notes:
Further Reads:
- Generalized Policy Iteration: Chapter 4 - Sections 4.6 and 4.7 of [SB]
-
Lecture09/23/2025
TuesdayLecture 21: Model-free Policy Evaluation via Monte-Carlo -
Lecture09/26/2025
FridayLecture 22: GPI via Monte-Carlo -
Lecture09/26/2025
FridayLecture 23: Bootstrapping -
Lecture09/26/2025
FridayLecture 24: GPI via Temporal Difference -
Lecture09/30/2025
TuesdayLecture 25: Deep Bootstrapping and TD-n -
Lecture09/30/2025
TuesdayLecture 26: TD-λ -
Due09/30/2025 23:59
TuesdayAssignment #1 due -
Lecture10/03/2025
FridayLecture 27: TD with Eligibility TracingFurther Reads:
- Eligibility Tracing: Chapter 12 - Sections 12.4 and 12.5 of [SB]
-
Lecture10/03/2025
FridayLecture 28: Control Loop with Monte Carlo -
Lecture10/03/2025
FridayLecture 29: Adding Exploration to Control Loop -
Due10/03/2025 23:59
FridayProposal due -
Assignment10/06/2025
MondayAssignment #2 - Tabular RL released! -
Lecture10/07/2025
TuesdayLecture 30: On-Policy RL via SARSAFurther Reads:
- Online Q-Learning Article On-Line Q-Learning Using Connectionist Systems published in 1994 by G. Rummery and M. Niranjan proposing SARSA as an online version of Q-Learning
- Sarsa: Chapter 6 - Section 6.4 of [SB]
- Sarsa: Chapter 10 - Sections 10.2 and 10.5 of [SB]
- Sarsa: Chapter 12 - Section 12.7 of [SB]
-
Lecture10/07/2025
TuesdayLecture 31: Off-Policy RL via Importance SamplingFurther Reads:
- Importance Sampling: Chapter 5 - Section 5.5 of [SB]
- Off Policy Learning: Chapter 12 - Sections 12.9 and 12.11 of [SB]
-
Lecture10/07/2025
TuesdayLecture 32: Q-LearningFurther Reads:
- Q-Learning Paper Paper Q-learning published in 1992 by _C Watkins and P. Dayan_proposing the off-policy learning as in Q-learning algorithm
- Q-Learning: Chapter 6 - Section 6.5 of [SB]
- Q-Learning: Chapter 12 - Section 12.10 of [SB]
-
Lecture10/10/2025
FridayLecture 33: Convergence of Q-Learning and SARSAFurther Reads: *Convergence Paper The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning published in 2000 by V. Borkar and S. Meyn studying convergence of Q-Learning and SARSA
-
Lecture10/10/2025
FridayLecture 34: Why Deep RL?Further Reads:
- Neuro-dynamic Programming Paper Neuro-dynamic programming: an overview published in 1995 by D. Bertsekas and J. Tsitsiklis discussing function approximation for value learning
-
Lecture10/10/2025
FridayLecture 35: Using Function Approximation in RLFurther Reads:
- Function Approximation for RL: Chapter 9 of [SB]
- Neuro-dynamic Programming Paper Neuro-dynamic programming: an overview published in 1995 by D. Bertsekas and J. Tsitsiklis discussing function approximation for value learning
-
Lecture10/14/2025
TuesdayLecture 36: Flexibility of RL via Function Approximation -
Lecture10/14/2025
TuesdayLecture 37: Training Value Model for PredictionFurther Reads:
- TD with FA Paper Analysis of Temporal-Diffference Learning with Function Approximation published in 1996 by J. Tsitsiklis and B. Roy analyzing Prediction with parameterized models
-
Lecture10/14/2025
TuesdayLecture 38: Back to Tabular RLFurther Reads:
- FA vs Tabular Paper Analyzing feature generation for value-function approximation published in 2008 by R. Parr et al. discussing connections of RL with FA to tabular RL
-
Lecture10/14/2025
TuesdayLecture 39: Learning Action-Value via Function ApproximationFurther Reads:
- RL with FA Paper Residual Algorithms: Reinforcement Learning with Function Approximation published in 1995 by L. BAird giving some critics to RL with FA
-
Lecture10/17/2025
FridayLecture 40: Control via Function Approximation and Deep Q-Learning -
Lecture10/17/2025
FridayLecture 41: Experience Replay in DQLFurther Reads:
- DQL: Chapter 4 - Section 4.3 of [CS]
- Deep Q-Learning Paper Human-level control through deep reinforcement learning published in 2015 by V. Mnih et al. proposing the legendary idea of Deep Q-Learning
- DQL Paper I Paper Playing Atari with Deep Reinforcement Learning published in 2013 by V. Mnih et al. describing DQL details
-
Exam10/21/2025 17:00
TuesdayMidtermTopics:
- The exam is 3 hours long
- No programming questions
- Starts at 5:00 PM
-
Lecture10/24/2025
FridayLecture 42: Target NetworkFurther Reads:
- DQL Paper I Paper Playing Atari with Deep Reinforcement Learning published in 2013 by V. Mnih et al. describing DQL details
-
Lecture10/24/2025
FridayLecture 43: Double DQL and GorilaFurther Reads:
- DQL Paper II Paper Deep Reinforcement Learning with Double Q-learning published in 2015 by H. Haasselt et al. proposing Double DQL
- DQL Paper III Paper Dueling Network Architectures for Deep Reinforcement Learning published in 2016 by Z. Wang et al. proposing Dueling DQL
- DQL Paper IV Paper Prioritized Experience Replay published in 2016 by T. Schaul et al. proposing a prioritizing experience replay scheme
- Gorila Paper Massively Parallel Methods for Deep Reinforcement Learning published in 2015 by A. Nair et al. proposing Gorila
-
Lecture10/24/2025
FridayLecture 44: Why Policy Net?Further Reads:
- Why Policy Net Article Deep Deterministic Policy Gradient at OpenAI Spinning Up
-
Due10/24/2025 23:59
FridayAssignment #2 due -
Lecture11/04/2025
TuesdayLecture 45: Policy Net and Its Learning ObjectiveFurther Reads:
- REINFORCE Paper Simple statistical gradient-following algorithms for connectionist reinforcement learning published by R. Williams in 1992 introducing REINFORCE algorithm
-
Lecture11/04/2025
TuesdayLecture 46: Training Policy Net via SGDFurther Reads:
- PGM Theorem Paper Policy Gradient Methods for Reinforcement Learning with Function Approximation published by R. Sutton et al. in 1999 developing the Policy Gradient Theorem
-
Lecture11/07/2025
FridayLecture 47: Policy Gradient TheoremFurther Reads:
- PGM Theorem Paper Policy Gradient Methods for Reinforcement Learning with Function Approximation published by R. Sutton et al. in 1999 developing the Policy Gradient Theorem
-
Lecture11/07/2025
FridayLecture 48: Vanilla and Baseline PGMFurther Reads:
- Baseline Paper Policy invariance under reward transformations: Theory and application to reward shaping published by A. Ng et al. in 1999
-
Lecture11/11/2025
TuesdayLecture 49: PGM as Sequential Surrogate OptimizationFurther Reads:
-
Lecture11/11/2025
TuesdayLecture 50: Trust Region and Natural PGMFurther Reads:
-
Lecture11/14/2025
FridayLecture 51: TRPO AlgorithmFurther Reads:
- TRPO Paper Trust Region Policy Optimization published by J. Schulman et al. in 2015 proposing TRPO
-
Lecture11/14/2025
FridayLecture 52: PPO AlgorithmFurther Reads:
- PPO Paper Proximal Policy Optimization Algorithms published by J. Schulman et al. in 2017 proposing PPO
