Schedule

  • Event
    Date
    Description
    Description
  • Session
    09/02/2025 17:00
    Tuesday
    First Lecture
  • Lecture
    09/02/2025
    Tuesday
    Lecture 0: Course Overview and Logistics

    Lecture Notes:

  • Lecture
    09/02/2025
    Tuesday
    Lecture 1: RL as a Learning Problem

    Lecture Notes:

    Further Reads:

  • Lecture
    09/02/2025
    Tuesday
    Lecture 2: Optimal and Random Playing of Multi-armed Bandit

    Lecture Notes:

    Further Reads:

    • k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
    • Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
  • Lecture
    09/05/2025
    Friday
    Lecture 3: Exploiting Explorations in Multi-armed Bandit

    Lecture Notes:

    Further Reads:

    • k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
    • Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
  • Lecture
    09/05/2025
    Friday
    Lecture 4: Formulating the RL Framework

    Lecture Notes:

    Further Reads:

  • Lecture
    09/05/2025
    Friday
    Lecture 5: Environment as State-Dependent System

    Lecture Notes:

    Further Reads:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 6: Examples of RL Setting

    Lecture Notes:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 7: Policy and Its Value

    Lecture Notes:

    Further Reads:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 8: Playing Tic-Tac-Toe

    Lecture Notes:

    Further Reads:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 9: Optimal Policy

    Lecture Notes:

  • Lecture
    09/12/2025
    Friday
    Lecture 10: Frozen Lake Example -- Terminal State and Episode

    Lecture Notes:

    Further Reads:

  • Lecture
    09/12/2025
    Friday
    Lecture 11: Markov Decision Processes

    Lecture Notes:

    Further Reads:

  • Lecture
    09/12/2025
    Friday
    Lecture 12: Value Function Calculation via MDPs -- Naive Approach

    Lecture Notes:

    Further Reads:

  • Assignment
    09/16/2025
    Tuesday
    Assignment #1 - Basics of RL released!
  • Lecture
    09/16/2025
    Tuesday
    Lecture 13: Bellman Equation

    Lecture Notes:

    Further Reads:

  • Lecture
    09/16/2025
    Tuesday
    Lecture 14: Bellman Equation for Action-Value and Backup Diagram

    Lecture Notes:

    Further Reads:

  • Lecture
    09/16/2025
    Tuesday
    Lecture 15: Bellman Optimality Equation

    Lecture Notes:

    Further Reads:

  • Lecture
    09/19/2025
    Friday
    Lecture 16: Back-Tracking Optimal Policy

    Lecture Notes:

    Further Reads:

  • Lecture
    09/19/2025
    Friday
    Lecture 17: Policy Evaluation by Dynamic Programming

    Lecture Notes:

    Further Reads:

  • Lecture
    09/19/2025
    Friday
    Lecture 18: Policy Improvement and Policy Iteration

    Lecture Notes:

    Further Reads:

  • Assignment
    09/21/2025
    Sunday
    Project Proposal released!
  • Lecture
    09/23/2025
    Tuesday
    Lecture 19: Value Iteration

    Lecture Notes:

    Further Reads:

  • Lecture
    09/23/2025
    Tuesday
    Lecture 20: Generalized Policy Iteration

    Lecture Notes:

    Further Reads:

  • Lecture
    09/23/2025
    Tuesday
    Lecture 21: Model-free Policy Evaluation via Monte-Carlo

    Lecture Notes:

    Further Reads:

  • Lecture
    09/26/2025
    Friday
    Lecture 22: GPI via Monte-Carlo

    Lecture Notes:

    Further Reads:

  • Lecture
    09/26/2025
    Friday
    Lecture 23: Bootstrapping

    Lecture Notes:

    Further Reads:

  • Lecture
    09/26/2025
    Friday
    Lecture 24: GPI via Temporal Difference

    Further Reads:

    • TD-0: Chapter 6 - Sections 6.2 and 6.3 of [SB]
  • Lecture
    09/30/2025
    Tuesday
    Lecture 25: Deep Bootstrapping and TD-n

    Further Reads:

    • TD-n: Chapter 7 - Sections 7.1 and 7.2 of [SB]
  • Lecture
    09/30/2025
    Tuesday
    Lecture 26: TD-λ

    Further Reads:

  • Due
    09/30/2025 23:59
    Tuesday
    Assignment #1 due
  • Lecture
    10/03/2025
    Friday
    Lecture 27: TD with Eligibility Tracing

    Further Reads:

  • Lecture
    10/03/2025
    Friday
    Lecture 28: Control Loop with Monte Carlo

    Further Reads:

  • Lecture
    10/03/2025
    Friday
    Lecture 29: Adding Exploration to Control Loop

    Further Reads:

  • Due
    10/03/2025 23:59
    Friday
    Proposal due
  • Assignment
    10/06/2025
    Monday
    Assignment #2 - Tabular RL released!
  • Lecture
    10/07/2025
    Tuesday
    Lecture 30: On-Policy RL via SARSA

    Further Reads:

    • Online Q-Learning Article On-Line Q-Learning Using Connectionist Systems published in 1994 by G. Rummery and M. Niranjan proposing SARSA as an online version of Q-Learning
    • Sarsa: Chapter 6 - Section 6.4 of [SB]
    • Sarsa: Chapter 10 - Sections 10.2 and 10.5 of [SB]
    • Sarsa: Chapter 12 - Section 12.7 of [SB]
  • Lecture
    10/07/2025
    Tuesday
    Lecture 31: Off-Policy RL via Importance Sampling

    Further Reads:

  • Lecture
    10/07/2025
    Tuesday
    Lecture 32: Q-Learning

    Further Reads:

    • Q-Learning Paper Paper Q-learning published in 1992 by _C Watkins and P. Dayan_proposing the off-policy learning as in Q-learning algorithm
    • Q-Learning: Chapter 6 - Section 6.5 of [SB]
    • Q-Learning: Chapter 12 - Section 12.10 of [SB]
  • Lecture
    10/10/2025
    Friday
    Lecture 33: Convergence of Q-Learning and SARSA

    Further Reads: *Convergence Paper The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning published in 2000 by V. Borkar and S. Meyn studying convergence of Q-Learning and SARSA

  • Lecture
    10/10/2025
    Friday
    Lecture 34: Why Deep RL?

    Further Reads:

    • Neuro-dynamic Programming Paper Neuro-dynamic programming: an overview published in 1995 by D. Bertsekas and J. Tsitsiklis discussing function approximation for value learning
  • Lecture
    10/10/2025
    Friday
    Lecture 35: Using Function Approximation in RL

    Further Reads:

  • Lecture
    10/14/2025
    Tuesday
    Lecture 36: Flexibility of RL via Function Approximation

    Further Reads:

  • Lecture
    10/14/2025
    Tuesday
    Lecture 37: Training Value Model for Prediction

    Further Reads:

    • TD with FA Paper Analysis of Temporal-Diffference Learning with Function Approximation published in 1996 by J. Tsitsiklis and B. Roy analyzing Prediction with parameterized models
  • Lecture
    10/14/2025
    Tuesday
    Lecture 38: Back to Tabular RL

    Further Reads:

    • FA vs Tabular Paper Analyzing feature generation for value-function approximation published in 2008 by R. Parr et al. discussing connections of RL with FA to tabular RL
  • Lecture
    10/14/2025
    Tuesday
    Lecture 39: Learning Action-Value via Function Approximation

    Further Reads:

    • RL with FA Paper Residual Algorithms: Reinforcement Learning with Function Approximation published in 1995 by L. BAird giving some critics to RL with FA
  • Lecture
    10/17/2025
    Friday
    Lecture 40: Control via Function Approximation and Deep Q-Learning

    Further Reads:

  • Lecture
    10/17/2025
    Friday
    Lecture 41: Experience Replay in DQL

    Further Reads:

    • DQL: Chapter 4 - Section 4.3 of [CS]
    • Deep Q-Learning Paper Human-level control through deep reinforcement learning published in 2015 by V. Mnih et al. proposing the legendary idea of Deep Q-Learning
    • DQL Paper I Paper Playing Atari with Deep Reinforcement Learning published in 2013 by V. Mnih et al. describing DQL details
  • Exam
    10/21/2025 17:00
    Tuesday
    Midterm

    Topics:

    • The exam is 3 hours long
    • No programming questions
    • Starts at 5:00 PM
  • Lecture
    10/24/2025
    Friday
    Lecture 42: Target Network

    Further Reads:

    • DQL Paper I Paper Playing Atari with Deep Reinforcement Learning published in 2013 by V. Mnih et al. describing DQL details
  • Lecture
    10/24/2025
    Friday
    Lecture 43: Double DQL and Gorila

    Further Reads:

    • DQL Paper II Paper Deep Reinforcement Learning with Double Q-learning published in 2015 by H. Haasselt et al. proposing Double DQL
    • DQL Paper III Paper Dueling Network Architectures for Deep Reinforcement Learning published in 2016 by Z. Wang et al. proposing Dueling DQL
    • DQL Paper IV Paper Prioritized Experience Replay published in 2016 by T. Schaul et al. proposing a prioritizing experience replay scheme
    • Gorila Paper Massively Parallel Methods for Deep Reinforcement Learning published in 2015 by A. Nair et al. proposing Gorila
  • Lecture
    10/24/2025
    Friday
    Lecture 44: Why Policy Net?

    Further Reads:

    • Why Policy Net Article Deep Deterministic Policy Gradient at OpenAI Spinning Up
  • Due
    10/24/2025 23:59
    Friday
    Assignment #2 due
  • Lecture
    11/04/2025
    Tuesday
    Lecture 45: Policy Net and Its Learning Objective

    Further Reads:

    • REINFORCE Paper Simple statistical gradient-following algorithms for connectionist reinforcement learning published by R. Williams in 1992 introducing REINFORCE algorithm
  • Lecture
    11/04/2025
    Tuesday
    Lecture 46: Training Policy Net via SGD

    Further Reads:

    • PGM Theorem Paper Policy Gradient Methods for Reinforcement Learning with Function Approximation published by R. Sutton et al. in 1999 developing the Policy Gradient Theorem
  • Lecture
    11/07/2025
    Friday
    Lecture 47: Policy Gradient Theorem

    Further Reads:

    • PGM Theorem Paper Policy Gradient Methods for Reinforcement Learning with Function Approximation published by R. Sutton et al. in 1999 developing the Policy Gradient Theorem
  • Lecture
    11/07/2025
    Friday
    Lecture 48: Vanilla and Baseline PGM

    Further Reads:

    • Baseline Paper Policy invariance under reward transformations: Theory and application to reward shaping published by A. Ng et al. in 1999
  • Lecture
    11/11/2025
    Tuesday
    Lecture 49: PGM as Sequential Surrogate Optimization

    Further Reads:

    • Nat PGM Paper A Natural Policy Gradient published by S. Kakade in 2001 proposing a basic natural PGM
    • TRPO Paper Trust Region Policy Optimization published by J. Schulman et al. in 2015 proposing TRPO
  • Lecture
    11/11/2025
    Tuesday
    Lecture 50: Trust Region and Natural PGM

    Further Reads:

    • Nat PGM Paper A Natural Policy Gradient published by S. Kakade in 2001 proposing a basic natural PGM
    • TRPO Paper Trust Region Policy Optimization published by J. Schulman et al. in 2015 proposing TRPO
  • Lecture
    11/14/2025
    Friday
    Lecture 51: TRPO Algorithm

    Further Reads:

    • TRPO Paper Trust Region Policy Optimization published by J. Schulman et al. in 2015 proposing TRPO
  • Lecture
    11/14/2025
    Friday
    Lecture 52: PPO Algorithm

    Further Reads:

    • PPO Paper Proximal Policy Optimization Algorithms published by J. Schulman et al. in 2017 proposing PPO

Tutorial Schedule