Schedule

  • Event
    Date
    Description
    Description
  • Session
    09/02/2025 17:00
    Tuesday
    First Lecture
  • Lecture
    09/02/2025
    Tuesday
    Lecture 0: Course Overview and Logistics

    Lecture Notes:

  • Lecture
    09/02/2025
    Tuesday
    Lecture 1: RL as a Learning Problem

    Lecture Notes:

    Further Reads:

  • Lecture
    09/02/2025
    Tuesday
    Lecture 2: Optimal and Random Playing of Multi-armed Bandit

    Lecture Notes:

    Further Reads:

    • k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
    • Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
  • Lecture
    09/05/2025
    Friday
    Lecture 3: Exploiting Explorations in Multi-armed Bandit

    Lecture Notes:

    Further Reads:

    • k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
    • Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
  • Lecture
    09/05/2025
    Friday
    Lecture 4: Formulating the RL Framework

    Lecture Notes:

    Further Reads:

  • Lecture
    09/05/2025
    Friday
    Lecture 5: Environment as State-Dependent System

    Lecture Notes:

    Further Reads:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 6: Examples of RL Setting

    Lecture Notes:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 7: Policy and Its Value

    Lecture Notes:

    Further Reads:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 8: Playing Tic-Tac-Toe

    Lecture Notes:

    Further Reads:

  • Lecture
    09/09/2025
    Tuesday
    Lecture 9: Optimal Policy

    Lecture Notes:

  • Lecture
    09/12/2025
    Friday
    Lecture 10: Frozen Lake Example -- Terminal State and Episode

    Lecture Notes:

    Further Reads:

  • Lecture
    09/12/2025
    Friday
    Lecture 11: Markov Decision Processes

    Lecture Notes:

    Further Reads:

  • Lecture
    09/12/2025
    Friday
    Lecture 12: Value Function Calculation via MDPs -- Naive Approach

    Lecture Notes:

    Further Reads:

  • Assignment
    09/16/2025
    Tuesday
    Assignment #1 - Basics of RL released!
  • Lecture
    09/16/2025
    Tuesday
    Lecture 13: Bellman Equation

    Lecture Notes:

    Further Reads:

  • Lecture
    09/16/2025
    Tuesday
    Lecture 14: Bellman Equation for Action-Value and Backup Diagram

    Lecture Notes:

    Further Reads:

  • Lecture
    09/16/2025
    Tuesday
    Lecture 15: Bellman Optimality Equation

    Lecture Notes:

    Further Reads:

  • Lecture
    09/19/2025
    Friday
    Lecture 16: Back-Tracking Optimal Policy

    Lecture Notes:

    Further Reads:

  • Lecture
    09/19/2025
    Friday
    Lecture 17: Policy Evaluation by Dynamic Programming

    Lecture Notes:

    Further Reads:

  • Lecture
    09/19/2025
    Friday
    Lecture 18: Policy Improvement and Policy Iteration

    Lecture Notes:

    Further Reads:

  • Assignment
    09/21/2025
    Sunday
    Project Proposal released!
  • Lecture
    09/23/2025
    Tuesday
    Lecture 19: Value Iteration

    Lecture Notes:

    Further Reads:

  • Lecture
    09/23/2025
    Tuesday
    Lecture 20: Generalized Policy Iteration

    Lecture Notes:

    Further Reads:

  • Lecture
    09/23/2025
    Tuesday
    Lecture 21: Model-free Policy Evaluation via Monte-Carlo

    Lecture Notes:

    Further Reads:

  • Lecture
    09/26/2025
    Friday
    Lecture 22: GPI via Monte-Carlo

    Lecture Notes:

    Further Reads:

  • Lecture
    09/26/2025
    Friday
    Lecture 23: Bootstrapping

    Lecture Notes:

    Further Reads:

  • Lecture
    09/26/2025
    Friday
    Lecture 24: GPI via Temporal Difference

    Further Reads:

    • TD-0: Chapter 6 - Sections 6.2 and 6.3 of [SB]
  • Lecture
    09/30/2025
    Tuesday
    Lecture 25: Deep Bootstrapping and TD-n

    Further Reads:

    • TD-n: Chapter 7 - Sections 7.1 and 7.2 of [SB]
  • Lecture
    09/30/2025
    Tuesday
    Lecture 26: TD-λ

    Further Reads:

  • Due
    09/30/2025 23:59
    Tuesday
    Assignment #1 due
  • Lecture
    10/03/2025
    Friday
    Lecture 27: TD with Eligibility Tracing

    Further Reads:

  • Lecture
    10/03/2025
    Friday
    Lecture 28: Control Loop with Monte Carlo

    Further Reads:

  • Lecture
    10/03/2025
    Friday
    Lecture 29: Adding Exploration to Control Loop

    Further Reads:

  • Due
    10/03/2025 23:59
    Friday
    Proposal due
  • Assignment
    10/06/2025
    Monday
    Assignment #2 - Tabular RL released!
  • Exam
    10/21/2025 17:00
    Tuesday
    Midterm

    Topics:

    • The exam is 3 hours long
    • No programming questions
    • Starts at 5:00 PM
  • Due
    10/24/2025 23:59
    Friday
    Assignment #2 due

Tutorial Schedule