Schedule
-
EventDateDescriptionDescription
-
Session09/02/2025 17:00
TuesdayFirst Lecture -
Lecture09/02/2025
TuesdayLecture 0: Course Overview and LogisticsLecture Notes:
-
Lecture09/02/2025
TuesdayLecture 1: RL as a Learning Problem -
Lecture09/02/2025
TuesdayLecture 2: Optimal and Random Playing of Multi-armed BanditLecture Notes:
Further Reads:
- k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
- Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
-
Lecture09/05/2025
FridayLecture 3: Exploiting Explorations in Multi-armed BanditLecture Notes:
Further Reads:
- k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
- Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays
-
Lecture09/05/2025
FridayLecture 4: Formulating the RL Framework -
Lecture09/05/2025
FridayLecture 5: Environment as State-Dependent System -
Lecture09/09/2025
TuesdayLecture 6: Examples of RL SettingLecture Notes:
-
Lecture09/09/2025
TuesdayLecture 7: Policy and Its Value -
Lecture09/09/2025
TuesdayLecture 8: Playing Tic-Tac-Toe -
Lecture09/09/2025
TuesdayLecture 9: Optimal PolicyLecture Notes:
-
Lecture09/12/2025
FridayLecture 10: Frozen Lake Example -- Terminal State and Episode -
Lecture09/12/2025
FridayLecture 11: Markov Decision Processes -
Lecture09/12/2025
FridayLecture 12: Value Function Calculation via MDPs -- Naive Approach -
Assignment09/16/2025
TuesdayAssignment #1 - Basics of RL released! -
Lecture09/16/2025
TuesdayLecture 13: Bellman Equation -
Lecture09/16/2025
TuesdayLecture 14: Bellman Equation for Action-Value and Backup Diagram -
Lecture09/16/2025
TuesdayLecture 15: Bellman Optimality Equation -
Lecture09/19/2025
FridayLecture 16: Back-Tracking Optimal Policy -
Lecture09/19/2025
FridayLecture 17: Policy Evaluation by Dynamic Programming -
Lecture09/19/2025
FridayLecture 18: Policy Improvement and Policy IterationLecture Notes:
Further Reads:
- Policy Improvement and Iteration: Chapter 4 - Sections 4.2 and 4.3 of [SB]
-
Assignment09/21/2025
SundayProject Proposal released! -
Lecture09/23/2025
TuesdayLecture 19: Value Iteration -
Lecture09/23/2025
TuesdayLecture 20: Generalized Policy IterationLecture Notes:
Further Reads:
- Generalized Policy Iteration: Chapter 4 - Sections 4.6 and 4.7 of [SB]
-
Lecture09/23/2025
TuesdayLecture 21: Model-free Policy Evaluation via Monte-Carlo -
Lecture09/26/2025
FridayLecture 22: GPI via Monte-Carlo -
Lecture09/26/2025
FridayLecture 23: Bootstrapping -
Lecture09/26/2025
FridayLecture 24: GPI via Temporal Difference -
Lecture09/30/2025
TuesdayLecture 25: Deep Bootstrapping and TD-n -
Lecture09/30/2025
TuesdayLecture 26: TD-λ -
Due09/30/2025 23:59
TuesdayAssignment #1 due -
Lecture10/03/2025
FridayLecture 27: TD with Eligibility TracingFurther Reads:
- Eligibility Tracing: Chapter 12 - Sections 12.4 and 12.5 of [SB]
-
Lecture10/03/2025
FridayLecture 28: Control Loop with Monte Carlo -
Lecture10/03/2025
FridayLecture 29: Adding Exploration to Control Loop -
Due10/03/2025 23:59
FridayProposal due -
Assignment10/06/2025
MondayAssignment #2 - Tabular RL released! -
Exam10/21/2025 17:00
TuesdayMidtermTopics:
- The exam is 3 hours long
- No programming questions
- Starts at 5:00 PM
-
Due10/24/2025 23:59
FridayAssignment #2 due