Lecture Notes

The lecture notes are uploaded through the semester. For each chapter, the notes are provided section by section.

Chapter 0: Course Overview and Logistics

  • Handouts: All Sections included in a single file

Chapter 1: RL Framework

Chapter 2: Model-based RL

Chapter 3: Model-free Tabular RL

Tutorial Notebooks and Videos

The tutorial notebooks can be accessed below.

Book

Most of the contents covered in the first two parts of course can be further read in

The following old preprint is also a good summary of important RL algorithms

Most materials in the third part, i.e., deep RL, are collected from research papers. The following textbook is also good resources for practicing hands-on skills.

Reading List

This section will be completed gradually through the semester.

Chapter 1: RL Framework

Introduction

Multi-armed Bandit

  • k-armed Bandit: Chapter 2 - Section 2.1 of [SB]
  • Robbins’ Paper: Paper Some aspects of the sequential design of experiments by H. Robbins published in the Bulletin of the American Mathematical Society in 1952 formulating multi-armed bandit as we know it nowadays

RL Problem Formulation

Terminal State and Episode

Chapter 2: Model-based RL

MDPs

Bellman Equation and Optimal Policy

Policy Iteration

Value Iteration

Chapter 2: Model-free RL

Monte Carlo Approach

Temporal Difference

  • TD-0: Chapter 6 - Sections 6.1, 6.3 and 6.3 of [SB]

Deep Temporal Difference

  • TD-n: Chapter 7 - Sections 7.1 and 7.2 of [SB]

Credit Assignment

Eligibility Trace

Monte-Carlo Control

ε-Greedy Improvement

Temporal-Difference Control

Sarsa Algorithm

Importance Sampling and Off-policy Learning

Q-Learning