Reinforcement Learning / Fall 2025

Updates

  • New Assignment released: [Assignment #2 - Tabular RL]
  • New Lecture is up: Lecture 29: Adding Exploration to Control Loop
  • New Lecture is up: Lecture 28: Control Loop with Monte Carlo
  • New Lecture is up: Lecture 27: TD with Eligibility Tracing
  • New Lecture is up: Lecture 26: TD-λ
  • New Lecture is up: Lecture 25: Deep Bootstrapping and TD-n
  • New Lecture is up: Lecture 24: GPI via Temporal Difference

For the Quercus page of the course please click here

Course Description

This course develops hands-on skills in deep reinforcement learning, for which fundamentals of reinforcement learning are first discussed and then deep reinforcement learning algorithms are studied. The course is designed in three major parts: Part I gives the students a warm welcome by taking them through the basic definitions and fundamental concepts. Part II explains fundamental reinforcement learning methods by touching the key model-based and model-free techniques and providing deep understanding of these methods. Part III explores deep reinforcement learning, where deep neural networks are employed to efficiently approximate the developed techniques in Part II. In this part, we take a look into several algorithms, such as deep Q-learning, policy gradient methods, e.g., trust-region and proximal policy optimization algorithms, and actor-critic methods.

Time and Place

Lectures

Lectures start on September 2, 2025. Please note that the lecture halls are different in Tuesdays and Fridays.

Day Time Place
Tuesdays 5 PM - 7 PM BA-1170 - Bahen Centre for Information Technology
Fridays 5 PM - 7 PM BA-1180 - Bahen Centre for Information Technology

Tutorials

Tutorials sessions start on September 16, 2025.

Day Time Place
Tuesdays 4 PM - 5 PM BA-1160 - Bahen Centre for Information Technology

Course Office Hours

Day Time
Thursdays 12 PM - 1 PM

Instructor

Assistant Professor (TS)

ECE Department

Bahen 7208

Course Description

This course provides a concrete understanding of reinforcement learning and its applications. The ultimate goal of the course is to develop hands-on skills in deep reinforcement learning, for which fundamentals of reinforcement learning are first discussed and then deep reinforcement learning algorithms are studied. The course is designed in three major parts: Part I introduces basic definitions and fundamental concepts; Part II covers fundamental reinforcement learning methods (model-based and model-free) with a deep understanding of each; Part III explores deep reinforcement learning, employing deep neural networks to approximate the techniques in Part II (e.g., function approximation, deep Q-learning, policy gradient methods, and proximal policy optimization).

Part I: First Things in Reinforcement Learning

  1. General framework of reinforcement learning
    • The multi-armed bandit problem
    • Components: Agent, Environment, State, Action, Reward, Policy
    • Comparison to supervised learning
    • Value function and policy design
    • Problem of Credit Assignment
  2. Exploration versus Exploitation
    • Revisiting the multi-armed bandit problem
    • Trade-off between exploration and exploitation
  3. Introduction to Gymnasium library
    • Generating an environment in Gymnasium
    • Our first try: a simple game

Part II: Fundamentals of Reinforcement Learning

  1. Model-based reinforcement learning
    • Markov Decision Processes (MDPs)
    • Value and policy with MDPs
    • Dynamic programming and Bellman equation
    • Value iteration and Policy iteration algorithms
  2. Model-free reinforcement learning
    • On-policy versus off-policy approaches for model-free reinforcement learning
      • Difference and properties of on-policy and off-policy methods
    • On-policy approach 1: Monte-Carlo (MC) learning
    • On-policy approach 2: Temporal Difference (TD) learning
    • From value function to Q-function
    • On-policy approach 3: State-Action-Reward-State-Action (SARSA)
    • Off-policy approach: Q-learning
  3. Revisiting our simple game
    • Implementing value and policy iteration in Gymnasium

Part III: Deep Reinforcement Learning

  1. Reviewing main concepts in deep learning
    • Universal approximation theorem
    • Deep neural networks
      • Training a neural net via gradient descent
      • Reviewing neural network implementation in PyTorch
  2. Preliminaries of Deep Reinforcement Learning
    • Function approximation
      • Space reduction via function approximation
    • Simple function approximator
    • Deep neural networks as function approximators
    • Looking into a new example
  3. Deep off-policy reinforcement learning
    • Value networks: value function approximation via deep neural networks
    • Deep Q-learning and Deep Q-networks (DQNs)
    • Properties of deep Q-learning: sample efficiency and instability
    • Visiting our new example
  4. Deep on-policy methods
    • Policy networks: policy approximation via deep neural networks
    • Policy gradient methods
      • Direct policy update
      • Properties of deep policy networks: sample inefficiency versus stability
    • Trust Region Policy Optimization (TRPO)
      • Constraining policy update via Kullback-Leibler divergence
      • Idea of surrogate objective function
    • Proximal Policy Optimization (PPO)
      • Clipping
      • Complexity of PPO
    • Revisiting our new example
  5. Actor-Critic Methods
    • Advantage Actor Critic (A2C)
    • TRPO and PPO with value network
    • Deterministic Policy Gradient
      • Deep Deterministic Policy Gradient (DDPG)
    • Soft Actor Critic (SAC)
    • Extensions and modification
  6. Applications and advancements of deep reinforcement learning
    • Looking into some successful examples: Alpha-Go, Alpha-Zero, Pluribus, and OpenAI Five
    • Sample applications of deep reinforcement learning and project poster session
    • Recent advancements in deep reinforcement learning

Course Evaluation

The learning procedure consists of three components:

Component Grade %
Assignments 42% 3 assignment sets
Exam 25% midterm exam
Project 33% selected from predefined set of projects or open-ended
Let's get through them a bit in detail.

Assignment

This is the most important part! We are learning an applied course and hence should implement whatever we learn. There will be three sets of assignments. Roughly speaking, the first one goes through fundamentals of reinforcement learning. The second assignment gets more serious in terms of implementation and develops your knowledge on tabular reinforcement learning methods. The last one gives you a chance to implement some mini-project on Deep Reinforcement Learning. The assignments will count for 42% of your final mark.

Exam

This will be a theory-side-of-the-course exam. When we are over with Part 2, there will be an exam. The exam will ask questions on theory. There will be of course no programming questions in the exam. The exam will only evaluate the understanding of fundamental concepts and reinforcement learning methods through questions that can be either explained in words or solved simply by hand. This exam will comprise 25% of the total mark.

Course Project

This is the exciting part of the course, where we could challenge ourselves and test the development of our skills. The topic of the project is set to be an application of deep reinforcement learning. The project is designed in advance with its milestones and tasks being clear. Required documents for the project will be shared. You then need to use your knowledge to gradually finish the required tasks by the end of semester. You have further the option to define your own project. An open-ended project must be defined to be of the same level of the predefined projects, with the milestones being clearly specified. The final projects will be presented in the seminar session which holds on the last week.