Reinforcement Learning / Fall 2025
Updates
- New Assignment released: [Assignment #2 - Tabular RL]
- New Lecture is up: Lecture 29: Adding Exploration to Control Loop
- New Lecture is up: Lecture 28: Control Loop with Monte Carlo
- New Lecture is up: Lecture 27: TD with Eligibility Tracing
- New Lecture is up: Lecture 26: TD-λ
- New Lecture is up: Lecture 25: Deep Bootstrapping and TD-n
- New Lecture is up: Lecture 24: GPI via Temporal Difference
For the Quercus page of the course please click here
Course Description
This course develops hands-on skills in deep reinforcement learning, for which fundamentals of reinforcement learning are first discussed and then deep reinforcement learning algorithms are studied. The course is designed in three major parts: Part I gives the students a warm welcome by taking them through the basic definitions and fundamental concepts. Part II explains fundamental reinforcement learning methods by touching the key model-based and model-free techniques and providing deep understanding of these methods. Part III explores deep reinforcement learning, where deep neural networks are employed to efficiently approximate the developed techniques in Part II. In this part, we take a look into several algorithms, such as deep Q-learning, policy gradient methods, e.g., trust-region and proximal policy optimization algorithms, and actor-critic methods.
Time and Place
Lectures
Lectures start on September 2, 2025. Please note that the lecture halls are different in Tuesdays and Fridays.
Day | Time | Place |
Tuesdays | 5 PM - 7 PM | BA-1170 - Bahen Centre for Information Technology |
Fridays | 5 PM - 7 PM | BA-1180 - Bahen Centre for Information Technology |
Tutorials
Tutorials sessions start on September 16, 2025.
Day | Time | Place |
Tuesdays | 4 PM - 5 PM | BA-1160 - Bahen Centre for Information Technology |
Course Office Hours
Day | Time |
Thursdays | 12 PM - 1 PM |
Course Description
This course provides a concrete understanding of reinforcement learning and its applications. The ultimate goal of the course is to develop hands-on skills in deep reinforcement learning, for which fundamentals of reinforcement learning are first discussed and then deep reinforcement learning algorithms are studied. The course is designed in three major parts: Part I introduces basic definitions and fundamental concepts; Part II covers fundamental reinforcement learning methods (model-based and model-free) with a deep understanding of each; Part III explores deep reinforcement learning, employing deep neural networks to approximate the techniques in Part II (e.g., function approximation, deep Q-learning, policy gradient methods, and proximal policy optimization).
Part I: First Things in Reinforcement Learning
- General framework of reinforcement learning
- The multi-armed bandit problem
- Components: Agent, Environment, State, Action, Reward, Policy
- Comparison to supervised learning
- Value function and policy design
- Problem of Credit Assignment
- Exploration versus Exploitation
- Revisiting the multi-armed bandit problem
- Trade-off between exploration and exploitation
- Introduction to Gymnasium library
- Generating an environment in Gymnasium
- Our first try: a simple game
Part II: Fundamentals of Reinforcement Learning
- Model-based reinforcement learning
- Markov Decision Processes (MDPs)
- Value and policy with MDPs
- Dynamic programming and Bellman equation
- Value iteration and Policy iteration algorithms
- Model-free reinforcement learning
- On-policy versus off-policy approaches for model-free reinforcement learning
- Difference and properties of on-policy and off-policy methods
- On-policy approach 1: Monte-Carlo (MC) learning
- On-policy approach 2: Temporal Difference (TD) learning
- From value function to Q-function
- On-policy approach 3: State-Action-Reward-State-Action (SARSA)
- Off-policy approach: Q-learning
- Revisiting our simple game
- Implementing value and policy iteration in Gymnasium
Part III: Deep Reinforcement Learning
- Reviewing main concepts in deep learning
- Universal approximation theorem
- Deep neural networks
- Training a neural net via gradient descent
- Reviewing neural network implementation in PyTorch
- Preliminaries of Deep Reinforcement Learning
- Function approximation
- Space reduction via function approximation
- Simple function approximator
- Deep neural networks as function approximators
- Looking into a new example
- Deep off-policy reinforcement learning
- Value networks: value function approximation via deep neural networks
- Deep Q-learning and Deep Q-networks (DQNs)
- Properties of deep Q-learning: sample efficiency and instability
- Visiting our new example
- Deep on-policy methods
- Policy networks: policy approximation via deep neural networks
- Policy gradient methods
- Direct policy update
- Properties of deep policy networks: sample inefficiency versus stability
- Trust Region Policy Optimization (TRPO)
- Constraining policy update via Kullback-Leibler divergence
- Idea of surrogate objective function
- Proximal Policy Optimization (PPO)
- Clipping
- Complexity of PPO
- Revisiting our new example
- Actor-Critic Methods
- Advantage Actor Critic (A2C)
- TRPO and PPO with value network
- Deterministic Policy Gradient
- Deep Deterministic Policy Gradient (DDPG)
- Soft Actor Critic (SAC)
- Extensions and modification
- Applications and advancements of deep reinforcement learning
- Looking into some successful examples: Alpha-Go, Alpha-Zero, Pluribus, and OpenAI Five
- Sample applications of deep reinforcement learning and project poster session
- Recent advancements in deep reinforcement learning
Course Evaluation
The learning procedure consists of three components:
Component | Grade % | |
Assignments | 42% | 3 assignment sets |
Exam | 25% | midterm exam |
Project | 33% | selected from predefined set of projects or open-ended |
Assignment
This is the most important part! We are learning an applied course and hence should implement whatever we learn. There will be three sets of assignments. Roughly speaking, the first one goes through fundamentals of reinforcement learning. The second assignment gets more serious in terms of implementation and develops your knowledge on tabular reinforcement learning methods. The last one gives you a chance to implement some mini-project on Deep Reinforcement Learning. The assignments will count for 42% of your final mark.
Exam
This will be a theory-side-of-the-course exam. When we are over with Part 2, there will be an exam. The exam will ask questions on theory. There will be of course no programming questions in the exam. The exam will only evaluate the understanding of fundamental concepts and reinforcement learning methods through questions that can be either explained in words or solved simply by hand. This exam will comprise 25% of the total mark.
Course Project
This is the exciting part of the course, where we could challenge ourselves and test the development of our skills. The topic of the project is set to be an application of deep reinforcement learning. The project is designed in advance with its milestones and tasks being clear. Required documents for the project will be shared. You then need to use your knowledge to gradually finish the required tasks by the end of semester. You have further the option to define your own project. An open-ended project must be defined to be of the same level of the predefined projects, with the milestones being clearly specified. The final projects will be presented in the seminar session which holds on the last week.