Applied Deep Learning / Fall 2025

Updates

New Lecture is up: Lecture 48: Gating Principle
New Lecture is up: Lecture 47: Backpropagation Through Time
New Lecture is up: Lecture 46: Different Sequence Problems
New Lecture is up: Lecture 45: Sequence Processing by Recursion
New Lecture is up: Lecture 44: Processing Sequence Data
New Lecture is up: Lecture 43: Skip Connection and ResNet
New Lecture is up: Lecture 42: Vanishing Gradient in Deep Networks

For the Quercus page of the course please click here

Midterm Exam

The midterm exam will be on Friday October 24, 2025 starting on 11:00 AM in FE 114 (located at 371 Bloor Steet West).

For the exam, please pay attention to the following points:

The exam will include only written questions and does not include programming
The exam is 3 hours long
Please bring a photo ID with you to the exam
The exam is open-book meaning that you can bring any printed material with you (printed notes, your own notes, cheat-sheet, textbook, etc). Electronic devices are though NOT allowed in the exam

If you need any assistance or specific accommodation for the exam, please feel free to reach out.

Course Description

The key goal of this course is to provide a fundamental understanding on Computational Learning, its functionality, and its deployment towards building information processing units. These concepts are keys to Deep Learning and its Applications. The course is designed to give a good mixture of fundamental notions and hands-on skills by teaching both theory of deep learning and recent advances in this area.

Time and Place

Lectures

Lectures start on September 2, 2025. Please note that the lecture halls are different in Tuesdays and Fridays.

Day	Time	Place
Tuesdays	11 AM - 1 PM	SF-1101 - Sandford Fleming Building
Fridays	11 AM - 1 PM	MP-103 - McLennan Physical Laboratories

Tutorials

Tutorials sessions start on September 19, 2025.

Day	Time	Place
Fridays	1 PM - 2 PM	SU B120 - Student Commons at 230 College Street

Course Office Hours

Day	Time
Thursdays	1 PM - 2 PM

Instructor

Ali Bereyhi

Assistant Professor (TS)

ECE Department

Bahen 7208

Teaching Assistants

Course Syllabus

The course is given in three parts, namely Fundamentals of Computational Learning, Deep Neural Networks, and Advances in Deep Learning. The contents taught in each part are listed below:

Part I: Principles of Computational Learning

Learning from Data

Data-driven Approaches
Concept of Learning

Basic Definitions

Types of Learning Tasks
Key Components of Computational Learning

Data
Model
Loss

Neural Networks

Basic Computational Unit: Artificial Neuron
Perceptron Machine
Artificial Neural Networks
Deep Neural Networks

Learning via Deep Models

Universal Approximation Theorem
Training Deep Models via Empirical Risk Minimization
Inference via Deep Models
Building Training Loop via Gradient Descent Algorithm

Review: Function Optimization
Review: Gradient Descent Algorithm

Part II: Deep Neural Networks

Fully-connected Feedforward Neural Networks

Deep Multilayer Perceptrons (MLPs)
Inference and Training of MLPs

Inference via Forward Propagation
Training by Sample Gradients
Challenge of Computing Gradient

Backpropagation Algorithm

Computation Graph
Computing Gradients on Graphs via Forward and Backward Pass
Backward Pass on Basic Vector Operations
Backpropagating over MLPs
Duality of Forward and Backward Computations in MLPs

Building Training Loop for MLPs

Mini-Batch Stochastic Gradient Descent (SGD)
Complexity-Variance Trade-off in SGD

Evaluating Trained Models

Independence of Test and Training Samples
Learning Curves and Classic Evaluation Metrics

Examples: Classification and Regression by MLPs

Advances in Optimization, Model Fitting and Data Preprocessing

Gradient-based Optimizers

Stabilizing SGD via Learning-Rate Scheduling
Momentum and Its Bias-Variance Trade-off
Dimension-level Learning-Rate Scheduling: Resilient Backpropagation (Rprop)
Dimension-level Scheduling by Momentum: Root Mean Square Propagation (RMSprop)
Adaptive Moment Estimation (Adam) Algorithm

Overfitting and Underfitting

Concept of Over- and Underfitting
Roots of Overfitting: Model Complexity, Data Sufficiency and Co-adaptation
Dealing with Overfitting I: Hyperparameter Tuning over Validation Dataset
Dealing with Overfitting II: Regularization and Dropout

Statistical Viewpoint on Data

Data Space and Data Distribution
Dealing with Overfitting III: Data Augmentation and Synthetic Data Generation
Basics of Data Cleaning

Standardization and Normalization

Input Normalization
Batch Normalization
Layer Normalization

Convolutional Neural Networks (CNNs)

Locally-Connected Models with Shared Weights

Local Connectivity versus Full Connectivity
Computational Models with Shared Parameters

Components of CNNs

Tensor-type Convolution and Convolutional Layer
Pooling Layer
Upsampling and Downsampling Layers

Building a Custom CNN

Deep CNN for Image Processing
Resizing and Normalization
Receptive Field and Feature Extraction

Training CNNs

Backpropagation through Convolutional Layer
Backpropagation through Pooling Layer
Backpropagation through Resampling Layers

Examples of Deep CNNs

Residual Learning

ILSVR Challenge: ImageNet
Depth Challenge

Vanishing Gradient
Exploding Gradient

Residual Learning

Learning Residual Function
Skip Connection
Concentration Property of Gradient in Residual Learning

Residual Networks (ResNets)

ResNet Units
Deep ResNets
Short and Long Skip Connections

State-of-the-Art Residual Architectures

Recurrent Neural Networks (RNNs)

Sequence Learning by Recurrence

Sequential Data and State-Space Models
Building Computational State-Space Models by Recurrence
Elman and Jordan Networks
Forward Propagation Through Time
Deep Recurrent Neural Networks (RNNs)

Training RNNs

Backpropagation Through Time (BPTT)
Vanishing and Exploding Gradient in BPTT
Truncated BPTT

Concept of Gating

Notion of Gate and Gating Mechanism
Building a Computational Gate
Gated Architecture I: Gated Recurrent Unit (GRU)
Gated Architecture II: Long Short-Term Memory (LSTM)

Bidirectional Sequence Processing
Segmentation in Sequence Learning

Pre-segmentation of Data versus Labeling Unsegmented Data
Connectionist Temporal Classification (CTC) Method

Part III: Advances in Deep Learning

Sequence-to-Sequence Models

Encoding and Decoding via Computational Models

General Learning Tasks with Sequence Data
Learning Universal Context

Basic Language Model

Processing text to tokens
Learning to predict next tokens

Neural Machine Translation (NMT)

Basic Translator by RNNs
Restricted Memory of Recurrent NMT

Attention Mechanism and Transformer

Attention Mechanism
Computational Modeling of Attention: Key and Query Correlation
Using Attention for Context Extraction: Cross-Attention
Sequence Processing with Attention
Multihead Self-Attention
Transformer: Encoding-Decoding via Attention

Deep Unsupervised Learning

Review: Principle Component Analysis (PCA)
Autoencoding

Nonlinear PCA by Computational Models
Notion of Latent Representation
Autoencoders

Training Autoencoders

Training Vanilla Autoencoder
Training Denoising Autoencoder
Constraining Latent Representation by Regularization

Data Generation by Autoencoders

Naive Generative Model by Vanilla Autoencoder
Variational Autoencoders (VAE)
Training and Sampling VAEs

Course Evaluation

The learning procedure consists of three components:

Component	Grade %
Assignments	45%	4 assignment sets
Exam	25%	midterm exam
Project	30%	open-ended

Let's get through them a bit in detail.

Assignment

No need to say that this is the most important part! We are learning Deep Learning. This means that we should get our hands dirty and implement whatever we learn. There will be four set of assignments. Roughly speaking, the first one goes through basics and make you get into the mood of the course. The next three go through MLPs, CNNs and Sequence Models. There will be also some optional items for self-practice. The assignments will count roughly for 45% of your final mark.

Exam

Let us call it midterm, but to be fair it would be a theory-side-of-the-course exam! When we are over with the fundamental topics, somewhere halfway through Step 2 in Week 8, there will be an exam. The exam will ask questions on theory. There will be of course no programming questions in the exam! The exam will only evaluate the understanding of fundamental concepts through questions that can be either explained in words or solved simply by hand. This exam will comprise 25% of the total mark.

Course Project

Last but not the least, there will be a final project. This is the exciting part of the course, where we could challenge ourselves and test the development of our skills! Each group suggests its own problem that is to be solved by Deep Learning. Each group will then work on the project through the semester and present its final result at the end in a report and a short presentation. Doing this project, we will build self-confidence that we are now experts in Deep Learning. This part will be 30% of the total mark. Note that the final projects plays the role of final evaluation for this course.