Applied Deep Learning / Fall 2025

Updates

  • New Lecture is up: Lecture 27: Overfitting
  • New Lecture is up: Lecture 26: RMSprop and Adam
  • New Lecture is up: Lecture 25: Optimizer Boosting -- Scheduling, Momentum and Rprop Ideas
  • New Assignment released: [Assignment #2 - Feedforward Neural Networks]
  • New Lecture is up: Lecture 24: Linear and Sub-linear Convergence Speed
  • New Lecture is up: Lecture 23: Evaluation and Generalization Measures
  • New Lecture is up: Lecture 22: Mini-batch SGD and Complexity-Variance Tradeoff

For the Quercus page of the course please click here

Course Description

The key goal of this course is to provide a fundamental understanding on Computational Learning, its functionality, and its deployment towards building information processing units. These concepts are keys to Deep Learning and its Applications. The course is designed to give a good mixture of fundamental notions and hands-on skills by teaching both theory of deep learning and recent advances in this area.

Time and Place

Lectures

Lectures start on September 2, 2025. Please note that the lecture halls are different in Tuesdays and Fridays.

Day Time Place
Tuesdays 11 AM - 1 PM SF-1101 - Sandford Fleming Building
Fridays 11 AM - 1 PM MP-103 - McLennan Physical Laboratories

Tutorials

Tutorials sessions start on September 19, 2025.

Day Time Place
Fridays 1 PM - 2 PM SU B120 - Student Commons at 230 College Street

Course Office Hours

Day Time
Thursdays 1 PM - 2 PM

Instructor

Assistant Professor (TS)

ECE Department

Bahen 7208

Course Syllabus

The course is given in three parts, namely Fundamentals of Computational Learning, Deep Neural Networks, and Advances in Deep Learning. The contents taught in each part are listed below:

Part I: Principles of Computational Learning

  1. Learning from Data
    • Data-driven Approaches
    • Concept of Learning
  2. Basic Definitions
    • Types of Learning Tasks
    • Key Components of Computational Learning
      • Data
      • Model
      • Loss
    • Neural Networks
      • Basic Computational Unit: Artificial Neuron
      • Perceptron Machine
      • Artificial Neural Networks
      • Deep Neural Networks
  3. Learning via Deep Models
    • Universal Approximation Theorem
    • Training Deep Models via Empirical Risk Minimization
    • Inference via Deep Models
    • Building Training Loop via Gradient Descent Algorithm
      • Review: Function Optimization
      • Review: Gradient Descent Algorithm

Part II: Deep Neural Networks

  1. Fully-connected Feedforward Neural Networks
    • Deep Multilayer Perceptrons (MLPs)
    • Inference and Training of MLPs
      • Inference via Forward Propagation
      • Training by Sample Gradients
      • Challenge of Computing Gradient
    • Backpropagation Algorithm
      • Computation Graph
      • Computing Gradients on Graphs via Forward and Backward Pass
      • Backward Pass on Basic Vector Operations
      • Backpropagating over MLPs
      • Duality of Forward and Backward Computations in MLPs
    • Building Training Loop for MLPs
      • Mini-Batch Stochastic Gradient Descent (SGD)
      • Complexity-Variance Trade-off in SGD
    • Evaluating Trained Models
      • Independence of Test and Training Samples
      • Learning Curves and Classic Evaluation Metrics
    • Examples: Classification and Regression by MLPs
  2. Advances in Optimization, Model Fitting and Data Preprocessing
    • Gradient-based Optimizers
      • Stabilizing SGD via Learning-Rate Scheduling
      • Momentum and Its Bias-Variance Trade-off
      • Dimension-level Learning-Rate Scheduling: Resilient Backpropagation (Rprop)
      • Dimension-level Scheduling by Momentum: Root Mean Square Propagation (RMSprop)
      • Adaptive Moment Estimation (Adam) Algorithm
    • Overfitting and Underfitting
      • Concept of Over- and Underfitting
      • Roots of Overfitting: Model Complexity, Data Sufficiency and Co-adaptation
      • Dealing with Overfitting I: Hyperparameter Tuning over Validation Dataset
      • Dealing with Overfitting II: Regularization and Dropout
    • Statistical Viewpoint on Data
      • Data Space and Data Distribution
      • Dealing with Overfitting III: Data Augmentation and Synthetic Data Generation
      • Basics of Data Cleaning
    • Standardization and Normalization
      • Input Normalization
      • Batch Normalization
      • Layer Normalization
  3. Convolutional Neural Networks (CNNs)
    • Locally-Connected Models with Shared Weights
      • Local Connectivity versus Full Connectivity
      • Computational Models with Shared Parameters
    • Components of CNNs
      • Tensor-type Convolution and Convolutional Layer
      • Pooling Layer
      • Upsampling and Downsampling Layers
    • Building a Custom CNN
      • Deep CNN for Image Processing
      • Resizing and Normalization
      • Receptive Field and Feature Extraction
    • Training CNNs
      • Backpropagation through Convolutional Layer
      • Backpropagation through Pooling Layer
      • Backpropagation through Resampling Layers
    • Examples of Deep CNNs
  4. Residual Learning
    • ILSVR Challenge: ImageNet
    • Depth Challenge
      • Vanishing Gradient
      • Exploding Gradient
    • Residual Learning
      • Learning Residual Function
      • Skip Connection
      • Concentration Property of Gradient in Residual Learning
    • Residual Networks (ResNets)
      • ResNet Units
      • Deep ResNets
      • Short and Long Skip Connections
    • State-of-the-Art Residual Architectures
  5. Recurrent Neural Networks (RNNs)
    • Sequence Learning by Recurrence
      • Sequential Data and State-Space Models
      • Building Computational State-Space Models by Recurrence
      • Elman and Jordan Networks
      • Forward Propagation Through Time
      • Deep Recurrent Neural Networks (RNNs)
    • Training RNNs
      • Backpropagation Through Time (BPTT)
      • Vanishing and Exploding Gradient in BPTT
      • Truncated BPTT
    • Concept of Gating
      • Notion of Gate and Gating Mechanism
      • Building a Computational Gate
      • Gated Architecture I: Gated Recurrent Unit (GRU)
      • Gated Architecture II: Long Short-Term Memory (LSTM)
    • Bidirectional Sequence Processing
    • Segmentation in Sequence Learning
      • Pre-segmentation of Data versus Labeling Unsegmented Data
      • Connectionist Temporal Classification (CTC) Method

Part III: Advances in Deep Learning

  1. Sequence-to-Sequence Models
    • Encoding and Decoding via Computational Models
      • General Learning Tasks with Sequence Data
      • Learning Universal Context
    • Basic Language Model
      • Processing text to tokens
      • Learning to predict next tokens
    • Neural Machine Translation (NMT)
      • Basic Translator by RNNs
      • Restricted Memory of Recurrent NMT
    • Attention Mechanism and Transformer
      • Attention Mechanism
      • Computational Modeling of Attention: Key and Query Correlation
      • Using Attention for Context Extraction: Cross-Attention
      • Sequence Processing with Attention
      • Multihead Self-Attention
      • Transformer: Encoding-Decoding via Attention
  2. Deep Unsupervised Learning
    • Review: Principle Component Analysis (PCA)
    • Autoencoding
      • Nonlinear PCA by Computational Models
      • Notion of Latent Representation
      • Autoencoders
    • Training Autoencoders
      • Training Vanilla Autoencoder
      • Training Denoising Autoencoder
      • Constraining Latent Representation by Regularization
    • Data Generation by Autoencoders
      • Naive Generative Model by Vanilla Autoencoder
      • Variational Autoencoders (VAE)
      • Training and Sampling VAEs

Course Evaluation

The learning procedure consists of three components:

Component Grade %
Assignments 45% 4 assignment sets
Exam 25% midterm exam
Project 30% open-ended
Let's get through them a bit in detail.

Assignment

No need to say that this is the most important part! We are learning Deep Learning. This means that we should get our hands dirty and implement whatever we learn. There will be four set of assignments. Roughly speaking, the first one goes through basics and make you get into the mood of the course. The next three go through MLPs, CNNs and Sequence Models. There will be also some optional items for self-practice. The assignments will count roughly for 45% of your final mark.

Exam

Let us call it midterm, but to be fair it would be a theory-side-of-the-course exam! When we are over with the fundamental topics, somewhere halfway through Step 2 in Week 8, there will be an exam. The exam will ask questions on theory. There will be of course no programming questions in the exam! The exam will only evaluate the understanding of fundamental concepts through questions that can be either explained in words or solved simply by hand. This exam will comprise 25% of the total mark.

Course Project

Last but not the least, there will be a final project. This is the exciting part of the course, where we could challenge ourselves and test the development of our skills! Each group suggests its own problem that is to be solved by Deep Learning. Each group will then work on the project through the semester and present its final result at the end in a report and a short presentation. Doing this project, we will build self-confidence that we are now experts in Deep Learning. This part will be 30% of the total mark. Note that the final projects plays the role of final evaluation for this course.