Materials - Applied Deep Learning / Fall 2025

Lecture Notes

The lecture notes are uploaded through the semester. For each chapter, the notes are provided section by section.

Chapter 0: Course Overview and Logistics

Handouts: All Sections included in a single file

Chapter 1: Fundamentals of Deep Learning

Section 1: Motivation to Learn DL
Section 2: Learning from Data: Basics
Section 3: Perceptron Machine
Section 4: Deep Neural Networks
Section 5: Function Optimization

Chapter 2: Feedforward NNs

Section 1: Forward Pass in MLPs
Section 2: Computing Gradient via Backpropagation on Computation Graph
Section 3: Multiclass Classification
Section 4: Mini-batch Training and SGD Algorithm

Chapter 3: Advances - Part I

Section 1: More on Optimizers
Section 2: Overfitting, Regularization and Dropout
Section 3: Data Distribution and Preporcessing
Section 4: Standardization and Batch Normalization

Chapter 4: Convolutional NNs

Section 1: Why Convolution?
Section 2: Components of CNNs
Section 3: Deep CNNs
Section 4: Training CNNs

Chapter 5: Residual Learning

Section 1: Depth and Vanishing Gradient
Section 2: Skip Connection and ResNet

Chapter 6: Sequence Processing

Section 1: Sequence Data
Section 2: RNNs
Section 3: Training RNNs
Section 4: Gating
Section 5: Bidirectional Sequence Processing
Section 6: CTC Algorithm

Chapter 7: Sequence to Sequence Models

Section 1: Seq2Seq
Section 2: Encoder Decoder
Section 3: Attention
Section 4: Self-Attention and Transformer

Tutorial Notebooks

The tutorial notebooks can be accessed below.

Tutorial 1: Intro to Python, Basic ML in Python, by Saleh Tabatabaei
Tutorial 2: Intro to PyTorch, Auto-grad, by Saleh Tabatabaei Watch the Video
Tutorial 3: Underfitting and Overfitting: How to Prevent Them, by Saleh Tabatabaei Watch the Video
Tutorial 4: CNNs by Saleh Tabatabaei Watch the Video
Tutorial 5: Midterm Review by Saleh Tabatabaei Watch the Video
Tutorial 6: ResNet by Saleh Tabatabaei Watch the Video
Tutorial 7: RNNs by Saleh Tabatabaei Watch the Video

Book

There is indeed no single textbook for this course, and we use various resources in the course. The following textbooks have covered the key notions in the course.

The following textbooks are also good resources for practicing hands-on skills. Note that we are not simply learning to implement only! We study the fundamentals of deep learning. Of course, we try to get our hands dirty as well and learn how to do implementation.

Reading List

This section will be completed gradually through the semester.

Chapter 1: Preliminaries

Introduction to DL

Motivation: Chapter 1 - Section 1.1 of [BB]

ML Components

Review on Linear Algebra: Chapter 2 of [GYC]
ML Components: Chapter 1 - Sections 1.2.1 to 1.2.4 of [BB]
ML Basics: Chapter 5 of [GYC]

Review on Probability Theory

Probability Theory: Chapter 2 of [BB]
Probability Review: Chapter 3 of [GYC]

Classification Problem

Binary Classification: Chapter 5 - Sections 5.1 and 5.2 of [BB]
McCulloch-Pitts Model: Paper A logical calculus of the ideas immanent in nervous activity published in the Bulletin of Mathematical Biophysics by Warren McCulloch and Walter Pitts in 1943, proposing a computational model for neuron. This paper is treated as the pioneer study leading to the idea of artificial neuron

Training via Risk Minimization

Overview on Risk Minimization: Paper An overview of statistical learning theory published as an overview of his life-going developments in ML in the IEEE Transactions on Neural Networks by Vladimir N. Vapnik in 1999

Perceptron Algorithm

Perceptron Simulation Experiments: Paper Perceptron Simulation Experiments presented by Frank Rosenblatt in Proceedings of IRE in 1960
Perceptron: Chapter 1 - Section 1.2.1 of [Ag]

Universal Approximation Theorem

Universal Approximation: Paper Approximation by superpositions of a sigmoidal function published in Mathematics of Control, Signals and Systems by George V. Cybenko in 1989

Deep NNs

DNNs: Chapter 6 - Sections 6.2 and 6.3 of [BB]

Optimization via Gradient Descent

Gradient-based Optimization: Chapter 4 - Sections 4.3 and 4.4 of [GYC]
Gradient Descent: Chapter 7 - Sections 7.1 to 7.2 of [BB]

Chapter 2: Fully-connected FNNs

Forward Propagation

Deep FNNs: Chapter 6 - Sections 6.3 and 6.4 of [GYC]

Backpropagation

Backpropagation: Chapter 6 - Section 6.5 of [GYC]
Backpropagation: Chapter 8 of [BB]
Backpropagation of Error Paper Learning representations by back-propagating errors published in Nature by D. Rumelhart, G. Hinton and R. Williams in 1986 advocating the idea of systematic gradient computation of a computation graph

Multi-class Classification

Binary Classification: Chapter 6 Section 6.6 of [BB]
Multi-class Models: Chapter 2 - Section 2.3 of [Ag]

Full-batch, sample-level and mini-batch SGD

SGD: Chapter 5 - Section 5.9 of [GYC]
SGD: Chapter 7 - Section 7.2 of [BB]

Generalization

Generalization: Chapter 6 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021

Chapter 3: Optimizers, Regularization and Data

Overfitting and Regularization

Regularization: Chapter 7 of [GYC]
Overfitting and Regularization: Chapter 9 - Sections 9.1 to 9.3 of [BB]
Tikhonov Paper Tikhonov Regularization and Total Least Squares published in 1999 by G. Golub et al. illustrating the Tikhonov Regularization work
Lasso Paper Regression Shrinkage and Selection Via the Lasso published in 1996 by R. Tibshirani proposing the legendary Lasso
Dropout 1 Paper Improving neural networks by preventing co-adaptation of feature detectors published in 2012 by G. Hinton et al. proposing Dropout
Dropout 2 Paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting published in 2014 by N. Srivastava et al. providing some analysis and illustrations on Dropout

Data: Data Distribution, Data Cleaning, and Outliers

Data: Chapter 8 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
Data Processing in Python Open Book Minimalist Data Wrangling with Python by Marek Gagolewski going through data processing in Python

Normalization

Normalization Paper Is normalization indispensable for training deep neural network? published in 2020 by J. Shao et al. discussing the meaning and effects of normalization
Batch-Norm Paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift published in 2015 by S. Ioffe and C. Szegedy proposing Batch Normalization
Batch-Norm Meaning Paper How Does Batch Normalization Help Optimization? published in 2018 by S. Santurkar et al. discussing why Batch Normalization works: they claim that the main reason is that loss landscape is getting much smoother

Chapter 4: Convolutional NNs

Development of CNNs

Hubel and Wiesel Study Paper Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex published in 1962 by D. Hubel and T. Wiesel elaborating their finding on visual understanding
Neocognitron Paper Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position published in 1980 by _K. Fukushima _ proposing the Neocognitron as a computational model for visual learning
Backpropagating on LeNet Paper Backpropagation Applied to Handwritten Zip Code Recognition published in 1989 by Y. LeCun et al. developing backpropagation for LeNet
LeNet Paper Gradient-Based Learning Applied to Document Recognition published in 1998 by Y. LeCun et al. discussing LeNet

Components of CNN

Convolution: Chapter 9 - Sections 9.1 and 9.2 of [GYC]
Convolution: Chapter 10 - Sections 10.2.1 and 10.2.2 of [BB]
Multi-channel Convolution: Chapter 10 - Sections 10.2.3 to 10.2.5 of [BB]
Pooling: Chapter 9 - Sections 9.3 of [GYC]
Pooling: Chapter 10 - Section 10.2.6 of [BB]
Flattening: Chapter 10 - Sections 10.2.7 and 10.2.8 of [BB]

Deep CNNs

Convolution: Chapter 9 - Sections 9.4 and 9.6 of [GYC]
VGG Paper Very Deep Convolutional Networks for Large-Scale Image Recognition published in 2014 by K. Simonyan and A. Zisserman proposing VGG Architectures

Backpropagation on CNN

LeCun’s Paper Paper Gradient-based learning applied to document recognition published in 2002 by Y. LeCun et al. summarizing the learning process in CNN
Efficient Backpropagation on CNN Paper High Performance Convolutional Neural Networks for Document Processing published in 2006 by K. Chellapilla et al. discussing efficient backpropagation on CNNs.

Chapter 5: Residual Learning

ResNet Paper Deep Residual Learning for Image Recognition published in 2015 by K. He et al. proposing ResNet
ResNet-1001 Paper Identity Mappings in Deep Residual Networks published in 2016 by K. He et al. demonstrating how deep ResNet can go
U-Net Paper U-Net: Convolutional Networks for Biomedical Image Segmentation published in 2015 by O. Ronneberger et al. proposing U-Net
DenseNet Paper Densely Connected Convolutional Networks published in 2017 by H. Huang et al. proposing DenseNet

Chapter 6: Sequence Processing via NNs

Basics of Sequence Processing

Jordan Network Paper Attractor dynamics and parallelism in a connectionist sequential machine published in 1986 by M. Jordan proposing his RNN
Elman Network Paper Finding structure in time published in 1990 by J. Elman proposing a revision to Jordan Network
Seq Models Article The Unreasonable Effectiveness of Recurrent Neural Networks written in May 2015 by A. Karpathy discussing different types of sequence problems

Backpropagation Through Time

BPTT Paper Backpropagation through time: What it does and how to do it published in 2002 by P. Werbos explaining BPTT
Vanishing Gradient with BPTT Paper On the difficulty of training recurrent neural networks published in 2013 by R. Pascanu et al. discussing challenges in training with BPTT
Truncated BPTT Paper An efficient gradient-based algorithm for on-line training of recurrent network trajectories published in 1990 by R. Williams and J. Peng explaining truncated BPTT

Gating

Gating Principle Chapter Long Short-Term Memory published in 2012 in book Supervised Sequence Labelling with Recurrent Neural Networks by A. Graves explaining Gating idea
LSTM Paper Long short-term memory published in 1997 by S. Hochreiter and J. Schmidhuber proposing LSTM
GRU Paper On the Properties of Neural Machine Translation: Encoder-Decoder Approaches published in 2014 by K. Cho et al. proposing GRU

CTC Algorithm

CTC Paper Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks published in 2006 by A. Graves et al. proposing CTC Algorithm

Lecture Notes

Chapter 0: Course Overview and Logistics

Chapter 1: Fundamentals of Deep Learning

Chapter 2: Feedforward NNs

Chapter 3: Advances - Part I

Chapter 4: Convolutional NNs

Chapter 5: Residual Learning

Chapter 6: Sequence Processing

Chapter 7: Sequence to Sequence Models

Tutorial Notebooks

Book

Reading List

Chapter 1: Preliminaries

Introduction to DL

ML Components

Review on Probability Theory

Classification Problem

Training via Risk Minimization

Perceptron Algorithm

Universal Approximation Theorem

Deep NNs

Optimization via Gradient Descent

Chapter 2: Fully-connected FNNs

Forward Propagation

Backpropagation

Multi-class Classification

Full-batch, sample-level and mini-batch SGD

Generalization

Chapter 3: Optimizers, Regularization and Data

More on Optimizers

Overfitting and Regularization

Data: Data Distribution, Data Cleaning, and Outliers

Normalization

Chapter 4: Convolutional NNs

Development of CNNs

Components of CNN

Deep CNNs

Backpropagation on CNN

Chapter 5: Residual Learning

Chapter 6: Sequence Processing via NNs

Basics of Sequence Processing

Backpropagation Through Time

Gating

CTC Algorithm