Lecture Notes

The lecture notes are uploaded through the semester. For each chapter, the notes are provided section by section.

Chapter 0: Course Overview and Logistics

  • Handouts: All Sections included in a single file

Chapter 1: Fundamentals of Deep Learning

Chapter 2: Feedforward NNs

  • Section 1: Forward Pass in MLPs
  • Section 2: Computing Gradient via Backpropagation on Computation Graph
  • Section 3: Multiclass Classification
  • Section 4: Mini-batch Training and SGD Algorithm

Chapter 3: Advances - Part I

  • Section 1: More on Optimizers
  • Section 2: Overfitting, Regularization and Dropout
  • Section 3: Data Distribution and Preporcessing
  • Section 4: Standardization and Batch Normalization

Chapter 4: Convolutional NNs

Chapter 5: Residual Learning

Chapter 6: Sequence Processing

Chapter 7: Sequence to Sequence Models

Tutorial Notebooks

The tutorial notebooks can be accessed below.

Book

There is indeed no single textbook for this course, and we use various resources in the course. The following textbooks have covered the key notions in the course.

The following textbooks are also good resources for practicing hands-on skills. Note that we are not simply learning to implement only! We study the fundamentals of deep learning. Of course, we try to get our hands dirty as well and learn how to do implementation.

Reading List

This section will be completed gradually through the semester.

Chapter 1: Preliminaries

Introduction to DL

ML Components

Review on Probability Theory

Classification Problem

  • Binary Classification: Chapter 5 - Sections 5.1 and 5.2 of [BB]
  • McCulloch-Pitts Model: Paper A logical calculus of the ideas immanent in nervous activity published in the Bulletin of Mathematical Biophysics by Warren McCulloch and Walter Pitts in 1943, proposing a computational model for neuron. This paper is treated as the pioneer study leading to the idea of artificial neuron

Training via Risk Minimization

  • Overview on Risk Minimization: Paper An overview of statistical learning theory published as an overview of his life-going developments in ML in the IEEE Transactions on Neural Networks by Vladimir N. Vapnik in 1999

Perceptron Algorithm

Universal Approximation Theorem

  • Universal Approximation: Paper Approximation by superpositions of a sigmoidal function published in Mathematics of Control, Signals and Systems by George V. Cybenko in 1989

Deep NNs

  • DNNs: Chapter 6 - Sections 6.2 and 6.3 of [BB]

Optimization via Gradient Descent

Chapter 2: Fully-connected FNNs

Forward Propagation

Backpropagation

  • Backpropagation: Chapter 6 - Section 6.5 of [GYC]
  • Backpropagation: Chapter 8 of [BB]
  • Backpropagation of Error Paper Learning representations by back-propagating errors published in Nature by D. Rumelhart, G. Hinton and R. Williams in 1986 advocating the idea of systematic gradient computation of a computation graph

Multi-class Classification

Full-batch, sample-level and mini-batch SGD

  • SGD: Chapter 5 - Section 5.9 of [GYC]
  • SGD: Chapter 7 - Section 7.2 of [BB]

Generalization

  • Generalization: Chapter 6 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021

Chapter 3: Optimizers, Regularization and Data

More on Optimizers

  • Learning Rate Scheduling Paper Cyclical Learning Rates for Training Neural Networks published in Winter Conference on Applications of Computer Vision (WACV) by Leslie N. Smith in 2017 discussing learning rate scheduling
  • Rprop Paper A direct adaptive method for faster backpropagation learning: the RPROP algorithm published in IEEE International Conference on Neural Networks by M. Riedmiller and H. Braun in 1993 proposing Rprop algorithm
  • RMSprop Lecture note by GEoffrey Hinton proposing RMSprop
  • RMSprop Analysis Paper RMSProp and equilibrated adaptive learning rates for non-convex optimization by Y. Dauphin et al. published in 2015 talking about RMSprop and citing Honton’s lecture notes
  • Adam Paper Adam: A Method for Stochastic Optimization published in 2014 by D. Kingma and J. Ba proposing Adam
  • Notes on Optimizers Lecture notes of the course Optimization for Machine Learning by Ashok Cutkosky in Boston University: A good resource for optimizers

Overfitting and Regularization

  • Regularization: Chapter 7 of [GYC]
  • Overfitting and Regularization: Chapter 9 - Sections 9.1 to 9.3 of [BB]
  • Tikhonov Paper Tikhonov Regularization and Total Least Squares published in 1999 by G. Golub et al. illustrating the Tikhonov Regularization work
  • Lasso Paper Regression Shrinkage and Selection Via the Lasso published in 1996 by R. Tibshirani proposing the legendary Lasso
  • Dropout 1 Paper Improving neural networks by preventing co-adaptation of feature detectors published in 2012 by G. Hinton et al. proposing Dropout
  • Dropout 2 Paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting published in 2014 by N. Srivastava et al. providing some analysis and illustrations on Dropout

Data: Data Distribution, Data Cleaning, and Outliers

  • Data: Chapter 8 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
  • Data Processing in Python Open Book Minimalist Data Wrangling with Python by Marek Gagolewski going through data processing in Python

Normalization

  • Normalization Paper Is normalization indispensable for training deep neural network? published in 2020 by J. Shao et al. discussing the meaning and effects of normalization
  • Batch-Norm Paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift published in 2015 by S. Ioffe and C. Szegedy proposing Batch Normalization
  • Batch-Norm Meaning Paper How Does Batch Normalization Help Optimization? published in 2018 by S. Santurkar et al. discussing why Batch Normalization works: they claim that the main reason is that loss landscape is getting much smoother

Chapter 4: Convolutional NNs

Development of CNNs

  • Hubel and Wiesel Study Paper Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex published in 1962 by D. Hubel and T. Wiesel elaborating their finding on visual understanding
  • Neocognitron Paper Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position published in 1980 by _K. Fukushima _ proposing the Neocognitron as a computational model for visual learning
  • Backpropagating on LeNet Paper Backpropagation Applied to Handwritten Zip Code Recognition published in 1989 by Y. LeCun et al. developing backpropagation for LeNet
  • LeNet Paper Gradient-Based Learning Applied to Document Recognition published in 1998 by Y. LeCun et al. discussing LeNet

Components of CNN

Deep CNNs

  • Convolution: Chapter 9 - Sections 9.4 and 9.6 of [GYC]
  • VGG Paper Very Deep Convolutional Networks for Large-Scale Image Recognition published in 2014 by K. Simonyan and A. Zisserman proposing VGG Architectures

Backpropagation on CNN

  • LeCun’s Paper Paper Gradient-based learning applied to document recognition published in 2002 by Y. LeCun et al. summarizing the learning process in CNN
  • Efficient Backpropagation on CNN Paper High Performance Convolutional Neural Networks for Document Processing published in 2006 by K. Chellapilla et al. discussing efficient backpropagation on CNNs.

Chapter 5: Residual Learning

  • ResNet Paper Deep Residual Learning for Image Recognition published in 2015 by K. He et al. proposing ResNet
  • ResNet-1001 Paper Identity Mappings in Deep Residual Networks published in 2016 by K. He et al. demonstrating how deep ResNet can go
  • U-Net Paper U-Net: Convolutional Networks for Biomedical Image Segmentation published in 2015 by O. Ronneberger et al. proposing U-Net
  • DenseNet Paper Densely Connected Convolutional Networks published in 2017 by H. Huang et al. proposing DenseNet

Chapter 6: Sequence Processing via NNs

Basics of Sequence Processing

  • Jordan Network Paper Attractor dynamics and parallelism in a connectionist sequential machine published in 1986 by M. Jordan proposing his RNN
  • Elman Network Paper Finding structure in time published in 1990 by J. Elman proposing a revision to Jordan Network
  • Seq Models Article The Unreasonable Effectiveness of Recurrent Neural Networks written in May 2015 by A. Karpathy discussing different types of sequence problems

Backpropagation Through Time

  • BPTT Paper Backpropagation through time: What it does and how to do it published in 2002 by P. Werbos explaining BPTT
  • Vanishing Gradient with BPTT Paper On the difficulty of training recurrent neural networks published in 2013 by R. Pascanu et al. discussing challenges in training with BPTT
  • Truncated BPTT Paper An efficient gradient-based algorithm for on-line training of recurrent network trajectories published in 1990 by R. Williams and J. Peng explaining truncated BPTT

Gating

  • Gating Principle Chapter Long Short-Term Memory published in 2012 in book Supervised Sequence Labelling with Recurrent Neural Networks by A. Graves explaining Gating idea
  • LSTM Paper Long short-term memory published in 1997 by S. Hochreiter and J. Schmidhuber proposing LSTM
  • GRU Paper On the Properties of Neural Machine Translation: Encoder-Decoder Approaches published in 2014 by K. Cho et al. proposing GRU

CTC Algorithm

  • CTC Paper Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks published in 2006 by A. Graves et al. proposing CTC Algorithm