Materials
Lecture Notes
The lecture notes are uploaded through the semester. For each chapter, the notes are provided section by section.
Chapter 0: Course Overview and Logistics
- Handouts: All Sections included in a single file
Chapter 1: Fundamentals of Deep Learning
- Section 1: Motivation to Learn DL
- Section 2: Learning from Data: Basics
- Section 3: Perceptron Machine
- Section 4: Deep Neural Networks
- Section 5: Function Optimization
Chapter 2: Feedforward NNs
- Section 1: Forward Pass in MLPs
- Section 2: Computing Gradient via Backpropagation on Computation Graph
- Section 3: Multiclass Classification
- Section 4: Mini-batch Training and SGD Algorithm
Chapter 3: Advances - Part I
- Section 1: More on Optimizers
- Section 2: Overfitting, Regularization and Dropout
- Section 3: Data Distribution and Preporcessing
- Section 4: Standardization and Batch Normalization
Chapter 4: Convolutional NNs
- Section 1: Why Convolution?
- Section 2: Components of CNNs
- Section 3: Deep CNNs
- Section 4: Training CNNs
Chapter 5: Residual Learning
Chapter 6: Sequence Processing
- Section 1: Sequence Data
- Section 2: RNNs
- Section 3: Training RNNs
- Section 4: Gating
- Section 5: Bidirectional Sequence Processing
- Section 6: CTC Algorithm
Chapter 7: Sequence to Sequence Models
- Section 1: Seq2Seq
- Section 2: Encoder Decoder
- Section 3: Attention
- Section 4: Self-Attention and Transformer
Tutorial Notebooks
The tutorial notebooks can be accessed below.
- Tutorial 1: Intro to Python, Basic ML in Python, by Saleh Tabatabaei
- Tutorial 2: Intro to PyTorch, Auto-grad, by Saleh Tabatabaei Watch the Video
- Tutorial 3: Underfitting and Overfitting: How to Prevent Them, by Saleh Tabatabaei Watch the Video
- Tutorial 4: CNNs by Saleh Tabatabaei Watch the Video
- Tutorial 5: Midterm Review by Saleh Tabatabaei Watch the Video
- Tutorial 6: ResNet by Saleh Tabatabaei Watch the Video
- Tutorial 7: RNNs by Saleh Tabatabaei Watch the Video
Book
There is indeed no single textbook for this course, and we use various resources in the course. The following textbooks have covered the key notions in the course.
- [GYC] Goodfellow, Ian, et al. Deep Learning. MIT Press, 2016.
- [BB] Bishop, Christopher M., and Hugh Bishop. Deep Learning: Foundations and Concepts. Springer Nature, 2023.
- [Ag] C. Aggarwal. Neural Networks and Deep Learning. Springer, 2018.
The following textbooks are also good resources for practicing hands-on skills. Note that we are not simply learning to implement only! We study the fundamentals of deep learning. Of course, we try to get our hands dirty as well and learn how to do implementation.
- Chollet, Francois. Deep learning with Python. Manning Publications, 2021.
- Müller, Andreas, and Sarah Guido. Introduction to Machine Learning with Python. O’Reilly Media, Inc., 2016.
Reading List
This section will be completed gradually through the semester.
Chapter 1: Preliminaries
Introduction to DL
- Motivation: Chapter 1 - Section 1.1 of [BB]
ML Components
- Review on Linear Algebra: Chapter 2 of [GYC]
- ML Components: Chapter 1 - Sections 1.2.1 to 1.2.4 of [BB]
- ML Basics: Chapter 5 of [GYC]
Review on Probability Theory
- Probability Theory: Chapter 2 of [BB]
- Probability Review: Chapter 3 of [GYC]
Classification Problem
- Binary Classification: Chapter 5 - Sections 5.1 and 5.2 of [BB]
- McCulloch-Pitts Model: Paper A logical calculus of the ideas immanent in nervous activity published in the Bulletin of Mathematical Biophysics by Warren McCulloch and Walter Pitts in 1943, proposing a computational model for neuron. This paper is treated as the pioneer study leading to the idea of artificial neuron
Training via Risk Minimization
- Overview on Risk Minimization: Paper An overview of statistical learning theory published as an overview of his life-going developments in ML in the IEEE Transactions on Neural Networks by Vladimir N. Vapnik in 1999
Perceptron Algorithm
- Perceptron Simulation Experiments: Paper Perceptron Simulation Experiments presented by Frank Rosenblatt in Proceedings of IRE in 1960
- Perceptron: Chapter 1 - Section 1.2.1 of [Ag]
Universal Approximation Theorem
- Universal Approximation: Paper Approximation by superpositions of a sigmoidal function published in Mathematics of Control, Signals and Systems by George V. Cybenko in 1989
Deep NNs
Optimization via Gradient Descent
- Gradient-based Optimization: Chapter 4 - Sections 4.3 and 4.4 of [GYC]
- Gradient Descent: Chapter 7 - Sections 7.1 to 7.2 of [BB]
Chapter 2: Fully-connected FNNs
Forward Propagation
Backpropagation
- Backpropagation: Chapter 6 - Section 6.5 of [GYC]
- Backpropagation: Chapter 8 of [BB]
- Backpropagation of Error Paper Learning representations by back-propagating errors published in Nature by D. Rumelhart, G. Hinton and R. Williams in 1986 advocating the idea of systematic gradient computation of a computation graph
Multi-class Classification
- Binary Classification: Chapter 6 Section 6.6 of [BB]
- Multi-class Models: Chapter 2 - Section 2.3 of [Ag]
Full-batch, sample-level and mini-batch SGD
Generalization
- Generalization: Chapter 6 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
Chapter 3: Optimizers, Regularization and Data
More on Optimizers
- Learning Rate Scheduling Paper Cyclical Learning Rates for Training Neural Networks published in Winter Conference on Applications of Computer Vision (WACV) by Leslie N. Smith in 2017 discussing learning rate scheduling
- Rprop Paper A direct adaptive method for faster backpropagation learning: the RPROP algorithm published in IEEE International Conference on Neural Networks by M. Riedmiller and H. Braun in 1993 proposing Rprop algorithm
- RMSprop Lecture note by GEoffrey Hinton proposing RMSprop
- RMSprop Analysis Paper RMSProp and equilibrated adaptive learning rates for non-convex optimization by Y. Dauphin et al. published in 2015 talking about RMSprop and citing Honton’s lecture notes
- Adam Paper Adam: A Method for Stochastic Optimization published in 2014 by D. Kingma and J. Ba proposing Adam
- Notes on Optimizers Lecture notes of the course Optimization for Machine Learning by Ashok Cutkosky in Boston University: A good resource for optimizers
Overfitting and Regularization
- Regularization: Chapter 7 of [GYC]
- Overfitting and Regularization: Chapter 9 - Sections 9.1 to 9.3 of [BB]
- Tikhonov Paper Tikhonov Regularization and Total Least Squares published in 1999 by G. Golub et al. illustrating the Tikhonov Regularization work
- Lasso Paper Regression Shrinkage and Selection Via the Lasso published in 1996 by R. Tibshirani proposing the legendary Lasso
- Dropout 1 Paper Improving neural networks by preventing co-adaptation of feature detectors published in 2012 by G. Hinton et al. proposing Dropout
- Dropout 2 Paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting published in 2014 by N. Srivastava et al. providing some analysis and illustrations on Dropout
Data: Data Distribution, Data Cleaning, and Outliers
- Data: Chapter 8 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
- Data Processing in Python Open Book Minimalist Data Wrangling with Python by Marek Gagolewski going through data processing in Python
Normalization
- Normalization Paper Is normalization indispensable for training deep neural network? published in 2020 by J. Shao et al. discussing the meaning and effects of normalization
- Batch-Norm Paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift published in 2015 by S. Ioffe and C. Szegedy proposing Batch Normalization
- Batch-Norm Meaning Paper How Does Batch Normalization Help Optimization? published in 2018 by S. Santurkar et al. discussing why Batch Normalization works: they claim that the main reason is that loss landscape is getting much smoother
Chapter 4: Convolutional NNs
Development of CNNs
- Hubel and Wiesel Study Paper Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex published in 1962 by D. Hubel and T. Wiesel elaborating their finding on visual understanding
- Neocognitron Paper Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position published in 1980 by _K. Fukushima _ proposing the Neocognitron as a computational model for visual learning
- Backpropagating on LeNet Paper Backpropagation Applied to Handwritten Zip Code Recognition published in 1989 by Y. LeCun et al. developing backpropagation for LeNet
- LeNet Paper Gradient-Based Learning Applied to Document Recognition published in 1998 by Y. LeCun et al. discussing LeNet
Components of CNN
- Convolution: Chapter 9 - Sections 9.1 and 9.2 of [GYC]
- Convolution: Chapter 10 - Sections 10.2.1 and 10.2.2 of [BB]
- Multi-channel Convolution: Chapter 10 - Sections 10.2.3 to 10.2.5 of [BB]
- Pooling: Chapter 9 - Sections 9.3 of [GYC]
- Pooling: Chapter 10 - Section 10.2.6 of [BB]
- Flattening: Chapter 10 - Sections 10.2.7 and 10.2.8 of [BB]
Deep CNNs
- Convolution: Chapter 9 - Sections 9.4 and 9.6 of [GYC]
- VGG Paper Very Deep Convolutional Networks for Large-Scale Image Recognition published in 2014 by K. Simonyan and A. Zisserman proposing VGG Architectures
Backpropagation on CNN
- LeCun’s Paper Paper Gradient-based learning applied to document recognition published in 2002 by Y. LeCun et al. summarizing the learning process in CNN
- Efficient Backpropagation on CNN Paper High Performance Convolutional Neural Networks for Document Processing published in 2006 by K. Chellapilla et al. discussing efficient backpropagation on CNNs.
Chapter 5: Residual Learning
- ResNet Paper Deep Residual Learning for Image Recognition published in 2015 by K. He et al. proposing ResNet
- ResNet-1001 Paper Identity Mappings in Deep Residual Networks published in 2016 by K. He et al. demonstrating how deep ResNet can go
- U-Net Paper U-Net: Convolutional Networks for Biomedical Image Segmentation published in 2015 by O. Ronneberger et al. proposing U-Net
- DenseNet Paper Densely Connected Convolutional Networks published in 2017 by H. Huang et al. proposing DenseNet
Chapter 6: Sequence Processing via NNs
Basics of Sequence Processing
- Jordan Network Paper Attractor dynamics and parallelism in a connectionist sequential machine published in 1986 by M. Jordan proposing his RNN
- Elman Network Paper Finding structure in time published in 1990 by J. Elman proposing a revision to Jordan Network
- Seq Models Article The Unreasonable Effectiveness of Recurrent Neural Networks written in May 2015 by A. Karpathy discussing different types of sequence problems
Backpropagation Through Time
- BPTT Paper Backpropagation through time: What it does and how to do it published in 2002 by P. Werbos explaining BPTT
- Vanishing Gradient with BPTT Paper On the difficulty of training recurrent neural networks published in 2013 by R. Pascanu et al. discussing challenges in training with BPTT
- Truncated BPTT Paper An efficient gradient-based algorithm for on-line training of recurrent network trajectories published in 1990 by R. Williams and J. Peng explaining truncated BPTT
Gating
- Gating Principle Chapter Long Short-Term Memory published in 2012 in book Supervised Sequence Labelling with Recurrent Neural Networks by A. Graves explaining Gating idea
- LSTM Paper Long short-term memory published in 1997 by S. Hochreiter and J. Schmidhuber proposing LSTM
- GRU Paper On the Properties of Neural Machine Translation: Encoder-Decoder Approaches published in 2014 by K. Cho et al. proposing GRU
CTC Algorithm
- CTC Paper Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks published in 2006 by A. Graves et al. proposing CTC Algorithm
