Materials
Lecture Notes
The lecture notes are uploaded through the semester. For each chapter, the notes are provided section by section.
Chapter 0: Course Overview and Logistics
- Handouts: All Sections included in a single file
Chapter 1: Fundamentals of Deep Learning
- Section 1: Motivation to Learn DL
- Section 2: Learning from Data: Basics
- Section 3: Perceptron Machine
- Section 4: Deep Neural Networks
- Section 5: Function Optimization
Chapter 2: Feedforward NNs
- Section 1: Forward Pass in MLPs
- Section 2: Computing Gradient via Backpropagation on Computation Graph
- Section 3: Multiclass Classification
- Section 4: Mini-batch Training and SGD Algorithm
Chapter 3: Advances - Part I
- Section 1: More on Optimizers
- Section 2: Overfitting, Regularization and Dropout
- Section 3: Data Distribution and Preporcessing
- Section 4: Standardization and Batch Normalization
Tutorial Notebooks
The tutorial notebooks can be accessed below.
- Tutorial 1: Intro to Python, Basic ML in Python, by Saleh Tabatabaei
- Tutorial 2: Intro to PyTorch, Auto-grad, by Saleh Tabatabaei Watch the Video
- Tutorial 3: Underfitting and Overfitting: How to Prevent Them, by Saleh Tabatabaei Watch the Video
Book
There is indeed no single textbook for this course, and we use various resources in the course. The following textbooks have covered the key notions in the course.
- [GYC] Goodfellow, Ian, et al. Deep Learning. MIT Press, 2016.
- [BB] Bishop, Christopher M., and Hugh Bishop. Deep Learning: Foundations and Concepts. Springer Nature, 2023.
- [Ag] C. Aggarwal. Neural Networks and Deep Learning. Springer, 2018.
The following textbooks are also good resources for practicing hands-on skills. Note that we are not simply learning to implement only! We study the fundamentals of deep learning. Of course, we try to get our hands dirty as well and learn how to do implementation.
- Chollet, Francois. Deep learning with Python. Manning Publications, 2021.
- Müller, Andreas, and Sarah Guido. Introduction to Machine Learning with Python. O’Reilly Media, Inc., 2016.
Reading List
This section will be completed gradually through the semester.
Chapter 1: Preliminaries
Introduction to DL
- Motivation: Chapter 1 - Section 1.1 of [BB]
ML Components
- Review on Linear Algebra: Chapter 2 of [GYC]
- ML Components: Chapter 1 - Sections 1.2.1 to 1.2.4 of [BB]
- ML Basics: Chapter 5 of [GYC]
Review on Probability Theory
- Probability Theory: Chapter 2 of [BB]
- Probability Review: Chapter 3 of [GYC]
Classification Problem
- Binary Classification: Chapter 5 - Sections 5.1 and 5.2 of [BB]
- McCulloch-Pitts Model: Paper A logical calculus of the ideas immanent in nervous activity published in the Bulletin of Mathematical Biophysics by Warren McCulloch and Walter Pitts in 1943, proposing a computational model for neuron. This paper is treated as the pioneer study leading to the idea of artificial neuron
Training via Risk Minimization
- Overview on Risk Minimization: Paper An overview of statistical learning theory published as an overview of his life-going developments in ML in the IEEE Transactions on Neural Networks by Vladimir N. Vapnik in 1999
Perceptron Algorithm
- Perceptron Simulation Experiments: Paper Perceptron Simulation Experiments presented by Frank Rosenblatt in Proceedings of IRE in 1960
- Perceptron: Chapter 1 - Section 1.2.1 of [Ag]
Universal Approximation Theorem
- Universal Approximation: Paper Approximation by superpositions of a sigmoidal function published in Mathematics of Control, Signals and Systems by George V. Cybenko in 1989
Deep NNs
Optimization via Gradient Descent
- Gradient-based Optimization: Chapter 4 - Sections 4.3 and 4.4 of [GYC]
- Gradient Descent: Chapter 7 - Sections 7.1 to 7.2 of [BB]
Chapter 2: Fully-connected FNNs
Forward Propagation
Backpropagation
- Backpropagation: Chapter 6 - Section 6.5 of [GYC]
- Backpropagation: Chapter 8 of [BB]
- Backpropagation of Error Paper Learning representations by back-propagating errors published in Nature by D. Rumelhart, G. Hinton and R. Williams in 1986 advocating the idea of systematic gradient computation of a computation graph
Multi-class Classification
- Binary Classification: Chapter 6 Section 6.6 of [BB]
- Multi-class Models: Chapter 2 - Section 2.3 of [Ag]
Full-batch, sample-level and mini-batch SGD
Generalization
- Generalization: Chapter 6 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
Chapter 3: Optimizers, Regularization and Data
More on Optimizers
- Learning Rate Scheduling Paper Cyclical Learning Rates for Training Neural Networks published in Winter Conference on Applications of Computer Vision (WACV) by Leslie N. Smith in 2017 discussing learning rate scheduling
- Rprop Paper A direct adaptive method for faster backpropagation learning: the RPROP algorithm published in IEEE International Conference on Neural Networks by M. Riedmiller and H. Braun in 1993 proposing Rprop algorithm
- RMSprop Lecture note by GEoffrey Hinton proposing RMSprop
- RMSprop Analysis Paper RMSProp and equilibrated adaptive learning rates for non-convex optimization by Y. Dauphin et al. published in 2015 talking about RMSprop and citing Honton’s lecture notes
- Adam Paper Adam: A Method for Stochastic Optimization published in 2014 by D. Kingma and J. Ba proposing Adam
- Notes on Optimizers Lecture notes of the course Optimization for Machine Learning by Ashok Cutkosky in Boston University: A good resource for optimizers
Overfitting and Regularization
- Regularization: Chapter 7 of [GYC]
- Overfitting and Regularization: Chapter 9 - Sections 9.1 to 9.3 of [BB]
- Dropout 1 Paper Improving neural networks by preventing co-adaptation of feature detectors published in 2012 by G. Hinton et al. proposing Dropout
- Dropout 2 Paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting published in 2014 by N. Srivastava et al. providing some analysis and illustrations on Dropout
Data: Data Distribution, Data Cleaning, and Outliers
- Data: Chapter 8 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
- Data Processing in Python Open Book Minimalist Data Wrangling with Python by Marek Gagolewski going through data processing in Python