Lecture Notes

The lecture notes are uploaded through the semester. For each chapter, the notes are provided section by section.

Chapter 0: Course Overview and Logistics

  • Handouts: All Sections included in a single file

Chapter 1: Fundamentals of Deep Learning

Chapter 2: Feedforward NNs

  • Section 1: Forward Pass in MLPs
  • Section 2: Computing Gradient via Backpropagation on Computation Graph
  • Section 3: Multiclass Classification
  • Section 4: Mini-batch Training and SGD Algorithm

Chapter 3: Advances - Part I

  • Section 1: More on Optimizers
  • Section 2: Overfitting, Regularization and Dropout
  • Section 3: Data Distribution and Preporcessing
  • Section 4: Standardization and Batch Normalization

Tutorial Notebooks

The tutorial notebooks can be accessed below.

Book

There is indeed no single textbook for this course, and we use various resources in the course. The following textbooks have covered the key notions in the course.

The following textbooks are also good resources for practicing hands-on skills. Note that we are not simply learning to implement only! We study the fundamentals of deep learning. Of course, we try to get our hands dirty as well and learn how to do implementation.

Reading List

This section will be completed gradually through the semester.

Chapter 1: Preliminaries

Introduction to DL

ML Components

Review on Probability Theory

Classification Problem

  • Binary Classification: Chapter 5 - Sections 5.1 and 5.2 of [BB]
  • McCulloch-Pitts Model: Paper A logical calculus of the ideas immanent in nervous activity published in the Bulletin of Mathematical Biophysics by Warren McCulloch and Walter Pitts in 1943, proposing a computational model for neuron. This paper is treated as the pioneer study leading to the idea of artificial neuron

Training via Risk Minimization

  • Overview on Risk Minimization: Paper An overview of statistical learning theory published as an overview of his life-going developments in ML in the IEEE Transactions on Neural Networks by Vladimir N. Vapnik in 1999

Perceptron Algorithm

Universal Approximation Theorem

  • Universal Approximation: Paper Approximation by superpositions of a sigmoidal function published in Mathematics of Control, Signals and Systems by George V. Cybenko in 1989

Deep NNs

  • DNNs: Chapter 6 - Sections 6.2 and 6.3 of [BB]

Optimization via Gradient Descent

Chapter 2: Fully-connected FNNs

Forward Propagation

Backpropagation

  • Backpropagation: Chapter 6 - Section 6.5 of [GYC]
  • Backpropagation: Chapter 8 of [BB]
  • Backpropagation of Error Paper Learning representations by back-propagating errors published in Nature by D. Rumelhart, G. Hinton and R. Williams in 1986 advocating the idea of systematic gradient computation of a computation graph

Multi-class Classification

Full-batch, sample-level and mini-batch SGD

  • SGD: Chapter 5 - Section 5.9 of [GYC]
  • SGD: Chapter 7 - Section 7.2 of [BB]

Generalization

  • Generalization: Chapter 6 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021

Chapter 3: Optimizers, Regularization and Data

More on Optimizers

  • Learning Rate Scheduling Paper Cyclical Learning Rates for Training Neural Networks published in Winter Conference on Applications of Computer Vision (WACV) by Leslie N. Smith in 2017 discussing learning rate scheduling
  • Rprop Paper A direct adaptive method for faster backpropagation learning: the RPROP algorithm published in IEEE International Conference on Neural Networks by M. Riedmiller and H. Braun in 1993 proposing Rprop algorithm
  • RMSprop Lecture note by GEoffrey Hinton proposing RMSprop
  • RMSprop Analysis Paper RMSProp and equilibrated adaptive learning rates for non-convex optimization by Y. Dauphin et al. published in 2015 talking about RMSprop and citing Honton’s lecture notes
  • Adam Paper Adam: A Method for Stochastic Optimization published in 2014 by D. Kingma and J. Ba proposing Adam
  • Notes on Optimizers Lecture notes of the course Optimization for Machine Learning by Ashok Cutkosky in Boston University: A good resource for optimizers

Overfitting and Regularization

  • Regularization: Chapter 7 of [GYC]
  • Overfitting and Regularization: Chapter 9 - Sections 9.1 to 9.3 of [BB]
  • Dropout 1 Paper Improving neural networks by preventing co-adaptation of feature detectors published in 2012 by G. Hinton et al. proposing Dropout
  • Dropout 2 Paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting published in 2014 by N. Srivastava et al. providing some analysis and illustrations on Dropout

Data: Data Distribution, Data Cleaning, and Outliers

  • Data: Chapter 8 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
  • Data Processing in Python Open Book Minimalist Data Wrangling with Python by Marek Gagolewski going through data processing in Python