Schedule
-
EventDateDescriptionDescription
-
Lecture09/02/2025
TuesdayLecture 0: Course Overview and LogisticsLecture Notes:
-
Lecture09/02/2025
TuesdayLecture 1: Why Deep Learning -
Lecture09/02/2025
TuesdayLecture 2: Machine Learning vs AnalysisLecture Notes:
-
Lecture09/02/2025
TuesdayLecture 3: ML Component 1 - Data -
Session09/02/2025 15:00
TuesdayFirst Lecture -
Lecture09/05/2025
FridayLecture 4: Supervised, Unsupervised and Semi-supervisedLecture Notes:
-
Lecture09/05/2025
FridayLecture 5: Components 2 and 3: Model and LossLecture Notes:
Further Reads:
- ML Components: Chapter 1 - Sections 1.2.1 to 1.2.4 of [BB]
- ML Basics: Chapter 5 of [GYC]
-
Lecture09/05/2025
FridayLecture 6: First Example -- Classification by PerceptronLecture Notes:
Further Reads:
- Binary Classification: Chapter 5 - Sections 5.1 and 5.2 of [BB]
- McCulloch-Pitts Model: Paper A logical calculus of the ideas immanent in nervous activity published in the Bulletin of Mathematical Biophysics by Warren McCulloch and Walter Pitts in 1943, proposing a computational model for neuron. This paper is treated as the pioneer study leading to the idea of artificial neuron
-
Lecture09/05/2025
FridayLecture 7: Recap -- Law of Large NumbersLecture Notes:
Further Reads:
- Probability Theory: Chapter 2 of [BB]
- Probability Review: Chapter 3 of [GYC]
-
Lecture09/09/2025
TuesdayLecture 8: Training via Empirical Risk MinimizationLecture Notes:
Further Reads:
- Overview on Risk Minimization: Paper An overview of statistical learning theory published as an overview of his life-going developments in ML in the IEEE Transactions on Neural Networks by Vladimir N. Vapnik in 1999
-
Lecture09/09/2025
TuesdayLecture 9: Training Perceptron MachineLecture Notes:
Further Reads:
- Perceptron Simulation Experiments: Paper Perceptron Simulation Experiments presented by Frank Rosenblatt in Proceedings of IRE in 1960
- Perceptron: Chapter 1 - Section 1.2.1 of [Ag]
-
Lecture09/09/2025
TuesdayLecture 10: From Perceptron to NNs -- Universal ApproximationLecture Notes:
Further Reads:
- Universal Approximation: Paper Approximation by superpositions of a sigmoidal function published in Mathematics of Control, Signals and Systems by George V. Cybenko in 1989
-
Assignment09/12/2025
FridayAssignment #1 - Fundamentals of Machine Learning released! -
Lecture09/12/2025
FridayLecture 11: Deep Neural Networks -
Lecture09/12/2025
FridayLecture 12: Iterative Optimization by Gradient DescentLecture Notes:
Further Reads:
- Gradient-based Optimization: Chapter 4 - Sections 4.3 and 4.4 of [GYC]
- Gradient Descent: Chapter 7 - Sections 7.1 and 7.2 of [BB]
-
Lecture09/16/2025
TuesdayLecture 13: More on Gradient Descent -
Lecture09/16/2025
TuesdayLecture 14: Forward Propagation in MLPsLecture Notes:
Further Reads:
-
Lecture09/19/2025
FridayLecture 15: Training Neural Networks via GD -
Lecture09/19/2025
FridayLecture 16: Chain Rule on Computation Graph -
Lecture09/19/2025
FridayLecture 17: Backward Pass on Computation GraphLecture Notes:
Further Reads:
- Backpropagation: Chapter 6 - Section 6.5 of [GYC]
- Backpropagation: Chapter 8 of [BB]
-
Assignment09/21/2025
SundayProject Proposal released! -
Lecture09/23/2025
TuesdayLecture 18: Backpropagation over MLPLecture Notes:
Further Reads:
- Backpropagation: Chapter 8 of [BB]
- Backpropagation of Error Paper Learning representations by back-propagating errors published in Nature by D. Rumelhart, G. Hinton and R. Williams in 1986 advocating the idea of systematic gradient computation of a computation graph
-
Lecture09/23/2025
TuesdayLecture 19: First Neural Classifier -
Lecture09/26/2025
FridayLecture 20: Multiclass Classification -
Lecture09/26/2025
FridayLecture 21: Stochastic Gradient Descent -
Due09/26/2025 23:59
FridayAssignment #1 due -
Lecture09/30/2025
TuesdayLecture 22: Mini-batch SGD and Complexity-Variance Tradeoff -
Lecture09/30/2025
TuesdayLecture 23: Evaluation and Generalization MeasuresLecture Notes:
Further Reads:
- Generalization: Chapter 6 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
-
Lecture09/30/2025
TuesdayLecture 24: Linear and Sub-linear Convergence SpeedLecture Notes:
Further Reads:
- Notes on Optimizers Lecture notes of the course Optimization for Machine Learning by Ashok Cutkosky in Boston University: A good resource for optimizers
-
Assignment10/01/2025
WednesdayAssignment #2 - Feedforward Neural Networks released! -
Lecture10/03/2025
FridayLecture 25: Optimizer Boosting -- Scheduling, Momentum and Rprop IdeasLecture Notes:
Further Reads:
- Learning Rate Scheduling Paper Cyclical Learning Rates for Training Neural Networks published in Winter Conference on Applications of Computer Vision (WACV) by Leslie N. Smith in 2017 discussing learning rate scheduling
- Rprop Paper A direct adaptive method for faster backpropagation learning: the RPROP algorithm published in IEEE International Conference on Neural Networks by M. Riedmiller and H. Braun in 1993 proposing Rprop algorithm
-
Lecture10/03/2025
FridayLecture 26: RMSprop and AdamLecture Notes:
Further Reads:
- RMSprop Lecture note by GEoffrey Hinton proposing RMSprop
- RMSprop Analysis Paper RMSProp and equilibrated adaptive learning rates for non-convex optimization by Y. Dauphin et al. published in 2015 talking about RMSprop and citing Honton’s lecture notes
- Adam Paper Adam: A Method for Stochastic Optimization published in 2014 by D. Kingma and J. Ba proposing Adam
-
Lecture10/03/2025
FridayLecture 27: Overfitting -
Due10/06/2025 23:59
MondayProposal due -
Lecture10/07/2025
TuesdayLecture 28: Sources of OverfittingLecture Notes:
Further Reads:
- Overfitting and Regularization: Chapter 9 - Sections 9.1 to 9.3 of [BB]
-
Lecture10/07/2025
TuesdayLecture 29: RegularizationLecture Notes:
Further Reads:
- Overfitting and Regularization: Chapter 9 - Sections 9.1 to 9.3 of [BB]
- Tikhonov Paper Tikhonov Regularization and Total Least Squares published in 1999 by G. Golub et al. illustrating the Tikhonov Regularization work
- Lasso Paper Regression Shrinkage and Selection Via the Lasso published in 1996 by R. Tibshirani proposing the legendary Lasso
-
Lecture10/07/2025
TuesdayLecture 30: DropoutLecture Notes:
Further Reads:
- Dropout 1 Paper Improving neural networks by preventing co-adaptation of feature detectors published in 2012 by G. Hinton et al. proposing Dropout
- Dropout 2 Paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting published in 2014 by N. Srivastava et al. providing some analysis and illustrations on Dropout
-
Lecture10/10/2025
FridayLecture 31: Statistical Viewpoint on DataLecture Notes:
Further Reads:
- Data: Chapter 8 of the Book Patterns, predictions, and actions: A story about machine learning by Moritz Hardt and B. Recht published in 2021
- Data Processing in Python Open Book Minimalist Data Wrangling with Python by Marek Gagolewski going through data processing in Python
-
Lecture10/10/2025
FridayLecture 32: NormalizationLecture Notes:
Further Reads:
- Normalization Paper Is normalization indispensable for training deep neural network? published in 2020 by J. Shao et al. discussing the meaning and effects of normalization
-
Lecture10/14/2025
TuesdayLecture 33: Batch NormalizationLecture Notes:
Further Reads:
- Batch-Norm Paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift published in 2015 by S. Ioffe and C. Szegedy proposing Batch Normalization
- Batch-Norm Meaning Paper How Does Batch Normalization Help Optimization? published in 2018 by S. Santurkar et al. discussing why Batch Normalization works: they claim that the main reason is that loss landscape is getting much smoother
-
Lecture10/14/2025
TuesdayLecture 34: Why Convolution?Lecture Notes:
Further Reads:
- Hubel and Wiesel Study Paper Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex published in 1962 by D. Hubel and T. Wiesel elaborating their finding on visual understanding
- Neocognitron Paper Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position published in 1980 by _K. Fukushima _ proposing the Neocognitron as a computational model for visual learning
- Backpropagating on LeNet Paper Backpropagation Applied to Handwritten Zip Code Recognition published in 1989 by Y. LeCun et al. developing backpropagation for LeNet
- LeNet Paper Gradient-Based Learning Applied to Document Recognition published in 1998 by Y. LeCun et al. discussing LeNet
-
Due10/15/2025 23:59
WednesdayAssignment #2 due -
Lecture10/17/2025
FridayLecture 35: Quick Preview on CNN -
Lecture10/17/2025
FridayLecture 36: Convolution Operation and Resampling -
Lecture10/17/2025
FridayLecture 37: Padding and Multichannel ConvolutionLecture Notes:
Further Reads:
- Multi-channel Convolution: Chapter 10 - Sections 10.2.3 to 10.2.5 of [BB]
-
Lecture10/21/2025
TuesdayLecture 38: Pooling and FlatteningLecture Notes:
Further Reads:
- Pooling: Chapter 10 - Section 10.2.6 of [BB]
- Flattening: Chapter 10 - Sections 10.2.7 and 10.2.8 of [BB]
-
Lecture10/21/2025
TuesdayLecture 39: Deep CNNs -
Lecture10/21/2025
TuesdayLecture 40: Example of VGG-16Lecture Notes:
Further Reads:
- VGG Paper Very Deep Convolutional Networks for Large-Scale Image Recognition published in 2014 by K. Simonyan and A. Zisserman proposing VGG Architectures
-
Exam10/24/2025 11:00
FridayMidtermTopics:
- The exam is 3 hours long
- No programming questions
- Starts at 11:00 AM
-
Lecture11/04/2025
TuesdayLecture 41: Backpropagation Through CNNsLecture Notes:
Further Reads:
- LeCun’s Paper Paper Gradient-based learning applied to document recognition published in 2002 by Y. LeCun et al. summarizing the learning process in CNN
- Efficient Backpropagation on CNN Paper High Performance Convolutional Neural Networks for Document Processing published in 2006 by K. Chellapilla et al. discussing efficient backpropagation on CNNs.
-
Lecture11/07/2025
FridayLecture 42: Vanishing Gradient in Deep NetworksLecture Notes:
Further Reads:
- ResNet Paper Deep Residual Learning for Image Recognition published in 2015 by K. He et al. proposing ResNet
-
Lecture11/07/2025
FridayLecture 43: Skip Connection and ResNetLecture Notes:
Further Reads:
- ResNet Paper Deep Residual Learning for Image Recognition published in 2015 by K. He et al. proposing ResNet
- ResNet-1001 Paper Identity Mappings in Deep Residual Networks published in 2016 by K. He et al. demonstrating how deep ResNet can go
- U-Net Paper U-Net: Convolutional Networks for Biomedical Image Segmentation published in 2015 by O. Ronneberger et al. proposing U-Net
- DenseNet Paper Densely Connected Convolutional Networks published in 2017 by H. Huang et al. proposing DenseNet
-
Lecture11/11/2025
TuesdayLecture 44: Processing Sequence DataLecture Notes:
Further Reads:
- Jordan Network Paper Attractor dynamics and parallelism in a connectionist sequential machine published in 1986 by M. Jordan proposing his RNN
- Elman Network Paper Finding structure in time published in 1990 by J. Elman proposing a revision to Jordan Network
-
Lecture11/11/2025
TuesdayLecture 45: Sequence Processing by RecursionLecture Notes:
Further Reads:
- BPTT Paper Backpropagation through time: What it does and how to do it published in 2002 by P. Werbos explaining BPTT
-
Lecture11/14/2025
FridayLecture 46: Different Sequence ProblemsLecture Notes:
Further Reads:
- Seq Models Article The Unreasonable Effectiveness of Recurrent Neural Networks written in May 2015 by A. Karpathy discussing different types of sequence problems
-
Lecture11/14/2025
FridayLecture 47: Backpropagation Through TimeLecture Notes:
Further Reads:
- Vanishing Gradient with BPTT Paper On the difficulty of training recurrent neural networks published in 2013 by R. Pascanu et al. discussing challenges in training with BPTT
- Truncated BPTT Paper An efficient gradient-based algorithm for on-line training of recurrent network trajectories published in 1990 by R. Williams and J. Peng explaining truncated BPTT
-
Lecture11/14/2025
FridayLecture 48: Gating PrincipleLecture Notes:
Further Reads:
- Gating Principle Chapter Long Short-Term Memory published in 2012 in book Supervised Sequence Labelling with Recurrent Neural Networks by A. Graves explaining Gating idea
