Schedule

Event

Date

Description

Description
Session

05/05/2026 13:00
Tuesday

First Lecture
Lecture

05/05/2026
Tuesday

Lecture 0: Course Overview and Logistics
Lecture Notes:
- Chapter 0
Lecture

05/05/2026
Tuesday

Lecture 1: Language Modeling
Lecture Notes:
- Chapter 1 - Section 1
Further Reads:
- Tokenization: Chapter 2 of [JM]
- Embedding: Chapter 6 of [JM]
- Original BPE Algorithm: Original BPE Algorithm proposed by Philip Gage in 1994
- BPE for Tokenization: Paper Neural machine translation of rare words with subword units by Rico Sennrich, Barry Haddow, and Alexandra Birch presented in ACL 2016 that adapted BPE for NLP
- LMs: Chapter 12 of [BB] Section 12.2
- N-Gram LMs: Chapter 3 of Speech and Language Processing; Section 3.1 on N-gram LM
- Maximum Likelihood: Chapter 2 of [BB] Section 2.3
- Recurrent LMs: Chapter 8 of [JM]
- LSTM LMs: Paper Regularizing and Optimizing LSTM Language Models by Stephen Merity, Nitish Shirish Keskar, and Richard Socher published in ICLR 2018 enabling LSTMs to perform strongly on word-level language modeling
- High-Rank Recurrent LMs: Paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen presented at ICLR 2018 proposing Mixture of Softmaxes (MoS) and achieving state-of-the-art results at the time
Lecture

05/12/2026
Tuesday

Lecture 2 - Part 1/2: Transformer-based Language Models
Lecture Notes:
- Chapter 1 - Section 2 Further Reads:
- Transformer Paper: Paper Attention Is All You Need! published in 2017 that made a great turn in sequence processing
- Transformers: Chapter 9 of [JM]
- Transformers: Chapter 12 of [BB] Section 12.1
- LLMs via Transformers: Chapter 10 of [JM]
Lecture

05/12/2026
Tuesday

Lecture 2 - Part 2/2: Large Language Models
Lecture Notes:
- Chapter 1 - Section 3
Further Reads:

GPT Papers:
- GPT-1: Paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al. (OpenAI, 2018) that introduced GPT-1 and revived the idea of pretraining transformers as LMs followed by supervised fine-tuning
- GPT-2: Paper Language Models are Unsupervised Multitask Learners by Alec Radford et al. (OpenAI, 2019) that introduces GPT-2 with 1.5B parameter trained on web text
- GPT-3: Paper Language Models are Few-Shot Learners by Tom B. Brown et al. (OpenAI, 2020) that introduces GPT-3, a 175B-parameter transformer LM
- GPT-4: GPT-4 Technical Report by OpenAI (2023) that provides an overview of GPT-4’s capabilities
Data for LLMs:
- The Pile: Paper The Pile: An 800GB Dataset of Diverse Text for Language Modeling by Leo Gao et al. presented in 2020 introductin dataset The Pile
- Documentation Debt: Paper Addressing “Documentation Debt” in Machine Learning Research: A Retrospective Datasheet for BookCorpus by Jack Bandy and Nicholas Vincent published in 2021 discussing the efficiency and legality of data collection by looking into BookCorpus
Fine-tuning:
- SSL: Paper Semi-supervised Sequence Learning by Andrew M. Dai et al. published in 2015 that explores using unsupervised pretraining followed by supervised fine-tuning; this was an early solid work advocating pre-training idea for LMs
- LoRA: Paper LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu et al. presented at ICLR in 2022 introducing LoRA
Prompt Design:
- Prefix-Tuning: Paper Prefix-Tuning: Optimizing Continuous Prompts for Generation by Xiang Lisa Li et al. presented at ACL in 2021 proposing prefix-tuning approach for prompting
- Prompt-Tuning: Paper The Power of Scale for Parameter-Efficient Prompt Tuning by B. Lester et al. presented at EMNLP in 2021 proposing the prompt tuning idea, i.e., learning to prompt
- Zero-Shot LLMs: Paper Large Language Models are Zero-Shot Reasoners by T. Kojima et al. presented at NeurIPS in 2022 studying zero-shot learning with LLMs
Assignment

05/14/2026
Thursday

Assignment #1 - Language Modeling released!

[Assignment #1 - Language Modeling]
Assignment

05/19/2026
Tuesday

Project Proposal released!

[Project Proposal]
Lecture

05/19/2026
Tuesday

Lecture 3 - Part 1/3: Fundamentals of Data Generation
Lecture Notes:
- Chapter 2 - Section 1
Further Reads:
- Probabilistic Model: Chapter 2 of [BB] Sections 2.4 to 2.6
- Statistics: Chapter 3 of [M] Sections 3.1 to 3.3
Lecture

05/19/2026
Tuesday

Lecture 3 - Part 2/3: Discriminative vs Generative Learning
Lecture Notes:
- Chapter 2 - Section 2
Further Reads:
- Discriminative and Generative Models: Chapter 5 of [BB]
Lecture

05/19/2026
Tuesday

Lecture 3 - Part 3/3: Generative Learning and Naive Bayes
Lecture Notes:
- Chapter 2 - Section 3
Further Reads:
- Naive Bayes: Paper Idiot’s Bayes—Not So Stupid After All? by D. Hand and K. Yu published at Statistical Review in 2001 discussing the efficiency of Naive Bayes for classification
- Naive Bayes vs Linear Regression: Paper On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes by A. Ng and M. Jordan presented at NeurIPS in 2001 elaborating the data-efficiency efficiency of Naive Bayes and asymptotic superiority of Logistic Regression
- Generative Models – Overview: Chapter 20 of [M] Sections 20.1 to 20.3
Session

05/26/2026 13:00
Tuesday

Guest Lecture
Erik Saarenvirta from Google will give a talk on Building AI Supercomputers on Google Cloud. Check the slides here
- Video - Part 1
- Video - Part 2
Due

05/28/2026 23:30
Thursday

Assignment #1 due
Lecture

06/02/2026
Tuesday

Lecture 4 - Part 1/2: Autoregressive Models
Lecture Notes:
- Chapter 3 - Section 1
Further Reads:
- Sampling Overview: Chapter 14 of [BB]
- Sampling The book Pattern Recognition and Machine Learning by Christopher Bishop. Read Chapter 11 to know about how challenging sampling from a distribution is
- Sampling Methods: Chapter 17 of [GYC] Sections 17.1 and 17.2
- KL Divergence and MLE: Chapter 5 of [M] Sections 5.1 to 5.2
- MLE: Chapter 5 of [GYC] Section 5.5
- Maximum Likelihood Learning The book Information Theory, Inference, and Learning Algorithms by David MacKay which discusses MLE for clustering in Chapter 22
- Autoregressive Models: Chapter 22 of [M]
Lecture

06/02/2026
Tuesday

Lecture 4 - Part 2/2: Computational Autoregressive Models
Lecture Notes:
- Chapter 3 - Section 2
Further Reads:
- PixelRNN and PixelCNN: Paper Pixel Recurrent Neural Networks by A. Oord et al. presented at ICML in 2016 proposing PixelRNN and PixelCNN
- ImageGPT: Paper Generative Pretraining from Pixels by M. Chen et al. presented at ICML in 2020 proposing ImageGPT
Assignment

06/03/2026
Wednesday

Assignment #2 - Explicit Generative Models released!

[Assignment #2 - Explicit Generative Models]
Lecture

06/09/2026
Tuesday

Lecture 5 - Part 1/2: Energy Based Models
Lecture Notes:
- Chapter 3 - Section 3
Further Reads:
- EBMs: Chapter 24 of [M]
- Partition Function and Normalizing: Chapter 16 of [GYC] Section 16.2
- Universality of EBMs Paper Representational power of restricted Boltzmann machines and deep belief networks, by N. Le Roux and Y. Bengio published at Neural Computation in 2008 elaborating the representational power of EBMs
- Tutorial on EBMs Survey A Tutorial on Energy-Based Learning, by Y. LeCun et al. published in 2006
Lecture

06/09/2026
Tuesday

Lecture 5 - Part 2/2: EBMs and MCMC Algorithms
Lecture Notes:
- Chapter 3 - Section 3
Further Reads:
- MCMC Algorithms: Chapter 12 of [M] Sections 12.3, 12.6 and 12.7
- Gibbs Sampling and Langevin: Chapter 14 of [BB]
- Contrastive Divergence Paper Training Products of Experts by Minimizing Contrastive Divergence, by G. Hinton published at Neural Computation in 2002 proposing the idea of Contrastive Divergence
- Training by MCMC Paper Implicit Generation and Generalization in Energy-Based Models published by Y. Du and I. Mordatch in NeurIPS 2019 discussing efficiency of MCMC algorithms for EBM training
- Improved CD Paper Improved Contrastive Divergence Training of Energy-Based Models published by Y. Du et al. in ICML 2021 proposing an efficient training based on Hinton’s CD ideal
Due

06/12/2026 23:30
Friday

Project Proposal due
Lecture

06/16/2026
Tuesday

Lecture 6 - Part 1/2: Normalizing Flow
Lecture Notes:
- Chapter 3 - Section 4
Further Reads:
- Latent Variable: Chapter 16 of [BB] Sections 16.2
- Normalizing Flow: Chapter 18 of [BB]
Lecture

06/16/2026
Tuesday

Lecture 6 - Part 2/2: Flow-based Models
Lecture Notes:
- Chapter 3 - Section 4
Further Reads:
- Flow-based Models: Chapter 23 of [M]
- Tutorial on Normalizing Flow Paper Normalizing Flows for Probabilistic Modeling and Inference published by G. Papamakarios et al. at JMLR in 2021 discussing the training and inference of flow-based models
- Real NVP Paper Density estimation using Real NVP published by L. Dinh et al. at ICLR in 2017 proposing the Real NVP model
- Flow Matching Paper Flow Matching for Generative Modeling published by Y. Lipman et al. at ICLR 2023
Exam

06/18/2026 13:00
Thursday

Exam 1
Exam will be at Tutorial Session
- We start at 1 PM in BA1160
- One double-sided cheat sheet is allowed
Lecture

06/23/2026
Tuesday

Lecture 7 - Part 1/2: Generative Adversarial Nets
Lecture Notes:
- Chapter 4 - Section 1
- Chapter 4 - Section 2
Further Reads:
- Tutorial on GANs Tutorial Generative Adversarial Networks given by I. Goodfellow at NeurIPS in 2016
- GANs Paper Generative Adversarial Nets published by I. Goodfellow et al. at NeurIPS in 2014 proposing GANs
Lecture

06/23/2026
Tuesday

Lecture 7 - Part 2/2: Wasserstein GAN
Lecture Notes:
- Chapter 4 - Section 3
- Chapter 4 - Section 4
Further Reads:
- WGAN Paper Wasserstein GAN published by M. Arjovsky et al. at ICML in 2017 proposing Wasserstein GANs
- DCGAN Paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks published by A. Radford et al. at ICLR in 2016 proposing DCGAN
- StyleGAN Paper A Style-Based Generator Architecture for Generative Adversarial Networks published by T. Karras et al. at IEEE CVF in 2019 proposing Style GAN
- BigGAN Paper Large Scale GAN Training for High Fidelity Natural Image Synthesis published by A. Brock et al. at ICLR in 2019 proposing BigGAN
- SAGAN Paper Self-Attention Generative Adversarial Networks published by H. Zhang et al. at ICML in 2019 proposing Self-Attention GAN
Due

06/24/2026 23:30
Wednesday

Assignment #2 due
Assignment

06/26/2026
Friday

Assignment #3 - Generative Adversarial Networks released!

[Assignment #3 - Generative Adversarial Networks]
Lecture

07/07/2026
Tuesday

Lecture 8 - Part 1/2: Probabilistic Latent-Space Generation
Lecture Notes:
- Chapter 5 - Section 1
Further Reads:
- Probabilistic Latent: Chapter 16 of [BB] Sections 16.1 and 16.2
- Mixture Models Paper On the number of components in a Gaussian mixture model published by G. McLachlan and S. Rathnayake in 2014 reviewing some key properties of Gaussian mixtures and their approximation power
Lecture

07/07/2026
Tuesday

Lecture 8 - Part 2/2: Variational Inference
Lecture Notes:
- Chapter 5 - Section 2
Further Reads:
- ELBO: Chapter 16 of [BB] Section 16.3
- VI for Likelihood The early paper Computing Upper and Lower Bounds on Likelihoods in Intractable Networks published by T. Jaakkola and M. Jordan at UAI in 1996
- Tutorials on VI Review paper Variational Inference: A Review for Statisticians published by D. Blei, A. Kucukelbir, and J. McAuliffe in 2016 giving a good overview on VI framework
- Introduction to VI Book An Introduction to Variational Autoencoders written by D. Kingma and M. Welling and published by NOW in 2019
Due

07/09/2026 23:30
Thursday

Assignment #3 due
Exam

07/30/2026 13:00
Thursday

Exam 2
Exam will be at Tutorial Session
- We start at 1 PM in BA1160
- One double-sided cheat sheet is allowed

Overall Course Calendar

Week	Topic	Assignment	Project	Exam	Submission
1	Language Modeling
2	LLMs	Assgn 1
3	Fundamentals of Generative Learning
4	Guest Lecture				Assgn 1
5	Autoregressive Models	Assgn 2
6	Energy-based Models		Proposal		Proposal
7	Normalizing Flow			Exam 1 on Jun 18
8	Generative Adversarial Networks	Assgn 3			Assgn 2
9	Holiday
10	Variational Inference	Assgn 4			Assgn 3
11	VAEs
12	Score-based Diffusion	Assgn 5			Assgn 4
13	DPMs			Exam 2 on Jul 30
14	Multimodality and Conditioning				Assgn 5
15	Final Lecture - Reserved		Presentation		Presentation
16	No Lecture - Reserved		Code and Paper		Code and Paper

Tutorial Schedule

Date	Topic	Tutorial
May 14	PyTorch Overview -- Tokenization and Embedding	Amir Hossein
May 21	Transformer-based Language Models	Amir Hossein
May 28	Generative vs Discriminative Learning	Mohammadreza
June 4	Autoregressive Models	Mohammadreza
June 11	Energy-based Models	Amir Hossein
June 18	Exam 1
June 25	Sample Project	Cassie
July 2	Holiday
July 9	Normalizing Flow and GAN	Mohammadreza
July 16	VAE and Q-VAE	Amir Hossein
July 23	Score-based Diffusion	Mohammadreza
July 30	Exam 2
August 6	DDPM and DDIM	Mohammadreza