Materials - Deep Generative Models / Summer 2025

Lecture Notes

The lecture notes are uploaded through the semester. For each chapter, the notes are provided section by section.

Chapter 0: Course Overview and Logistics

Handouts: All Sections included in a single file

Chapter 1: Text Generation via Language Models

Section 1: Fundamentals of Language Modeling - Primary LMs
Section 2: Transformer-based LMs
Section 3: Large Language Models

Chapter 2: Data Generation Problem

Section 1: Basic Definitions
Section 2: Generative and Discriminative Learning
Section 3: Generative Modeling

Chapter 3: Data Generation by Explicit Distribution Learning

Section 1: Distribution Learning
Section 2: Autoregressive Modeling
Section 3: Computational Autoregressive Models
Section 4: Energy-based Models
Section 5: Flow-based Models

Chapter 4: Generative Adversarial Networks

Section 1: Vanilla GAN - Generating by Min-Max Game
Section 2: Implicit MLE Learning by GAN
Section 3: Wasserstein GAN
Section 4: Sample GAN Architectures and Wrap-up

Chapter 5: Variational Autoencoders

Section 1: Probabilistic Latent-Space Generation
Section 2: Variational Inference
Section 3: Variational Autoencoding
Section 4: Advances in VAEs

Chapter 6: Diffusion Models

Section 1: Learning by Langevin Dynamics
Section 2: Diffusion Score Matching
Section 3: Probabilistic Diffusion
Section 4: DPM, DDPM and DDIM
Section 5: Know Diffusion Models

Chapter 7: Multimodality and Conditional Generation

Handouts: All sections included in a single file

Tutorial Notebooks

The tutorial notebooks can be accessed below.

Tutorial 1: PyTorch Overview, Batch Training, Embedding, and Tokenization, by Amir Hossein Mobasheri
Tutorial 2: Transformers and Large Language Models, by Amir Hossein Mobasheri
Tutorial 3 - Video: Autoregressive Models, by Mohammadreza Safavi
Tutorial 4 - Video: EBMs, by Amir Hossein Mobasheri
Tutorial 5 - Video: Exam Review
Tutorial 6 - Video: GANs, by Amir Hossein Mobasheri
Tutorial 7 - Video: VAEs, by Amir Hossein Mobasheri
Tutorial 8 - Video: Diffusion Models I, by Mohammadreza Safavi
Tutorial 9 - Video: Diffusion Models II, by Mohammadreza Safavi

Book

There is indeed no single textbook for this course, and we use various resources in the course. Most of resources are research papers, which are included in the reading list below and completed through the semester. The following textbooks have however covered some key notions and related topics.

With respect to the first part of the course, the following book provides some good read:

[JM]Jurafsky, Dan, and James H. Martin. Speech and Language Processing. 3rd Edition, 2024.

The following recent textbooks are also good resources for practicing hands-on skills. Note that we are not simply learning to implement only! We study the fundamentals that led to development of this framework, nowadays known as generative AI. Of course, we try to get our hands dirty as well and learn how to do implementation.

Reading List

This section will be completed gradually through the semseter. I will try to break down the essence of each item, so that you could go over them easily.

Review

You may review the idea of Seq2Seq learning in the following references:

SimpleLM: Initial ideas on making a language model
SeqGen: Sequence generation via RNNs –Old idea, but yet worth thinking about it!
Seq2Seq: How we can do sequence to sequence learning via NNs

You may review the idea of transformers in the following resources:

Transformer Paper: Paper Attention Is All You Need! published in 2017 that made a great turn in sequence processing
Transformers: Chapter 9 of [JM]
Transformers: Chapter 12 of [BB] Section 12.1

Chapter 1: Text Generation and Language Models

Tokenization and Embedding

Tokenization: Chapter 2 of [JM]
Embedding: Chapter 6 of [JM]
Original BPE Algorithm: Original BPE Algorithm proposed by Philip Gage in 1994
BPE for Tokenization: Paper Neural machine translation of rare words with subword units by Rico Sennrich, Barry Haddow, and Alexandra Birch presented in ACL 2016 that adapted BPE for NLP

Other Embedding Approaches

Word2Vec Paper Efficient Estimation of Word Representations in Vector Space by Mikolov et al. published in 2013 introducing Word2Vec
GloVe Paper GloVe: Global Vectors for Word Representation by Pennington et al._ published in 2014 introducing GloVe
WordPiece: Paper Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation by Yonghui Wu et al. published in 2016 introducing WordPiece (used in BERT)
SentencePiece: Paper SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing by Taku Kudo and John Richardson presented in EMNLP 2018 that introduces a language-independent tokenizer
ELMo Paper Deep contextualized word representations by Peters et al. introducing ELMo a context-sensitive embedding
ByT5: Paper ByT5: Towards a token-free future with pre-trained byte-to-byte models by Xue et al. presented in ACL 2022 proposing ByT5

Language Modelling

LMs: Chapter 12 of [BB] Section 12.2
N-Gram LMs: Chapter 3 of Speech and Language Processing; Section 3.1 on N-gram LM
Maximum Likelihood: Chapter 2 of [BB] Sections 12.1 – 12.3

Recurrent LMs

Recurrent LMs: Chapter 8 of [JM]
LSTM LMs: Paper Regularizing and Optimizing LSTM Language Models by Stephen Merity, Nitish Shirish Keskar, and Richard Socher published in ICLR 2018 enabling LSTMs to perform strongly on word-level language modeling
High-Rank Recurrent LMs: Paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen presented at ICLR 2018 proposing Mixture of Softmaxes (MoS) and achieving state-of-the-art results at the time

Transformer-based LMs and LLMs

Transformer LMs: Chapter 12 of [BB] Section 12.3
LLMs via Transformers: Chapter 10 of [JM]

GPTs

GPT-1: Paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al. (OpenAI, 2018) that introduced GPT-1 and revived the idea of pretraining transformers as LMs followed by supervised fine-tuning
GPT-2: Paper Language Models are Unsupervised Multitask Learners by Alec Radford et al. (OpenAI, 2019) that introduces GPT-2 with 1.5B parameter trained on web text
GPT-3: Paper Language Models are Few-Shot Learners by Tom B. Brown et al. (OpenAI, 2020) that introduces GPT-3, a 175B-parameter transformer LM
GPT-4: GPT-4 Technical Report by OpenAI (2023) that provides an overview of GPT-4’s capabilities

Other LLMs

BERT: Paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin et al. presented at NAACL 2019 that introduced BERT
RoBERTa: Paper RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan Liu, et al. (Facebook AI, 2019) that shows BERT’s performance can be significantly improved by more data, longer training, and removing next sentence prediction
T5: Paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel et al. (JMLR 2020) that reformulates all NLP tasks as text-to-text problems introducing the T5 model

Data for LLMs

The Pile: Paper The Pile: An 800GB Dataset of Diverse Text for Language Modeling by Leo Gao et al. presented in 2020 introductin dataset The Pile
RACE: Paper RACE: Large-scale Reading Comprehension Dataset from Examinations by Guokun Lai et al. presented at EMNLP in 2017 introducing a large-scale dataset of English reading comprehension questions from real-world exams
BookCorpus: Paper Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books by Yukun Zhu et al. presented at ICCV in 2015 introducing the dataset BookCorpus. It was used to pre-train GPT-1 and BERT; nevertheless, it turned out that the dataset was collected without authors consent; see the Wikipedia article. It was hence replaced later with BookCorpusOpen
Documentation Debt: Paper Addressing “Documentation Debt” in Machine Learning Research: A Retrospective Datasheet for BookCorpus by Jack Bandy and Nicholas Vincent published in 2021 discussing the efficiency and legality of data collection by looking into BookCorpus

Earlier Work on Pretraining

SSL: Paper Semi-supervised Sequence Learning by Andrew M. Dai et al. published in 2015 that explores using unsupervised pretraining followed by supervised fine-tuning; this was an early solid work advocating pre-training idea for LMs
ULMFiT: Paper Universal Language Model Fine-tuning for Text Classification by Jeremy Howard et al. presented at ACL in 2018 introducing ULMFiT that uses pre-trained LMs with task-specific fine-tuning

Fine-tuning

LMs: Chapter 12 of [BB] Section 12.3.5
LoRA: Paper LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu et al. presented at ICLR in 2022 introducing LoRA
ReFT: Paper ReFT: Representation Finetuning for Language Models by Z. Wu et al. presented at NeurIPS in 2024 proposing an alternative fine-tuning algorithm

Prompt Design

Zero-Shot: Paper Zero-shot Learning — A Comprehensive Evaluation of the Good, the Bad and the Ugly by Yongqin Xian et al. at IEEE Tran. PAMI in 2018 presenting an overview on zero-shot learning
Chain-of-Thought: Paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models by Jason Wei et al. presented at NeurIPS in 2022 introducing chain-of-thought prompting
Prefix-Tuning: Paper Prefix-Tuning: Optimizing Continuous Prompts for Generation by Xiang Lisa Li et al. presented at ACL in 2021 proposing prefix-tuning approach for prompting
Prompt-Tuning: Paper The Power of Scale for Parameter-Efficient Prompt Tuning by B. Lester et al. presented at EMNLP in 2021 proposing the prompt tuning idea, i.e., learning to prompt
Zero-Shot LLMs: Paper Large Language Models are Zero-Shot Reasoners by T. Kojima et al. presented at NeurIPS in 2022 studying zero-shot learning with LLMs
Prompt Engineering is Dead: Article AI Prompt Engineering Is Dead: Long Live AI Prompt Engineering by Dina Genkina published in IEEE Spectrum in 2024

Foundation Models

CRFM Center for Research on Foundation Models who coined the term Foundation Model

Chapter 2: Data Generation Problem

Basic Definitions

Probabilistic Model: Chapter 2 of [BB] Sections 2.4 to 2.6
Statistics: Chapter 3 of [M] Sections 3.1 to 3.3
Bayesian Statistics: Chapter 5 of [GYC] Section 5.6

Generative and Discriminative Learning

Discriminative and Generative Models: Chapter 5 of [BB]

Generative Models

Naive Bayes: Paper Idiot’s Bayes—Not So Stupid After All? by D. Hand and K. Yu published at Statistical Review in 2001 discussing the efficiency of Naive Bayes for classification
Naive Bayes vs Linear Regression: Paper On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes by A. Ng and M. Jordan presented at NeurIPS in 2001 elaborating the data-efficiency efficiency of Naive Bayes and asymptotic superiority of Logistic Regression
Generative Models – Overview: Chapter 20 of [M] Sections 20.1 to 20.3

Chapter 3: Explicit Distribution Learning

Sampling

Sampling Overview: Chapter 14 of [BB]
Sampling The book Pattern Recognition and Machine Learning by Christopher Bishop. Read Chapter 11 to know about how challenging sampling from a distribution is
Sampling Methods: Chapter 17 of [GYC] Sections 17.1 and 17.2

Distribution Learning via MLE

KL Divergence and MLE: Chapter 5 of [M] Sections 5.1 to 5.2
MLE: Chapter 5 of [GYC] Section 5.5
Maximum Likelihood Learning The book Information Theory, Inference, and Learning Algorithms by David MacKay which discusses MLE for clustering in Chapter 22
Evaluating Distribution Learning: Chapter 20 of [M] Sections 20.4

Autoregressive Models

Autoregressive Models: Chapter 22 of [M]
PixelRNN and PixelCNN: Paper Pixel Recurrent Neural Networks by A. Oord et al. presented at ICMLR in 2016 proposing PixelRNN and PixelCNN
ImageGPT: Paper Generative Pretraining from Pixels by M. Chen et al. presented at ICML in 2020 proposing ImageGPT

Energy-based Models

EBMs: Chapter 24 of [M]
Partition Function and Normalizing: Chapter 16 of [GYC] Section 16.2
Universality of EBMs Paper Representational power of restricted Boltzmann machines and deep belief networks, by N. Le Roux and Y. Bengio published at Neural Computation in 2008 elaborating the representational power of EBMs *Tutorial on EBMs Survey A Tutorial on Energy-Based Learning, by Y. LeCun et al. published in 2006

MCMC Sampling

MCMC Algorithms: Chapter 12 of [M] Sections 12.3, 12.6 and 12.7
Gibbs Sampling and Langevin: Chapter 14 of [BB]

Training EBMs by MCMC Sampling

Contrastive Divergence Paper Training Products of Experts by Minimizing Contrastive Divergence, by G. Hinton published at Neural Computation in 2002 proposing the idea of Conservative Divergence
Training by MCMC Paper Implicit Generation and Generalization in Energy-Based Models published by Y. Du and I. Mordatch in NeurIPS 2019 discussing efficiency of MCMC algorithms for EBM training
Improved CD Paper Improved Contrastive Divergence Training of Energy-Based Models published by Y. Du et al. in ICML 2021 proposing an efficient training based on Hinton’s CD ideal
Anatomy of MCMC Paper On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models published by E. Nijkamp et al. in AAAI 2020 looking on the stability of training by MCMC algorithms

Latent Space

Latent Variable: Chapter 16 of [BB] Sections 16.2

Normalizing Flow and Flow-based Models

Normalizing Flow: Chapter 18 of [BB]
Flow-based Models: Chapter 23 of [M]
Tutorial on Normalizing Flow Paper Normalizing Flows for Probabilistic Modeling and Inference published by G. Papamakarios et al. at JMLR in 2021 discussing the training and inference of flow-based models

State-of-the-art Flow Models: NICE, Real NVP and Glow

NICE Paper NICE: Non-linear Independent Components Estimation published by L. Dinh et al. at ICLR in 2015 proposing the NICE model
Real NVP Paper Density estimation using Real NVP published by L. Dinh et al. at ICLR in 2017 proposing the Real NVP model
Glow Paper Glow: Generative Flow with Invertible 1x1 Convolutions published by D. Kingma and P. Dhariwal at NeurIPS in 2018 proposing the Glow model
Flow++ Paper Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design published by J. Ho et al. at ICML in 2019 proposing the Flow++ model

Chapter 4: Generative Adversarial Networks

Vanilla GAN

GANs Paper Generative Adversarial Nets published by I. Goodfellow et al. at NeurIPS in 2014 proposing GANs
Tutorial on GANs Tutorial Generative Adversarial Networks given by I. Goodfellow at NeurIPS in 2016
GAN Overview Paper Generative Adversarial Networks: An Overview published by A. Creswell et al. at IEEE Signal Processing Magazine in 2018 giving a high-level overview on GANs

W-GAN

W-GANs Paper Wasserstein GAN published by M. Arjovsky et al. at ICML in 2017 proposing Wasserstein GANs

Examples of GAN Architectures

DCGAN Paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks published by A. Radford et al. at ICLR in 2016 proposing DCGAN
StyleGAN Paper A Style-Based Generator Architecture for Generative Adversarial Networks published by T. Karras et al. at IEEE CVF in 2019 proposing Style GAN
BigGAN Paper Large Scale GAN Training for High Fidelity Natural Image Synthesis published by A. Brock et al. at ICLR in 2019 proposing BigGAN
SAGAN Paper Self-Attention Generative Adversarial Networks published by H. Zhang et al. at ICML in 2019 proposing Self-Attention GAN

Chapter 5: Variational Inference and VAEs

Probabilistic Latent-Space Generation

Probabilistic Latent: Chapter 16 of [BB] Sections 16.1 and 16.2
AE with VI Paper Auto-Encoding Variational Bayes published by D. Kingma and M. Welling first in 2013 discussing the power of probabilistic generation from latent
Mixture Models Paper On the number of components in a Gaussian mixture model published by G. McLachlan and S. Rathnayake in 2014 reviewing some key properties of Gaussian mixtures and their approximation power

Variational Inference

ELBO: Chapter 16 of [BB] Section 16.3
VI for Likelihood The early paper Computing Upper and Lower Bounds on Likelihoods in Intractable Networks published by T. Jaakkola and M. Jordan at UAI in 1996
Tutorials on VI Review paper Variational Inference: A Review for Statisticians published by D. Blei, A. Kucukelbir, and J. McAuliffe in 2016 giving a good overview on VI framework
Introduction to VAE Book An Introduction to Variational Autoencoders written by D. Kingma and M. Welling and published by NOW in 2019

Variational Autoencoding

AE with VI Paper Auto-Encoding Variational Bayes published by D. Kingma and M. Welling in 2014 proposing VAE
Stachastic Generation by VAE Paper Stochastic Backpropagation and Approximate Inference in Deep Generative Models published by D. Rezende et al in 2014 proposing VAE in parallel
Introduction to VAE Book An Introduction to Variational Autoencoders written by D. Kingma and M. Welling and published by NOW in 2019

Reparametrization Trick

Introduction to VAE Book An Introduction to Variational Autoencoders written by D. Kingma and M. Welling and published by NOW in 2019 read the chapter on Reperameterization

Known VAEs

DCVAE Paper Semi-Supervised Learning with Deep Generative Models published by D. Kingma et al. in 2014 implementing a Deep Convolutional VAE
Transformer VAE Paper Transformer VAE: A Hierarchical Model for Structure-Aware and Interpretable Music Representation Learning published by J. Jiang et al in ICASSP 2020 proposing a Transformer based VAE
VQ-VAE Paper Neural Discrete Representation Learning published by D. Kingma and M. Welling in NeurIPS 2017 proposing VQ-VAE
VAE with VampPrior Paper VAE with a VampPrior published by J. Tomczak and M. Welling in 2017 proposing VAE with general latent prior

Chapter 6: Diffusion Models

Learning by Langevin Dynamics

Score Matching Paper Estimation of non-normalized statistical models by score matching published by A. Hyvärinen in 2005 proposing the computational score matching

Diffusion Score Matching

DSM Paper A Connection Between Score Matching and Denoising Autoencoders published in Neural Computation by Pascal Vincent in 2011 proposing the denoising approach for score estimation (DSM)
Langevin Generation Paper Generative Modeling by Estimating Gradients of the Data Distribution published by Song and Ermon in NeurIPS 2019 explaining the score matching and its application to generation by Langevin Dynamics

Diffusion Probabilistic Models

Reverse-time Diffusion Paper Reverse-time diffusion equation models published in Elsevier by B. Anderson in 1982 esplaining the reverse-time diffusion process
SDE Approach Paper Maximum Likelihood Training of Score-Based Diffusion Models by Song et al. in NeurIPS 2021 explaining the DPM and DDPM from inverse Diffusion viewpoint
DPM Paper Deep Unsupervised Learning using Nonequilibrium Thermodynamics published by J. Sohl-Dickstein et al. in ICML 2015 proposing DPM framework for generation (from Variational Inference)
DDPM Paper Denoising Diffusion Probabilistic Models published by J. Ho et al. in NeurIPS 2020 proposing DDPM framework
DDPM GitHub GitHub Page including codes of the paper Denoising Diffusion Probabilistic Models
Improved DDPM Paper Improved Denoising Diffusion Probabilistic Models published by A. Nichol and P. Dhariwal in ICML 2021 proposing improvements to DDPM
DDIM Paper Denoising Diffusion Implicit Models published by J. Song et al. in ICLR 2021 proposing DDIM framework
Clssifier-free Denosing Paper Classifier-Free Diffusion Guidance published by J. Ho and T. Salimans in NeurIPS 2021 proposing improved training for DDPM

State-of-the-Art Diffusion Models

Stable Diffusion Paper High-Resolution Image Synthesis with Latent Diffusion Models published by R. Rombach et al. in IEEE CVPR 2022 proposing Stable Diffusion
CVL Group Page of the research group Computer Vision & Learning Group in LMU Munich which developed Stable Diffusion
Imagen Paper Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding published by C. Saharia et al. in 2022 at Google proposing Imagen model
DALL-E Page of the DALL-E project by OpenAI

Chapter 7: Conditional Models

Text-to-Image Paper Learning Transferable Visual Models From Natural Language Supervision published by A. Radford et al. in ICML 2021 proposing visual data generation from raw text
FiLM Paper FiLM: Visual Reasoning with a General Conditioning Layer published by E. Perez et al. in AAAI 2018 proposing FiLM
Cross-Attention Conditioning Paper Multi-Modality Cross Attention Network for Image and Sentence Matching published by X. Wei et al. in IEEE CVPR 2020 proposing a cross-attention based approach for conditioning
Survey on Multimodal Models Paper Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions published by P. Liang et al. in 2023