Schedule

  • Event
    Date
    Description
    Description
  • Session
    05/06/2025 22:00
    Tuesday
    First Lecture
  • Lecture
    05/06/2025
    Tuesday
    Lecture 0: Course Overview and Logistics

    Lecture Notes:

  • Lecture
    05/06/2025
    Tuesday
    Lecture 1: Tokenization and Embedding

    Lecture Notes:

    Further Reads:

  • Lecture
    05/08/2025
    Thursday
    Lecture 2: Language Distribution and Bi-Gram Model

    Lecture Notes:

    Further Reads:

  • Lecture
    05/08/2025
    Thursday
    Lecture 3: Recurrent LMs

    Lecture Notes:

    Further Reads:

    • Recurrent LMs: Chapter 8 of [JM]
    • LSTM LMs: Paper Regularizing and Optimizing LSTM Language Models by Stephen Merity, Nitish Shirish Keskar, and Richard Socher published in ICLR 2018 enabling LSTMs to perform strongly on word-level language modeling
    • High-Rank Recurrent LMs: Paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen presented at ICLR 2018 proposing Mixture of Softmaxes (MoS) and achieving state-of-the-art results at the time
  • Lecture
    05/13/2025
    Tuesday
    Lecture 4: Context Extraction via Self-Attention

    Lecture Notes:

    Further Reads:

  • Lecture
    05/13/2025
    Tuesday
    Lecture 5: Transformer LM

    Lecture Notes:

    Further Reads:

  • Lecture
    05/15/2025
    Thursday
    Lecture 6: LLM Examples

    Lecture Notes:

    Further Reads:

    • GPT-1: Paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al. (OpenAI, 2018) that introduced GPT-1 and revived the idea of pretraining transformers as LMs followed by supervised fine-tuning
    • GPT-2: Paper Language Models are Unsupervised Multitask Learners by Alec Radford et al. (OpenAI, 2019) that introduces GPT-2 with 1.5B parameter trained on web text
    • GPT-3: Paper Language Models are Few-Shot Learners by Tom B. Brown et al. (OpenAI, 2020) that introduces GPT-3, a 175B-parameter transformer LM
    • GPT-4: GPT-4 Technical Report by OpenAI (2023) that provides an overview of GPT-4’s capabilities

    • The Pile: Paper The Pile: An 800GB Dataset of Diverse Text for Language Modeling by Leo Gao et al. presented in 2020 introductin dataset The Pile
    • Documentation Debt: Paper Addressing “Documentation Debt” in Machine Learning Research: A Retrospective Datasheet for BookCorpus by Jack Bandy and Nicholas Vincent published in 2021 discussing the efficiency and legality of data collection by looking into BookCorpus
  • Lecture
    05/15/2025
    Thursday
    Lecture 7: Pre-training vs Fine-tuning

    Lecture Notes:

    Further Reads:

    • SSL: Paper Semi-supervised Sequence Learning by Andrew M. Dai et al. published in 2015 that explores using unsupervised pretraining followed by supervised fine-tuning; this was an early solid work advocating pre-training idea for LMs
    • GPT-1: Paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al. (OpenAI, 2018) that introduced GPT-1 and revived the idea of pretraining transformers as LMs followed by supervised fine-tuning
  • Lecture
    05/15/2025
    Thursday
    Lecture 8: Statistical View and LoRA

    Lecture Notes:

    Further Reads:

    • LMs: Chapter 12 of [BB] Section 12.3.5
    • LoRA: Paper LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu et al. presented at ICLR in 2022 introducing LoRA
  • Assignment
    05/20/2025
    Tuesday
    Assignment #1 - Language Modeling released!
  • Lecture
    05/20/2025
    Tuesday
    Lecture 9: Prompt Design

    Lecture Notes:

    Further Reads:

    • Chain-of-Thought: Paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models by Jason Wei et al. presented at NeurIPS in 2022 introducing chain-of-thought prompting
    • Prefix-Tuning: Paper Prefix-Tuning: Optimizing Continuous Prompts for Generation by Xiang Lisa Li et al. presented at ACL in 2021 proposing prefix-tuning approach for prompting
    • Prompt-Tuning: Paper The Power of Scale for Parameter-Efficient Prompt Tuning by B. Lester et al. presented at EMNLP in 2021 proposing the prompt tuning idea, i.e., learning to prompt
    • Zero-Shot LLMs: Paper Large Language Models are Zero-Shot Reasoners by T. Kojima et al. presented at NeurIPS in 2022 studying zero-shot learning with LLMs
  • Lecture
    05/20/2025
    Tuesday
    Lecture 10: Data Generation Problem - Basic Definitions

    Lecture Notes:

    Further Reads:

  • Lecture
    05/22/2025
    Thursday
    Lecture 11: Discriminative vs Generative Learning

    Lecture Notes:

    Further Reads:

  • Lecture
    05/22/2025
    Thursday
    Lecture 12: Naive Bayes - Most Basic Generative Model

    Lecture Notes:

    Further Reads:

    • Naive Bayes: Paper Idiot’s Bayes—Not So Stupid After All? by D. Hand and K. Yu published at Statistical Review in 2001 discussing the efficiency of Naive Bayes for classification
    • Naive Bayes vs Linear Regression: Paper On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes by A. Ng and M. Jordan presented at NeurIPS in 2001 elaborating the data-efficiency efficiency of Naive Bayes and asymptotic superiority of Logistic Regression
    • Generative Models – Overview: Chapter 20 of [M] Sections 20.1 to 20.3
  • Lecture
    05/27/2025
    Tuesday
    Lecture 13: Explicit Distribution Learning - Sampling

    Lecture Notes:

    Further Reads:

    • Sampling Overview: Chapter 14 of [BB]
    • Sampling The book Pattern Recognition and Machine Learning by Christopher Bishop. Read Chapter 11 to know about how challenging sampling from a distribution is
    • Sampling Methods: Chapter 17 of [GYC] Sections 17.1 and 17.2
  • Lecture
    05/27/2025
    Tuesday
    Lecture 14: Maximum Likelihood Learning

    Lecture Notes:

    Further Reads:

  • Lecture
    05/27/2025
    Tuesday
    Lecture 15: Autoregressive Modeling

    Lecture Notes:

    Further Reads:

  • Lecture
    05/29/2025
    Thursday
    Lecture 16: Computational AR Models

    Lecture Notes:

    Further Reads:

  • Lecture
    05/29/2025
    Thursday
    Lecture 17: PixelRNN

    Lecture Notes:

    Further Reads:

    • PixelRNN and PixelCNN: Paper Pixel Recurrent Neural Networks by A. Oord et al. presented at ICML in 2016 proposing PixelRNN and PixelCNN
  • Lecture
    06/03/2025
    Tuesday
    Lecture 18: Masked AR Models - PixelCNN and ImageGPT

    Lecture Notes:

    Further Reads:

    • PixelRNN and PixelCNN: Paper Pixel Recurrent Neural Networks by A. Oord et al. presented at ICML in 2016 proposing PixelRNN and PixelCNN
    • ImageGPT: Paper Generative Pretraining from Pixels by M. Chen et al. presented at ICML in 2020 proposing ImageGPT
  • Lecture
    06/03/2025
    Tuesday
    Lecture 19: Energy Based Models - Boltzmann Distribution

    Lecture Notes:

    Further Reads:

    • EBMs: Chapter 24 of [M]
    • Partition Function and Normalizing: Chapter 16 of [GYC] Section 16.2
    • Universality of EBMs Paper Representational power of restricted Boltzmann machines and deep belief networks, by N. Le Roux and Y. Bengio published at Neural Computation in 2008 elaborating the representational power of EBMs *Tutorial on EBMs Survey A Tutorial on Energy-Based Learning, by Y. LeCun et al. published in 2006
  • Lecture
    06/05/2025
    Thursday
    Lecture 20: Computational EBMs - Training and Sampling

    Lecture Notes:

    Further Reads:

  • Lecture
    06/05/2025
    Thursday
    Lecture 21: MCMC Algorithms - Gibbs Sampling

    Lecture Notes:

    Further Reads:

  • Due
    06/05/2025 23:59
    Thursday
    Assignment #1 due
  • Lecture
    06/10/2025
    Tuesday
    Lecture 22: MCMC - Langevin and Conservative Divergence

    Lecture Notes:

    Further Reads:

    • Gibbs Sampling and Langevin: Chapter 14 of [BB]
    • Conservative Divergence Paper Training Products of Experts by Minimizing Contrastive Divergence, by G. Hinton published at Neural Computation in 2002 proposing the idea of Conservative Divergence
    • Training by MCMC Paper Implicit Generation and Generalization in Energy-Based Models published by Y. Du and I. Mordatch in NeurIPS 2019 discussing efficiency of MCMC algorithms for EBM training
    • Improved CD Paper Improved Contrastive Divergence Training of Energy-Based Models published by Y. Du et al. in ICML 2021 proposing an efficient training based on Hinton’s CD ideal
  • Lecture
    06/10/2025
    Tuesday
    Lecture 23: Latent Space

    Lecture Notes:

    Further Reads:

  • Lecture
    06/10/2025
    Tuesday
    Lecture 24: Normalizing Flow

    Lecture Notes:

    Further Reads:

  • Lecture
    06/12/2025
    Thursday
    Lecture 25: Learning Flow

    Lecture Notes:

    Further Reads:

    • Flow-based Models: Chapter 23 of [M]
    • Tutorial on Normalizing Flow Paper Normalizing Flows for Probabilistic Modeling and Inference published by G. Papamakarios et al. at JMLR in 2021 discussing the training and inference of flow-based models
  • Lecture
    06/12/2025
    Thursday
    Lecture 26: NICE, RealNVP and Glow

    Lecture Notes:

    Further Reads:

    • NICE Paper NICE: Non-linear Independent Components Estimation published by L. Dinh et al. at ICLR in 2015 proposing the NICE model
    • Real NVP Paper Density estimation using Real NVP published by L. Dinh et al. at ICLR in 2017 proposing the Real NVP model
    • Glow Paper Glow: Generative Flow with Invertible 1x1 Convolutions published by D. Kingma and P. Dhariwal at NeurIPS in 2018 proposing the Glow model
  • Lecture
    06/12/2025
    Thursday
    Lecture 27: Introduction to GAN

    Lecture Notes:

    Further Reads:

    • Tutorial on GANs Tutorial Generative Adversarial Networks given by I. Goodfellow at NeurIPS in 2016
  • Assignment
    06/17/2025
    Tuesday
    Assignment #2 - Explicit Methods for Generation released!
  • Lecture
    06/17/2025
    Tuesday
    Lecture 28: Vanilla GAN

    Lecture Notes:

    Further Reads:

    • GANs Paper Generative Adversarial Nets published by I. Goodfellow et al. at NeurIPS in 2014 proposing GANs
  • Lecture
    06/17/2025
    Tuesday
    Lecture 29: Implicit MLE via GAN

    Lecture Notes:

    Further Reads:

    • GANs Paper Generative Adversarial Nets published by I. Goodfellow et al. at NeurIPS in 2014 proposing GANs
    • Tutorial on GANs Tutorial Generative Adversarial Networks given by I. Goodfellow at NeurIPS in 2016
  • Lecture
    06/19/2025
    Thursday
    Lecture 30: Wasserstein Distance

    Lecture Notes:

    Further Reads:

    • W-GANs Paper Wasserstein GAN published by M. Arjovsky et al. at ICML in 2017 proposing Wasserstein GANs
    • Tutorial on GANs Tutorial Generative Adversarial Networks given by I. Goodfellow at NeurIPS in 2016
  • Lecture
    06/19/2025
    Thursday
    Lecture 31: Wasserstein GAN

    Lecture Notes:

    Further Reads:

    • W-GANs Paper Wasserstein GAN published by M. Arjovsky et al. at ICML in 2017 proposing Wasserstein GANs
  • Lecture
    06/19/2025
    Thursday
    Lecture 32: GAN Samples

    Lecture Notes:

    Further Reads:

    • DCGAN Paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks published by A. Radford et al. at ICLR in 2016 proposing DCGAN
    • StyleGAN Paper A Style-Based Generator Architecture for Generative Adversarial Networks published by T. Karras et al. at IEEE CVF in 2019 proposing Style GAN
    • BigGAN Paper Large Scale GAN Training for High Fidelity Natural Image Synthesis published by A. Brock et al. at ICLR in 2019 proposing BigGAN
    • SAGAN Paper Self-Attention Generative Adversarial Networks published by H. Zhang et al. at ICML in 2019 proposing Self-Attention GAN
  • Exam
    06/24/2025 18:00
    Tuesday
    Midterm

    Topics:

    • The exam covers Chapters 1 to 3
    • The exam is 3 hours long
    • No programming questions
    • Starts at 6:00 PM in EX-320
  • Lecture
    07/03/2025
    Thursday
    Lecture 33: Probabilistic Latent-Space Generation

    Lecture Notes:

    Further Reads:

    • Probabilistic Latent: Chapter 16 of [BB] Sections 16.1 and 16.2
    • Mixture Models Paper On the number of components in a Gaussian mixture model published by G. McLachlan and S. Rathnayake in 2014 reviewing some key properties of Gaussian mixtures and their approximation power
  • Lecture
    07/03/2025
    Thursday
    Lecture 34: Variational Inference

    Lecture Notes:

    Further Reads:

    • ELBO: Chapter 16 of [BB] Section 16.3
    • VI for Likelihood The early paper Computing Upper and Lower Bounds on Likelihoods in Intractable Networks published by T. Jaakkola and M. Jordan at UAI in 1996
    • Tutorials on VI Review paper Variational Inference: A Review for Statisticians published by D. Blei, A. Kucukelbir, and J. McAuliffe in 2016 giving a good overview on VI framework
    • Introduction to VI Book An Introduction to Variational Autoencoders written by D. Kingma and M. Welling and published by NOW in 2019
  • Assignment
    07/05/2025
    Saturday
    Project Briefing released!
  • Due
    07/07/2025 23:59
    Monday
    Assignment #2 due
  • Due
    07/16/2025 23:59
    Wednesday
    Project Briefing due

Tutorial Schedule

Session Topics Tutor
Tutorial 1 PyTorch Overview -- Tokenization and Embedding A. Mobasheri
Tutorial 2 Transformers and Large Language Models A. Mobasheri
Tutorial 3 Auto-regressive Models M. Safavi
Tutorial 4 Energy-based Models A. Mobasheri
Tutorial 5 Generative Adversarial Networks | Exam Overview M. Safavi
Reading Week & Exam - No Lecture N/A
Tutorial 6 Variational Inference and VAEs A. Mobasheri
Tutorial 7 Diffusion Models I M. Safavi
Tutorial 8 Sample Project Demo A. Mobasheri
Tutorial 9 Diffusion Models II M. Safavi
Tutorial 10 Advances and Practical Considerations M. Safavi