Schedule

  • Event
    Date
    Description
    Description
  • Session
    05/05/2026 13:00
    Tuesday
    First Lecture
  • Lecture
    05/05/2026
    Tuesday
    Lecture 0: Course Overview and Logistics

    Lecture Notes:

  • Lecture
    05/05/2026
    Tuesday
    Lecture 1: Language Modeling

    Lecture Notes:

    Further Reads:

    • Tokenization: Chapter 2 of [JM]
    • Embedding: Chapter 6 of [JM]
    • Original BPE Algorithm: Original BPE Algorithm proposed by Philip Gage in 1994
    • BPE for Tokenization: Paper Neural machine translation of rare words with subword units by Rico Sennrich, Barry Haddow, and Alexandra Birch presented in ACL 2016 that adapted BPE for NLP
    • LMs: Chapter 12 of [BB] Section 12.2
    • N-Gram LMs: Chapter 3 of Speech and Language Processing; Section 3.1 on N-gram LM
    • Maximum Likelihood: Chapter 2 of [BB] Section 2.3
    • Recurrent LMs: Chapter 8 of [JM]
    • LSTM LMs: Paper Regularizing and Optimizing LSTM Language Models by Stephen Merity, Nitish Shirish Keskar, and Richard Socher published in ICLR 2018 enabling LSTMs to perform strongly on word-level language modeling
    • High-Rank Recurrent LMs: Paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen presented at ICLR 2018 proposing Mixture of Softmaxes (MoS) and achieving state-of-the-art results at the time
  • Lecture
    05/12/2026
    Tuesday
    Lecture 2: Transformer-based Language Models

    Lecture Notes:

  • Lecture
    05/12/2026
    Tuesday
    Lecture 3: Large Language Models

    Lecture Notes:

    Further Reads:

    GPT Papers:

    • GPT-1: Paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al. (OpenAI, 2018) that introduced GPT-1 and revived the idea of pretraining transformers as LMs followed by supervised fine-tuning
    • GPT-2: Paper Language Models are Unsupervised Multitask Learners by Alec Radford et al. (OpenAI, 2019) that introduces GPT-2 with 1.5B parameter trained on web text
    • GPT-3: Paper Language Models are Few-Shot Learners by Tom B. Brown et al. (OpenAI, 2020) that introduces GPT-3, a 175B-parameter transformer LM
    • GPT-4: GPT-4 Technical Report by OpenAI (2023) that provides an overview of GPT-4’s capabilities

    Data for LLMs:

    • The Pile: Paper The Pile: An 800GB Dataset of Diverse Text for Language Modeling by Leo Gao et al. presented in 2020 introductin dataset The Pile
    • Documentation Debt: Paper Addressing “Documentation Debt” in Machine Learning Research: A Retrospective Datasheet for BookCorpus by Jack Bandy and Nicholas Vincent published in 2021 discussing the efficiency and legality of data collection by looking into BookCorpus

    Fine-tuning:

    • SSL: Paper Semi-supervised Sequence Learning by Andrew M. Dai et al. published in 2015 that explores using unsupervised pretraining followed by supervised fine-tuning; this was an early solid work advocating pre-training idea for LMs
    • LoRA: Paper LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu et al. presented at ICLR in 2022 introducing LoRA

    Prompt Design:

    • Prefix-Tuning: Paper Prefix-Tuning: Optimizing Continuous Prompts for Generation by Xiang Lisa Li et al. presented at ACL in 2021 proposing prefix-tuning approach for prompting
    • Prompt-Tuning: Paper The Power of Scale for Parameter-Efficient Prompt Tuning by B. Lester et al. presented at EMNLP in 2021 proposing the prompt tuning idea, i.e., learning to prompt
    • Zero-Shot LLMs: Paper Large Language Models are Zero-Shot Reasoners by T. Kojima et al. presented at NeurIPS in 2022 studying zero-shot learning with LLMs
  • Assignment
    05/14/2026
    Thursday
    Assignment #1 - Language Modeling released!
  • Assignment
    05/19/2026
    Tuesday
    Project Proposal released!
  • Session
    05/26/2026 13:00
    Tuesday
    Guest Lecture

    Erik Saarenvirta from Google will give a talk on Building AI Supercomputers on Google Cloud.

  • Due
    05/28/2026 23:30
    Thursday
    Assignment #1 due
  • Due
    06/12/2026 23:30
    Friday
    Project Proposal due



Overall Course Calendar

Week Topic Assignment Project Exam Submission
1 Language Modeling
2 LLMs Assgn 1
3 Fundamentals of Generative Learning
4 Guest Lecture Assgn 1
5 Autoregressive Models Assgn 2
6 Energy-based Models Proposal Proposal
7 Normalizing Flow Exam 1
8 Generative Adversarial Networks Assgn 3 Assgn 2
9 Holiday
10 Variational Inference Assgn 4 Assgn 3
11 VAEs
12 Score-based Diffusion Assgn 5 Assgn 4
13 DPMs Exam 2
14 Multimodality and Conditioning Assgn 5
15 Final Lecture - Reserved Presentation Presentation
16 No Lecture - Reserved Code and Paper Code and Paper




Tutorial Schedule

Date Topic Tutorial
May 14 PyTorch Overview -- Tokenization and Embedding Amir Hossein
May 21 Transformer-based Language Models Amir Hossein
May 28 Generative vs Discriminative Learning Mohammadreza
June 4 Autoregressive Models Mohammadreza
June 11 Energy-based Models Amir Hossein
June 18 Midterm
June 25 Normalizing Flow Mohammadreza
July 2 GAN Amir Hossein
July 9 Sample Project
July 16 VAE and Q-VAE Amir Hossein
July 23 Score-based Diffusion Mohammadreza
July 30 Midterm
August 6 DDPM and DDIM Mohammadreza