Schedule
-
EventDateDescriptionDescription
-
Session05/05/2026 13:00
TuesdayFirst Lecture -
Lecture05/05/2026
TuesdayLecture 0: Course Overview and LogisticsLecture Notes:
-
Lecture05/05/2026
TuesdayLecture 1: Language ModelingLecture Notes:
Further Reads:
- Tokenization: Chapter 2 of [JM]
- Embedding: Chapter 6 of [JM]
- Original BPE Algorithm: Original BPE Algorithm proposed by Philip Gage in 1994
- BPE for Tokenization: Paper Neural machine translation of rare words with subword units by Rico Sennrich, Barry Haddow, and Alexandra Birch presented in ACL 2016 that adapted BPE for NLP
- LMs: Chapter 12 of [BB] Section 12.2
- N-Gram LMs: Chapter 3 of Speech and Language Processing; Section 3.1 on N-gram LM
- Maximum Likelihood: Chapter 2 of [BB] Section 2.3
- Recurrent LMs: Chapter 8 of [JM]
- LSTM LMs: Paper Regularizing and Optimizing LSTM Language Models by Stephen Merity, Nitish Shirish Keskar, and Richard Socher published in ICLR 2018 enabling LSTMs to perform strongly on word-level language modeling
- High-Rank Recurrent LMs: Paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen presented at ICLR 2018 proposing Mixture of Softmaxes (MoS) and achieving state-of-the-art results at the time
-
Lecture05/12/2026
TuesdayLecture 2: Transformer-based Language ModelsLecture Notes:
- Chapter 1 - Section 2 Further Reads:
- Transformer Paper: Paper Attention Is All You Need! published in 2017 that made a great turn in sequence processing
- Transformers: Chapter 9 of [JM]
- Transformers: Chapter 12 of [BB] Section 12.1
- LLMs via Transformers: Chapter 10 of [JM]
-
Lecture05/12/2026
TuesdayLecture 3: Large Language ModelsLecture Notes:
Further Reads:
GPT Papers:
- GPT-1: Paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al. (OpenAI, 2018) that introduced GPT-1 and revived the idea of pretraining transformers as LMs followed by supervised fine-tuning
- GPT-2: Paper Language Models are Unsupervised Multitask Learners by Alec Radford et al. (OpenAI, 2019) that introduces GPT-2 with 1.5B parameter trained on web text
- GPT-3: Paper Language Models are Few-Shot Learners by Tom B. Brown et al. (OpenAI, 2020) that introduces GPT-3, a 175B-parameter transformer LM
- GPT-4: GPT-4 Technical Report by OpenAI (2023) that provides an overview of GPT-4’s capabilities
Data for LLMs:
- The Pile: Paper The Pile: An 800GB Dataset of Diverse Text for Language Modeling by Leo Gao et al. presented in 2020 introductin dataset The Pile
- Documentation Debt: Paper Addressing “Documentation Debt” in Machine Learning Research: A Retrospective Datasheet for BookCorpus by Jack Bandy and Nicholas Vincent published in 2021 discussing the efficiency and legality of data collection by looking into BookCorpus
Fine-tuning:
- SSL: Paper Semi-supervised Sequence Learning by Andrew M. Dai et al. published in 2015 that explores using unsupervised pretraining followed by supervised fine-tuning; this was an early solid work advocating pre-training idea for LMs
- LoRA: Paper LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu et al. presented at ICLR in 2022 introducing LoRA
Prompt Design:
- Prefix-Tuning: Paper Prefix-Tuning: Optimizing Continuous Prompts for Generation by Xiang Lisa Li et al. presented at ACL in 2021 proposing prefix-tuning approach for prompting
- Prompt-Tuning: Paper The Power of Scale for Parameter-Efficient Prompt Tuning by B. Lester et al. presented at EMNLP in 2021 proposing the prompt tuning idea, i.e., learning to prompt
- Zero-Shot LLMs: Paper Large Language Models are Zero-Shot Reasoners by T. Kojima et al. presented at NeurIPS in 2022 studying zero-shot learning with LLMs
-
Assignment05/14/2026
ThursdayAssignment #1 - Language Modeling released! -
Assignment05/19/2026
TuesdayProject Proposal released! -
Session05/26/2026 13:00
TuesdayGuest LectureErik Saarenvirta from Google will give a talk on Building AI Supercomputers on Google Cloud.
-
Due05/28/2026 23:30
ThursdayAssignment #1 due -
Due06/12/2026 23:30
FridayProject Proposal due
Overall Course Calendar
| Week | Topic | Assignment | Project | Exam | Submission |
|---|---|---|---|---|---|
| 1 | Language Modeling | ||||
| 2 | LLMs | Assgn 1 | |||
| 3 | Fundamentals of Generative Learning | ||||
| 4 | Guest Lecture | Assgn 1 | |||
| 5 | Autoregressive Models | Assgn 2 | |||
| 6 | Energy-based Models | Proposal | Proposal | ||
| 7 | Normalizing Flow | Exam 1 | |||
| 8 | Generative Adversarial Networks | Assgn 3 | Assgn 2 | ||
| 9 | Holiday | ||||
| 10 | Variational Inference | Assgn 4 | Assgn 3 | ||
| 11 | VAEs | ||||
| 12 | Score-based Diffusion | Assgn 5 | Assgn 4 | ||
| 13 | DPMs | Exam 2 | |||
| 14 | Multimodality and Conditioning | Assgn 5 | |||
| 15 | Final Lecture - Reserved | Presentation | Presentation | ||
| 16 | No Lecture - Reserved | Code and Paper | Code and Paper |
Tutorial Schedule
| Date | Topic | Tutorial |
|---|---|---|
| May 14 | PyTorch Overview -- Tokenization and Embedding | Amir Hossein |
| May 21 | Transformer-based Language Models | Amir Hossein |
| May 28 | Generative vs Discriminative Learning | Mohammadreza |
| June 4 | Autoregressive Models | Mohammadreza |
| June 11 | Energy-based Models | Amir Hossein |
| June 18 | Midterm | |
| June 25 | Normalizing Flow | Mohammadreza |
| July 2 | GAN | Amir Hossein |
| July 9 | Sample Project | |
| July 16 | VAE and Q-VAE | Amir Hossein |
| July 23 | Score-based Diffusion | Mohammadreza |
| July 30 | Midterm | |
| August 6 | DDPM and DDIM | Mohammadreza |
