Lecture Videos
Lectures
Here, you can find the recordings of the lecture videos.
-
Lecture 1: Language Modeling
LMs - Part 1: We start with LMs and understand how we can feed a text into it by doing the so-called "Tokenization" and "Embedding". We build simple language model called Bi-Gram and understand its limitations. This motivates us to build a context-aware LM. We look into the basic context-aware LM design via RNNs. Unfortunately, the audio recording is not good due to the issue with the recording device.
[link]
Lecture Notes:
Further Reads:
- Tokenization: Chapter 2 of [JM]
- Embedding: Chapter 6 of [JM]
- Original BPE Algorithm: Original BPE Algorithm proposed by Philip Gage in 1994
- BPE for Tokenization: Paper Neural machine translation of rare words with subword units by Rico Sennrich, Barry Haddow, and Alexandra Birch presented in ACL 2016 that adapted BPE for NLP
- LMs: Chapter 12 of [BB] Section 12.2
- N-Gram LMs: Chapter 3 of Speech and Language Processing; Section 3.1 on N-gram LM
- Maximum Likelihood: Chapter 2 of [BB] Section 2.3
- Recurrent LMs: Chapter 8 of [JM]
- LSTM LMs: Paper Regularizing and Optimizing LSTM Language Models by Stephen Merity, Nitish Shirish Keskar, and Richard Socher published in ICLR 2018 enabling LSTMs to perform strongly on word-level language modeling
- High-Rank Recurrent LMs: Paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen presented at ICLR 2018 proposing Mixture of Softmaxes (MoS) and achieving state-of-the-art results at the time
-
Lecture 2: Transformer-based Language Models
Transformer LMs: In this lecture we use self-attention mechanism to extract context from a token sequence. Introduction to self-attention is given through the lecture. We then used this mechanism to build a LM architecture which is used nowadays in practice.
[link]
Lecture Notes:
- Chapter 1 - Section 2 Further Reads:
- Transformer Paper: Paper Attention Is All You Need! published in 2017 that made a great turn in sequence processing
- Transformers: Chapter 9 of [JM]
- Transformers: Chapter 12 of [BB] Section 12.1
- LLMs via Transformers: Chapter 10 of [JM]
-
Lecture 3: Large Language Models
LLMs: We study LLMs which are Large LMs trained on large corpora. We see how they can be evaluated, fine-tuned, and deployed via prompt design.
[link]
Lecture Notes:
Further Reads:
GPT Papers:
- GPT-1: Paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al. (OpenAI, 2018) that introduced GPT-1 and revived the idea of pretraining transformers as LMs followed by supervised fine-tuning
- GPT-2: Paper Language Models are Unsupervised Multitask Learners by Alec Radford et al. (OpenAI, 2019) that introduces GPT-2 with 1.5B parameter trained on web text
- GPT-3: Paper Language Models are Few-Shot Learners by Tom B. Brown et al. (OpenAI, 2020) that introduces GPT-3, a 175B-parameter transformer LM
- GPT-4: GPT-4 Technical Report by OpenAI (2023) that provides an overview of GPT-4’s capabilities
Data for LLMs:
- The Pile: Paper The Pile: An 800GB Dataset of Diverse Text for Language Modeling by Leo Gao et al. presented in 2020 introductin dataset The Pile
- Documentation Debt: Paper Addressing “Documentation Debt” in Machine Learning Research: A Retrospective Datasheet for BookCorpus by Jack Bandy and Nicholas Vincent published in 2021 discussing the efficiency and legality of data collection by looking into BookCorpus
Fine-tuning:
- SSL: Paper Semi-supervised Sequence Learning by Andrew M. Dai et al. published in 2015 that explores using unsupervised pretraining followed by supervised fine-tuning; this was an early solid work advocating pre-training idea for LMs
- LoRA: Paper LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu et al. presented at ICLR in 2022 introducing LoRA
Prompt Design:
- Prefix-Tuning: Paper Prefix-Tuning: Optimizing Continuous Prompts for Generation by Xiang Lisa Li et al. presented at ACL in 2021 proposing prefix-tuning approach for prompting
- Prompt-Tuning: Paper The Power of Scale for Parameter-Efficient Prompt Tuning by B. Lester et al. presented at EMNLP in 2021 proposing the prompt tuning idea, i.e., learning to prompt
- Zero-Shot LLMs: Paper Large Language Models are Zero-Shot Reasoners by T. Kojima et al. presented at NeurIPS in 2022 studying zero-shot learning with LLMs
Review Lectures
Here, you can find review lectures on some key deep learning topics. It is strongly suggested that you watch these videos to recap those key concepts, as they are frequently used in the course.
