CSE 561A: Large Language Models

Course Overview

This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)
Canvas: https://wustl.instructure.com/courses/129974 Piazza: https://piazza.com/class/lsf6np5hzai4rs

Course Grading

Class Presentation

Grading Criteria:

Each student is also required to submit a preview question for a paper one day before the presentation for 3 times (need to be on 3 different classes, and not the date that you present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and traditional method in nlp?"

Final Project (2-3 students per group)

Project Requirement: There are typically two types of projects.

  1. Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
  2. https://huggingface.co/models
  3. Designing a novel algorithm to do inference on large language models (white box models such as LLaMA2 models, or black box models such as GPT-4, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations. (We may not be able to reimburse for the API costs, so you can choose to use free APIs such as CLAUDE)
  4. https://platform.openai.com/docs/introduction
  5. https://docs.anthropic.com/claude/reference/getting-started-with-the-api

Project Presentation: Date: 4/18, 4/23, 4/25. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).

Office Hour

Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.

Syllabus (The dates of the courses are tentative due to guest lectures.)

DateTopicReadingsSlides
1/16Course OverviewDistributed Representations of Words and Phrases and their Compositionality (Word2Vec)
Enriching Word Vectors with Subword Information
Attention Is All You Need (Transformer)
Slides
1/18Language Model ArchitecturesLanguage Models are Unsupervised Multitask Learners (GPT-2)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Slides
1/23Large Language Model Training (I)Multitask Prompted Training Enables Zero-Shot Task Generalization
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
LIMA: Less Is More for Alignment
Self-Instruct: Aligning Language Models with Self-Generated Instructions
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Slides
1/25Large Language Model Training (II)Training language models to follow instructions with human feedback
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Slides
1/30Parameter Efficient Fine-TuningThe Power of Scale for Parameter-Efficient Prompt Tuning
Prefix-Tuning: Optimizing Continuous Prompts for Generation
LoRA: Low-Rank Adaptation of Large Language Models
QLoRA: Efficient Finetuning of Quantized LLMs
Slides
-----Student Presentation Starts-----
2/1Prompting and In-context LearningLanguage Models are Few-Shot Learners (GPT-3)
Emergent Abilities of Large Language Models
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
Slides_1
Slides_2
2/6Language Model Reasoning (I)Chain of Thought Prompting Elicits Reasoning in Large Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Slides_1
Slides_2
2/8Language Model Reasoning (II)STaR: Bootstrapping Reasoning With Reasoning
Large Language Models Can Self-Improve
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
Slides_1
Slides_2
2/13Language Model Calibration/UncertaintyHow Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering
Teaching models to express their uncertainty in words
Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Slides_1
Slides_2
2/15Retrieval Augmentation and Parametric KnowledgeGeneralization through Memorization: Nearest Neighbor Language Models
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Language Models as Knowledge Bases?
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Slides_1
Slides_2
-----Project Proposal Deadline: 2/19 11:59pm-----
2/20DecodingContrastive Decoding: Open-ended Text Generation as Optimization
Don't throw away your value model! Making PPO even better via Value-Guided Monte-Carlo Tree Search decoding
Slides
2/22Course Cancelled Due to Business Travel
2/27Code Language ModelsInCoder: A Generative Model for Code Infilling and Synthesis
Code Llama: Open Foundation Models for Code
Teaching Large Language Models to Self-Debug
LEVER: Learning to Verify Language-to-Code Generation with Execution
Slides_1
Slides_2
2/29Multimodal Language ModelsFlamingo: a Visual Language Model for Few-Shot Learning
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Visual Instruction Tuning
NExT-GPT: Any-to-Any Multimodal LLM
Slides_1
Slides_2
3/5Language Models as AgentsReAct: Synergizing Reasoning and Acting in Language Models
Toolformer: Language Models Can Teach Themselves to Use Tools
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
ART: Automatic multi-step reasoning and tool-use for large language models
Slides_1
Slides_2
3/7Evaluation of Language ModelsProving Test Set Contamination in Black Box Language Models
Holistic Evaluation of Language Models
Slides
Spring Break
-----Project Mid-Term Report Deadline: 3/18 11:59pm-----
3/19Long-Context Language ModelsLost in the Middle: How Language Models Use Long Contexts
Longformer: The Long-Document Transformer
LongNet: Scaling Transformers to 1B Tokens
Memorizing Transformers
Slides_1
Slides_2
3/21Dynamic Architecture and KnowledgeDepth-Adaptive Transformer
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
How is ChatGPT's behavior changing over time?
Time is Encoded in the Weights of Finetuned Language Models
Slides_1
Slides_2
3/26Language Model BiasMen Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Whose Opinions Do Language Models Reflect?
Red Teaming Language Models with Language Models
Slides_1
Slides_2
3/28Guest Lecture: Chunyuan Li (Microsoft Research)
4/2PrivacyExtracting Training Data from Large Language Models
Large Language Models Can Be Strong Differentially Private Learners
Quantifying Memorization Across Neural Language Models
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Slides_1
Slides_2
4/4SecurityUniversal and Transferable Adversarial Attacks on Aligned Language Models
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Poisoning Language Models During Instruction Tuning
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Slides_1
Slides_2
4/9Guest Lecture: Sarah Wiegreffe (Allen Institute for AI)
4/11Guest Lecture: Akari Asai (University of Washington)
4/16Explorations of Large Language ModelsWeak-To-Strong Generalization:Eliciting Strong Capabilities With Weak Supervision
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
PaLM-E: An Embodied Multimodal Language Model
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities
Slides_1
Slides_2
-----Project Presentation Deadline: 4/17 11:59pm-----
4/18Final Project Presentation I
4/23Final Project Presentation II
4/25Final Project Presentation III
-----Project Final Report Deadline: 5/3 11:59pm-----