CSE 561A: Large Language Models (2024 Fall)

Course Overview

This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)

Course Grading

Paper Presentation

Grading Criteria:

Each student is also required to submit a preview question for a paper one day before the presentation for 3 times (need to be on 3 different classes, and not the date that you present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and traditional method in nlp?"

Final Project (2-3 students per group)

Project Requirement: There are typically two types of projects.

  1. Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
  2. Designing a novel algorithm to do inference on large language models (white box models such as LLaMA2 models, or black box models such as GPT-4, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations.

Project Presentation: Date: 12/3 and 12/5. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).

Office Hour

Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.

Teaching Assistant

Chengsong Huang(chengsong@wustl.edu)

Syllabus (The dates of the courses are tentative due to guest lectures.)

DateTopicReadingsSlides
Large Language Model Basics
8/27Course OverviewDistributed Representations of Words and Phrases and their Compositionality (Word2Vec)
Enriching Word Vectors with Subword Information
Attention Is All You Need (Transformer)
Slides
8/29Language Model ArchitecturesLanguage Models are Unsupervised Multitask Learners (GPT-2)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Slides
9/3Prompting and In-Context LearningLanguage Models are Few-Shot Learners (GPT-3)
Emergent Abilities of Large Language Models
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
Slides
9/5Language Model Instruction TuningMultitask Prompted Training Enables Zero-Shot Task Generalization
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Self-Instruct: Aligning Language Models with Self-Generated Instructions
LIMA: Less Is More for Alignment
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Slides
-----Student Presentation Starts-----
Large Language Model Capabilities
9/10Language Model Reasoning (I)Chain of Thought Prompting Elicits Reasoning in Large Language Models
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Slides
9/12Language Model Reasoning (II)Large Language Models Can Self-Improve
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Large Language Models are Better Reasoners with Self-Verification
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Slides
-----Project Proposal Deadline: 9/16 11:59pm-----
9/17Language Model CalibrationTeaching models to express their uncertainty in words
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
Just Ask for Calibration
Slides
9/19LLM Hallucination and SolutionsImproving Factuality and Reasoning in Language Models through Multiagent Debate
How Language Model Hallucinations Can Snowball
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation
Slides
9/24Retrieval Augmentation GenerationRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation
REPLUG: Retrieval-Augmented Black-Box Language Models
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Slides
9/26Reinforcement Learning with Human FeedbackTraining language models to follow instructions with human feedback
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
SimPO: Simple Preference Optimization with a Reference-Free Reward
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Slides
Advanced Methods for Large Language Models
10/1Efficient Fine-TuningThe Power of Scale for Parameter-Efficient Prompt Tuning
Parameter-Efficient Transfer Learning for NLP
LoRA: Low-Rank Adaptation of Large Language Models
DoRA: Weight-Decomposed Low-Rank Adaptation
Slides
10/3Efficient InferenceFast Inference from Transformers via Speculative Decoding
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
Adapting Language Models to Compress Contexts
Slides
10/8----Fall Break-----
10/10Long-Context Language ModelsLongNet: Scaling Transformers to 1B Tokens
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Lost in the Middle: How Language Models Use Long Contexts
Memorizing Transformers
Slides
10/15Guest Lecture: "Effective Pretraining and Finetuning: Methods for optimizing your data." by Shayne Longpre (MIT)
Large Language Model Applications
10/17Code Language ModelsCode Llama: Open Foundation Models for Code
Planning with Large Language Models for Code Generation
Teaching Large Language Models to Self-Debug
SelfEvolve: A Code Evolution Framework via Large Language Models
Slides
-----Project Mid-Term Report Deadline: 10/21 11:59pm-----
10/22Multimodal Language ModelsVisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Visual Instruction Tuning
NExT-GPT: Any-to-Any Multimodal LLM
Evaluating Object Hallucination in Large Vision-Language Models
Slides
10/24Language Models as AgentsToolformer: Language Models Can Teach Themselves to Use Tools
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
ART: Automatic multi-step reasoning and tool-use for large language modelsLLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Slides
10/29Language Models and Knowledge GraphsGNN-LM: Language Modeling based on Global Contexts via GNN
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense ReasoningHead-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
Slides
10/31Language Models for Specialized DomainsDon't Stop Pretraining: Adapt Language Models to Domains and Tasks
SciBERT: A Pretrained Language Model for Scientific Text
Large Language Models Encode Clinical Knowledge
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Slides
Large Language Model Analysis
11/5Evaluation of Language ModelsProving Test Set Contamination in Black Box Language Models
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Large Language Models are not Fair Evaluators
Holistic Evaluation of Language Models
Slides
11/7Detection of LLM GenerationDetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
GPT-who: An Information Density-based Machine-Generated Text Detector
A Watermark for Large Language Models
GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
Slides
11/12Language Model BiasMen Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Whose Opinions Do Language Models Reflect?
“Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters
Red Teaming Language Models with Language Models
Slides
11/14Language Model Privacy & SecurityMulti-step Jailbreaking Privacy Attacks on ChatGPT
Jailbreaking Black Box Large Language Models in Twenty Queries
Quantifying Memorization Across Neural Language Models
Poisoning Language Models During Instruction Tuning
Slides
11/19Guest Lecture: "Breaking the Curse of Multilinguality in Language Models" by Terra Blevins (Incoming Asst. Prof. at Northeastern Univ.)
11/21----No Class-----
11/26----No Class-----
-----Project Presentation Deadline: 12/2 11:59pm-----
12/3Final Project Presentation I
12/5Final Project Presentation II
-----Project Final Report Deadline: 12/13 11:59pm-----