CSE 4061: Large Language Models (2025 Spring)

Course Overview

This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)

Course Grading

Paper Presentation

Grading Criteria:

Each student is also required to submit a preview question for a paper one day before the presentation for 3 times (need to be on 3 different classes, and not the date that you present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and traditional method in nlp?"

Final Project (2-3 students per group)

Project Requirement: There are typically two types of projects.

  1. Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
  2. Designing a novel algorithm to do inference on large language models (white box models such as LLaMA2 models, or black box models such as GPT-4, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations.

Project Presentation: Date: 12/3 and 12/5. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).

Office Hour

Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.

Teaching Assistant

Chengsong Huang(chengsong@wustl.edu)

Syllabus (The dates of the courses are tentative due to guest lectures.)

DateTopicReadingsSlides
Large Language Model Basics
1/14Course OverviewDistributed Representations of Words and Phrases and their Compositionality (Word2Vec)
Enriching Word Vectors with Subword Information
Attention Is All You Need (Transformer)
Slides
1/16n-gram language modelsLanguage Models are Unsupervised Multitask Learners (GPT-2)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Slides
1/21text classification, bag of words, TF-IDFLanguage Models are Few-Shot Learners (GPT-3)
Emergent Abilities of Large Language Models
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
Slides
1/23part of speech tagging, named entity recognition, syntax and parsingMultitask Prompted Training Enables Zero-Shot Task Generalization
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Self-Instruct: Aligning Language Models with Self-Generated Instructions
LIMA: Less Is More for Alignment
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Slides
1/28word embeddings (word2vec, glove)Chain of Thought Prompting Elicits Reasoning in Large Language Models
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Slides
1/30topic modeling (traditional, embedding-based)Large Language Models Can Self-Improve
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Large Language Models are Better Reasoners with Self-Verification
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Slides
2/4sequence modeling I (RNN, LSTM)Teaching models to express their uncertainty in words
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
Just Ask for Calibration
Slides
2/6sequence modeling II (self-attention, transformers)Improving Factuality and Reasoning in Language Models through Multiagent Debate
How Language Model Hallucinations Can Snowball
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation
Slides
2/11transformer architectures, LLM pre-trainingRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation
REPLUG: Retrieval-Augmented Black-Box Language Models
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Slides
2/13LLM fine-tuning (instruction-tuning)Training language models to follow instructions with human feedback
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
SimPO: Simple Preference Optimization with a Reference-Free Reward
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Slides
2/18scaling laws, in-context learningThe Power of Scale for Parameter-Efficient Prompt Tuning
Parameter-Efficient Transfer Learning for NLP
LoRA: Low-Rank Adaptation of Large Language Models
DoRA: Weight-Decomposed Low-Rank Adaptation
Slides
2/20chain-of-thought reasoningFast Inference from Transformers via Speculative Decoding
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
Adapting Language Models to Compress Contexts
Slides
2/25Application (I): RAGLongNet: Scaling Transformers to 1B Tokens
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Lost in the Middle: How Language Models Use Long Contexts
Memorizing Transformers
Slides
2/27Guest Lecture: "Effective Pretraining and Finetuning: Methods for optimizing your data." by Shayne Longpre (MIT)
3/4Application (II): Code Llama: Open Foundation Models for Code
Planning with Large Language Models for Code Generation
Teaching Large Language Models to Self-Debug
SelfEvolve: A Code Evolution Framework via Large Language Models
Slides
-----Project Mid-Term Report Deadline: 10/21 11:59pm-----
3/6Multimodal Language ModelsVisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Visual Instruction Tuning
NExT-GPT: Any-to-Any Multimodal LLM
Evaluating Object Hallucination in Large Vision-Language Models
Slides
----Spring Break-----
3/11Language Models as AgentsToolformer: Language Models Can Teach Themselves to Use Tools
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
ART: Automatic multi-step reasoning and tool-use for large language modelsLLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Slides
3/13Language Models and Knowledge GraphsGNN-LM: Language Modeling based on Global Contexts via GNN
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense ReasoningHead-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
Slides
3/18Language Models for Specialized DomainsDon't Stop Pretraining: Adapt Language Models to Domains and Tasks
SciBERT: A Pretrained Language Model for Scientific Text
Large Language Models Encode Clinical Knowledge
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Slides
Large Language Model Analysis
3/25Evaluation of Language ModelsProving Test Set Contamination in Black Box Language Models
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Large Language Models are not Fair Evaluators
Holistic Evaluation of Language Models
Slides
3/27Detection of LLM GenerationDetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
GPT-who: An Information Density-based Machine-Generated Text Detector
A Watermark for Large Language Models
GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
Slides
4/1Language Model BiasMen Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Whose Opinions Do Language Models Reflect?
“Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters
Red Teaming Language Models with Language Models
Slides
4/3Language Model Privacy & SecurityMulti-step Jailbreaking Privacy Attacks on ChatGPT
Jailbreaking Black Box Large Language Models in Twenty Queries
Quantifying Memorization Across Neural Language Models
Poisoning Language Models During Instruction Tuning
Slides
4/8Guest Lecture: "Breaking the Curse of Multilinguality in Language Models" by Terra Blevins (Incoming Asst. Prof. at Northeastern Univ.)
4/10Future Directions of Large Language ModelsWeak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
A Theory for Emergence of Complex Skills in Language Models
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
-----Project Presentation Deadline: 4/14 11:59pm-----
4/15Final Project Presentation
4/17Final Project Presentation
-----Project Final Report Deadline: 5/2 11:59pm-----