CSE 4061: Large Language Models (2025 Spring)
Course Overview
This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)
Course Grading
- 15% Class Participation
- Regular class participation and discussion (10%)
- Preview question submissions (5%)
- 30% Paper Presentation
- 55% Final Project
- 10% Project/Survey Proposal
- 10% Mid-term Report
- 10% Final Course Presentation (Group-based)
- 5% Feedbacks for other groups’ final project presentations
- 20% Final Project Report
Paper Presentation
Grading Criteria:
- Well Preparation: Whether the slides are sent over by the given deadline for the instructors to give feedback
- For Tuesday classes, send over your slides before the previous Friday 12:00PM
- For Thursday classes, send over your slides before the previous Monday 12:00PM
- Completeness: Whether the presentation covers the background and major contribution of the listed papers, and is delivered within the required timeframe
- Clarity: Whether the presenter clearly convey the information from their slides
- Q&A: If there are any raised questions from the audiences, whether the presenters can handle the questions properly
Each student is also required to submit a preview question for a paper one day before the presentation for 3 times (need to be on 3 different classes, and not the date that you present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and traditional method in nlp?"
Final Project (2-3 students per group)
Project Requirement: There are typically two types of projects.
- Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
- Designing a novel algorithm to do inference on large language models (white box models such as LLaMA2 models, or black box models such as GPT-4, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations.
Project Presentation: Date: 12/3 and 12/5. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).
Office Hour
Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.
Teaching Assistant
Chengsong Huang(chengsong@wustl.edu)
Syllabus (The dates of the courses are tentative due to guest lectures.)
Date | Topic | Readings | Slides |
Large Language Model Basics |
1/14 | Course Overview | Distributed Representations of Words and Phrases and their Compositionality (Word2Vec) Enriching Word Vectors with Subword Information Attention Is All You Need (Transformer) | Slides |
1/16 | n-gram language models | Language Models are Unsupervised Multitask Learners (GPT-2) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | Slides |
1/21 | text classification, bag of words, TF-IDF | Language Models are Few-Shot Learners (GPT-3) Emergent Abilities of Large Language Models Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers | Slides |
1/23 | part of speech tagging, named entity recognition, syntax and parsing | Multitask Prompted Training Enables Zero-Shot Task Generalization Cross-Task Generalization via Natural Language Crowdsourcing Instructions Self-Instruct: Aligning Language Models with Self-Generated Instructions LIMA: Less Is More for Alignment How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources | Slides |
1/28 | word embeddings (word2vec, glove) | Chain of Thought Prompting Elicits Reasoning in Large Language Models Least-to-Most Prompting Enables Complex Reasoning in Large Language Models Self-Consistency Improves Chain of Thought Reasoning in Language Models Graph of Thoughts: Solving Elaborate Problems with Large Language Models | Slides |
1/30 | topic modeling (traditional, embedding-based) | Large Language Models Can Self-Improve Progressive-Hint Prompting Improves Reasoning in Large Language Models Large Language Models are Better Reasoners with Self-Verification Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | Slides |
2/4 | sequence modeling I (RNN, LSTM) | Teaching models to express their uncertainty in words SLiC-HF: Sequence Likelihood Calibration with Human Feedback Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models Just Ask for Calibration | Slides |
2/6 | sequence modeling II (self-attention, transformers) | Improving Factuality and Reasoning in Language Models through Multiagent Debate How Language Model Hallucinations Can Snowball Trusting Your Evidence: Hallucinate Less with Context-aware Decoding Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation | Slides |
2/11 | transformer architectures, LLM pre-training | Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation REPLUG: Retrieval-Augmented Black-Box Language Models Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | Slides |
2/13 | LLM fine-tuning (instruction-tuning) | Training language models to follow instructions with human feedback Direct Preference Optimization: Your Language Model is Secretly a Reward Model SimPO: Simple Preference Optimization with a Reference-Free Reward Fine-Grained Human Feedback Gives Better Rewards for Language Model Training | Slides |
2/18 | scaling laws, in-context learning | The Power of Scale for Parameter-Efficient Prompt Tuning Parameter-Efficient Transfer Learning for NLP LoRA: Low-Rank Adaptation of Large Language Models DoRA: Weight-Decomposed Low-Rank Adaptation | Slides |
2/20 | chain-of-thought reasoning | Fast Inference from Transformers via Speculative Decoding Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models Adapting Language Models to Compress Contexts | Slides |
2/25 | Application (I): RAG | LongNet: Scaling Transformers to 1B Tokens LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Lost in the Middle: How Language Models Use Long Contexts Memorizing Transformers | Slides |
2/27 | Guest Lecture: "Effective Pretraining and Finetuning: Methods for optimizing your data." by Shayne Longpre (MIT) |
3/4 | Application (II): | Code Llama: Open Foundation Models for Code Planning with Large Language Models for Code Generation Teaching Large Language Models to Self-Debug SelfEvolve: A Code Evolution Framework via Large Language Models | Slides |
| -----Project Mid-Term Report Deadline: 10/21 11:59pm----- | | |
3/6 | Multimodal Language Models | VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Visual Instruction Tuning NExT-GPT: Any-to-Any Multimodal LLM Evaluating Object Hallucination in Large Vision-Language Models | Slides |
| ----Spring Break----- | | |
3/11 | Language Models as Agents | Toolformer: Language Models Can Teach Themselves to Use Tools ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs ART: Automatic multi-step reasoning and tool-use for large language modelsLLM+P: Empowering Large Language Models with Optimal Planning Proficiency
| Slides |
3/13 | Language Models and Knowledge Graphs | GNN-LM: Language Modeling based on Global Contexts via GNN G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense ReasoningHead-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
| Slides |
3/18 | Language Models for Specialized Domains | Don't Stop Pretraining: Adapt Language Models to Domains and Tasks SciBERT: A Pretrained Language Model for Scientific Text Large Language Models Encode Clinical Knowledge Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
| Slides |
Large Language Model Analysis |
3/25 | Evaluation of Language Models | Proving Test Set Contamination in Black Box Language Models Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Large Language Models are not Fair Evaluators Holistic Evaluation of Language Models | Slides |
3/27 | Detection of LLM Generation | DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature GPT-who: An Information Density-based Machine-Generated Text Detector A Watermark for Large Language Models GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content | Slides |
4/1 | Language Model Bias | Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints Whose Opinions Do Language Models Reflect? “Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters Red Teaming Language Models with Language Models | Slides |
4/3 | Language Model Privacy & Security | Multi-step Jailbreaking Privacy Attacks on ChatGPT Jailbreaking Black Box Large Language Models in Twenty Queries Quantifying Memorization Across Neural Language Models Poisoning Language Models During Instruction Tuning | Slides |
4/8 | Guest Lecture: "Breaking the Curse of Multilinguality in Language Models" by Terra Blevins (Incoming Asst. Prof. at Northeastern Univ.) |
4/10 | Future Directions of Large Language Models | Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision A Theory for Emergence of Complex Skills in Language Models When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
| -----Project Presentation Deadline: 4/14 11:59pm----- | | |
4/15 | Final Project Presentation | | |
4/17 | Final Project Presentation | | |
| -----Project Final Report Deadline: 5/2 11:59pm----- | | |