CSE 5610: Large Language Models (2025 Fall)

Course Overview

This is an advanced research-oriented course that teaches and discusses frontier papers of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues.
Please be aware that this is a fast-paced, research-driven course, not an introductory course to LLMs. The curriculum is tailored for advanced students (PhD candidates) doing state-of-the-art LLM-related research. Students without a strong machine learning research background will find the pace and technical depth of the material exceptionally challenging.

Course Grading

  • 20% Preview question submissions
  • 25% Paper Presentation
  • 55% Final Project
    • 10% Project/Survey Proposal
    • 10% Mid-term Report
    • 10% Final Project Presentation (Group-based)
    • 5% Feedbacks for other groups’ final project presentations
    • 20% Final Project Report
  • Paper Presentation

    Grading Criteria:

    Preview Questions Submission

    Each student is required to submit a preview question for a paper to be presented one day before every class (except for the class that you will present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and previous methods?"

    Final Project (2-3 students per group)

    Project Requirement: There are typically two types of projects.

    1. Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
    2. Designing a novel algorithm to do inference on large language models (white box models such as Qwen, Llama, and DeepSeek series, or black box models such as GPT, Gemini, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations.

    Project Presentation: Date: 12/2 and 12/4. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).

    Office Hour

    Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.

    Teaching Assistants

    Zheyuan Wu(w.zheyuan@wustl.edu)

    Isle Song(s.xiaodao@wustl.edu)

    Syllabus (The dates of the courses are tentative due to guest lectures.)

    DateTopicReadingsSlides
    Large Language Model Basics
    8/26Course OverviewDistributed Representations of Words and Phrases and their Compositionality (Word2Vec)
    Enriching Word Vectors with Subword Information
    Attention Is All You Need (Transformer)
    Slides
    8/28Language Model Pre-trainingLanguage Models are Unsupervised Multitask Learners (GPT-2)
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
    Slides
    9/2Scaling Laws and Emergent BehaviorsLanguage Models are Few-Shot Learners (GPT-3)
    Emergent Abilities of Large Language Models
    Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
    Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
    Slides
    9/4Post-training (I): Instruction TuningMultitask Prompted Training Enables Zero-Shot Task Generalization
    Cross-Task Generalization via Natural Language Crowdsourcing Instructions
    Self-Instruct: Aligning Language Models with Self-Generated Instructions
    LIMA: Less Is More for Alignment
    How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
    Slides
    -----Student Presentation Starts-----
    State-of-the-Art Reasoning and Post-training
    9/9Language Model Reasoning (I): Chain of Thoughts + Inference-Time ScalingChain of Thought Prompting Elicits Reasoning in Large Language Models
    Self-Consistency Improves Chain of Thought Reasoning in Language Models
    Self-Refine: Iterative Refinement with Self-Feedback
    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
    9/11Language Model Reasoning (II): Thinking in Latent SpaceTraining Large Language Models to Reason in a Continuous Latent Space
    Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
    Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
    LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
    -----Proposal Deadline: 9/15/2025-----
    9/16Guest Lecture by Weijia Shi (University of Washington)
    9/18Post-training (II): Reinforcement Learning from Human FeedbackTraining language models to follow instructions with human feedback
    Direct Preference Optimization: Your Language Model is Secretly a Reward Model
    SimPO: Simple Preference Optimization with a Reference-Free Reward
    Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
    9/23Post-training (III): Reinforcement Learning from Verified RewardsDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
    DAPO: An Open-Source LLM Reinforcement Learning System at Scale
    Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
    SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
    Efficient Methods for Large Language Models
    9/25Efficient Fine-TuningThe Power of Scale for Parameter-Efficient Prompt Tuning
    Parameter-Efficient Transfer Learning for NLP
    LoRA: Low-Rank Adaptation of Large Language Models
    Text-to-LoRA: Instant Transformer Adaption
    9/30Efficient RLVR (Data & Computation)Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
    Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
    Spurious Rewards: Rethinking Training Signals in RLVR
    R-Zero: Self-Evolving Reasoning LLM from Zero Data
    10/2Efficient InferenceFast Inference from Transformers via Speculative Decoding
    Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
    EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
    Learning Harmonized Representations for Speculative Sampling
    10/7----Fall Break-----
    10/9No class (TBD)
    10/14Long-Context Language ModelsLost in the Middle: How Language Models Use Long Contexts
    RoFormer: Enhanced Transformer with Rotary Position Embedding
    LongNet: Scaling Transformers to 1B Tokens
    RULER: What's the Real Context Size of Your Long-Context Language Models?
    Large Language Model Factuality
    10/16LLM Hallucination and SolutionsHow Language Model Hallucinations Can Snowball
    Improving Factuality and Reasoning in Language Models through Multiagent Debate
    Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
    Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation
    -----Mid-Term Report Deadline: 10/20-----
    10/21Language Model CalibrationJust Ask for Calibration
    Teaching Models to Express Their Uncertainty in Words
    Taming Overconfidence in LLMs: Reward Calibration in RLHF
    Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
    Large Language Model Applications
    10/23Retrieval-Augmented GenerationRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation
    REPLUG: Retrieval-Augmented Black-Box Language Models
    Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
    10/28Language Models as AgentsToolformer: Language Models Can Teach Themselves to Use Tools
    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
    ART: Automatic multi-step reasoning and tool-use for large language modelsA-MEM: Agentic Memory for LLM Agents
    10/30Agentic RAGAdaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
    Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models
    Search-o1: Agentic Search-Enhanced Large Reasoning Models
    Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
    11/4Multi-modal LLMsLearning Transferable Visual Models From Natural Language Supervision
    Visual Instruction Tuning
    NExT-GPT: Any-to-Any Multimodal LLM
    Evaluating Object Hallucination in Large Vision-Language Models
    Large Language Model Evaluation
    11/6Evaluation of Language ModelsProving Test Set Contamination in Black Box Language Models
    Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
    Large Language Models are not Fair Evaluators
    Holistic Evaluation of Language Models
    11/11Detection of LLM GenerationDetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
    GPT-who: An Information Density-based Machine-Generated Text Detector
    A Watermark for Large Language Models
    GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
    Other Topics
    11/13Revisiting Other Language Model ArchitecturesMixtral of Experts
    Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    RWKV: Reinventing RNNs for the Transformer Era
    Hierarchical Reasoning Model
    11/18Language Model BiasMen Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
    Whose Opinions Do Language Models Reflect?
    “Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters
    Red Teaming Language Models with Language Models
    11/20Language Model SafetyMulti-step Jailbreaking Privacy Attacks on ChatGPT
    Jailbreaking Black Box Large Language Models in Twenty Queries
    Quantifying Memorization Across Neural Language Models
    Poisoning Language Models During Instruction Tuning
    11/25Guest Lecture by Bowen Jin (University of Illinois at Urbana-Champaign)
    11/27----No Class-----
    -----Project Presentation Deadline: 12/1-----
    12/2Final Project Presentation I
    12/4Final Project Presentation II
    -----Project Final Report Deadline: 12/12-----