CSE 5610: Large Language Models (2025 Fall)

Course Overview

This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)

Course Grading

  • 15% Preview question submissions
  • 30% Paper Presentation
  • 55% Final Project
    • 10% Project/Survey Proposal
    • 10% Mid-term Report
    • 10% Final Project Presentation (Group-based)
    • 5% Feedbacks for other groups’ final project presentations
    • 20% Final Project Report
  • Paper Presentation

    Grading Criteria:

    Preview Questions Submission

    Each student is required to submit a preview question for a paper to be presented one day before every class (except for the class that you will present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and previous methods?"

    Final Project (2-3 students per group)

    Project Requirement: There are typically two types of projects.

    1. Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
    2. Designing a novel algorithm to do inference on large language models (white box models such as Qwen, Llama, and DeepSeek series, or black box models such as GPT, Gemini, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations.

    Project Presentation: Date: 12/2 and 12/4. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).

    Office Hour

    Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.

    Teaching Assistants

    Zheyuan Wu(w.zheyuan@wustl.edu)

    Isle Song(s.xiaodao@wustl.edu)

    Syllabus (The dates of the courses are tentative due to guest lectures.)

    DateTopicReadingsSlides
    Large Language Model Basics
    8/26Course OverviewDistributed Representations of Words and Phrases and their Compositionality (Word2Vec)
    Enriching Word Vectors with Subword Information
    Attention Is All You Need (Transformer)
    8/28Language Model Pre-trainingLanguage Models are Unsupervised Multitask Learners (GPT-2)
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
    9/2Scaling Laws and Emergent BehaviorsLanguage Models are Few-Shot Learners (GPT-3)
    Emergent Abilities of Large Language Models
    Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
    Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
    9/4Post-training (I): Instruction TuningMultitask Prompted Training Enables Zero-Shot Task Generalization
    Cross-Task Generalization via Natural Language Crowdsourcing Instructions
    Self-Instruct: Aligning Language Models with Self-Generated Instructions
    LIMA: Less Is More for Alignment
    How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
    -----Student Presentation Starts-----
    State-of-the-Art Reasoning and Post-training
    9/9Language Model Reasoning (I): Chain of Thoughts + Inference-Time ScalingChain of Thought Prompting Elicits Reasoning in Large Language Models
    Self-Consistency Improves Chain of Thought Reasoning in Language Models
    Large Language Models Can Self-Improve
    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
    9/11Language Model Reasoning (II): Thinking in Latent SpaceTraining Large Language Models to Reason in a Continuous Latent Space
    Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
    Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
    LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
    -----Proposal Deadline: 9/15/2025-----
    9/16Post-training (II): Reinforcement Learning from Human FeedbackTraining language models to follow instructions with human feedback
    Direct Preference Optimization: Your Language Model is Secretly a Reward Model
    SimPO: Simple Preference Optimization with a Reference-Free Reward
    Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
    9/18Post-training (III): Reinforcement Learning from Verified RewardsDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
    DAPO: An Open-Source LLM Reinforcement Learning System at Scale
    Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
    SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
    Efficient Methods for Large Language Models
    9/23Efficient Fine-TuningThe Power of Scale for Parameter-Efficient Prompt Tuning
    Parameter-Efficient Transfer Learning for NLP
    LoRA: Low-Rank Adaptation of Large Language Models
    DoRA: Weight-Decomposed Low-Rank Adaptation
    9/25Efficient RLVR (Data & Computation)Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
    Spurious Rewards: Rethinking Training Signals in RLVR
    The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
    R-Zero: Self-Evolving Reasoning LLM from Zero Data
    9/30Efficient InferenceFast Inference from Transformers via Speculative Decoding
    Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
    Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
    Adapting Language Models to Compress Contexts
    10/2Long-Context Language ModelsLongNet: Scaling Transformers to 1B Tokens
    LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
    Lost in the Middle: How Language Models Use Long Contexts
    Memorizing Transformers
    10/7----Fall Break-----
    10/9No class (TBD)
    Large Language Model Factuality
    10/14LLM Hallucination and SolutionsImproving Factuality and Reasoning in Language Models through Multiagent Debate
    How Language Model Hallucinations Can Snowball
    Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
    Hallucination Detection for Generative Large Language Models by Bayesian Sequential Estimation
    10/16Language Model CalibrationTeaching models to express their uncertainty in words
    SLiC-HF: Sequence Likelihood Calibration with Human Feedback
    Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
    Just Ask for Calibration
    -----Mid-Term Report Deadline: 10/20-----
    10/21Retrieval Augmentation GenerationRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation
    REPLUG: Retrieval-Augmented Black-Box Language Models
    Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
    Large Language Model Applications
    10/23Language Models as AgentsToolformer: Language Models Can Teach Themselves to Use Tools
    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
    ART: Automatic multi-step reasoning and tool-use for large language modelsLLM+P: Empowering Large Language Models with Optimal Planning Proficiency
    10/28Agentic RAGAdaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
    Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models
    Search-o1: Agentic Search-Enhanced Large Reasoning Models
    Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
    10/30Multi-modal LLMsVisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
    Visual Instruction Tuning
    NExT-GPT: Any-to-Any Multimodal LLM
    Evaluating Object Hallucination in Large Vision-Language Models
    Large Language Model Evaluation
    11/4Evaluation of Language ModelsProving Test Set Contamination in Black Box Language Models
    Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
    Large Language Models are not Fair Evaluators
    Holistic Evaluation of Language Models
    11/6Detection of LLM GenerationDetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
    GPT-who: An Information Density-based Machine-Generated Text Detector
    A Watermark for Large Language Models
    GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
    Other Topics
    11/11Revisiting Other Language Model ArchitecturesTransformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    RWKV: Reinventing RNNs for the Transformer Era
    Hierarchical Reasoning Model
    DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
    11/13Language Model BiasMen Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
    Whose Opinions Do Language Models Reflect?
    “Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters
    Red Teaming Language Models with Language Models
    11/18Language Model SafetyMulti-step Jailbreaking Privacy Attacks on ChatGPT
    Jailbreaking Black Box Large Language Models in Twenty Queries
    Quantifying Memorization Across Neural Language Models
    Poisoning Language Models During Instruction Tuning
    11/20Guest Lecture
    11/25Guest Lecture
    11/27----No Class-----
    -----Project Presentation Deadline: 12/1-----
    12/2Final Project Presentation I
    12/4Final Project Presentation II
    -----Project Final Report Deadline: 12/12-----