CSE 5610: Large Language Models (2025 Fall)
Course Overview
This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)
Course Grading
- 10% Project/Survey Proposal
- 10% Mid-term Report
- 10% Final Project Presentation (Group-based)
- 5% Feedbacks for other groups’ final project presentations
- 20% Final Project Report
Paper Presentation
Grading Criteria:
- Well Preparation: Whether the slides are sent over by the given deadline for the instructors to give feedback
- For Tuesday classes, send over your slides before the previous Friday 12:00PM
- For Thursday classes, send over your slides before the previous Monday 12:00PM
- Completeness: Whether the presentation covers the background and major contribution of the listed papers, and is delivered within the required timeframe
- Clarity: Whether the presenter clearly convey the information from their slides
- Q&A: If there are any raised questions from the audiences, whether the presenters can handle the questions properly
Preview Questions Submission
Each student is required to submit a preview question for a paper to be presented one day before every class (except for the class that you will present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and previous methods?"
Final Project (2-3 students per group)
Project Requirement: There are typically two types of projects.
- Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
- Designing a novel algorithm to do inference on large language models (white box models such as Qwen, Llama, and DeepSeek series, or black box models such as GPT, Gemini, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations.
Project Presentation: Date: 12/2 and 12/4. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).
Office Hour
Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.
Teaching Assistants
Zheyuan Wu(w.zheyuan@wustl.edu)
Isle Song(s.xiaodao@wustl.edu)