CSE 561A: Large Language Models
Course Overview
This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)
Canvas: https://wustl.instructure.com/courses/129974 Piazza: https://piazza.com/class/lsf6np5hzai4rs
Course Grading
- 15% Class Participation
- Regular class participation and discussion (10%)
- Preview question submissions (5%)
- 30% Class Presentation
- 55% Final Project
- 10% Project Proposal
- 10% Mid-term Report
- 10% Final Course Presentation (Group-based)
- 5% Feedbacks for other groups’ final project presentations
- 20% Final Project Report
Class Presentation
Grading Criteria:
- Good Preparation: Whether the slides are sent over by the given deadline for the instructors to give feedback
- For Tuesday classes, send over your slides before the previous Friday 12:00PM
- For Thursday classes, send over your slides before the previous Monday 12:00PM
- Completeness: Whether the presentation covers the background and major contribution of the listed papers, and is delivered within the required timeframe
- Clarity: Whether the presenter clearly convey the information from their slides
- Q&A: If there are any raised questions from the audiences, whether the presenters can handle the questions properly
Each student is also required to submit a preview question for a paper one day before the presentation for 3 times (need to be on 3 different classes, and not the date that you present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and traditional method in nlp?"
Final Project (2-3 students per group)
Project Requirement: There are typically two types of projects.
- Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
- https://huggingface.co/models
- Designing a novel algorithm to do inference on large language models (white box models such as LLaMA2 models, or black box models such as GPT-4, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations. (We may not be able to reimburse for the API costs, so you can choose to use free APIs such as CLAUDE)
- https://platform.openai.com/docs/introduction
- https://docs.anthropic.com/claude/reference/getting-started-with-the-api
Project Presentation: Date: 4/18, 4/23, 4/25. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).
Office Hour
Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.