CSE 561A: Large Language Models (2024 Fall)
Course Overview
This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)
Course Grading
- 15% Class Participation
- Regular class participation and discussion (10%)
- Preview question submissions (5%)
- 30% Paper Presentation
- 55% Final Project
- 10% Project/Survey Proposal
- 10% Mid-term Report
- 10% Final Course Presentation (Group-based)
- 5% Feedbacks for other groups’ final project presentations
- 20% Final Project Report
Paper Presentation
Grading Criteria:
- Well Preparation: Whether the slides are sent over by the given deadline for the instructors to give feedback
- For Tuesday classes, send over your slides before the previous Friday 12:00PM
- For Thursday classes, send over your slides before the previous Monday 12:00PM
- Completeness: Whether the presentation covers the background and major contribution of the listed papers, and is delivered within the required timeframe
- Clarity: Whether the presenter clearly convey the information from their slides
- Q&A: If there are any raised questions from the audiences, whether the presenters can handle the questions properly
Each student is also required to submit a preview question for a paper one day before the presentation for 3 times (need to be on 3 different classes, and not the date that you present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and traditional method in nlp?"
Final Project (2-3 students per group)
Project Requirement: There are typically two types of projects.
- Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
- Designing a novel algorithm to do inference on large language models (white box models such as LLaMA2 models, or black box models such as GPT-4, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations.
Project Presentation: Date: 12/3 and 12/5. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).
Office Hour
Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.
Teaching Assistant
Chengsong Huang(chengsong@wustl.edu)