CSE 561A: Large Language Models

Course Overview

This is an advanced research-oriented course that teaches fundamentals of Large Language Models (language model architecture and training framework) as well as Large Language Model capabilities, applications and issues. We will be teaching and discussing state-of-the-art papers about large language models.
Pre-requisites: Students are expected to understand concepts in machine learning (CSE 417T/517A)
Canvas: https://wustl.instructure.com/courses/129974 Piazza: https://piazza.com/class/lsf6np5hzai4rs

Course Grading

15% Class Participation
- Regular class participation and discussion (10%)
- Preview question submissions (5%)
30% Class Presentation
55% Final Project
- 10% Project Proposal
- 10% Mid-term Report
- 10% Final Course Presentation (Group-based)
- 5% Feedbacks for other groups’ final project presentations
- 20% Final Project Report

Class Presentation

Grading Criteria:

Good Preparation: Whether the slides are sent over by the given deadline for the instructors to give feedback
- For Tuesday classes, send over your slides before the previous Friday 12:00PM
- For Thursday classes, send over your slides before the previous Monday 12:00PM
Completeness: Whether the presentation covers the background and major contribution of the listed papers, and is delivered within the required timeframe
Clarity: Whether the presenter clearly convey the information from their slides
Q&A: If there are any raised questions from the audiences, whether the presenters can handle the questions properly

Each student is also required to submit a preview question for a paper one day before the presentation for 3 times (need to be on 3 different classes, and not the date that you present). You are also encouraged to raise that question in class. Preview questions cannot be simple ones like "what is the aim of the paper?" or "what is the difference between this method and traditional method in nlp?"

Final Project (2-3 students per group)

Project Requirement: There are typically two types of projects.

Designing a novel algorithm to train a medium-sized language model: BERT, GPT-2 for problems that you are interested in.
https://huggingface.co/models
Designing a novel algorithm to do inference on large language models (white box models such as LLaMA2 models, or black box models such as GPT-4, CLAUDE, etc.) to solve some type of complex problems, and analyze their limitations. (We may not be able to reimburse for the API costs, so you can choose to use free APIs such as CLAUDE)
https://platform.openai.com/docs/introduction
https://docs.anthropic.com/claude/reference/getting-started-with-the-api

Project Presentation: Date: 4/18, 4/23, 4/25. You will need to signup for a time slot near the end of the semester. Students will need to submit feedback scores for other groups’ presentation (through Google Form).

Office Hour

Our office hour will be on-demand ones: If you find yourself needing to discuss course materials or have questions at any point, feel free to send an email requesting an office hour. Based on these requests, we will organize time slots for students to schedule appointments.

Syllabus (The dates of the courses are tentative due to guest lectures.)

Date	Topic	Readings	Slides
1/16	Course Overview	Distributed Representations of Words and Phrases and their Compositionality (Word2Vec) Enriching Word Vectors with Subword Information Attention Is All You Need (Transformer)	Slides
1/18	Language Model Architectures	Language Models are Unsupervised Multitask Learners (GPT-2) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension	Slides
1/23	Large Language Model Training (I)	Multitask Prompted Training Enables Zero-Shot Task Generalization Cross-Task Generalization via Natural Language Crowdsourcing Instructions LIMA: Less Is More for Alignment Self-Instruct: Aligning Language Models with Self-Generated Instructions How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources	Slides
1/25	Large Language Model Training (II)	Training language models to follow instructions with human feedback Direct Preference Optimization: Your Language Model is Secretly a Reward Model Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback	Slides
1/30	Parameter Efficient Fine-Tuning	The Power of Scale for Parameter-Efficient Prompt Tuning Prefix-Tuning: Optimizing Continuous Prompts for Generation LoRA: Low-Rank Adaptation of Large Language Models QLoRA: Efficient Finetuning of Quantized LLMs	Slides
	-----Student Presentation Starts-----
2/1	Prompting and In-context Learning	Language Models are Few-Shot Learners (GPT-3) Emergent Abilities of Large Language Models Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers	Slides_1 Slides_2
2/6	Language Model Reasoning (I)	Chain of Thought Prompting Elicits Reasoning in Large Language Models Self-Consistency Improves Chain of Thought Reasoning in Language Models Least-to-Most Prompting Enables Complex Reasoning in Large Language Models Tree of Thoughts: Deliberate Problem Solving with Large Language Models	Slides_1 Slides_2
2/8	Language Model Reasoning (II)	STaR: Bootstrapping Reasoning With Reasoning Large Language Models Can Self-Improve Progressive-Hint Prompting Improves Reasoning in Large Language Models Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters	Slides_1 Slides_2
2/13	Language Model Calibration/Uncertainty	How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering Teaching models to express their uncertainty in words Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation	Slides_1 Slides_2
2/15	Retrieval Augmentation and Parametric Knowledge	Generalization through Memorization: Nearest Neighbor Language Models Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Language Models as Knowledge Bases? How Much Knowledge Can You Pack Into the Parameters of a Language Model?	Slides_1 Slides_2
	-----Project Proposal Deadline: 2/19 11:59pm-----
2/20	Decoding	Contrastive Decoding: Open-ended Text Generation as Optimization Don't throw away your value model! Making PPO even better via Value-Guided Monte-Carlo Tree Search decoding	Slides
2/22	Course Cancelled Due to Business Travel
2/27	Code Language Models	InCoder: A Generative Model for Code Infilling and Synthesis Code Llama: Open Foundation Models for Code Teaching Large Language Models to Self-Debug LEVER: Learning to Verify Language-to-Code Generation with Execution	Slides_1 Slides_2
2/29	Multimodal Language Models	Flamingo: a Visual Language Model for Few-Shot Learning VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Visual Instruction Tuning NExT-GPT: Any-to-Any Multimodal LLM	Slides_1 Slides_2
3/5	Language Models as Agents	ReAct: Synergizing Reasoning and Acting in Language Models Toolformer: Language Models Can Teach Themselves to Use Tools AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation ART: Automatic multi-step reasoning and tool-use for large language models	Slides_1 Slides_2
3/7	Evaluation of Language Models	Proving Test Set Contamination in Black Box Language Models Holistic Evaluation of Language Models	Slides
	Spring Break
	-----Project Mid-Term Report Deadline: 3/18 11:59pm-----
3/19	Long-Context Language Models	Lost in the Middle: How Language Models Use Long Contexts Longformer: The Long-Document Transformer LongNet: Scaling Transformers to 1B Tokens Memorizing Transformers	Slides_1 Slides_2
3/21	Dynamic Architecture and Knowledge	Depth-Adaptive Transformer DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference How is ChatGPT's behavior changing over time? Time is Encoded in the Weights of Finetuned Language Models	Slides_1 Slides_2
3/26	Language Model Bias	Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Whose Opinions Do Language Models Reflect? Red Teaming Language Models with Language Models	Slides_1 Slides_2
3/28	Guest Lecture: Chunyuan Li (Microsoft Research)
4/2	Privacy	Extracting Training Data from Large Language Models Large Language Models Can Be Strong Differentially Private Learners Quantifying Memorization Across Neural Language Models SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore	Slides_1 Slides_2
4/4	Security	Universal and Transferable Adversarial Attacks on Aligned Language Models DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Poisoning Language Models During Instruction Tuning GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher	Slides_1 Slides_2
4/9	Guest Lecture: Sarah Wiegreffe (Allen Institute for AI)
4/11	Guest Lecture: Akari Asai (University of Washington)
4/16	Explorations of Large Language Models	Weak-To-Strong Generalization:Eliciting Strong Capabilities With Weak Supervision Hungry Hungry Hippos: Towards Language Modeling with State Space Models PaLM-E: An Embodied Multimodal Language Model When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities	Slides_1 Slides_2
	-----Project Presentation Deadline: 4/17 11:59pm-----
4/18	Final Project Presentation I
4/23	Final Project Presentation II
4/25	Final Project Presentation III
	-----Project Final Report Deadline: 5/3 11:59pm-----

Jiaxin Huang