Lecturer: Binhang Yuan.
In recent years, foundation models have fundamentally revolutionized the state-of-the-art of artificial intelligence. Thus, the computation in the training or inference of the foundation model could be one of the most important workflows running on top of modern computer systems. This course unravels the secrets of the efficient deployment of such workflows from the system perspective. Specifically, we will: i) explain how a modern machine learning system (i.e., PyTorch) works; ii) understand the performance bottleneck of machine learning computation over modern hardware (e.g., Nvidia GPUs); iii) discuss four main parallel strategies in foundation model training (data-, pipeline-, tensor model-, optimizer- parallelism); and iv) real-world deployment of foundation model including domain-specific adaptations.
Date | Topic |
---|---|
W1 - 09/03, 09/05 | Introduction and Logistics [Slides] & Stochastic Gradient Descent [Slides] |
W2 - 09/10, 09/12 | Auto-Differentiation [Slides] & Nvidia GPU Computation and Communication [Slides] |
W3 – 09/17, 09/19 | LLM Pretraining [Slides] & Data-, Pipeline- Parallelism [Slides] |
W4 - 09/24, 09/26 | Tensor Model-, Optimizer- Parallelism [Slides] & LLM Tuning and Utilization [Slides] |
W5 - 10/03 | Generative Inference Overview [Slides] |
W6 - 10/08, 10/10 | Alogirhtm Optimizations for Inference [Slides] & System Optimizations for Inference [Slides] |
W7 - 10/15, 10/17 | RAG and Domain Specific LLM Agent [Slides] & Course Review [Slides] |
W8 - 10/22, 10/24 | Presentation Sessions |
W9 – 10/29, 10/31 | Presentation Sessions |
W10 - 11/05, 11/07 | Presentation Sessions |
W11 - 11/12, 11/14 | Presentation Sessions |
W12 - 11/19, 11/21 | Presentation Sessions |
W13 - 11/26, 11/28 | Presentation Sessions |
- Course Report (70%):
- Literature review (50%):
- Cover the relevant techniques exhaustively. (10%)
- Understand the relevant techniques correctly. (15%)
- Organize the techniques using good categorization. (15%)
- The report is written in professional academic English. (10%)
- Page limits: 4 pages in NeurIPS template (excluding reference).
- Research plan (20%):
- The proposed research plan is executable. (10%)
- The proposed research plan includes novelty and concrete design. (10%)
- Page limits: 4 pages in NeurIPS template (excluding reference).
- Literature review (50%):
- In-class Presentation (30%), including literature review only:
- Clearly organize the material and present the problem definition, related work, and methodology appropriately. (20%)
- Can answer the questions from the lecturers and other students appropriately. (5%)
- Submit short feedback for all the other presentation sessions. (5%)
- (Other student feedback determines 70% of the grades for this part.)