Frontiers in Foundation Models

Course Description: This seminar (Frontiers in Foundation Models) explores the frontier of Large Language Models (LLMs) and next-generation multimodal foundation models. We move beyond standard autoregressive language modeling to study how modern systems integrate text, vision, video, code, and action to enable grounded reasoning and real-world decision making. Topics include multimodal reasoning and evaluation, vision representation learning and segmentation, video generation, foundation models for robotics, and action-centric representation learning. We also cover emerging paradigms such as hierarchical and latent reasoning, diffusion-based language modeling, reinforcement learning for reasoning, and agentic workflows. Throughout the course, we emphasize both mechanistic understanding and practical safety alignment, connecting recent advances in interpretability, test-time compute scaling, and robust evaluation to hands-on paper presentations and open-ended research projects.

Course Slides

Class 1: Slides (PDF)

Who Should Enroll?

We welcome students from diverse backgrounds who are interested in learning about SOTA foundation models.

For PhD Students: This is an excellent opportunity to read the latest literature and bring your own research into the course. You are encouraged to align the final project with your thesis or ongoing research topics.
For Graduate & Undergraduate Students: If you are interested in Deep Learning and Generative AI, this course will help you get up to speed with the state-of-the-art.
Technical Background: Familiarity with Deep Learning basics (Python/PyTorch) is recommended to get the most out of the course projects, but we will support students in defining projects that match their skill levels (e.g., surveys, reproductions, or novel research).

Grading & Logistics

📍 Time & Location: Friday, Periods 3 & 4 (12:10 PM - 3:10 PM) in Hill 009

Grading Breakdown

Final Research Project and Presentation: 60%
Paper Presentation & Participation: 40%
- Slides (due 1 day before class): 10%
- Presentation Delivery: 20%
- Daily Participation: 10%

Schedule (Spring 2026)

Week	Topic & Readings
Week 1Jan 23	Introduction & Interpretation Overview of Foundation Models. Safety and Interpretation. Reading: SELFIE
Week 2Jan 30	LLM Frontiers DeepSeek-V3.2 Kimi-K2
Week 3Feb 6	Part 1: Student Presentation Paper: Dino v3 Part 2: Guest Lecture (1 hr) Xingyu Fu (Princeton) Topic: MLLM (benchmarks, thinking with images)
Week 4Feb 13	Guest Lecture (Starts 2:00 PM) Didac Suris (Meta Super Intelligence Lab) Topic: SAM 3 (Vision Foundation)
Week 5Feb 20	Part 1: Student Presentation Paper: Why Do Multi-Agent LLM Systems Fail? Part 2: Guest Lecture (Starts 1:40 PM) Sachit Menon (Columbia University) Topic: Multimodal Reasoning with Code Generation
Week 6Feb 27	Guest Lecture (Starts 1:40 PM) Wenhao Ding (NVIDIA Scientist) Topic: Accelerating the Development and Deployment of Reasoning Models for Physical AI
Week 7Mar 6	Part 1: Student Presentation (Aparajita) Paper: DataComp-LM: In search of the next generation of training sets for language models Part 2: Guest Lecture (1 hr) Ruoshi Liu (Amazon FAR Scientist) Topic: Foundation Model for Robotics
Week 8Mar 13	Guest Lecture (1 hr) Congyue Deng (MIT) Topic: Action Representation
Spring RecessMar 20	NO CLASS Rutgers Spring Recess (March 14 - March 22)
Week 9Mar 27	Part 1: Student Presentation (Alborz) Paper: SAT Solvers in LLMs Part 2: Guest Lecture (1 hr) Yunhao Ge (NVIDIA Research Scientist) Topic: World Action Models are Zero-Shot Policies
Week 10Apr 3	Data & Models Openthought, DataComp-LM, s1
Week 11Apr 10	Part 1: Student Presentation (Sinchona) Paper: Wan (arXiv:2503.20314) Part 2: Guest Lecture (1 hr) Yushi Hu (Meta FAIR) Topic: Multimodal
Week 12Apr 17	Diffusion & Hybrid Models Diffusion LM: Llada, dream-7B Hybrid: Transfusion, Diffusion Forcing
Week 13Apr 24	Reasoning Paradigms Parallel thinking, Latent thinking, Soft thinking
Week 14May 1	Final Final Project Presentations (Last day of Regular Classes)