CS25: Transformers United V5

Date	Title	Description
April 1	Overview of Transformers [In-Person] Speaker: Instructors Recording Link	Brief intro and overview of the history of NLP, Transformers and how they work, and their impact. Discussion about recent trends, breakthroughs, applications, and current challenges/weaknesses. Slides posted here.
April 8	RL as a Co-Design of Product and Research [In-Person] Speaker: Karina Nguyen (OpenAI)	The next generation of AI products will be born at the intersection of rigorous RL research and fearless product design. This talk explores how tight co‑design loops—where scientists prototype and users immediately probe—let us build evaluation metrics to measure real-world usability of AI systems, not just what traditional benchmarks record. Drawing from her work on both Claude and ChatGPT, she'll share how she deepened her view of post-training as a blend of technical precision and creative intuition. This perspective becomes increasingly important as AI interactions grow increasingly multimodal, multi-agentic, and collaborative. At the heart of the discussion lies an interesting question: can we teach models to be truly creative and how do we design good evals for this? To probe this question she'll briefly discuss Reinforcement Learning from AI Feedback (RLAIF): how synthetic data accelerates iteration, how the "asymmetric verification" paradigm—where checking is easier than generating—allow for new research methods, and what these advances reveal about fostering creative intelligence while keeping models aligned with human values.
April 15	The Advent of AGI [In-Person] Speaker: Div Garg (AGI Inc)	As superintelligence seems round the corner and frontier models continue to scale, a new wave of autonomous AI agents is emerging—systems capable of perceiving, reasoning, and acting in open-ended environments. These agents represent the first steps toward Artificial General Intelligence, promising to radically reshape how we interface with software and get things done in the world. But the path to AGI is riddled with deep, unsolved challenges: brittle reasoning, drifting goals, shallow memory, and poor calibration under uncertainty. Real-world deployment quickly reveals how fragile today's agents truly are. Solving this isn't just about model improvements—it requires a rethinking of how we design, evaluate, and deploy intelligent systems - from rigorous evaluation metrics to tight user feedback loops for building systems that can reason, remember, and recover. In this talk, Div Garg explores a human-inspired approach to agent design—drawing from his work at frontier agent research and product design in the real-world. From new agent evaluation standards, online reinforcement learning training methodologies to enabling agent-agent communication, this talk offers a glimpse into the emerging frontier: agents that don't just complete tasks, but coordinate, adapt, and evolve with users in the loop. Speaker Bio: Div Garg is the founder and CEO of AGI, Inc, a new applied AI lab redefining AI-human interaction with the mission to bring AGI into everyday life. Div prev. founded MultiOn, the first AI agent startup developing agents that can interact with computers and assist with everyday tasks, funded by top Silicon Valley VCs. Div has spent his career at the intersection of AI, research, and startups and was previously a CS PhD at Stanford focused on RL before dropping out. His work spans across various high-impact areas, ranging from self-driving cars, robotics, computer control & Minecraft AI agents.
April 22	We're All in this Together: Human Agency in an Era of Artificial Agents [In-Person] Speaker: Eric Zelikman (xAI)	What does it mean to design agents that collaborate with us effectively and help empower us, even as they become more capable? What lessons can we learn from past advances and what role can academia play in understanding these dynamics and frameworks? Let's talk about how things have evolved so far and some ways they might still evolve. Speaker Bio: Eric Zelikman has studied how we can improve AI reasoning, often with inspiration from human reasoning, and helped develop a now-popular algorithm for language models to learn from their own reasoning (self-taught reasoners/STaR). These days, he works on reasoning and agents at xAI, with a focus on how they can interact with us.
April 29	Large Language Model Reasoning [In-Person] Speaker: Denny Zhou (Google Deepmind)	High-level overview of reasoning in large language models, focusing on motivations, core ideas, and current limitations. No prior background is required. Speaker Bio: Denny Zhou founded the Reasoning Team at Google Brain, now part of Google DeepMind. His group is renowned for pioneering chain‑of‑thought prompting and self‑consistency, and for developing the mathematical foundations of in‑context learning and chain‑of‑thought reasoning. The team also created core techniques that power Gemini's reasoning capabilities. Denny co-founded the Conference on Language Modeling (COLM) and served as General Chair for COLM 2024.
May 6	Reasoning Models as Agents: Deliberative Alignment, Multimodal Intelligence, and Tool Use [In-Person] Speaker: Hongyu Ren (OpenAI)	As large language models evolve into agents capable of complex reasoning, we will explore how modern models are taught not just to respond, but to reason and think -- orchestrating plans, decomposing problems, using tools, and understanding multimodal inputs like images and diagrams. Through visual walkthroughs and code examples, we will reveal how these capabilities are reshaping AI's ability to reason about the world, and glimpse the future of models that are truly general problem-solvers. We will also consider the implications for alignment: how to encourage helpful, honest, and harmless behavior when models are increasingly agentic. Speaker Bio: Hongyu Ren is a Member of Technical Staff at OpenAI, working on reasoning algorithms for large language models. He led the development of the o-mini series of models, and previously received his PhD degree in CS at Stanford.
May 13	On the Biology of a Large Language Model [In-Person] Speaker: Josh Batson (Anthropic)	Large language models do many things, and it's not clear from black-box interactions how they do them. We will discuss recent progress in mechanistic interpretability, an approach to understanding models based on decomposing them into pieces, understanding the role of the pieces, and then understanding behaviors based on how those pieces fit together. We will focus on the methods and findings of On the Biology of a Large Language Model, with some additional excursions and speculations. We hope to shed light on important behaviors like hallucination, planning, reasoning, (un)faithfulness, and emergent capabilities, and close with some suggestions for further research. Speaker Bio: Joshua Batson leads the circuits effort of the Anthropic mechanistic interpretability team. Before Anthropic, he worked on viral genomics and computational microscopy at the Chan Zuckerberg Biohub. His academic training is in pure mathematics.
May 20	Multimodal World Models for Drug Discovery [In-Person] Speaker: Eshed Margalit (Noetik.ai)	Where are all the cancer drugs? The past decade has seen astounding progress in machine learning, including the dominance of large transformer-based models in learning from massive datasets. At the same time, the field of cancer biology has enjoyed rapid improvement in the cost, speed, and resolution of once-futuristic measurement tools. These advancements should go hand in hand, yet we still lack models that can tell us which biological targets to drug in which patient subpopulations. In this talk I'll describe one particularly promising approach to this problem: large multimodal world models of patient biology. The two core ingredients to this approach are quite general: 1) collecting a large dataset that spans many scales and modalities, and 2) training multimodal transformers that learn to fuse those data streams in a way that allows nuanced simulations with a "world model". I will give an accessible overview of these components, and share our progress in applying them to cancer immunotherapy. Speaker Bio: Eshed is a neuroscientist and ML researcher working to understand biological systems with AI. He completed his PhD in neuroscience at Stanford, where he constructed self-supervised neural networks that incorporate biologically-inspired constraints to explain the structure, function, and development of primate visual cortex. Eshed is currently an ML scientist at Noetik, an AI-native biotech startup focused on curing cancer. In his work he develops novel transformer model architectures and tasks that learn from a large multi-modal dataset of patient tumor biology, and applies those models to drug discovery.
May 27	Transformers in Diffusion Models for Image Generation and Beyond [In-Person] Speaker: Sayak Paul (Hugging Face)	Diffusion models have been all the rage in recent times when it comes to generating realistic yet synthetic continuous media content. This talk covers how Transformers are used in diffusion models for image generation and goes far beyond that. We set the context by briefly discussing some preliminaries around diffusion models and how they are trained. We then cover the UNet-based network architecture that used to be the de facto choice for diffusion models. This helps us to motivate the introduction and rise of transformer-based architectures for diffusion. We cover the fundamental blocks and the degrees of freedom one can ablate in the base architecture in different conditional settings. We then shift our focus to the different flavors of attention and other connected components that the community has been using in some of the SoTA open models for various use cases. We conclude by shedding light on some promising future directions around efficiency. Speaker Bio: Sayak works on diffusion models at Hugging Face. His day-to-day includes contributing to the diffusers library, training and babysitting diffusion models, and working on applied ideas. He's interested in subject-driven generation, preference alignment, and evaluation of diffusion models. When he is not working, he can be found playing the guitar and binge-watching ICML tutorials and Suits.
June 3	Transformers for Video Generation [In-Person] Speaker: Andrew Brown (Meta)	The progress in video generation models over the past just 2 to 3 years has been astounding. With a particular focus on Meta’s Movie Gen model, in this talk we will explore how we are now able to train generative models to output high quality realistic videos, and the key role that transformers have played. Speaker Bio: Andrew is a Research Scientist in Meta’s GenAI team, focusing on media generation. Over the past few years, his team has focussed on publishing research papers that push the frontiers of video generative models, including Emu-Video and Movie Gen. Prior to working at Meta, Andrew completed his PhD at Oxford’s Visual Geometry Group (VGG) under the supervision of Professor Andrew Zisserman.

CS25: Transformers United V5

Instructors