Learning From Experience in Large Language Models
Taiwei Shi
University of Southern California
The NLP Reading Group is excited to host Taiwei Shi who will present a talk on Learning From Experience in Large Language Models.
Logistics
Date: Friday April 24
Time: 2PM
Location: on Google Meet, to be screencast at Mila in A14
Abstract
Large language models are primarily trained by imitating human generated data, which limits performance to the quality and scope of available demonstrations and faces increasing constraints from data exhaustion. This motivates a shift toward learning directly from interaction, where models improve by leveraging feedback arising from their own behavior in the world. In this talk, I present two works that advance this paradigm. WildFeedback develops methods to extract and structure supervision signals from natural user interactions, enabling scalable alignment with real world preferences. Experiential Reinforcement Learning formalizes learning from experience as an iterative process of action, feedback, reflection, and policy update, transforming sparse and delayed rewards into reusable learning signals. Empirical results across control and reasoning domains demonstrate improved sample efficiency and performance. Together, these approaches illustrate a path toward scaling learning through interaction, where experience rather than static data becomes the primary driver of progress.
Speaker Bio
Taiwei Shi is a final-year PhD student at University of Southern California, advised by Jieyu Zhao. He collaborates closely with Microsoft Research and the Microsoft Office of Applied Research. His research focuses on moving beyond static training on human-curated datasets toward learning paradigms where agents improve continuously through interaction, feedback, and self-generated data. His work develops methods that convert real-world experience into scalable training signals for alignment and reasoning, with the goal of enabling general-purpose AI systems that learn autonomously and reliably from experience.