Data Curation, Curriculum, and Cascade Serving for Better Small Language Models

Wanru Zhao

University of Cambridge / Mila

The NLP Reading Group is excited to host Wanru Zhao who will be presenting her recent work on “Data Curation, Curriculum, and Cascade Serving for Better Small Language Models”.

Logistics

Date: Friday July 11
Time: 1PM
Location: on Google Meet, to be screencast at Mila in A14

Abstract

In this talk, I present my three recent work aimed at improving the training and deployment of small language models through better data, smarter systems, and structured learning. We begin with CLUES, a collaborative data curation method that leverages training dynamics to identify high-quality examples across diverse data distributions. Next, CASCADIA introduces a modular cascade serving system that jointly optimizes request routing and deployment strategies to achieve Pareto-optimal trade-offs between accuracy and latency during inference. Finally, I will share a project from my internship at MSR Montréal, which proposes a dataset curriculum approach for organizing training data to help small models progressively learn more complex reasoning skills. Together, these efforts highlight a unified perspective on building more efficient and accessible LLM systems - by making them more data-driven, modular, and compositional.

Speaker Bio

Wanru Zhao is a third-year PhD student at the University of Cambridge, advised by Prof. Nic Lane, and is also a visiting researcher at the Vector Institute, working with Prof. Colin Raffel. Her research focuses on democratizing large model development through distributed learning, data curation, and modular model design. She develops algorithms and frameworks that enable organizations to jointly train and adapt models across data silos and heterogeneous compute environments, with the goal of making AI more accessible, efficient, and inclusive. She previously interned at AWS AI Lab and Microsoft Research. She co-organized the ICLR 2024 Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning (MCDC). Prior to her PhD, she earned an M.Phil. in Advanced Computer Science from the University of Cambridge, graduating with Distinction.