Efficient Model Development in the Era of Large Language Models

Tu Vu

Virginia Tech

The NLP Reading Group is delighted to host Tu Vu, an assistant professor at Virginia Tech, who will be speaking remotely on Zoom and also be projected in Auditorium 2 at 11:30 AM on Friday October 11th about Efficient Model Development in the Era of Large Language Models.

Talk Description

The development of modern large language models typically involves a post-training stage where a pretrained model is fine-tuned on a large multitask mixture to improve various capabilities such as instruction following, reasoning, and coding. This raises key research questions: 1) How can we efficiently determine the optimal dataset proportions for the mixture? and 2) How can we efficiently integrate new capabilities into the model while maintaining performance on existing ones?

In this talk, I will present two of my recent research papers that shed light on these questions. First, I will discuss our work on developing Foundational Large Auto-rater Models (FLAMe), introducing a novel tail-patch fine-tuning approach that evaluates the impact of each dataset on targeted distributions. This approach enables us to determine the optimal dataset proportions, yielding competitive results compared to example-proportional mixing, while significantly reducing computational costs. Second, I will cover our latest work on large-scale model merging, which proposes a promising solution for efficiently incorporating new capabilities: As model capability and capacity grow, merging preserves nearly all performance on existing capabilities while significantly enhancing zero-shot generalization to novel tasks, often matching or exceeding the performance of multitask training. Taken as a whole, I hope these results will spur more fundamental research into the efficient development of large language models.

Speaker Bio

Tu Vu is an Assistant Professor in the Department of Computer Science at Virginia Tech. Prior to joining Virginia Tech, he held the position of Research Scientist at Google DeepMind for a year after receiving his PhD in Computer Science from the University of Massachusetts Amherst. Vu’s research aims to develop effective and efficient methods for advancing and democratizing artificial intelligence in the era of large language models. During his PhD, Vu worked as a student researcher at Google DeepMind, contributing to the development of models like Gemini and Flan-T5. His work on parameter-efficient transfer learning through soft prompts was widely adopted within Google for tuning large language models.

Logistics

Date: October 11th
Time: 11:30AM
Location: Auditorium 2 or via Zoom (See email)