Towards a Generalist Web Agent

Boyuan Zheng

Ohio State University

The NLP Reading Group is delighted to host Boyuan Zheng, a PhD student at the Ohio State University, who will be speaking remotely on Zoom (also projected in A14) at 11:30 AM on Friday November 8th about Towards a Generalist Web Agent.

Talk Description

Empowered by large multimodal models (LMMs), web agents are rising as a new frontier for embodied agents that provide both the breadth and depth needed for driving agent development. In this talk, I will discuss the challenges and promises of building a generalist web agent. Firstly, I will introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Then, I will introduce SeeAct, a generalist web agent that harnesses the power of LMMs for integrated visual understanding and acting on the web. Finally, I will discuss some promising future directions, including planning and agent safety.

Speaker Bio

Boyuan Zheng is a Ph.D. candidate at the OSUNLP group at The Ohio State University, advised by Prof.Yu Su. His research interests lie in natural language processing and artificial intelligence, with a passion for developing language agents that operate in real-world environments, such as websites. In pursuit of this goal, he has worked on various aspects of web agents, including evaluation (Mind2Web), multimodal agent (SeeAct), visual grounding (UGround), and planning (WebDreamer).

Logistics

Date: November 8th
Time: 11:30AM
Location: A14 or via Zoom (See email)