Beyond the GUI: Toward Capable, Verifiable, and Safe Computer-Use Agents
Jieyu Zhao
University of Southern California
The NLP Reading Group is excited to host Prof. Jieyu Zhao who will present a talk on Beyond the GUI: Toward Capable, Verifiable, and Safe Computer-Use Agents.
Logistics
Date: Friday May 8
Time: 2PM
Location: on Google Meet, to be screencast at Mila in A14
Abstract
Computer-use agents are rapidly moving from research demos to systems that operate real desktops and applications, but progress in capability, evaluation, and safety has not advanced at the same pace. This talk presents three recent works that take on these dimensions together. I will first introduce CoAct-1, a multi-agent computer-use system that combines GUI operations with code generation through a planner–operator–programmer architecture, achieving state-of-the-art success on OSWorld with substantially fewer steps than prior GUI agents. I will then present ExeVRM, a video-based reward model that evaluates whether an agent’s execution video fulfills a user’s instruction, providing a scalable, model-agnostic evaluator across operating systems. Finally, I will discuss OS-BLIND, a benchmark that uncovers a critical safety blind spot in computer-use agents: harm that arises not from adversarial users but from benign instructions in risky contexts, where most frontier agents exceed 90% attack success rate. Together, these works argue that capability, verifiability, and safety must advance together, and outline a research agenda for computer-use agents that is honest about the trade-offs among them.
Speaker Bio
Jieyu Zhao (https://jyzhao.net) is a Gabilan Assistant Professor in the Thomas Lord Department of Computer Science at the University of Southern California, where she leads the LIME lab. Her research focuses on building trustworthy and socially responsible NLP systems, spanning efficient post-training and reinforcement learning from feedback, rigorous evaluation of model capabilities and limitations, mitigation of bias, hallucination, and safety risks in LLMs, and human–AI collaboration across diverse social and cultural contexts. Her work has been covered by news media such as Wired and The Daily Mail, and she has been invited by UN-WOMEN Beijing and Korea to give talks on gender equality and social responsibility. Her papers have received the Best Long Paper Award, the SAC Highlight Award, and Top-10 Cited Paper recognition at top NLP conferences.