Data Visualization Understanding with Vision-Language Models (VLMs)

Ahmed Masry

York University

The NLP Reading Group is excited to host Admed Masry who will be giving a talk in-person on “Data Visualization Understanding with Vision-Language Models (VLMs)”

Talk Description

Data visualizations, such as charts, are central to how people analyze information and make informed decisions. Yet, automating their understanding (e.g., answering questions) remains challenging. While large vision-language models (VLMs) have advanced multimodal reasoning, they still underperform humans on data visualization tasks.

This talk explores our research journey in advancing multimodal chart understanding through benchmarks, training methods, and novel architectures. We introduce ChartQA and Chart-to-Text, our first large-scale benchmarks for question answering and summarization. Next, we present UniChart, a continual pretraining approach using chart-specific objectives and synthetic data to enhance visual math reasoning. Then, we reveal alignment gaps between vision and text modalities in VLMs and present AlignVLM, which bridges these gaps using a novel vision-text connector. We also discuss BigCharts, our synthetic chart generation technique that preserves real-world visual diversity. The talk concludes with open challenges and future directions for advancing VLMs’ understanding of data visualizations and, more broadly, visually-situated language tasks.

Speaker Bio

Ahmed Masry is a PhD student in the Electrical Engineering and Computer Science program at York University, supervised by Prof. Enamul Hoque. His research focuses on multimodal vision-language models and benchmarks for chart, table, and document comprehension, with publications in top-tier conferences such as ACL, EMNLP, COLM, and NeurIPS. He is currently a visiting researcher at Mila, working under the supervision of Prof. Aaron Courville. His past experiences include an internship at ServiceNow Research through a Mitacs Accelerate Award. He is also a recipient of the NSERC CGS-D, Canada’s national doctoral scholarship, and the Google PaliGemma Academic Award.

Logistics

Date: December 5th
Time: 2:00PM
Location: H04 or via Google Meet (See email)