Concept-Based Interpretability: Methods, Metrics, and Tools for NLP

Fanny Jourdan

IRT Saint Exupéry, Toulouse / Mila

The NLP Reading Group is excited to host Fanny Jourdan who will be giving a talk in-person on “Concept-Based Interpretability: Methods, Metrics, and Tools for NLP.”

Talk Description

This talk focuses on concept-based explainability methods for NLP models, including mechanistic interpretability, and introduces a unified framework to facilitate their use and evaluation. We will then introduce an evaluation metric designed to assess the effectiveness of these methods. Finally, we will present Interpreto, an open-source library that consolidates both concept-based and attribution methods into a single, user-friendly toolkit compatible with Hugging Face NLP models. The goal is to provide a comprehensive and accessible framework to researchers and practitioners working on transparency, fairness, and interpretability in NLP.

Speaker Bio

Fanny Jourdan is a researcher specializing in explainability and bias evaluation for large language models. She holds a PhD in mathematics and computer science from the University of Toulouse, where she focused on fairness and transparency in NLP systems. She is currently an Assistant Professor at the IRT Saint Exupéry in Toulouse, France, and a visiting researcher at Mila for 18 months. Her work centers on concept-based interpretability, representation analysis, and the development of open-source tools to better understand the inner workings of NLP models.

Logistics

Date: November 28th
Time: 2:00PM
Location: H04 or via Google Meet (See email)