Memorisation meets compositionality in natural language processing

Verna Dankers

McGill University / Mila

The NLP Reading Group is thrilled to host Verna Dankers who will be presenting highlights from her PhD thesis followed by her current and future research directions at Mila.

Logistics

Date: Friday July 18
Time: 1PM
Location: on Google Meet, to be screencast at Mila in A14

Abstract

In deep learning, the perspective on memorisation of training examples is undergoing a paradigm shift. Traditionally linked to overfitting, memorisation is now seen both as beneficial when it enhances models’ generalisation capabilities and as concerning when it comes to specific examples that should not be memorised. This shift raises questions about what models (should) memorise, how memorisation is implemented internally, and what benefits memorisation has. I consider these questions to be particularly relevant for NLP; mastering language requires generalisation and memorisation, due to language’s dichotomous nature of involving both compositional and non-compositional expressions. In this talk, I will introduce you to some of the work I published throughout my PhD. I will first discuss memorisation in generic terms. Using the task of NMT, examples are placed on a memorisation–generalisation continuum to explore features predictive of high memorisation and to study the connection between memorisation and generalisation. Afterwards, I will discuss memorisation through the lens of compositionality, focusing on idioms as non-compositional phrases requiring memorisation. For NMT transformers, I analyse how models memorise idiom translations over the course of training while also monitoring models’ compositional generalisation abilities. I then examine internal mechanisms that enable paraphrased translations of idioms, by analysing the role of transformer’s (cross-)attention and changes to the hidden states across layers.

Speaker Bio

Verna Dankers is an incoming postdoc at MILA and McGill University, where she works with Professor Siva Reddy. She obtained her PhD from the University of Edinburgh, within the Centre for Doctoral Training in NLP, where she was supervised by Professor Ivan Titov. Her work seeks connections between interpretability, memorisation, figurative language processing and (non-)compositional generalisation. She obtained a BSc and MSc in AI from the University of Amsterdam, where she worked on metaphor processing, and she interned at Meta AI and Microsoft Research.