Large language models, large culture models? The role of language in the cultural values exhibited by LLMs
Ayla Rigouts Terryn
Université de Montréal
The NLP Reading Group is thrilled to have Ayla Rigouts Terryn, an Assistant Professor in the Department of Linguistics and Translation at Université de Montréal, who will be discussing the role of language in the cultural values exhibited by LLMs. The talk will happen on Zoom and in A14 on Friday May 30rd at 1PM.
Talk Description
It is by now a well-known fact that large language models (LLMs) have reached a very high adoption rate in a short time. Many of these LLMs are exclusively text-based, acquiring all their knowledge through vast corpora of written language. All of the best-performing and most popular LLMs are trained on a disproportionate amount of English training data. Nevertheless, the models work well enough in many other languages to see users worldwide increasingly adopting LLMs in their native tongues.
So now that we have these powerful models, based purely on language, and adopted very widely and very quickly, this raises the question: do these systems respond in the same way in different languages? We already know performance is often better in English, and some research even points in the direction of the models responding as if English is their “native language”, often “thinking” in English and translating to other languages. But how else does prompt language influence model output?
This question naturally leads us to explore cultural values, which differ among humans in various cultures and, therefore, often with different native languages. Do LLMs, when prompted in different languages, display variations in cultural values? Since models are trained on texts in different languages, have they learned to associate languages with the cultural values of the people writing texts in these languages? Or does the overrepresentation of English and fine-tuning by U.S. companies mean that American values dominate? And can you prompt the models to reflect values from a specific culture?
To test these questions, we submitted 11 different models to 38 survey questions in 16 different languages. We compare replies when asking the same questions in different languages prompting models to reply as humans, versus explicitly prompting models to reply from the perspective of humans in a certain culture. We also compare against results from humans from different cultures replying to these same questions.
Results show a clear impact of language but also emphasise how unpredictable models can be.
Speaker Bio
Ayla Rigouts Terryn is an Assistant Professor of Translation Technologies and AI in the Department of Linguistics and Translation at Université de Montréal and an Associate Academic Member at Mila. In her research, she focuses on natural language processing (NLP), specifically in multilingual and/or domain-specific settings.
She obtained a master’s degree in translation from Antwerp University, a PhD on automatic terminology extraction from Ghent University, and specialised in multilingual language technology as an assistant professor at KU Leuven. As an applied linguist in the field of AI, she is passionate about using linguistic insight to advance our understanding of language models.
Logistics
Date: May 30rd
Time: 1PM
Location: A14 or Zoom (See email)