High-Quality Human Evaluation of Generated Texts

Ehud Reiter

University of Aberdeen

The NLP Reading Group is excited to host Professor Ehud Reiter who will be giving a talk about “High-Quality Human Evaluation of Generated Texts”.

Talk Description

Human evaluation is the best way to evaluate text generation systems if it is done rigorously, with good experimental design, execution, and reporting. Unfortunately many of the human evaluations in NLP have quality problems including poor experimental execution and questionable experimental design; many evaluations are also difficult to reproduce. I will summarise our recent work on developing high-quality human evaluations techniques and protocols, on analysing problems in experiments done elsewhere, and on reproducing human evaluations.

Speaker Bio

Ehud Reiter is a Professor of Computing Science at the University of Aberdeen in Scotland. He has worked on Natural Language Generation for over 30 years (in industry as well as academia), and in recent years has focused on evaluation, especially human evaluation, of generated texts. He also has a long-standing interest in medical applications of NLG. He has a Google Scholar H-index of 57 and writes a widely read blog on NLG (ehudreiter.com); he has received several academic awards for his research. His current funded research projects include ReproHum (reproducibility of human evaluations), NL4XAI (NLG in Explainable AI), and ASICA (helping skin cancer patients monitor their status while at home).

Logistics

Date: May 14th
Time: 10:00AM
Location: H04 or via Zoom (See email)