FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

TACL 2022

Nouha DziriEhsan KamallooSivan MiltonOsmar ZaianeMo YuEdoardo Maria PontiSiva Reddy.

The goal of information-seeking dialogue is to respond to user queries with natural language utterances that are grounded on knowledge sources. In our recent investigation, we show that existing knowledge-grounded benchmarks are fraught with hallucinations (>60% of the responses). To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues by editing hallucinated responses in the Wizard of Wikipedia benchmark. FaithDial contains around 50K turns across 5.5K conversations. If trained on FaithDial, state-of-the-art dialogue models are significantly more faithful while also enhancing other dialogue aspects like cooperativeness, creativity and engagement.

@article{dziri2022faithdial,
  title={FaithDial: A Faithful Benchmark for Information-Seeking Dialogue},
  author={Dziri, Nouha and Kamalloo, Ehsan and Milton, Sivan and Zaiane, Osmar and Yu, Mo and Ponti, Edoardo and Reddy, Siva},
  journal={arXiv preprint, arXiv:2204.10757},
  year={2022},
  url={https://arxiv.org/abs/2204.10757}
}

and

@inproceedings{dziri2022origin,
  title = "On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?",
  author={Dziri, Nouha and Milton, Sivan and Yu, Mo and Zaiane, Osmar and Reddy, Siva},
  booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
  year={2022},
  pages = "5271--5285",
  address = "Seattle, United States",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2022.naacl-main.387",
}

Brought to you by researchers from: