CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Daniel Zeman, M. Popel, Milan Straka, Jan Hajic, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis M. Tyers, E. Badmaeva, Memduh Gökirmak, A. Nedoluzhko, Silvie, Cinková, Jaroslava Hlavácová, Václava Kettnerová, Zdenka Uresová, Jenna Kanerva, S. Ojala, Anna Missilä, Christopher D. Manning, Sebastian, Schuster, Siva Reddy, Dima Taji, Nizar Habash, H. Leung, Marie-Catherine de Marneffe, M. Sanguinetti, M. Simi, Hiroshi, Kanayama, Valeria C V de Paiva, Kira Droganova, Héctor Martínez Alonso, Çağrı, Çöltekin, U. Sulubacak, H. Uszkoreit, Vivien Macketanz, Aljoscha, Burchardt, K. Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, A. Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M. Mandl, Jesse Kirchner, Héctor Fernández Alcalde, Jana Strnadová, Esha, Banerjee, R. Manurung, Antonio Stella, A. Shimada, Sookyoung Kwak, Gustavo Mendonça, T. Lando, Rattima Nitisaroj, Josie Li

CoNLL

Abstract

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.