Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data

Joakim Nivre, Zeljko Agic, Lars Ahrenberg, Lene Antonsen, M. J. Aranzabe, Masayuki Asahara, Luma Ateyah, Mohammed Attia, Aitziber Atutxa, E. Badmaeva, Miguel Ballesteros, Esha Banerjee, S. Bank, John Bauer, K. Bengoetxea, R. Bhat, E. Bick, C. Bosco, G. Bouma, Samuel R. Bowman, A. Burchardt, Marie Candito, G. Caron, G. Eryigit, G. Celano, S. Çetin, Fabricio Chalub, Jinho Choi, Yongseok Cho, Silvie Cinková, Çagri Çöltekin, M. Connor, Marie-Catherine de Marneffe, Valeria C V de Paiva, A. D. Ilarraza, Kaja Dobrovoljc, Timothy Dozat, Kira Droganova, Marhaba Eli, A. Elkahky, T. Erjavec, Richárd Farkas, Héctor Fernández Alcalde, Jennifer Foster, Cláudia Freitas, K. Gajdošová, Daniel Galbraith, Marcos Garcia, Filip Ginter, Iakes Goenaga, Koldo Gojenola, Memduh Gökirmak, Yoav Goldberg, X. Guinovart, Berta Gonzáles Saavedra, M. Grioni, Normunds Gruzitis, B. Guillaume, Nizar Habash, Jan Hajic, L. My, K. Harris, D. Haug, B. Hladká, Jaroslava Hlavácová, Petter Hohle, Radu Ion, E. Irimia, Anders Johannsen, Fredrik Jørgensen, Hüner Kaşıkara, H. Kanayama, Jenna Kanerva, Tolga Kayadelen, Václava Kettnerová, Jesse Kirchner, N. Kotsyba, Simon Krek, Sookyoung Kwak, Veronika Laippala, Lorenzo Lambertino, T. Lando, Phương Lê Hồng, Alessandro Lenci, Saran Lertpradit, H. Leung, Cheuk Ying Li, Josie Li, N. Ljubešić, O. Loginova, O. Lyashevskaya, Teresa Lynn, Vivien Macketanz, Aibek Makazhanov, M. Mandl, Christopher D. Manning, R. Manurung, Catalina Maranduc, D. Mareček, Katrin Marheinecke, Héctor Martínez Alonso, André Martins, J. Masek, Yuji Matsumoto, Ryan T. McDonald, Gustavo Mendonça, Anna Missilä, V. Mititelu, Yusuke Miyao, S. Montemagni, A. More, L. Romero, Shunsuke Mori, Bohdan Moskalevskyi, K. Muischnek, N. Mustafina, Kaili Müürisep, Pinkey Nainwani, A. Nedoluzhko, Lương Nguyễn Thị, Huyen Nguyen Thi Minh, Vitaly Nikolaev, Rattima Nitisaroj, Hanna M. Nurmi, S. Ojala, P. Osenova, Lilja Øvrelid, E. Pascual, Marco Passarotti, Cenel-Augusto Perez, Guy Perrier, Slav Petrov, Jussi Piitulainen, Emily Pitler, Barbara Plank, M. Popel, L. Pretkalnina, Prokopis Prokopidis, Tiina Puolakainen, Sampo Pyysalo, Alexandre Rademaker, Livy Real, Siva Reddy, Georg Rehm, L. Rinaldi, Laura Rituma, Rudolf Rosa, D. Rovati, Shadi Saleh, M. Sanguinetti, Baiba Saulite, Yanin Sawanakunanon, Sebastian Schuster, Djamé Seddah, W. Seeker, M. Seraji, L. Shakurova, Mo Shen, A. Shimada, Muh Shohibussirri, Natalia Silveira, M. Simi, R. Simionescu, K. Simkó, M. Šimková, K. Simov, Aaron Smith, Antonio Stella, Jana Strnadová, A. Suhr, U. Sulubacak, Z. Szántó, Dima Taji, Takaaki Tanaka, Trond Trosterud, A. Trukhina, Reut Tsarfaty, Francis M. Tyers, Sumire Uematsu, Zdenka Uresová, L. Uria, H. Uszkoreit, Gertjan van Noord, Viktor Varga, V. Vincze, Jonathan North Washington, Zhuoran Yu, Z. Žabokrtský, Daniel Zeman, Hanzhi Zhu

CoNLL

Abstract

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).

This release contains the test data used in the CoNLL 2017 shared task on parsing Universal Dependencies. Due to the shared task the test data was held hidden and not released together with the training and development data of UD 2.0. Therefore this release complements the UD 2.0 release (http://hdl.handle.net/11234/1-1983) to a full release of UD treebanks. In addition, the present release contains 18 new parallel test sets and 4 test sets in surprise languages. The present release also includes the development data already released with UD 2.0. Unlike regular UD releases, this one uses the folder-file structure that was visible to the systems participating in the shared task.