Journal article
2020
APA
Click to copy
Guillaume, B., Ramisch, C., Waszczuk, J., Monti, J., di Buono, M. P., Sangati, F., … Dowling, M. (2020). Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2).
Chicago/Turabian
Click to copy
Guillaume, Bruno, Carlos Ramisch, J. Waszczuk, J. Monti, Maria Pia di Buono, Federico Sangati, Giulia Speranza, et al. “Morpho-Syntactically Annotated Corpora Provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (Edition 1.2)” (2020).
MLA
Click to copy
Guillaume, Bruno, et al. Morpho-Syntactically Annotated Corpora Provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (Edition 1.2). 2020.
BibTeX Click to copy
@article{bruno2020a,
title = {Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)},
year = {2020},
author = {Guillaume, Bruno and Ramisch, Carlos and Waszczuk, J. and Monti, J. and di Buono, Maria Pia and Sangati, Federico and Speranza, Giulia and Carlino, Carola and Güngör, Tunga and Yi̇rmi̇beşoğlu, Zeynep and Sak, Hasim and Saraçlar, M. and Giouli, Voula and Foufi, Vassiliki and Ramisch, Renata and Rademaker, Alexandre and Vale, Oto and Wilkens, Rodrigo and Candito, Marie and Crabbé, Benoît and Segonne, Vincent and Liebeskind, Chaya and Stymne, Sara and Hajic, Jan and Ginter, Filip and Luotolahti, Juhani and Straka, Milan and Zeman, Daniel and Mititelu, V. and Cristescu, Mihaela and Vaidya, Ashwini and Bhatia, Archna and Lichte, Timm and Ehren, Rafael and Jiang, M. and Xu, Hongzhi and Walsh, Abigail and Irimia, Elena and Dowling, Meghan}
}
This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2