Projects per year
Abstract
This work provides the first in-depth analysis of genre in Universal Dependencies (UD). In contrast to prior work on genre identification which uses small sets of well-defined labels in mono-/bilingual setups, UD contains 18 genres with varying degrees of specificity spread across 114 languages. As most treebanks are labeled with multiple genres while lacking annotations about which instances belong to which genre, we propose four methods for predicting instance-level genre using weak supervision from treebank metadata. The proposed methods recover instance-level genre better than competitive baselines as measured on a subset of UD with labeled instances and adhere better to the global expected distribution. Our analysis sheds light on prior work using UD genre metadata for treebank selection, finding that metadata alone are a noisy signal and must be disentangled within treebanks before it can be universally applied.
Original language | English |
---|---|
Title of host publication | Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021) |
Place of Publication | Sofia, Bulgaria |
Publisher | Association for Computational Linguistics |
Publication date | Dec 2021 |
Pages | 69-85 |
Publication status | Published - Dec 2021 |
Event | 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021) - Sofia, Bulgaria Duration: 21 Mar 2022 → 25 Mar 2022 Conference number: 20 |
Workshop
Workshop | 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021) |
---|---|
Number | 20 |
Country/Territory | Bulgaria |
City | Sofia |
Period | 21/03/2022 → 25/03/2022 |
Keywords
- Universal Dependencies
- Genre Identification
- Treebank Metadata
- Weak Supervision
- Instance-level Prediction
Fingerprint
Dive into the research topics of 'How Universal is Genre in Universal Dependencies?'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Multi-Task Sequence Labeling Under Adverse Conditions
Plank, B. (PI) & van der Goot, R. (CoI)
01/04/2019 → 31/08/2020
Project: Other