Projects per year
Abstract
Lexical normalization is the task of translating non-standard social media data to a standard form. Previous work has shown that this is beneficial for many downstream tasks in multiple languages. However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data. In this paper, we discuss the creation of a lexical normalization dataset for Italian. After two rounds of annotation, a Cohen’s kappa score of 78.64 is obtained. During this process, we also analyze the inter-annotator agreement for this task, which is only rarely done on datasets for lexical normalization,and when it is reported, the analysis usually remains shallow. Furthermore, we utilize this dataset to train a lexical normalization model and show that it can be used to improve dependency parsing of social media data. All annotated data and the code to reproduce the results are available at: http://bitbucket.org/robvanderg/normit.
Original language | English |
---|---|
Title of host publication | Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020) |
Place of Publication | France |
Publisher | European Language Resources Association (ELRA) |
Publication date | May 2020 |
Pages | 6272–6278 |
Publication status | Published - May 2020 |
Event | LREC 2020 - Marseille, France Duration: 17 May 2020 → 22 May 2020 https://lrec2020.lrec-conf.org/en/ |
Conference
Conference | LREC 2020 |
---|---|
Country/Territory | France |
City | Marseille |
Period | 17/05/2020 → 22/05/2020 |
Internet address |
Keywords
- lexical normalization
- social media data
- Italian language
- inter-annotator agreement
- dependency parsing
Fingerprint
Dive into the research topics of 'Norm It! Lexical Normalization for Italian and Its Downstream Effects forDependency Parsing'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Multi-Task Sequence Labeling Under Adverse Conditions
Plank, B. (PI) & van der Goot, R. (CoI)
01/04/2019 → 31/08/2020
Project: Other