Projekter pr. år
Abstract
Because of globalization, it is becoming more and more common to use multiple languages in a single utterance, also called code-switching. This results in special linguistic structures and, therefore, poses many challenges for Natural Language Processing. Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs. In this paper, we explore semi-supervised approaches, that exploit out-of-domain mono-lingual training data. We experiment with word uni-grams, word n-grams, character n-grams, Viterbi Decoding, Latent Dirichlet Allocation, Support Vector Machine and Logistic Regression. The Viterbi model was the best semi-supervised model, scoring a weighted F1 score of 92.23%, whereas a fully supervised state-of-the-art BERT-based model scored 98.43%.
| Originalsprog | Engelsk |
|---|---|
| Titel | Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching |
| Antal sider | 6 |
| Forlag | Association for Computational Linguistics |
| Publikationsdato | jun. 2021 |
| Sider | 65 |
| Status | Udgivet - jun. 2021 |
| Begivenhed | The 5th Workshop on Computational Approaches to Linguistic Code-Switching - Varighed: 11 jun. 2021 → 11 jun. 2021 Konferencens nummer: 5 |
Konference
| Konference | The 5th Workshop on Computational Approaches to Linguistic Code-Switching |
|---|---|
| Nummer | 5 |
| Periode | 11/06/2021 → 11/06/2021 |
| Navn | Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching |
|---|
Emneord
- Globalization
- Code-switching
- Natural Language Processing
- Semi-supervised learning
- Language identification
Fingeraftryk
Dyk ned i forskningsemnerne om 'Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?'. Sammen danner de et unikt fingeraftryk.Projekter
- 1 Afsluttet
-
Multi-Task Sequence Labeling Under Adverse Conditions
Plank, B. (PI) & van der Goot, R. (CoI)
01/04/2019 → 31/08/2020
Projekter: Projekt › Andet