Abstract
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects |
| Publisher | Association for Computational Linguistics |
| Publication date | 20 Apr 2021 |
| Pages | 67–75 |
| Publication status | Published - 20 Apr 2021 |
| Event | Workshop on NLP for Similar Languages, Varieties and Dialects - VIRTUAL Duration: 20 Apr 2021 → 20 Apr 2021 Conference number: 8 |
Workshop
| Workshop | Workshop on NLP for Similar Languages, Varieties and Dialects |
|---|---|
| Number | 8 |
| City | VIRTUAL |
| Period | 20/04/2021 → 20/04/2021 |
Keywords
- Automatic language identification
- Nordic languages
- Machine learning
- Language discrimination
- Natal differentiation
Fingerprint
Dive into the research topics of 'Discriminating Between Similar Nordic Languages'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver