Abstract
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.
| Originalsprog | Engelsk |
|---|---|
| Titel | Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects |
| Forlag | Association for Computational Linguistics |
| Publikationsdato | 20 apr. 2021 |
| Sider | 67–75 |
| Status | Udgivet - 20 apr. 2021 |
Emneord
- Automatic language identification
- Nordic languages
- Machine learning
- Language discrimination
- Natal differentiation