Discriminating Between Similar Nordic Languages

René Haas, Leon Derczynski

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.
OriginalsprogEngelsk
TitelProceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
ForlagAssociation for Computational Linguistics
Publikationsdato20 apr. 2021
Sider67–75
StatusUdgivet - 20 apr. 2021

Emneord

  • Automatic language identification
  • Nordic languages
  • Machine learning
  • Language discrimination
  • Natal differentiation

Fingeraftryk

Dyk ned i forskningsemnerne om 'Discriminating Between Similar Nordic Languages'. Sammen danner de et unikt fingeraftryk.

Citationsformater