Discriminating Between Similar Nordic Languages

René Haas, Leon Derczynski

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.
Original languageEnglish
Title of host publicationProceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
PublisherAssociation for Computational Linguistics
Publication date20 Apr 2021
Pages67–75
Publication statusPublished - 20 Apr 2021

Keywords

  • Automatic language identification
  • Nordic languages
  • Machine learning
  • Language discrimination
  • Natal differentiation

Fingerprint

Dive into the research topics of 'Discriminating Between Similar Nordic Languages'. Together they form a unique fingerprint.

Cite this