Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation

Margita Šoštarić, Christian Hardmeier, Sara Stymne

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

We present an analysis of a number of coreference phenomena in English-Croatian human and machine translations. The aim is to shed light on the differences in the way these structurally different languages make use of discourse information and provide insights for discourse-aware machine translation system development. The phenomena are automatically identified in parallel data using annotation produced by parsers and word alignment tools, enabling us to pinpoint patterns of interest in both languages. We make the analysis more fine-grained by including three corpora pertaining to three different registers. In a second step, we create a test set with the challenging linguistic constructions and use it to evaluate the performance of three MT systems. We show that both SMT and NMT systems struggle with handling these discourse phenomena, even though NMT tends to perform somewhat better than SMT. By providing an overview of patterns frequently occurring in actual language use, as well as by pointing out the weaknesses of current MT systems that commonly mistranslate them, we hope to contribute to the effort of resolving the issue of discourse phenomena in MT applications.
OriginalsprogEngelsk
TitelProceedings of the Third Conference on Machine Translation: Research Papers
Publikationsdato1 nov. 2018
ISBN (Trykt)978-1-948087-81-0
DOI
StatusUdgivet - 1 nov. 2018
Udgivet eksterntJa

Fingeraftryk

Dyk ned i forskningsemnerne om 'Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation'. Sammen danner de et unikt fingeraftryk.

Citationsformater