Tunable Distortion Limits and Corpus Cleaning for SMT

Sara Stymne, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

We describe the Uppsala University system for WMT13, for English-to-German translation. We use the Docent decoder, a local search decoder that translates at the document level. We add tunable distortion limits, that is, soft constraints on the maximum distortion allowed, to Docent. We also investigate cleaning of the noisy Common Crawl corpus. We show that we can use alignment-based filtering for cleaning with good results. Finally we investigate effects of corpus selection for recasing.
OriginalsprogEngelsk
TitelProceedings of the Eighth Workshop on Statistical Machine Translation
Publikationsdato9 aug. 2013
ISBN (Trykt)978-1-937284-57-2
StatusUdgivet - 9 aug. 2013
Udgivet eksterntJa

Fingeraftryk

Dyk ned i forskningsemnerne om 'Tunable Distortion Limits and Corpus Cleaning for SMT'. Sammen danner de et unikt fingeraftryk.

Citationsformater