Tunable Distortion Limits and Corpus Cleaning for SMT

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

We describe the Uppsala University system for WMT13, for English-to-German translation. We use the Docent decoder, a local search decoder that translates at the document level. We add tunable distortion limits, that is, soft constraints on the maximum distortion allowed, to Docent. We also investigate cleaning of the noisy Common Crawl corpus. We show that we can use alignment-based filtering for cleaning with good results. Finally we investigate effects of corpus selection for recasing.
Original languageEnglish
Title of host publicationProceedings of the Eighth Workshop on Statistical Machine Translation
Publication date9 Aug 2013
ISBN (Print)978-1-937284-57-2
Publication statusPublished - 9 Aug 2013
Externally publishedYes

Keywords

  • machine translation
  • document-level decoding
  • corpus cleaning
  • alignment-based filtering
  • recasing

Fingerprint

Dive into the research topics of 'Tunable Distortion Limits and Corpus Cleaning for SMT'. Together they form a unique fingerprint.

Cite this