Abstract
We investigate the problem of predicting the quality of a given Machine Translation (MT) output segment as a binary classification task. In a study with four different data sets in two text genres and two language pairs, we show that the performance of a Support Vector Machine (SVM) classifier can be improved by extending the feature set with implicitly defined syntactic features in the form of tree kernels over syntactic parse trees. Moreover, we demonstrate that syntax tree kernels achieve surprisingly high performance levels even without additional features, which makes them suitable as a low-effort initial building block for an MT quality estimation system.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 15th International Conference of the European Association for Machine Translation |
Publikationsdato | 31 maj 2011 |
Status | Udgivet - 31 maj 2011 |
Udgivet eksternt | Ja |