Abstract
We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semiautomatic metrics and test suites in place of fully automatic metrics.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
Publikationsdato | 2018 |
ISBN (Trykt) | 9781948087841 |
DOI | |
Status | Udgivet - 2018 |
Udgivet eksternt | Ja |