TY - GEN
T1 - Automatic reference-based evaluation of pronoun translation misses the point
AU - Guillou, Liane
AU - Hardmeier, Christian
PY - 2018
Y1 - 2018
N2 - We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semiautomatic metrics and test suites in place of fully automatic metrics.
AB - We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semiautomatic metrics and test suites in place of fully automatic metrics.
KW - pronoun translation evaluation
KW - APT
KW - AutoPRF
KW - PROTEST test suite
KW - semiautomatic metrics
UR - https://www.scopus.com/pages/publications/85081746897
U2 - 10.18653/v1/D18-1513
DO - 10.18653/v1/D18-1513
M3 - Article in proceedings
SN - 9781948087841
BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
ER -