Abstract
We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semiautomatic metrics and test suites in place of fully automatic metrics.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
| Publication date | 2018 |
| ISBN (Print) | 9781948087841 |
| DOIs | |
| Publication status | Published - 2018 |
| Externally published | Yes |
Keywords
- pronoun translation evaluation
- APT
- AutoPRF
- PROTEST test suite
- semiautomatic metrics
Fingerprint
Dive into the research topics of 'Automatic reference-based evaluation of pronoun translation misses the point'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver