Abstract
This paper discusses some central caveats of summarisation, incurred in the use of
the ROUGE metric for evaluation, with respect to optimal solutions. The task is NPhard, of which we give the first proof. Still, as we show empirically for three central benchmark datasets for the task, greedy algorithms empirically seem to perform optimally according to the metric. Additionally, overall quality assurance is problematic: there is no natural upper bound on the quality of summarisation systems, and even humans are excluded from performing optimal summarisation.
the ROUGE metric for evaluation, with respect to optimal solutions. The task is NPhard, of which we give the first proof. Still, as we show empirically for three central benchmark datasets for the task, greedy algorithms empirically seem to perform optimally according to the metric. Additionally, overall quality assurance is problematic: there is no natural upper bound on the quality of summarisation systems, and even humans are excluded from performing optimal summarisation.
Original language | English |
---|---|
Title of host publication | Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics |
Number of pages | 5 |
Volume | 2 |
Publisher | Association for Computational Linguistics |
Publication date | 2017 |
Pages | 41–45 |
ISBN (Print) | 978-1-945626-34-0 |
Publication status | Published - 2017 |
Event | The 15th Conference of the European Chapter of the Association for Computational Linguistics - Valencia, Spain Duration: 3 Apr 2017 → 7 Apr 2017 http://eacl2017.org/ |
Conference
Conference | The 15th Conference of the European Chapter of the Association for Computational Linguistics |
---|---|
Country/Territory | Spain |
City | Valencia |
Period | 03/04/2017 → 07/04/2017 |
Internet address |
Keywords
- Summarisation
- ROUGE metric
- NP-hard
- Greedy algorithms
- Quality assurance