ITU

DaNewsroom: A Large-scale Danish Summarisation Dataset

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Standard

DaNewsroom: A Large-scale Danish Summarisation Dataset. / Varab, Daniel; Schluter, Natalie.

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). European Language Resources Association, 2020. p. 6731–6739.

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Harvard

Varab, D & Schluter, N 2020, DaNewsroom: A Large-scale Danish Summarisation Dataset. in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). European Language Resources Association, pp. 6731–6739, LREC 2020, Marseille, France, 17/05/2020. <https://www.aclweb.org/anthology/2020.lrec-1.831/>

APA

Varab, D., & Schluter, N. (2020). DaNewsroom: A Large-scale Danish Summarisation Dataset. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 6731–6739). European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.831/

Vancouver

Varab D, Schluter N. DaNewsroom: A Large-scale Danish Summarisation Dataset. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). European Language Resources Association. 2020. p. 6731–6739

Author

Varab, Daniel ; Schluter, Natalie. / DaNewsroom: A Large-scale Danish Summarisation Dataset. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). European Language Resources Association, 2020. pp. 6731–6739

Bibtex

@inproceedings{448828b40d7548e998419ad54c8b2727,
title = "DaNewsroom: A Large-scale Danish Summarisation Dataset",
abstract = "Dataset development for automatic summarisation systems is notoriously English-oriented. In this paper we present the first large-scale non-English language dataset specifically curated for automatic summarisation. The document-summary pairs are news articles and manually written summaries in the Danish language. There has previously been no work done to establish a Danish summarisation dataset, nor any published work on the automatic summarisation of Danish. We provide therefore the first automatic summarisation dataset for the Danish language (large-scale or otherwise). To support the comparison of future automatic summarisation systems for Danish, we include system performance on this dataset of strong well-established unsupervised baseline systems, together with an oracle extractive summariser, which is the first account of automatic summarisation system performance for Danish. Finally, we make all code for automatically acquiring the data freely available and make explicit how this technology can easily be adapted in order to acquire automatic summarisation datasets for further languages.",
author = "Daniel Varab and Natalie Schluter",
year = "2020",
month = apr,
language = "English",
pages = "6731–6739",
booktitle = "Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)",
publisher = "European Language Resources Association",
note = "LREC 2020 ; Conference date: 17-05-2020 Through 22-05-2020",
url = "https://lrec2020.lrec-conf.org/en/",

}

RIS

TY - GEN

T1 - DaNewsroom: A Large-scale Danish Summarisation Dataset

AU - Varab, Daniel

AU - Schluter, Natalie

PY - 2020/4

Y1 - 2020/4

N2 - Dataset development for automatic summarisation systems is notoriously English-oriented. In this paper we present the first large-scale non-English language dataset specifically curated for automatic summarisation. The document-summary pairs are news articles and manually written summaries in the Danish language. There has previously been no work done to establish a Danish summarisation dataset, nor any published work on the automatic summarisation of Danish. We provide therefore the first automatic summarisation dataset for the Danish language (large-scale or otherwise). To support the comparison of future automatic summarisation systems for Danish, we include system performance on this dataset of strong well-established unsupervised baseline systems, together with an oracle extractive summariser, which is the first account of automatic summarisation system performance for Danish. Finally, we make all code for automatically acquiring the data freely available and make explicit how this technology can easily be adapted in order to acquire automatic summarisation datasets for further languages.

AB - Dataset development for automatic summarisation systems is notoriously English-oriented. In this paper we present the first large-scale non-English language dataset specifically curated for automatic summarisation. The document-summary pairs are news articles and manually written summaries in the Danish language. There has previously been no work done to establish a Danish summarisation dataset, nor any published work on the automatic summarisation of Danish. We provide therefore the first automatic summarisation dataset for the Danish language (large-scale or otherwise). To support the comparison of future automatic summarisation systems for Danish, we include system performance on this dataset of strong well-established unsupervised baseline systems, together with an oracle extractive summariser, which is the first account of automatic summarisation system performance for Danish. Finally, we make all code for automatically acquiring the data freely available and make explicit how this technology can easily be adapted in order to acquire automatic summarisation datasets for further languages.

M3 - Article in proceedings

SP - 6731

EP - 6739

BT - Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)

PB - European Language Resources Association

T2 - LREC 2020

Y2 - 17 May 2020 through 22 May 2020

ER -

ID: 85551496