Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and–as sanity check–over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.
Original languageEnglish
Title of host publicationProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa
PublisherUniversity of Tartu Library
Publication date2023
Pages80 - 85
Publication statusPublished - 2023

Keywords

  • Relation Extraction
  • Multi-lingual Dataset
  • Machine Translation
  • CrossRE
  • Text Domains

Fingerprint

Dive into the research topics of 'Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction'. Together they form a unique fingerprint.

Cite this