Cross-domain Relation Extraction

Research output: Book / Anthology / Report / Ph.D. thesisPh.D. thesis

Abstract

Language technologies are widely spreading over a diverse range of applications. Therefore, the ability of computational systems to easily adapt to new unseen situations is becoming more and more important.

In this thesis, we explore the task of Relation Extraction (RE) from a cross-domain perspective, in order to push the boundaries of model robustness across domains of application. RE is a key task in the automatic extraction of structured information from unstructured text. The goal of RE is the extraction of semantic triplets where two entities mentioned in the input text are connected by a semantic relation. The main challenge to the robustness of RE across domains is that depending on the downstream application the relevant information to extract differs (i.e., the entities and the types of semantic connections between them).

The work of this thesis covers the whole experimental pipeline for RE: First, given the lack of previous work in cross-domain RE, we outline several challenges characterizing the research area, from the scarcity of available resources for studying cross-domain RE, to the lack of standards in annotation guidelines and experimental settings. Second, to address the aforementioned challenges, we describe the creation of CrossRE, a multi-domain dataset for RE in English, and its subsequent expansion to 26 languages. Third, we propose two methodologies to boost the performance of RE in this multi-domain setup. Last, we present two frameworks for the analysis of the RE pipeline in terms of model performance and presence of socio-demographic biases.
Original languageEnglish
PublisherIT-Universitetet i København
Number of pages296
Publication statusPublished - 2024
SeriesITU-DS
Number224
ISSN1602-3536

Fingerprint

Dive into the research topics of 'Cross-domain Relation Extraction'. Together they form a unique fingerprint.

Cite this