ITU

Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data

Research output: Contribution to conference - NOT published in proceeding or journalConference abstract for conferenceResearchpeer-review

Standard

Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data. / Braggaar, Anouck; van der Goot, Rob.

2021. Abstract from RESOURCEFUL-2020
, Gothenburg, Sweden.

Research output: Contribution to conference - NOT published in proceeding or journalConference abstract for conferenceResearchpeer-review

Harvard

APA

Vancouver

Author

Bibtex

@conference{61dcfef27d024ef0b490550df7b09123,
title = "Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data",
abstract = "This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian codeswitch utterances into Universal Dependencies. We make use of data from the FAME!corpus, which consists of transcriptions andaudio data. Besides the usual annotation difficulties, this dataset is extra challenging because of Frisian being low-resource, the informal nature of the data, code-switching andnon-standard sentence segmentation. As astarting point, two annotators annotated 150random utterances in three stages of 50 utterances. After each stage, disagreements wherediscussed and resolved. An increase of 7.8UAS and 10.5 LAS points was achieved between the first and third round. This paper willfocus on the issues that arise when annotatinga transcribed speech corpus. To resolve theseissues several solutions are proposed.",
author = "Anouck Braggaar and {van der Goot}, Rob",
year = "2021",
month = sep,
day = "25",
language = "English",
note = "RESOURCEFUL-2020<br/> : RESOURCEs and representations For Under-resourced Languages and domains, RESOURCEFUL ; Conference date: 25-11-2020",
url = "https://gu-clasp.github.io/resourceful-2020/",

}

RIS

TY - ABST

T1 - Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data

AU - Braggaar, Anouck

AU - van der Goot, Rob

PY - 2021/9/25

Y1 - 2021/9/25

N2 - This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian codeswitch utterances into Universal Dependencies. We make use of data from the FAME!corpus, which consists of transcriptions andaudio data. Besides the usual annotation difficulties, this dataset is extra challenging because of Frisian being low-resource, the informal nature of the data, code-switching andnon-standard sentence segmentation. As astarting point, two annotators annotated 150random utterances in three stages of 50 utterances. After each stage, disagreements wherediscussed and resolved. An increase of 7.8UAS and 10.5 LAS points was achieved between the first and third round. This paper willfocus on the issues that arise when annotatinga transcribed speech corpus. To resolve theseissues several solutions are proposed.

AB - This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian codeswitch utterances into Universal Dependencies. We make use of data from the FAME!corpus, which consists of transcriptions andaudio data. Besides the usual annotation difficulties, this dataset is extra challenging because of Frisian being low-resource, the informal nature of the data, code-switching andnon-standard sentence segmentation. As astarting point, two annotators annotated 150random utterances in three stages of 50 utterances. After each stage, disagreements wherediscussed and resolved. An increase of 7.8UAS and 10.5 LAS points was achieved between the first and third round. This paper willfocus on the issues that arise when annotatinga transcribed speech corpus. To resolve theseissues several solutions are proposed.

M3 - Conference abstract for conference

T2 - RESOURCEFUL-2020<br/>

Y2 - 25 November 2020

ER -

ID: 85929423