Projektdetaljer
Beskrivelse
Despite unprecedented advances in Natural Language Understanding (NLU), our models still dreadfully lack the ability to generalize to conditions that are different from the ones encountered during training. Such adverse conditions include learning for noisy domains up to the extreme case of adaptation: new languages. Recent work on transfer learning offers great promise to remedy the problem, particularly Multi-Task Learning (MTL). MTL has been applied successfully across NLU. However, most of the work has limited scope: e.g., only sharing across a few tasks or domains, and typically considering a single language. Little is known on when which type of sharing is most beneficial, especially if we want to expedite NLU to dozens of languages or customer-specific domains. In this project, we focus on a core NLU problem, sequence tagging, and ask: How can we create the best sequence labelers at scale, under adverse conditions, if little to no annotated data exists? We propose to combine diverse sources of supervision to bridge the gap, while also learning what and how to successfully share in MTL, to derive a set of best practices and models that quickly scale to new conditions.
| Status | Afsluttet |
|---|---|
| Effektiv start/slut dato | 01/04/2019 → 31/08/2020 |
Samarbejdspartnere
- IT-Universitetet i København (leder)
- Amazon Development Center Aachen (Projektpartner)
Finansiering
- Amazon.com, Inc.: 524.000,00 kr.
Fingerprint
Udforsk forskningsemnerne, som dette projekt berører. Disse etiketter er oprettet på grundlag af de underliggende bevillinger/legater. Sammen danner de et unikt fingerprint.
-
Challenges in Annotating and Parsing Spoken, Code-switched, Frisian-Dutch Data
Braggaar, A. & van der Goot, R., apr. 2021, Proceedings of the Second Workshop on Domain Adaptation for NLP. Association for Computational Linguistics, s. 50-58Publikation: Konference artikel i Proceeding eller bog/rapport kapitel › Konferencebidrag i proceedings › Forskning › peer review
Åben adgangFil -
CL-MoNoise: Cross-lingual Lexical Normalization
van der Goot, R., okt. 2021, Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). Association for Computational Linguistics, s. 510 4 s.Publikation: Konference artikel i Proceeding eller bog/rapport kapitel › Konferencebidrag i proceedings › Forskning › peer review
Åben adgang -
Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data
Braggaar, A. & van der Goot, R., 25 sep. 2021.Publikation: Konferencebidrag - EJ publiceret i proceeding eller tidsskrift › Konferenceabstrakt til konference › Forskning › peer review
Åben adgang