Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are open-sourced in a software package.
OriginalsprogEngelsk
TitelFindings of 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Publikationsdato7 dec. 2022
DOI
StatusUdgivet - 7 dec. 2022
BegivenhedEmpirical Methods in Natural Language Processing - Abu Dhabi National Exhibition Center (ADNEC), Abu Dhabi, United Arab Emirates
Varighed: 7 dec. 202211 dec. 2022
https://2022.emnlp.org/

Konference

KonferenceEmpirical Methods in Natural Language Processing
LokationAbu Dhabi National Exhibition Center (ADNEC)
Land/OmrådeUnited Arab Emirates
ByAbu Dhabi
Periode07/12/202211/12/2022
Internetadresse

Emneord

  • Predictive confidence
  • Neural classifier
  • Low-resource languages
  • Uncertainty estimation
  • Pre-trained models
  • Ensembles
  • Data uncertainty
  • Model uncertainty
  • Sequence analysis
  • Open-source software

Fingeraftryk

Dyk ned i forskningsemnerne om 'Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity'. Sammen danner de et unikt fingeraftryk.

Citationsformater