Calibrating Large Language Models Using Their Generations Only

Dennis Thomas Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

As large language models (LLMs) are increasingly deployed in user-facing applications, building trust and maintaining safety by accurately quantifying a model’s confidence in its prediction becomes even more important. However, finding effective ways to calibrate LLMs—especially when the only interface to the models is their generated text—remains a challenge. We propose APRICOT (Auxiliary prediction of confidence targets): A method to set confidence targets and train an additional model that predicts an LLM’s confidence based on its textual input and output alone. This approach has several advantages: It is conceptually simple, does not require access to the target model beyond its output, does not interfere with the language generation, and has a multitude of potential usages, for instance by verbalizing the predicted confidence or using it to re-prompting the LLM to accurately reflecting its uncertainty. We show how our approach performs competitively in terms of calibration error for white-box and black-box LLMs on closed-book question-answering to detect incorrect LLM answers.
OriginalsprogEngelsk
TitelProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics
RedaktørerLun-Wei Ku, Andre Martins, Vivek Srikumar
Vol/bindVolume 1: Long Papers
UdgivelsesstedBangkok
ForlagAssociation for Computational Linguistics
Publikationsdatoaug. 2024
Sider15440–15459
DOI
StatusUdgivet - aug. 2024

Fingeraftryk

Dyk ned i forskningsemnerne om 'Calibrating Large Language Models Using Their Generations Only'. Sammen danner de et unikt fingeraftryk.

Citationsformater