[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Calibrating Long-form Generations From Large Language Models

Yukun Huang, Yixin Liu, Raghuveer Thirukovalluru, Arman Cohan, Bhuwan Dhingra


Abstract
To enhance Large Language Models’ (LLMs) reliability, calibration is essential—the model’s confidence scores should align with the likelihood of its responses being correct. However, traditional calibration methods typically rely on a binary true/false assessment of response correctness, unsuitable for long-form generations where an answer can be partially correct. Addressing this gap, we introduce a unified calibration framework, in which both the correctness of the LLMs’ responses and their associated confidence levels are treated as distributions across a range of scores. We develop three metrics for assessing LLM calibration and propose confidence elicitation methods based on self-consistency and self-evaluation. Our experiments demonstrate that larger models don’t necessarily guarantee better calibration, that various calibration metrics complement each other, and that self-consistency methods excel in factoid datasets. We also find that calibration can be enhanced through techniques such as fine-tuning, scaling the temperature. Finally, we illustrate one application of long-form calibration through selective answering in long-form responses, optimizing correctness within a constrained API budget.
Anthology ID:
2024.findings-emnlp.785
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13441–13460
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.785/
DOI:
10.18653/v1/2024.findings-emnlp.785
Bibkey:
Cite (ACL):
Yukun Huang, Yixin Liu, Raghuveer Thirukovalluru, Arman Cohan, and Bhuwan Dhingra. 2024. Calibrating Long-form Generations From Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13441–13460, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Calibrating Long-form Generations From Large Language Models (Huang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.785.pdf