Computer Science > Machine Learning

arXiv:2409.19817 (cs)

[Submitted on 29 Sep 2024]

Title:Calibrating Language Models with Adaptive Temperature Scaling

Authors:Johnathan Xie, Annie S. Chen, Yoonho Lee, Eric Mitchell, Chelsea Finn

View PDF

Abstract:The effectiveness of large language models (LLMs) is not only measured by their ability to generate accurate outputs but also by their calibration-how well their confidence scores reflect the probability of their outputs being correct. While unsupervised pre-training has been shown to yield LLMs with well-calibrated conditional probabilities, recent studies have shown that after fine-tuning with reinforcement learning from human feedback (RLHF), the calibration of these models degrades significantly. In this work, we introduce Adaptive Temperature Scaling (ATS), a post-hoc calibration method that predicts a temperature scaling parameter for each token prediction. The predicted temperature values adapt based on token-level features and are fit over a standard supervised fine-tuning (SFT) dataset. The adaptive nature of ATS addresses the varying degrees of calibration shift that can occur after RLHF fine-tuning. ATS improves calibration by over 10-50% across three downstream natural language evaluation benchmarks compared to prior calibration methods and does not impede performance improvements from RLHF.

Comments:	EMNLP 2024
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2409.19817 [cs.LG]
	(or arXiv:2409.19817v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.19817

Submission history

From: Annie Chen [view email]
[v1] Sun, 29 Sep 2024 22:54:31 UTC (453 KB)

Computer Science > Machine Learning

Title:Calibrating Language Models with Adaptive Temperature Scaling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Calibrating Language Models with Adaptive Temperature Scaling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators