Computer Science > Computation and Language

arXiv:2310.05782 (cs)

[Submitted on 9 Oct 2023 (v1), last revised 13 Jan 2024 (this version, v3)]

Title:Aligning Language Models with Human Preferences via a Bayesian Approach

Authors:Jiashuo Wang, Haozhao Wang, Shichao Sun, Wenjie Li

Abstract:In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as d-PM. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity "Rule-of-Thumb" generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.

Comments:	NeurIPS 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.05782 [cs.CL]
	(or arXiv:2310.05782v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.05782

Submission history

From: Wang Jiashuo [view email]
[v1] Mon, 9 Oct 2023 15:15:05 UTC (3,435 KB)
[v2] Fri, 22 Dec 2023 13:04:48 UTC (3,435 KB)
[v3] Sat, 13 Jan 2024 11:37:57 UTC (3,435 KB)

Computer Science > Computation and Language

Title:Aligning Language Models with Human Preferences via a Bayesian Approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Aligning Language Models with Human Preferences via a Bayesian Approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators