Computer Science > Computation and Language

arXiv:2404.10346 (cs)

[Submitted on 16 Apr 2024 (v1), last revised 3 Oct 2024 (this version, v4)]

Title:Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards

Authors:Hyeonbin Hwang, Doyoung Kim, Seungone Kim, Seonghyeon Ye, Minjoon Seo

Abstract:Training on large amounts of rationales (i.e., CoT Fine-tuning) is effective at improving the reasoning capabilities of large language models (LLMs). However, acquiring human-authored rationales or augmenting rationales from proprietary models is costly and not scalable. In this paper, we study the problem of whether LLMs could self-improve their reasoning capabilities. To this end, we propose Self-Explore, where the LLM is tasked to explore the first wrong step (i.e., the first pit) within the rationale and use such signals as fine-grained rewards for further improvement. On the GSM8K and MATH test set, Self-Explore achieves 11.57% and 2.89% improvement on average across three LLMs compared to supervised fine-tuning (SFT). Our code is available at this https URL.

Comments:	EMNLP Findings 2024 Camera Ready
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2404.10346 [cs.CL]
	(or arXiv:2404.10346v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.10346

Submission history

From: Hyeonbin Hwang [view email]
[v1] Tue, 16 Apr 2024 07:30:11 UTC (1,747 KB)
[v2] Mon, 6 May 2024 14:25:04 UTC (1,748 KB)
[v3] Thu, 16 May 2024 13:47:00 UTC (1,746 KB)
[v4] Thu, 3 Oct 2024 03:55:41 UTC (1,748 KB)

Computer Science > Computation and Language

Title:Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators