Computer Science > Computation and Language

arXiv:2404.07965 (cs)

[Submitted on 11 Apr 2024 (v1), last revised 7 Dec 2024 (this version, v3)]

Title:Rho-1: Not All Tokens Are What You Need

Authors:Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

View PDF HTML (experimental)

Abstract:Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''9l training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when continual pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training.

Comments:	First two authors equal contribution
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.07965 [cs.CL]
	(or arXiv:2404.07965v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.07965

Submission history

From: Zhibin Gou [view email]
[v1] Thu, 11 Apr 2024 17:52:01 UTC (507 KB)
[v2] Thu, 23 May 2024 06:57:07 UTC (523 KB)
[v3] Sat, 7 Dec 2024 04:48:03 UTC (523 KB)

Computer Science > Computation and Language

Title:Rho-1: Not All Tokens Are What You Need

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Rho-1: Not All Tokens Are What You Need

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators