Computer Science > Computation and Language

arXiv:2404.19553 (cs)

[Submitted on 30 Apr 2024]

Title:Extending Llama-3's Context Ten-Fold Overnight

Authors:Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou

View PDF

Abstract:We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: \url{this https URL}.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2404.19553 [cs.CL]
	(or arXiv:2404.19553v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.19553

Submission history

From: Peitian Zhang [view email]
[v1] Tue, 30 Apr 2024 13:25:20 UTC (38 KB)

Computer Science > Computation and Language

Title:Extending Llama-3's Context Ten-Fold Overnight

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Extending Llama-3's Context Ten-Fold Overnight

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators