Add Deepseek R1 Distill Llama 8B/70B configs #1263

wizeng23 · 2025-01-27T22:06:39Z

Description

For train/eval/infer. Qwen configs to follow. Ran all configs to verify they work.

Towards OPE-936

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

linear · 2025-01-27T22:06:41Z

wizeng23 added 2 commits January 27, 2025 11:01

Add Deepseek R1 Distill Llama 8B train configs

a718619

Add Llama 70B configs

a8a3b15

wizeng23 requested review from oelachqar, taenin and nikg4 January 27, 2025 22:06

wizeng23 added 3 commits January 27, 2025 14:07

a

7c953f2

a

f43017c

merge main

a30a956

wizeng23 marked this pull request as ready for review January 27, 2025 23:09

oelachqar approved these changes Jan 27, 2025

View reviewed changes

a

792627f

wizeng23 merged commit 357df73 into main Jan 27, 2025
1 check passed

wizeng23 deleted the wizeng/deepseek branch January 27, 2025 23:22