-
Notifications
You must be signed in to change notification settings - Fork 3k
[NEW Model] add jamba #8517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW Model] add jamba #8517
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8517 +/- ##
===========================================
+ Coverage 54.96% 55.03% +0.07%
===========================================
Files 646 646
Lines 103133 101970 -1163
===========================================
- Hits 56687 56124 -563
+ Misses 46446 45846 -600 ☔ View full report in Codecov by Sentry. |
ddp_find_unused_parameters (`bool`, *optional*): | ||
When using distributed training, the value of the flag `find_unused_parameters` passed to | ||
`paddle.DataParallel`. Will default to `False` if recompute is used, `True` otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增 ddp_find_unused_parameters 参数,与huggingface的逻辑保持一致。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Add jamba
PR types
New features
PR changes
Models
Description
--max_grad_norm 1 --bf16 1 --fp16_opt_level O2 --tensor_parallel_degree 4 --recompute 1 --recompute_use_reentrant 0 --lora 1 --max_length 4096
Jamba: A Hybrid Transformer-Mamba Language Model
huggingface仓库
huggingface_hub权重
run.py代码
2卡80G启动。
python -u -m paddle.distributed.launch --gpus "6,7" run.py --output_dir debug --tensor_parallel_degree 2
4卡40G启动。
python -u -m paddle.distributed.launch --gpus "4,5,6,7" run.py --output_dir debug --tensor_parallel_degree 4