8000 [NEW Model] add jamba by JunnYu · Pull Request #8517 · PaddlePaddle/PaddleNLP · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[NEW Model] add jamba #8517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Aug 19, 2024
Merged

[NEW Model] add jamba #8517

merged 38 commits into from
Aug 19, 2024

Conversation

JunnYu
Copy link
Member
@JunnYu JunnYu commented May 31, 2024

PR types

New features

PR changes

Models

Description

  • 新增 ddp_find_unused_parameters 参数,与huggingface的逻辑保持一致。
  • 修复 unified ckpt的时候 保存_tied_weights_keys的错误
  • 新增 warning_once,log时候可以只打印一次。
  • 新增jamba, DP训练需要开启ddp_find_unused_parameters True, 纯TP的时候不需要添加, 目前写法sharding存在问题。
    --max_grad_norm 1 --bf16 1 --fp16_opt_level O2 --tensor_parallel_degree 4 --recompute 1 --recompute_use_reentrant 0 --lora 1 --max_length 4096

Jamba: A Hybrid Transformer-Mamba Language Model
huggingface仓库
huggingface_hub权重

run.py代码

import paddle
from paddlenlp.transformers import JambaForCausalLM, JambaConfig, JambaTokenizer
from paddlenlp.trainer import TrainingArguments, PdArgumentParser
from paddlenlp.transformers.configuration_utils import LlmMetaConfig

parser = PdArgumentParser((TrainingArguments,))
training_args = parser.parse_args_into_dataclasses()[0]

model_name_or_path = "ai21labs/Jamba-v0.1"
dtype = "bfloat16"

tokenizer = JambaTokenizer.from_pretrained(model_name_or_path)
config = JambaConfig.from_pretrained(
    model_name_or_path,
    dtype=dtype,
)
LlmMetaConfig.set_llm_config(config, training_args)

model = JambaForCausalLM.from_pretrained(model_name_or_path, config=config, low_cpu_mem_usage=True)
model.eval()

prompt = "In the recent Super Bowl LVIII, "
input_ids = tokenizer(prompt, return_tensors='pd').input_ids

with paddle.no_grad():
    outputs = model.generate(input_ids=input_ids, max_new_tokens=512)
    for e in tokenizer.batch_decode(outputs[0], skip_special_tokens=True):
        print(prompt + e)
        print('-'*100)

2卡80G启动。

python -u -m paddle.distributed.launch --gpus "6,7" run.py --output_dir debug --tensor_parallel_degree 2

4卡40G启动。

python -u -m paddle.distributed.launch --gpus "4,5,6,7" run.py --output_dir debug --tensor_parallel_degree 4

Copy link
paddle-bot bot commented May 31, 2024

Thanks for your contribution!

Copy link
codecov bot commented Jun 3, 2024

Codecov Report

Attention: Patch coverage is 74.01848% with 225 lines in your changes missing coverage. Please review.

Project coverage is 55.03%. Comparing base (12107af) to head (0a75927).
Report is 219 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/jamba/modeling.py 72.87% 217 Missing ⚠️
paddlenlp/trainer/trainer.py 0.00% 7 Missing ⚠️
paddlenlp/utils/llm_utils.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8517      +/-   ##
===========================================
+ Coverage    54.96%   55.03%   +0.07%     
===========================================
  Files          646      646              
  Lines       103133   101970    -1163     
===========================================
- Hits         56687    56124     -563     
+ Misses       46446    45846     -600     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +345 to +347
ddp_find_unused_parameters (`bool`, *optional*):
When using distributed training, the value of the flag `find_unused_parameters` passed to
`paddle.DataParallel`. Will default to `False` if recompute is used, `True` otherwise.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增 ddp_find_unused_parameters 参数,与huggingface的逻辑保持一致。

Copy link
Collaborator
@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit 0ec78aa into PaddlePaddle:develop Aug 19, 2024
8 of 12 checks passed
Mangodadada pushed a commit to Mangodadada/PaddleNLP that referenced this pull request Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0