【inference】support load or save Llama2-7b in three patterns #8712

lizexu123 · 2024-07-04T09:01:25Z

PR types

New features

PR changes

Others

Description

补充:

导出pdmodel:

python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --output_path ./inference --dtype float16

导出json:

FLAGS_enable_pir_api=1 python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --output_path ./inference --dtype float16

支持三种模式跑

跑Llama2-7b.pdmodel +旧ir

python ./predict/predictor.py --model_name_or_path ./llama2-7b --dtype float16 --mode static

跑Llama2-7b.pdmodel +pir

FLAGS_enable_pir_in_executor=1 python ./predict/predictor.py --model_name_or_path ./llama2-7b --dtype float16 --mode static

跑Llama2-7b.json +pir

FLAGS_enable_pir_api=1 python ./predict/predictor.py --model_name_or_path ./inference --dtype float16 --mode static
如需测试保存优化后的模型再推理，设置inference_config.use_optimized_model(True)，第一次执行会先保存优化后的模型，第二次执行，会直接加载优化后的模型进行推理

paddle-bot · 2024-07-04T09:01:30Z

Thanks for your contribution!

codecov · 2024-07-04T09:38:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.73%. Comparing base (6d464bf) to head (de3ae3a).
Report is 219 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8712      +/-   ##
===========================================
- Coverage    55.74%   55.73%   -0.01%     
===========================================
  Files          623      623              
  Lines        97456    97459       +3     
===========================================
  Hits         54323    54323              
- Misses       43133    43136       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome · 2024-07-04T11:46:15Z

llm/predict/predictor.py

-        inference_config.disable_glog_info()
+        # inference_config.disable_glog_info()


yuanlehome · 2024-07-04T11:46:22Z

llm/predict/predictor.py

        inference_config.enable_new_executor()
+


yuanlehome · 2024-07-04T11:46:31Z

llm/predict/predictor.py

+        # if use optimized_model to inference
+        # inference_config.use_optimized_model(True)


yuanlehome · 2024-07-04T11:46:36Z

llm/predict/predictor.py

@@ -372,7 +379,6 @@ def __init__(self, config: PredictorArgument, tokenizer: PretrainedTokenizer = N
    def _preprocess(self, input_text: str | list[str]):
        inputs = super()._preprocess(input_text)
        inputs["max_new_tokens"] = np.array(self.config.max_length, dtype="int64")
-


wawltor

LGTM

fix

e0a7f79

fix

d557f18

8000

删除注释

23a9f3f

yuanlehome reviewed Jul 4, 2024

View reviewed changes

lizexu123 added 2 commits July 4, 2024 11:54

恢复+删除

d11f394

删除+恢复

de3ae3a

yuanlehome approved these changes Jul 4, 2024

View reviewed changes

wawltor approved these changes Jul 5, 2024

View reviewed changes

wawltor merged commit d8ddba9 into PaddlePaddle:develop Jul 5, 2024
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【inference】support load or save Llama2-7b in three patterns #8712

【inference】support load or save Llama2-7b in three patterns #8712

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		inference_config.disable_glog_info()
		# inference_config.disable_glog_info()

		# if use optimized_model to inference
		# inference_config.use_optimized_model(True)

【inference】support load or save Llama2-7b in three patterns #8712

【inference】support load or save Llama2-7b in three patterns #8712

Uh oh!

Conversation

Uh oh!

PR types

PR changes

Description

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!