8000 Qwen2.5-0.5B-Instruct 8bit量化 推理输出乱码 · Issue #3091 · alibaba/MNN · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Qwen2.5-0.5B-Instruct 8bit量化 推理输出乱码 #3091
Closed
@jfduma

Description

@jfduma

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

ubuntu 20.04 cuda

使用最新的3.0 MNN版本导出qwen2.5-0.5b模型,4bit量化正常,8bit量化输出乱码【无论是否修改"precision": "fp16"】。

########### 4bit ############
python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 4 --mnnconvert mnn/build/MNNConvert

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
Can't open file:.tempcache
Load Cache file error.

is_single_ = 1

load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/qwen2.5_0.5b_instruct_mnn/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2222.191162 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 249.036011 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms

Q: hi

A: Hello! How can I assist you today? Is there something specific you would like to know or discuss about anything in particular? I'm here to help answer questions and provide information on various topics. Please feel free to ask me any questions, and I'll do my best to help you.

############# 8bit ################

python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 8 --mnnconvert mnn/build/MNNConvert

./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json 【无论是否修改"precision": "fp16"】

The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/basemodel_0.5b_instruct_q88_300/config.json
Can't open file:.tempcache
Load Cache file error.

is_single_ = 1

load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/basemodel_0.5b_instruct_q88_300/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2159.822021 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 246.123016 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms

Q: hi

A: s

p

-ho P.

O

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0