Description
平台(如果交叉编译请再附上交叉编译目标平台):
Platform(Include target platform as well if cross-compiling):
ubuntu 20.04 cuda
使用最新的3.0 MNN版本导出qwen2.5-0.5b模型,4bit量化正常,8bit量化输出乱码【无论是否修改"precision": "fp16"】。
########### 4bit ############
python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 4 --mnnconvert mnn/build/MNNConvert
./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/qwen2.5_0.5b_instruct_mnn/config.json
Can't open file:.tempcache
Load Cache file error.
is_single_ = 1
load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/qwen2.5_0.5b_instruct_mnn/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2222.191162 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 249.036011 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms
Q: hi
A: Hello! How can I assist you today? Is there something specific you would like to know or discuss about anything in particular? I'm here to help answer questions and provide information on various topics. Please feel free to ask me any questions, and I'll do my best to help you.
############# 8bit ################
python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export mnn --dst_path mnn-output/qwen2.5_0.5b_instruct_mnn --quant_bit 8 --mnnconvert mnn/build/MNNConvert
./mnn/build/llm_demo mnn-output/qwen2.5_0.5b_instruct_mnn/config.json 【无论是否修改"precision": "fp16"】
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
config path is mnn-output/basemodel_0.5b_instruct_q88_300/config.json
Can't open file:.tempcache
Load Cache file error.
is_single_ = 1
load tokenizer
tokenizer_type = 3
load tokenizer Done
load mnn-output/basemodel_0.5b_instruct_q88_300/llm.mnn ... Load Module Done!
Clone Decode Module Done!
main, 180, cost time: 2159.822021 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1070 - Total: 1070, rate = 1.000000
main, 184, cost time: 246.123016 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 188, cost time: 0.010000 ms
Q: hi
A: s
p
-ho P.
O