Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. #2064

Jintao-Huang · 2024-09-18T15:35:24Z

More docs:

qwen2-vl: https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

qwen1.5: https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

我们使用ms-swift对qwen2.5和qwen2-vl进行自我认知微调和图像OCR微调，并对微调后的模型进行推理。

在开始微调之前，请确保您的环境已正确安装

# 安装ms-swift.
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .[llm]

# qwen2-vl
# https://github.com/QwenLM/Qwen2-VL/issues/96
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
# vllm加速
pip install vllm>=0.6.1

通常，大模型微调通常使用自定义数据集进行微调。在这里，我们将展示可直接运行的demo。

qwen2.5-72b-instruct

我们对Qwen2.5-72B-Instruct进行自我认知微调。

自我认知数据集：https://www.modelscope.cn/datasets/swift/self-cognition

通用混合数据集：

微调脚本:

# 实验环境：4 * A100
# 显存占用：4 * 70GB
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
    --model_type qwen2_5-72b-instruct \
    --model_id_or_path qwen/Qwen2.5-72B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant." \
    --deepspeed default-zero3

# 单卡A10/3090可运行的例子 （Qwen2.5-7B-Instruct）
# 显存占用：24GB
CUDA_VISIBLE_DEVICES=0 swift sft \
    --model_type qwen2_5-7b-instruct \
    --model_id_or_path qwen/Qwen2.5-7B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --max_length 2048 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant."

自定义数据集文档可以查看：https://github.com/modelscope/ms-swift/blob/main/docs/source/Instruction/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md

微调显存消耗：

微调过程的loss可视化：

微调后推理脚本如下，这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速：

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --infer_backend vllm --max_model_len 8192 \
    --tensor_parallel_size 4

微调后模型对验证集进行推理的示例：

qwen2-vl-72b-instruct

我们对Qwen2-VL-72B-Instruct进行OCR微调。Grouding任务和视频微调的例子可以查看ms-swift文档：https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

微调数据集：https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR
微调脚本：

# 实验环境：8 * A100
SIZE_FACTOR=8 MAX_PIXELS=602112 \
NPROC_PER_NODE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000 \
  --deepspeed default-zero3

如果要使用自定义数据集，只需按以下方式进行指定：

# val_dataset可选，如果不指定，则会从dataset中切出一部分数据集作为验证集
  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集格式：

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "<image><image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}

微调显存消耗：

微调过程的loss可视化：（由于时间原因，这里只微调了250个steps）

微调后推理脚本如下，这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速：

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --load_dataset_config true

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --load_dataset_config true --infer_backend vllm \
    --tensor_parallel_size 4 --max_model_len 16384

微调后模型对验证集进行推理的示例：

llp1992 · 2024-09-19T01:55:54Z

qwen2-vl支持多图多伦对话训练吗？

etemiz · 2024-09-19T04:23:39Z

can I train 72b with 2A6000? (248GB)

Jintao-Huang · 2024-09-19T15:36:51Z

qwen2-vl支持多图多伦对话训练吗？

支持的

Jintao-Huang · 2024-09-19T15:43:15Z

can I train 72b with 2_A6000? (2_48GB)

maybe qlora

# GPU Memory: 2 * 28GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct-gptq-int4 \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 \
  --sft_type lora \
  --dataset latex-ocr-print#20000

Jintao-Huang · 2024-09-19T15:46:15Z

lora & device_map

# GPU Memory: 2 * 75GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000

ZhuJD-China · 2024-09-25T01:51:57Z

how to train qwen2_5-72b-instruct with 4090(24GB)*8 ？ CUDA out of memory.

xuezc · 2024-09-26T01:25:52Z

请问在A100上训练速度怎么样

MuyeHuang · 2024-10-10T05:54:02Z

qwen2-vl支持多图多伦对话训练吗？

支持的

请问这里的多轮多图对话训练，每次assistant回复的内容都会参与loss计算，还是只有最后一条assistant回复的内容会参与loss计算呢

Jintao-Huang · 2024-10-10T08:22:32Z

都会计算的

Wangman1 · 2024-10-11T03:57:59Z

请问支持 qwen2-vl 的 pretrain 吗

Labmem009 · 2024-10-18T11:52:26Z

读取数据后，直接停止训练，也没有报错：

MuyeHuang · 2024-10-23T05:29:34Z

请问：对轮多图对话中，{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
中的image_path1", "image_path2"顺序是按照history中0，1，2···最后是query，这样的顺序排序的对吗？（即对话中图片出现的次序）

dhhcj1 · 2024-10-24T09:28:13Z

能否给一个调用的示例啊？只有部署的

zhangfanTJU · 2024-10-30T02:06:56Z

910b可以吗

liujiachang · 2024-10-30T05:44:18Z

按照示例，无法加载url形式的mp4文件，网络是没有问题的，本地视频为wget直接下载下来的。

<<< <video>描述视频
Input a video path or URL <<< baby.mp4
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
[INFO:swift] Setting min_frames: 4. You can adjust this hyperparameter through the environment variable: `MIN_FRAMES`.
[INFO:swift] Setting max_frames: 768. You can adjust this hyperparameter through the environment variable: `MAX_FRAMES`.
[INFO:swift] Setting min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Setting total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `TOTAL_PIXELS`.
[INFO:swift] Setting max_pixels: None. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[W compiler_depend.ts:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
/root/.local/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:349: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
  attention_mask[..., cu_seqlens[i - 1] : cu_seqlens[i], cu_seqlens[i - 1] : cu_seqlens[i]] = True
视频中展示了一个小孩在玩书。她穿着一件浅蓝色的背心和粉色的裤子，戴着一副黑色的眼镜，坐在床上，手里拿着一本打开的书。她先是用右手翻动书页，然后用左手扶着书，右手继续翻动书页。
--------------------------------------------------
<<< <video>描述视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 280, in load_video_qwen2
    video, _, info = io.read_video(
  File "/root/.local/lib/python3.10/site-packages/torchvision/io/video.py", line 271, in read_video
    raise RuntimeError(f"File not found: {filename}")
RuntimeError: File not found: https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4

而且我将文件换成我本地的其他视频文件，同样会报错。

<<< <video>描述视频
Input a video path or URL <<< test.mp4
moov atom not found
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 293, in load_video_qwen2
    nframes = video.size(0) / info['video_fps'] * fps
KeyError: 'video_fps'

junwenxiong · 2024-11-01T03:12:05Z

确实可以训练，但碰到这个问题，发现是 gradient_checkpointing设置为了true导致的，可能是代码库还没有进行修改，来匹配qwen2-vl。
解决方案就是设置gradient_checkpointing=false就行了

Labmem009 · 2024-11-05T06:52:34Z

微调VL的时候，image_path可以使用url吗

fclearner · 2024-11-05T08:39:17Z

大佬，请教个问题，72B-bf16的理论显存不该是72*2字节，也就是144GB吗，为啥你这里只用了70GB，是上了int4?还是有我不了解的处理机制在里面。

bang123-box · 2024-11-12T09:29:46Z

请问支持 qwen2-vl 的 pretrain 吗

我也想问，老哥找到qwen2-vl预训练的代码了吗

Jintao-Huang mentioned this issue Sep 18, 2024

🎉Support for finetuning of Qwen2-VL-Chat series models #1857

Open

Jintao-Huang added good first issue Good for newcomers labels Sep 19, 2024

Jintao-Huang mentioned this issue Sep 19, 2024

🍭Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. QwenLM/Qwen2-VL#227

Open

Digital2Slave mentioned this issue Sep 20, 2024

V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型，单卡测试脚本运行正常，双卡，三卡，四卡运行异常。 #2087

Closed

This was referenced Sep 23, 2024

请问在padding media offset的时候用的数字是多少呢？ #2092

Closed

用deploy部署qwen2vl，多个请求同时并发报错 #1961

Closed

Jintao-Huang pinned this issue Sep 26, 2024

Jintao-Huang mentioned this issue Sep 29, 2024

GOT-OCR自定义数据 #2148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. #2064

Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. #2064

Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. #2064

Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. #2064

Comments

qwen2.5-72b-instruct

qwen2-vl-72b-instruct