[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. #2064

Open
Jintao-Huang opened this issue Sep 18, 2024 · 19 comments
Labels
good first issue Good for newcomers

Comments

@Jintao-Huang
Copy link
Collaborator
Jintao-Huang commented Sep 18, 2024

More docs:

qwen2-vl: https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

qwen1.5: https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

我们使用ms-swift对qwen2.5和qwen2-vl进行自我认知微调和图像OCR微调,并对微调后的模型进行推理。

在开始微调之前,请确保您的环境已正确安装

# 安装ms-swift.
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .[llm]

# qwen2-vl
# https://github.com/QwenLM/Qwen2-VL/issues/96
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
# vllm加速
pip install vllm>=0.6.1

通常,大模型微调通常使用自定义数据集进行微调。在这里,我们将展示可直接运行的demo。

qwen2.5-72b-instruct

我们对Qwen2.5-72B-Instruct进行自我认知微调。

自我认知数据集:https://www.modelscope.cn/datasets/swift/self-cognition

通用混合数据集:

微调脚本:

# 实验环境:4 * A100
# 显存占用:4 * 70GB
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
    --model_type qwen2_5-72b-instruct \
    --model_id_or_path qwen/Qwen2.5-72B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant." \
    --deepspeed default-zero3

# 单卡A10/3090可运行的例子 (Qwen2.5-7B-Instruct)
# 显存占用:24GB
CUDA_VISIBLE_DEVICES=0 swift sft \
    --model_type qwen2_5-7b-instruct \
    --model_id_or_path qwen/Qwen2.5-7B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --max_length 2048 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant."

自定义数据集文档可以查看:https://github.com/modelscope/ms-swift/blob/main/docs/source/Instruction/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md

微调显存消耗:

image

微调过程的loss可视化:

image

微调后推理脚本如下,这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速:

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --infer_backend vllm --max_model_len 8192 \
    --tensor_parallel_size 4

微调后模型对验证集进行推理的示例:

image

qwen2-vl-72b-instruct

我们对Qwen2-VL-72B-Instruct进行OCR微调。Grouding任务和视频微调的例子可以查看ms-swift文档:https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

微调数据集:https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR
微调脚本:

# 实验环境:8 * A100
SIZE_FACTOR=8 MAX_PIXELS=602112 \
NPROC_PER_NODE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000 \
  --deepspeed default-zero3

如果要使用自定义数据集,只需按以下方式进行指定:

# val_dataset可选,如果不指定,则会从dataset中切出一部分数据集作为验证集
  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集格式:

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "<image><image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}

微调显存消耗:

image

微调过程的loss可视化:(由于时间原因,这里只微调了250个steps)

image

微调后推理脚本如下,这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速:

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --load_dataset_config true

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --load_dataset_config true --infer_backend vllm \
    --tensor_parallel_size 4 --max_model_len 16384

微调后模型对验证集进行推理的示例:

image

@llp1992
Copy link
llp1992 commented Sep 19, 2024

qwen2-vl支持多图多伦对话训练吗?

@etemiz
Copy link
etemiz commented Sep 19, 2024

can I train 72b with 2A6000? (248GB)

@Jintao-Huang
Copy link
Collaborator Author

qwen2-vl支持多图多伦对话训练吗?

支持的

@Jintao-Huang
Copy link
Collaborator Author

can I train 72b with 2_A6000? (2_48GB)

maybe qlora

# GPU Memory: 2 * 28GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct-gptq-int4 \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 \
  --sft_type lora \
  --dataset latex-ocr-print#20000

@Jintao-Huang
Copy link
Collaborator Author

lora & device_map

# GPU Memory: 2 * 75GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000

@ZhuJD-China
Copy link

how to train qwen2_5-72b-instruct with 4090(24GB)*8 ? CUDA out of memory.

@xuezc
Copy link
xuezc commented Sep 26, 2024

请问在A100上训练速度怎么样

@MuyeHuang
Copy link

qwen2-vl支持多图多伦对话训练吗?

支持的

请问这里的多轮多图对话训练,每次assistant回复的内容都会参与loss计算,还是只有最后一条assistant回复的内容会参与loss计算呢

@Jintao-Huang
Copy link
Collaborator Author

都会计算的

@Wangman1
Copy link

请问支持 qwen2-vl 的 pretrain 吗

@Labmem009
Copy link

读取数据后,直接停止训练,也没有报错:

@MuyeHuang
Copy link

请问:对轮多图对话中,{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
中的image_path1", "image_path2"顺序是按照history中0,1,2···最后是query,这样的顺序排序的对吗?(即对话中图片出现的次序)

@dhhcj1
Copy link
dhhcj1 commented Oct 24, 2024

能否给一个调用的示例啊? 只有部署的

@zhangfanTJU
Copy link

910b可以吗

@liujiachang
Copy link

按照示例,无法加载url形式的mp4文件,网络是没有问题的,本地视频为wget直接下载下来的。

<<< <video>描述视频
Input a video path or URL <<< baby.mp4
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
[INFO:swift] Setting min_frames: 4. You can adjust this hyperparameter through the environment variable: `MIN_FRAMES`.
[INFO:swift] Setting max_frames: 768. You can adjust this hyperparameter through the environment variable: `MAX_FRAMES`.
[INFO:swift] Setting min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Setting total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `TOTAL_PIXELS`.
[INFO:swift] Setting max_pixels: None. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[W compiler_depend.ts:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
/root/.local/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:349: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
  attention_mask[..., cu_seqlens[i - 1] : cu_seqlens[i], cu_seqlens[i - 1] : cu_seqlens[i]] = True
视频中展示了一个小孩在玩书。她穿着一件浅蓝色的背心和粉色的裤子,戴着一副黑色的眼镜,坐在床上,手里拿着一本打开的书。她先是用右手翻动书页,然后用左手扶着书,右手继续翻动书页。
--------------------------------------------------
<<< <video>描述视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 280, in load_video_qwen2
    video, _, info = io.read_video(
  File "/root/.local/lib/python3.10/site-packages/torchvision/io/video.py", line 271, in read_video
    raise RuntimeError(f"File not found: {filename}")
RuntimeError: File not found: https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4

而且我将文件换成我本地的其他视频文件,同样会报错。

<<< <video>描述视频
Input a video path or URL <<< test.mp4
moov atom not found
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 293, in load_video_qwen2
    nframes = video.size(0) / info['video_fps'] * fps
KeyError: 'video_fps'

@junwenxiong
Copy link

确实可以训练,但碰到这个问题,发现是 gradient_checkpointing设置为了true导致的,可能是代码库还没有进行修改,来匹配qwen2-vl。
解决方案就是设置gradient_checkpointing=false就行了

image

image

@Labmem009
Copy link

微调VL的时候,image_path可以使用url吗

@fclearner
Copy link

大佬,请教个问题,72B-bf16的理论显存不该是72*2字节,也就是144GB吗,为啥你这里只用了70GB,是上了int4?还是有我不了解的处理机制在里面。

@bang123-box
Copy link

请问支持 qwen2-vl 的 pretrain 吗

我也想问,老哥找到qwen2-vl预训练的代码了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests