8000 运行低显存代码时 出现错误 · Issue #6 · Tencent-Hunyuan/HunyuanCustom · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

运行低显存代码时 出现错误 #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
colorAi opened this issue May 9, 2025 · 9 comments
Open

运行低显存代码时 出现错误 #6

colorAi opened this issue May 9, 2025 · 9 comments

Comments

@colorAi
Copy link
colorAi commented May 9, 2025

(base) root@H-TAO:/HunyuanCustom# conda activate HunyuanCustom
(HunyuanCustom) root@H-TAO:
/HunyuanCustom# cd HunyuanCustom

export MODEL_BASE="./models"
export CPU_OFFLOAD=1
export PYTHONPATH=./
python hymm_sp/sample_gpu_poor.py
--input './assets/images/seg_woman_01.png'
--pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café."
--neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border."
--ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt"
--video-size 720 1280
--seed 1024
--sample-n-frames 129
--infer-steps 30
--flow-shift-eval-video 13.0
--save-path './results/cpu_720p'
--use-fp8
--cpu-offload
-bash: cd: HunyuanCustom: No such file or directory
vae: cpu_offload=1, DISABLE_SP=0
text_encoder: cpu_offload=1
models: cpu_offload=1, DISABLE_SP=0
2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:59 - Got text-to-video model root path: ./models/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt
2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:67 - Building model...
========================= build model =========================
/root/HunyuanCustom/hymm_sp/modules/fp8_optimization.py:88: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
fp8_map = torch.load(fp8_map_path, map_location=lambda storage, loc: storage)['module']
==================== load transformer to cpu
/root/HunyuanCustom/hymm_sp/inference.py:144: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(ckpt_path, map_location=lambda storage, loc: storage)
========================= load vae =========================
2025-05-09 17:02:17.665 | INFO | hymm_sp.vae:load_vae:19 - Loading 3D VAE model (884-16c-hy0801) from: ./models/vae_3d/hyvae_v1_0801
/root/HunyuanCustom/hymm_sp/vae/init.py:25: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ckpt = torch.load(Path(vae_path) / "pytorch_model.pt", map_location=vae.device)
2025-05-09 17:02:20.046 | INFO | hymm_sp.vae:load_vae:42 - VAE to dtype: torch.float16
========================= load llava =========================
2025-05-09 17:02:20.053 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.84s/it]
2025-05-09 17:02:30.640 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16
2025-05-09 17:02:30.645 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
2025-05-09 17:02:30.959 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (clipL) from: ./models/openai_clip-vit-large-patch14
2025-05-09 17:02:31.308 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16
2025-05-09 17:02:31.310 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (clipL) from: ./models/openai_clip-vit-large-patch14
load hunyuan model successful...
Traceback (most recent call last):
File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 98, in
main()
File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 36, in main
from diffusers.hooks import apply_group_offloading
ModuleNotFoundError: No module named 'diffusers.hooks'
(HunyuanCustom) root@H-TAO:~/HunyuanCustom#

@colorAi
Copy link
Author
colorAi commented May 9, 2025

运行单GPU那个代码是可以跑的,就是非常慢几乎没有进度,在4090D上

@zhouzhengguang
Copy link
Collaborator

(base) root@H-TAO:/HunyuanCustom# conda activate HunyuanCustom (HunyuanCustom) root@H-TAO:/HunyuanCustom# cd HunyuanCustom

export M 8000 ODEL_BASE="./models" export CPU_OFFLOAD=1 export PYTHONPATH=./ python hymm_sp/sample_gpu_poor.py --input './assets/images/seg_woman_01.png' --pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café." --neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border." --ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt" --video-size 720 1280 --seed 1024 --sample-n-frames 129 --infer-steps 30 --flow-shift-eval-video 13.0 --save-path './results/cpu_720p' --use-fp8 --cpu-offload -bash: cd: HunyuanCustom: No such file or directory vae: cpu_offload=1, DISABLE_SP=0 text_encoder: cpu_offload=1 models: cpu_offload=1, DISABLE_SP=0 2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:59 - Got text-to-video model root path: ./models/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt 2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:67 - Building model... ========================= build model ========================= /root/HunyuanCustom/hymm_sp/modules/fp8_optimization.py:88: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. fp8_map = torch.load(fp8_map_path, map_location=lambda storage, loc: storage)['module'] ==================== load transformer to cpu /root/HunyuanCustom/hymm_sp/inference.py:144: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(ckpt_path, map_location=lambda storage, loc: storage) ========================= load vae ========================= 2025-05-09 17:02:17.665 | INFO | hymm_sp.vae:load_vae:19 - Loading 3D VAE model (884-16c-hy0801) from: ./models/vae_3d/hyvae_v1_0801 /root/HunyuanCustom/hymm_sp/vae/init.py:25: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. ckpt = torch.load(Path(vae_path) / "pytorch_model.pt", map_location=vae.device) 2025-05-09 17:02:20.046 | INFO | hymm_sp.vae:load_vae:42 - VAE to dtype: torch.float16 ========================= load llava ========================= 2025-05-09 17:02:20.053 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1 Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.84s/it] 2025-05-09 17:02:30.640 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16 2025-05-09 17:02:30.645 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1 You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message. 2025-05-09 17:02:30.959 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (clipL) from: ./models/openai_clip-vit-large-patch14 2025-05-09 17:02:31.308 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16 2025-05-09 17:02:31.310 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (clipL) from: ./models/openai_clip-vit-large-patch14 load hunyuan model successful... Traceback (most recent call last): File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 98, in main() File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 36, in main from diffusers.hooks import apply_group_offloading ModuleNotFoundError: No module named 'diffusers.hooks' (HunyuanCustom) root@H-TAO:~/HunyuanCustom#

Please update diffusers==0.33.0 and transformers==4.41.2,we have make a mistake.

@colorAi
Copy link
Author
colorAi commented May 9, 2025

更新后可以运行了,但是速度感人。在4090D上,五分钟一步。显存占用就像心电图一样emmm..

Image

Image

@justinjohn0306
Copy link

on wsl it worked after the requirements update but after a few secs my screen randomly died and had to restart my computer - also it was super slow

@colorAi
Copy link
Author
colorAi commented May 9, 2025

on wsl it worked after the requirements update but after a few secs my screen randomly died and had to restart my computer - also it was super slow

Yes, it's very slow. I guess it might be a VRAM optimization issue because running Kijai's quantization in ComfyUI works well.

@zhouzhengguang
Copy link
Collaborator

更新后可以运行了,但是速度感人。在4090D上,五分钟一步。显存占用就像心电图一样emmm..

Image

Image

确实,目前我们的实现方式是频繁的cpu-offload并清空cache,显存是会比较波动,速度也确实很慢,我们有看到mmgp库效率会高一些,后面我们也探索下更高效的实现

@colorAi
Copy link
Author
colorAi commented May 9, 2025

更新后可以运行了,但是速度感人。在4090D上,五分钟一步。显存占用就像心电图一样emmm..
Image
Image

确实,目前我们的实现方式是频繁的cpu-offload并清空cache,显存是会比较波动,速度也确实很慢,我们有看到mmgp库效率会高一些,后面我们也探索下更高效的实现

感谢你们的付出,辛苦了。

@FBAdmirer
Copy link
FBAdmirer commented May 10, 2025

Run completed finally. 2 secs video and the video quality is good.


I am able to run the "sample_gpu_poor.py" python script on WSL; Gaming desktop computer has 192GB RAM and a RTX 5060 Ti 16GB graphics card.

Image

export CUDA_VISIBLE_DEVICES=0
export MODEL_BASE="./models"
export CPU_OFFLOAD=1
export PYTHONPATH=./
python hymm_sp/sample_gpu_poor.py
--input './assets/images/seg_woman_01.png'
--pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café."
--neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border."
--ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt"
--video-size 720 1280
--seed 1024
--sample-n-frames 65
--infer-steps 30
--flow-shift-eval-video 13.0
--save-path './results/cpu_720p'
--use-fp8
--cpu-offload

@zhouzhengguang
Copy link
Collaborator

Run completed finally. 2 secs video and the video quality is good.

I am able to run the "sample_gpu_poor.py" python script on WSL; Gaming desktop computer has 192GB RAM and a RTX 5060 Ti 16GB graphics card.

Image

export CUDA_VISIBLE_DEVICES=0 export MODEL_BASE="./models" export CPU_OFFLOAD=1 export PYTHONPATH=./ python hymm_sp/sample_gpu_poor.py --input './assets/images/seg_woman_01.png' --pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café." --neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border." --ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt" --video-size 720 1280 --seed 1024 --sample-n-frames 65 --infer-steps 30 --flow-shift-eval-video 13.0 --save-path './results/cpu_720p' --use-fp8 --cpu-offload

Glad to hear that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0