运行低显存代码时出现错误 #6

colorAi · 2025-05-09T09:03:44Z

(base) root@H-TAO:/HunyuanCustom# conda activate HunyuanCustom
(HunyuanCustom) root@H-TAO:/HunyuanCustom# cd HunyuanCustom

export MODEL_BASE="./models"
export CPU_OFFLOAD=1
export PYTHONPATH=./
python hymm_sp/sample_gpu_poor.py
--input './assets/images/seg_woman_01.png'
--pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café."
--neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border."
--ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt"
--video-size 720 1280
--seed 1024
--sample-n-frames 129
--infer-steps 30
--flow-shift-eval-video 13.0
--save-path './results/cpu_720p'
--use-fp8
--cpu-offload
-bash: cd: HunyuanCustom: No such file or directory
vae: cpu_offload=1, DISABLE_SP=0
text_encoder: cpu_offload=1
models: cpu_offload=1, DISABLE_SP=0
2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:59 - Got text-to-video model root path: ./models/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt
2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:67 - Building model...
========================= build model =========================
/root/HunyuanCustom/hymm_sp/modules/fp8_optimization.py:88: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
fp8_map = torch.load(fp8_map_path, map_location=lambda storage, loc: storage)['module']
==================== load transformer to cpu
/root/HunyuanCustom/hymm_sp/inference.py:144: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(ckpt_path, map_location=lambda storage, loc: storage)
========================= load vae =========================
2025-05-09 17:02:17.665 | INFO | hymm_sp.vae:load_vae:19 - Loading 3D VAE model (884-16c-hy0801) from: ./models/vae_3d/hyvae_v1_0801
/root/HunyuanCustom/hymm_sp/vae/init.py:25: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ckpt = torch.load(Path(vae_path) / "pytorch_model.pt", map_location=vae.device)
2025-05-09 17:02:20.046 | INFO | hymm_sp.vae:load_vae:42 - VAE to dtype: torch.float16
========================= load llava =========================
2025-05-09 17:02:20.053 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.84s/it]
2025-05-09 17:02:30.640 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16
2025-05-09 17:02:30.645 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
2025-05-09 17:02:30.959 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (clipL) from: ./models/openai_clip-vit-large-patch14
2025-05-09 17:02:31.308 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16
2025-05-09 17:02:31.310 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (clipL) from: ./models/openai_clip-vit-large-patch14
load hunyuan model successful...
Traceback (most recent call last):
File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 98, in
main()
File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 36, in main
from diffusers.hooks import apply_group_offloading
ModuleNotFoundError: No module named 'diffusers.hooks'
(HunyuanCustom) root@H-TAO:~/HunyuanCustom#

The text was updated successfully, but these errors were encountered:

colorAi · 2025-05-09T09:06:34Z

运行单GPU那个代码是可以跑的，就是非常慢几乎没有进度，在4090D上

zhouzhengguang · 2025-05-09T09:26:35Z

(base) root@H-TAO:~~/HunyuanCustom# conda activate HunyuanCustom (HunyuanCustom) root@H-TAO:~~/HunyuanCustom# cd HunyuanCustom

export M 8000 ODEL_BASE="./models" export CPU_OFFLOAD=1 export PYTHONPATH=./ python hymm_sp/sample_gpu_poor.py --input './assets/images/seg_woman_01.png' --pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café." --neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border." --ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt" --video-size 720 1280 --seed 1024 --sample-n-frames 129 --infer-steps 30 --flow-shift-eval-video 13.0 --save-path './results/cpu_720p' --use-fp8 --cpu-offload -bash: cd: HunyuanCustom: No such file or directory vae: cpu_offload=1, DISABLE_SP=0 text_encoder: cpu_offload=1 models: cpu_offload=1, DISABLE_SP=0 2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:59 - Got text-to-video model root path: ./models/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt 2025-05-09 17:00:25.936 | INFO | hymm_sp.inference:from_pretrained:67 - Building model... ========================= build model ========================= /root/HunyuanCustom/hymm_sp/modules/fp8_optimization.py:88: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. fp8_map = torch.load(fp8_map_path, map_location=lambda storage, loc: storage)['module'] ==================== load transformer to cpu /root/HunyuanCustom/hymm_sp/inference.py:144: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(ckpt_path, map_location=lambda storage, loc: storage) ========================= load vae ========================= 2025-05-09 17:02:17.665 | INFO | hymm_sp.vae:load_vae:19 - Loading 3D VAE model (884-16c-hy0801) from: ./models/vae_3d/hyvae_v1_0801 /root/HunyuanCustom/hymm_sp/vae/init.py:25: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. ckpt = torch.load(Path(vae_path) / "pytorch_model.pt", map_location=vae.device) 2025-05-09 17:02:20.046 | INFO | hymm_sp.vae:load_vae:42 - VAE to dtype: torch.float16 ========================= load llava ========================= 2025-05-09 17:02:20.053 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1 Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.84s/it] 2025-05-09 17:02:30.640 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16 2025-05-09 17:02:30.645 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (llava-llama-3-8b) from: ./models/llava-llama-3-8b-v1_1 You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message. 2025-05-09 17:02:30.959 | INFO | hymm_sp.text_encoder:load_text_encoder:29 - Loading text encoder model (clipL) from: ./models/openai_clip-vit-large-patch14 2025-05-09 17:02:31.308 | INFO | hymm_sp.text_encoder:load_text_encoder:46 - Text encoder to dtype: torch.float16 2025-05-09 17:02:31.310 | INFO | hymm_sp.text_encoder:load_tokenizer:61 - Loading tokenizer (clipL) from: ./models/openai_clip-vit-large-patch14 load hunyuan model successful... Traceback (most recent call last): File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 98, in main() File "/root/HunyuanCustom/hymm_sp/sample_gpu_poor.py", line 36, in main from diffusers.hooks import apply_group_offloading ModuleNotFoundError: No module named 'diffusers.hooks' (HunyuanCustom) root@H-TAO:~/HunyuanCustom#

Please update diffusers==0.33.0 and transformers==4.41.2，we have make a mistake.

colorAi · 2025-05-09T10:51:34Z

更新后可以运行了，但是速度感人。在4090D上，五分钟一步。显存占用就像心电图一样emmm..

justinjohn0306 · 2025-05-09T11:28:45Z

on wsl it worked after the requirements update but after a few secs my screen randomly died and had to restart my computer - also it was super slow

colorAi · 2025-05-09T12:09:53Z

on wsl it worked after the requirements update but after a few secs my screen randomly died and had to restart my computer - also it was super slow

Yes, it's very slow. I guess it might be a VRAM optimization issue because running Kijai's quantization in ComfyUI works well.

zhouzhengguang · 2025-05-09T12:49:33Z

更新后可以运行了，但是速度感人。在4090D上，五分钟一步。显存占用就像心电图一样emmm..

确实，目前我们的实现方式是频繁的cpu-offload并清空cache，显存是会比较波动，速度也确实很慢，我们有看到mmgp库效率会高一些，后面我们也探索下更高效的实现

colorAi · 2025-05-09T15:20:42Z

更新后可以运行了，但是速度感人。在4090D上，五分钟一步。显存占用就像心电图一样emmm..

确实，目前我们的实现方式是频繁的cpu-offload并清空cache，显存是会比较波动，速度也确实很慢，我们有看到mmgp库效率会高一些，后面我们也探索下更高效的实现

感谢你们的付出，辛苦了。

FBAdmirer · 2025-05-10T21:29:53Z

Run completed finally. 2 secs video and the video quality is good.

I am able to run the "sample_gpu_poor.py" python script on WSL; Gaming desktop computer has 192GB RAM and a RTX 5060 Ti 16GB graphics card.

export CUDA_VISIBLE_DEVICES=0
export MODEL_BASE="./models"
export CPU_OFFLOAD=1
export PYTHONPATH=./
python hymm_sp/sample_gpu_poor.py
--input './assets/images/seg_woman_01.png'
--pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café."
--neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border."
--ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt"
--video-size 720 1280
--seed 1024
--sample-n-frames 65
--infer-steps 30
--flow-shift-eval-video 13.0
--save-path './results/cpu_720p'
--use-fp8
--cpu-offload

zhouzhengguang · 2025-05-12T01:58:17Z

Run completed finally. 2 secs video and the video quality is good.

I am able to run the "sample_gpu_poor.py" python script on WSL; Gaming desktop computer has 192GB RAM and a RTX 5060 Ti 16GB graphics card.

export CUDA_VISIBLE_DEVICES=0 export MODEL_BASE="./models" export CPU_OFFLOAD=1 export PYTHONPATH=./ python hymm_sp/sample_gpu_poor.py --input './assets/images/seg_woman_01.png' --pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café." --neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border." --ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt" --video-size 720 1280 --seed 1024 --sample-n-frames 65 --infer-steps 30 --flow-shift-eval-video 13.0 --save-path './results/cpu_720p' --use-fp8 --cpu-offload

Glad to hear that.

Vibecoder9000 mentioned this issue May 10, 2025

OOM Error on Windows 11 24GB GPU (RTX 3090) with sample_gpu_poor.py Despite Following Low VRAM Guidelines #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

运行低显存代码时出现错误 #6

运行低显存代码时出现错误 #6

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

运行低显存代码时 出现错误 #6

运行低显存代码时 出现错误 #6

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

运行低显存代码时出现错误 #6

运行低显存代码时出现错误 #6