8000 [Bug] 训练flux kontext lora 在开启blocks_to_swap后报错 · Issue #32 · lrzjason/T2ITrainer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Bug] 训练flux kontext lora 在开启blocks_to_swap后报错 #32
Closed
@Wenaka2004

Description

@Wenaka2004

训练环境:
硬件:4090D
nvidia-smi信息

Image

torch 2.7.1+cu126
torchaudio 2.7.1+cu126
torchvision 0.22.1+cu126

`Steps:   0%|                                                                                    | 0/20 [00:00<?, ?it/s]

Unhandled exception caught in c10/util/AbortHandler.h

00007FF892AF8E2400007FF892AD7670 torch_python.dll!THPGenerator_initDefaultGenerator [ @ ]

00007FF92721EE1200007FF92721EDF0 ucrtbase.dll!terminate [ @ ]

00007FF904551911 VCRUNTIME140_1.dll! [ @ ]

00007FF90455218F VCRUNTIME140_1.dll! [ @ ]

00007FF9045521E9 VCRUNTIME140_1.dll! [ @ ]

00007FF90455401900007FF904553F70 VCRUNTIME140_1.dll!_CxxFrameHandler4 [ @ ]

00007FF9297B5CEF00007FF9297B5BC0 ntdll.dll!_chkstk [ @ ]

00007FF92972E8C600007FF92972DE30 ntdll.dll!RtlFindCharInUnicodeString [ @ ]

00007FF92976499500007FF929764800 ntdll.dll!RtlRaiseException [ @ ]

00007FF926A500AC00007FF926A50040 KERNELBASE.dll!RaiseException [ @ ]

00007FF8911D526700007FF8911D51D0 VCRUNTIME140.dll!CxxThrowException [ @ ]

00007FF8F4E86C6E00007FF8F4E86C00 c10.dll!c10::detail::torchCheckFail [ @ ]

00007FF8F4DD384F00007FF8F4DD3550 c10_cuda.dll!c10::cuda::c10_cuda_check_implementation [ @ ]

00007FFFAA9AD36B00007FFFAA9AD310 torch_cuda.dll!at::cuda::CUDAEvent::createEvent [ @ ]

00007FFFAA9B0A3E00007FFFAA9B0A00 torch_cuda.dll!at::cuda::CUDAEvent::record [ @ ]

00007FFFAA9BE29900007FFFAA9BD580 torch_cuda.dll!at::cuda::getCachingHostAllocator [ @ ]

00007FFFAA9BCF5900007FFFAA9BB800 torch_cuda.dll!at::cuda::flush_icache [ @ ]

00007FF8F4E4B16C00007FF8F4E4B140 c10.dll!c10::StorageImpl::~StorageImpl [ @ ]

00007FFFE66F76F500007FFFE66F5F10 torch_cpu.dll!at::DynamicLibrary::sym [ @ ]

00007FF8F4E3F1D200007FF8F4E3F160 c10.dll!c10::ConstantSymNodeImpl::~ConstantSymNodeImpl [ @ ]

00007FF8F4E6F94500007FF8F4E6F8D0 c10.dll!c10::TensorImpl::~TensorImpl [ @ ]

00007FF8F4E6FC8300007FF8F4E6FB00 c10.dll!c10::impl::TorchDispatchModeTLS::operator= [ @ ]

00007FFFE669ECA800007FFFE669EC20 torch_cpu.dll!at::TensorBase::reset [ @ ]

00007FF892B0661500007FF892AF9840 torch_python.dll!initModule [ @ ]

00007FF892B99CE500007FF892B67DC0 torch_python.dll!THPPointer::THPPointer [ @ ]

00007FF890B17D8300007FF890AA27E0 python311.dll!Py_Get_Getpath_CodeObject [ @ ]

00007FF890B184D800007FF890AA27E0 python311.dll!Py_Get_Getpath_CodeObject [ @ ]

00007FF890D1FF2700007FF890D1F650 python311.dll!Py_InitializeMain [ @ ]

00007FF890D2106900007FF890D20E00 python311.dll!Py_FinalizeEx [ @ ]

00007FF890B2382500007FF890B237C0 python311.dll!Py_Main [ @ ]

00007FF61713149000007FF617131110 python.exe!OPENSSL_Applink [ @ ]

00007FF92877259D00007FF928772580 KERNEL32.DLL!BaseThreadInitThunk [ @ ]

00007FF92976AF7800007FF92976AF50 ntdll.dll!RtlUserThreadStart [ @ ]`
以上报错仅在设置blocks_to_swap后出现,设置为0也就是不开启时能够正常训练(占用共享显存 非常慢)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0