Does transformerEngine support 2080ti? #1680

SeekPoint · 2025-04-14T17:56:04Z

while I run another project which used TransformerEngine, it trigger exception:
https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/attention.py
# raise exception if no backend is available
if sum([use_flash_attention, use_fused_attention, use_unfused_attention]) == 0:
raise ValueError(
"No dot product attention backend is available for the provided inputs. Please"
" run with NVTE_DEBUG=1 NVTE_DEBUG_LEVEL=2 to find out the reasons for"
" disabling all backends."
)

ptrendx · 2025-04-14T17:58:12Z

@cyanguwa Could you take a look? 2080ti is Turing (sm75).

ptrendx · 2025-04-14T20:53:44Z

@SeekPoint could you give more details about the workload that triggered this - e.g. datatype, shapes of tensors? Also, if you could rerun with those suggested environment variables (NVTE_DEBUG=1 NVTE_DEBUG_LEVEL=2) and paste the output that would be super helpful.

SeekPoint · 2025-04-15T07:44:27Z

I have 4 cards namely special version of 2080ti, each card have 22GB GPU memory

the case come from https://swift.readthedocs.io/zh-cn/latest/Instruction/Megatron-SWIFT%E8%AE%AD%E7%BB%83.html

Considering the log is huge, only paste the trackback .
Also, I find another one raised the same error in NeMo, NVIDIA/NeMo#11218

the traceback:

[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/lib/python3.11/site-packages/swift/cli/_megatron/sft.py", line 4, in
[rank0]: megatron_sft_main()
[rank0]: File "/usr/local/lib/python3.11/site-packages/swift/megatron/train/sft.py", line 90, in megatron_sft_main
[rank0]: return MegatronSft(args).main()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/swift/llm/base.py", line 47, in main
[rank0]: result = self.run()
[rank0]: ^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/swift/megatron/train/sft.py", line 75, in run
[rank0]: pretrain(
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/training/training.py", line 408, in pretrain
[rank0]: iteration, num_floating_point_operations_so_far = train(
[rank0]: ^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/training/training.py", line 1493, in train
[rank0]: train_step(forward_step_func,
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/training/training.py", line 791, in train_step
[rank0]: losses_reduced = forward_backward_func(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 471, in forward_backward_no_pipelining
[rank0]: output_tensor, num_tokens = forward_step(
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 275, in forward_step
[rank0]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/swift/megatron/train/utils.py", line 182, in forward_step
[rank0]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels, packed_seq_params=packed_seq_params)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return for 8000 ward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/distributed/data_parallel_base.py", line 22, in forward
[rank0]: return self.module(*inputs, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/legacy/model/module.py", line 189, in forward
[rank0]: outputs = self.module(*inputs, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 264, in forward
[rank0]: hidden_states = self.decoder(
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/transformer/transformer_block.py", line 549, in forward
[rank0]: hidden_states, context = layer(
[rank0]: ^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/transformer/transformer_layer.py", line 502, in call
[rank0]: return super(MegatronModule, self).call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/transformer/transformer_layer.py", line 390, in forward
[rank0]: attention_output_with_bias = self.self_attention(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/transformer/attention.py", line 450, in forward
[rank0]: core_attn_out = self._checkpointed_attention_forward(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/transformer/attention.py", line 170, in _checkpointed_attention_forward
[rank0]: hidden_states = tensor_parallel.checkpoint(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/tensor_parallel/random.py", line 431, in checkpoint
[rank0]: return CheckpointFunction.apply(function, distribute_saved_activations, *args)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/tensor_parallel/random.py", line 369, in forward
[rank0]: outputs = run_function(*args)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/github/Megatron-LM/megatron/core/transformer/attention.py", line 156, in custom_forward
[rank0]: output = self.core_attention(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/.cache/modelscope/hub/_github/Megatron-LM/megatron/core/extensions/transformer_engine.py", line 804, in forward
[rank0]: core_attn_out = super().forward(
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.11/site-packages/transformer_engine/pytorch/attention.py", line 7736, in forward
[rank0]: raise ValueError("No dot product attention support for the provided inputs!")
[rank0]: ValueError: No dot product attention support for the provided inputs!
[rank1]:[W415 15:38:12.811977692 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
[rank3]:[W415 15:38:12.832828834 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0415 15:38:14.751000 78 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 111 closing signal SIGTERM
W0415 15:38:14.753000 78 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 113 closing signal SIGTERM
E0415 15:38:14.792000 78 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 112) of binary: /usr/local/bin/python
Traceback (most recent call last):

ptrendx · 2025-04-25T00:22:29Z

Hi @SeekPoint, the part of the log that would help us narrow the issue should actually be before the traceback - with the NVTE_DEBUG and NVTE_DEBUG_LEVEL environment variables set there should be lines logging the reasons for why it did not choose any backend.

SeekPoint added the bug Something isn't working label Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does transformerEngine support 2080ti? #1680

Does transformerEngine support 2080ti? #1680

Does transformerEngine support 2080ti? #1680

Does transformerEngine support 2080ti? #1680

Comments