Description
🐛 Describe the bug
Calling torch.svd_lowrank
on the same input tensor (with same seed
and values) produces inconsistent U outputs between CPU and CUDA.
To Reproduce
import torch
import torch.nn.functional as F
if not torch.cuda.is_available():
print("CUDA is not available. Exiting test.")
torch.manual_seed(0)
batch_size = 2
n = 3
A = torch.rand(batch_size, n, n)
A = (A + A.transpose(-2, -1)) / 2
A[0, 2, 0] = 0
A[1, 0, 0] = 0
torch.manual_seed(0) # use the same seed
U_cpu, S_cpu, V_cpu = torch.functional.svd_lowrank(A, q=2)
torch.manual_seed(0) # use the same seed
U_cuda, S_cuda, V_cuda = torch.functional.svd_lowrank(A.cuda(), q=2)
print("U_cpu:", U_cpu)
print("U_cuda:", U_cuda)
Output
U_cpu: tensor([[[-0.4947, -0.1896],
[-0.6474, -0.5706],
[-0.5797, 0.7990]],
[[-0.3636, 0.8550],
[-0.4345, 0.1954],
[-0.8240, -0.4803]]])
U_cuda: tensor([[[-0.4954, 0.1663],
[-0.6470, 0.5843],
[-0.5796, -0.7943]],
[[-0.3636, -0.8370],
[-0.4345, -0.2359],
[-0.8240, 0.4938]]], device='cuda:0')
Versions
PyTorch version: 2.7.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu122.04) 11.4.0
Clang version: 21.0.0 (++20250526042847+95756e67c230-1exp1~20250526042959.2439)
CMake version: version 3.22.1
Libc version: glibc-2.35
Python version: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-138-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX A6000
Nvidia driver version: 570.133.20
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
cc @jianyuh @nikitaved @mruberry @walterddr @xwang233 @lezcano