You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For pypots.forecasting.fits and pypots.imputation.fits we have
E RuntimeError: Caught RuntimeError in replica 0 on device 1.
E Original Traceback (most recent call last):
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
E output = module(*input, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/PyPOTS_dev/pypots/forecasting/fits/core.py", line 68, in forward
E enc_out = self.backbone(enc_out)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/PyPOTS_dev/pypots/nn/modules/fits/backbone.py", line 63, in forward
E low_specxy_ = self.freq_upsampler(low_specx.permute(0, 2, 1))
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
E return F.linear(input, self.weight, self.bias)
E RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D
For pypots.imputation.film we have
E RuntimeError: Caught RuntimeError in replica 0 on device 1.
E Original Traceback (most recent call last):
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
E output = module(*input, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/PyPOTS_dev/pypots/imputation/film/core.py", line 65, in forward
E backbone_output = self.backbone(X_embedding)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/PyPOTS_dev/pypots/nn/modules/film/backbone.py", line 65, in forward
E out1 = self.spec_conv_1[i](x_in_c)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/PyPOTS_dev/pypots/nn/modules/film/layers.py", line 128, in forward
E out_ft[:, :, :, : self.modes2] = torch.einsum("bjix,iox->bjox", a, self.weights1)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/functional.py", line 380, in einsum
E return _VF.einsum(equation, operands) # type: ignore[attr-defined]
E RuntimeError: einsum(): the number of subscripts in the equation (3) does not match the number of dimensions (4) for operand 1 and no ellipsis was given
For pypots.imputation.gpvae we have
E RuntimeError: Caught RuntimeError in replica 1 on device 2.
E Original Traceback (most recent call last):
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
E output = module(*input, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/PyPOTS_dev/pypots/imputation/gpvae/core.py", line 97, in forward
E elbo_loss = self.backbone(X, missing_mask)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
E return self._call_impl(*args, **kwargs)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
E return forward_call(*args, **kwargs)
E File "/home/wdudu/PyPOTS_dev/pypots/nn/modules/gpvae/backbone.py", line 157, in forward
E self.prior = self._init_prior(device=X.device)
E File "/home/wdudu/PyPOTS_dev/pypots/nn/modules/gpvae/backbone.py", line 137, in _init_prior
E prior = torch.distributions.MultivariateNormal(
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/distributions/multivariate_normal.py", line 177, in __init__
E super().__init__(batch_shape, event_shape, validate_args=validate_args)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/distributions/distribution.py", line 66, in __init__
E valid = constraint.check(value)
E File "/home/wdudu/.conda/envs/ml/lib/python3.10/site-packages/torch/distributions/constraints.py", line 557, in check
E return torch.linalg.cholesky_ex(value).info.eq(0)
E RuntimeError: lazy wrapper should be called at most once
for others
they have 'DataParallel' object has no attribute 'backbone'
The text was updated successfully, but these errors were encountered:
They are all fine when working on a single GPU. Hence, if one encounters the errors above, one should use one GPU only to run the models, or utilize CPU only.
Now that CRLI, Koopa, and USGAN are fixed in #633. I'm going to change this issue's title.
WenjieDu
changed the title
Some models fail when running on multiple CUDA devices
FITS/FILM/GP-VAE fail when running on multiple CUDA devices
Mar 15, 2025
1. System Info
v0.11
2. Information
3. Reproduction
4. Expected behavior
For pypots.forecasting.fits and pypots.imputation.fits we have
For pypots.imputation.film we have
For pypots.imputation.gpvae we have
for others
they have
'DataParallel' object has no attribute 'backbone'
The text was updated successfully, but these errors were encountered: