You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model is created from huggingface pretrained model, but I got the following error when doing clip_grad_norm:
grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=self.grad_clip)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/utils/clip_grad.py", line 30, in _no_grad_wrapper
[rank4]: return func(*args, **kwargs)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/utils/clip_grad.py", line 105, in clip_grad_norm_
[rank4]: clip_coef = max_norm / (total_norm + 1e-6)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/_tensor.py", line 39, in wrapped
[rank4]: return f(*args, **kwargs)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/_tensor.py", line 1032, in __rdiv__
[rank4]: return self.reciprocal() * other
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
[rank4]: return disable_fn(*args, **kwargs)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank4]: return fn(*args, **kwargs)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/tensor/_api.py", line 340, in __torch_dispatch__
[rank4]: return DTensor._op_dispatcher.dispatch(
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 181, in dispatch
[rank4]: self.redistribute_local_args(
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 317, in redistribute_local_args
[rank4]: resharded_local_tensor = redistribute_local_tensor(
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/tensor/_redistribute.py", line 195, in redistribute_local_tensor
[rank4]: new_local_tensor = partial_spec._reduce_value(
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/tensor/_ops/_math_ops.py", line 126, in _reduce_value
[rank4]: reduced_tensor = super()._reduce_value(tensor, mesh, mesh_dim)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/tensor/placement_types.py", line 599, in _reduce_value
[rank4]: return funcol.all_reduce(
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/_functional_collectives.py", line 175, in all_reduce
[rank4]: tensor = torch.ops._c10d_functional.all_reduce(self, reduceOp.lower(), group_name)
[rank4]: File "/root/miniconda3/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
[rank4]: return self._op(*args, **(kwargs or {}))
[rank4]: RuntimeError: No backend type associated with device type cpu
Is there anything wrong in my model init device?
The text was updated successfully, but these errors were encountered:
The gradients are place o
8000
n CPU when doing the collectives. @weifengpy What's the common practices when users need to manipulate gradients when CPU offload is on?
I am currently using
cpuOffloadPolicy
in the following way:The model is created from huggingface pretrained model, but I got the following error when doing
clip_grad_norm
:Is there anything wrong in my model init device?
The text was updated successfully, but these errors were encountered: