Got "RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED" while applying conv2d over a transposed tensor

bincard · 2022-07-04T12:37:46Z

🐛 Describe the bug

I got this error on pytorch version 1.12.0.
The code that can reproduce the error:

import torch
import torch.nn as nn


conv = nn.Conv2d(
    1,
    128,
    kernel_size=(5, 2),
    stride=(2, 1),
    padding=(0, 1),
    dilation=(1, 1),
    groups=1,
    bias=True,
    padding_mode='zeros')

t = torch.rand([1, 2, 321, 201, 1])
t = torch.transpose(t, 1, 4)
t2 = t[..., 0]
r = conv(t2)

The error message:

Traceback (most recent call last):
  File "/Users/bin.xue/Codes/iot/maasAEC/xxx.py", line 19, in <module>
    r = conv(t2)
  File "/Users/bin.xue/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/bin.xue/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/bin.xue/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "/Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch.

Versions

PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.4 (x86_64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: version 3.20.2
Libc version: N/A

Python version: 3.7.3 (default, May 29 2019, 18:19:34) [Clang 10.0.1 (clang-1001.0.46.4)] (64-bit runtime)
Python platform: Darwin-21.5.0-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.0
[pip3] torchaudio==0.12.0
[conda] Could not collect

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @VitalyFedyunin

The text was updated successfully, but these errors were encountered:

Fixes [#80837](#80837). This PR is to disable use_mkldnn when input is not contiguous for oneDNN requirement. Pull Request resolved: #80864 Approved by: https://github.com/malfet

…80864) Summary: Fixes [#80837](#80837). This PR is to disable use_mkldnn when input is not contiguous for oneDNN requirement. Pull Request resolved: #80864 Approved by: https://github.com/malfet Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4655c3bace4d50b7d02fe0eb0e0fc2a792a518a7 Reviewed By: DanilBaibak Differential Revision: D37919713 Pulled By: DanilBaibak fbshipit-source-id: 40a7f5d802498d2a01a702e970982cc40b113e10

mingfeima · 2022-07-28T23:09:14Z

Root cause is this ambiguity between is_contiguous.

the given input has shape [1, 1, 321, 201] and stride [129042, 1, 201, 1], this is considered as contiguous (memory format is channels last) for pytorch.

but onednn considered as not nhwc since

    inline bool is_nhwc() const {
      if (!is_plain() || data.ndims != 4) return false;
      const auto &dims = data.dims;
      const auto &strides = blocking_strides();
      const auto n = 0, c = 1, h = 2, w = 3;
      return strides[n] == dims[h] * dims[w] * dims[c]
          && strides[h] == dims[w] * dims[c]
          && strides[w] == dims[c]
          && strides[c] == 1;
    };

and the dim0 stride do not match, but actually it does not matter since size0 is 1 and so the index could only be 0.

we need to get rid of the ambiguity between pytorch and onednn, decide use_channels_last from pytorch side and pass it to onednn. On onednn side we don't make judgement, just believe in that pytorch passed down physically contiguous tensors in the aligned memory format (and it should be so).

Fix #82060(N>1 will call in OneDNN path) and #80837, those two issues are introduced by the definition of channels last is different between PyTorch FW side with ideep side, this PR will fix this gap which ideep will use the format flag given by FW side. Pull Request resolved: #83653 Approved by: https://github.com/mingfeima, https://github.com/malfet

XiaobingSuper · 2022-10-18T13:48:03Z

it has been fixed by #83653.

XiaobingSuper added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Jul 4, 2022

XiaobingSuper assigned yanbing-j Jul 5, 2022

XiaobingSuper added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 5, 2022

yanbing-j mentioned this issue Jul 7, 2022

Disable use_mkldnn when input is not contiguous for oneDNN #80864

Closed

XiaobingSuper mentioned this issue Aug 18, 2022

fix oneDNN channels_last path issue #83653

Closed

XiaobingSuper linked a pull request Aug 18, 2022 that will close this issue

fix oneDNN channels_last path issue #83653

Closed

XiaobingSuper closed this as completed Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Got "RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED" while applying conv2d over a transposed tensor #80837

Got "RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED" while applying conv2d over a transposed tensor #80837

Uh oh!

Uh oh!

Got "RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED" while applying conv2d over a transposed tensor #80837

Got "RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED" while applying conv2d over a transposed tensor #80837

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!

Uh oh!