8000 Got "RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED" while applying conv2d over a transposed tensor · Issue #80837 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Got "RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED" while applying conv2d over a transposed tensor #80837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bincard opened this issue Jul 4, 2022 · 2 comments
Assignees
Labels
module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@bincard
Copy link
bincard commented Jul 4, 2022

🐛 Describe the bug

I got this error on pytorch version 1.12.0.
The code that can reproduce the error:

import torch
import torch.nn as nn


conv = nn.Conv2d(
    1,
    128,
    kernel_size=(5, 2),
    stride=(2, 1),
    padding=(0, 1),
    dilation=(1, 1),
    groups=1,
    bias=True,
    padding_mode='zeros')

t = torch.rand([1, 2, 321, 201, 1])
t = torch.transpose(t, 1, 4)
t2 = t[..., 0]
r = conv(t2)

The error message:

Traceback (most recent call last):
  File "/Users/bin.xue/Codes/iot/maasAEC/xxx.py", line 19, in <module>
    r = conv(t2)
  File "/Users/bin.xue/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/bin.xue/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/bin.xue/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "/Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch. 

Versions

PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.4 (x86_64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: version 3.20.2
Libc version: N/A

Python version: 3.7.3 (default, May 29 2019, 18:19:34) [Clang 10.0.1 (clang-1001.0.46.4)] (64-bit runtime)
Python platform: Darwin-21.5.0-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.0
[pip3] torchaudio==0.12.0
[conda] Could not collect

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @VitalyFedyunin

@XiaobingSuper XiaobingSuper added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Jul 4, 2022
@XiaobingSuper XiaobingSuper added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 5, 2022
pytorchmergebot pushed a commit that referenced this issue Jul 17, 2022
Fixes [#80837](#80837).
This PR is to disable use_mkldnn when input is not contiguous for oneDNN requirement.

Pull Request resolved: #80864
Approved by: https://github.com/malfet
facebook-github-bot pushed a commit that referenced this issue Jul 18, 2022
…80864)

Summary:
Fixes [#80837](#80837).
This PR is to disable use_mkldnn when input is not contiguous for oneDNN requirement.

Pull Request resolved: #80864
Approved by: https://github.com/malfet

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4655c3bace4d50b7d02fe0eb0e0fc2a792a518a7

Reviewed By: DanilBaibak

Differential Revision: D37919713

Pulled By: DanilBaibak

fbshipit-source-id: 40a7f5d802498d2a01a702e970982cc40b113e10
@mingfeima
Copy link
Collaborator
mingfeima commented Jul 28, 2022

Root cause is this ambiguity between is_contiguous.

the given input has shape [1, 1, 321, 201] and stride [129042, 1, 201, 1], this is considered as contiguous (memory format is channels last) for pytorch.

but onednn considered as not nhwc since

    inline bool is_nhwc() const {
      if (!is_plain() || data.ndims != 4) return false;
      const auto &dims = data.dims;
      const auto &strides = blocking_strides();
      const auto n = 0, c = 1, h = 2, w = 3;
      return strides[n] == dims[h] * dims[w] * dims[c]
          && strides[h] == dims[w] * dims[c]
          && strides[w] == dims[c]
          && strides[c] == 1;
    };

and the dim0 stride do not match, but actually it does not matter since size0 is 1 and so the index could only be 0.

we need to get rid of the ambiguity between pytorch and onednn, decide use_channels_last from pytorch side and pass it to onednn. On onednn side we don't make judgement, just believe in that pytorch passed down physically contiguous tensors in the aligned memory format (and it should be so).

@XiaobingSuper XiaobingSuper linked a pull request Aug 18, 2022 that will close this issue
pytorchmergebot pushed a commit that referenced this issue Aug 25, 2022
Fix #82060(N>1 will call in OneDNN path) and #80837, those two issues are introduced by the definition of channels last is different between PyTorch FW side with ideep side, this PR will fix this gap which ideep will use the format flag given by FW side.

Pull Request resolved: #83653
Approved by: https://github.com/mingfeima, https://github.com/malfet
@XiaobingSuper
Copy link
Collaborator

it has been fixed by #83653.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
4 participants
0