8000 failed to convert torch.jit.ScriptModule to ONNX (crash) · Issue #30512 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

failed to convert torch.jit.ScriptModule to ONNX (crash) #30512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lironmo opened this issue Nov 27, 2019 · 17 commen 8000 ts
Closed

failed to convert torch.jit.ScriptModule to ONNX (crash) #30512

lironmo opened this issue Nov 27, 2019 · 17 comments
Assignees
Labels
module: onnx Related to torch.onnx oncall: jit Add this issue/PR to JIT oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@lironmo
Copy link
lironmo commented Nov 27, 2019

🐛 Bug

when convert scriptModule to onnx we crash and get the following exception:

Traceback (most recent call last):
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 382, in _export
    fixed_batch_size=fixed_batch_size)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 262, in _model_to_graph
    fixed_batch_size=fixed_batch_size)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 132, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/__init__.py", line 174, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/utils.py", line 619, in _run_symbolic_function
    return op_fn(g, *inputs, **attrs)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py", line 124, in wrapper
    return fn(g, *args)
  File "/home/liron/envs/detectron/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py", line 862, in batch_norm
    if len(input_sizes) == 2:
TypeError: object of type 'NoneType' has no len() (occurred when translating batch_norm)

To Reproduce

load the attached torch script, and try to convert to onnx:

def convert(self):
    loaded = torch.jit.load(self._torch_script_path)
    #loaded.load_state_dict(self._model_state)
    dummy_input = torch.randn(1, 3, 224, 224)
    target = loaded(dummy_input)
    torch.onnx.export(loaded, dummy_input, self._out_onnx_path, verbose=True,
    operator_export_type=torch.onnx.OperatorExportTypes.ONNX,
    example_outputs=target)

cc @suo @houseroad @spandantiwari @lara-hdr @BowenBao @neginraoof

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Nov 27, 2019
@lironmo
Copy link
Author
lironmo commented Nov 27, 2019

file: https://filebin.net/scu91052e8txtl4r

clean code
import torch
loaded = torch.jit.load("/app_data/test_torch_script/torch_script_test.zip")
dummy_input = torch.randn(1, 3, 224, 224)
target = loaded(dummy_input)
torch.onnx.export(loaded, dummy_input, self._out_onnx_path, verbose=True,
operator_export_type=torch.onnx.OperatorExportTypes.ONNX,
example_outputs=target)

@ailzhang ailzhang added module: onnx Related to torch.onnx triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 27, 2019
xuhdev added a commit to xuhdev/pytorch-xla that referenced this issue Nov 28, 2019
@lironmo lironmo changed the title failed to convert torch.jit.ScriptModule to ONNX failed to convert torch.jit.ScriptModule to ONNX (crash) Nov 28, 2019
@lironmo
Copy link
Author
lironmo commented Nov 28, 2019

when i compare it to normal export(create the model from code and export) i get the following:

input
133 defined in (%133 : Float(1, 64, 112, 112) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[7, 7], pads=[3, 3, 3, 3], strides=[2, 2]](%input.1, %conv1.weight), scope: ResNet/Conv2d[conv1] # /home/liron/envs/detectron/lib/python3.6/site-packages/torch/nn/modules/conv.py:342:0
)

input.type().sizes()
[1, 64, 112, 112]

where i try to export from torch module:

input
114 defined in (%114 : Tensor = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[7, 7], pads=[3, 3, 3, 3], strides=[2, 2]](%input.1, %102) # code/torch/torch/nn/modules/container.py:213:13
)

and

input_sizes = input.type().sizes()
type(input_sizes)
<class 'NoneType'>

dlibenzi pushed a commit to pytorch/xla that referenced this issue Nov 28, 2019
@lironmo
Copy link
Author
lironmo commented Nov 28, 2019

@dlibenzi - can you explain why your commit related to this issue?

@lironmo
Copy link
Author
lironmo commented Dec 3, 2019

@dlibenzi
Copy link
Contributor
dlibenzi commented Dec 3, 2019

@dlibenzi - can you explain why your commit related to this issue?

Should I know? 😄

@suo
Copy link
Member
suo commented Dec 3, 2019

@houseroad who is the right person to look at this?

@lara-hdr
Copy link
Contributor
lara-hdr commented Dec 3, 2019

@lironmo what version of PyTorch are you using?

@lironmo
Copy link
Author
lironmo commented Dec 4, 2019

@lara-hdr 1.3.0 and test also at 1.3.1

@lara-hdr
Copy link
Contributor
lara-hdr commented Dec 4, 2019

@lironmo , the issue is that the shape information of the tensors are not always available when scripting. The ONNX exporter needs these information in certain cases when PyTorch and ONNX operators' behaviors don't align perfectly.

Batch_norm was recently updated to export without the shape information in this PR #29458, so the error you are getting with batch_norm is now fixed in master.

However when I tried exporting your model with PyTorch master I got a similar error with flatten.
I submitted a PR with some improvements for flatten in opset 11, that solves the problem in your case #30751.

Once this PR is merged you should be able to export your model in opset_version=11 (use the parameter opset_version=11 in the exporter api) with PyTorch nighly.

@lironmo
Copy link
Author
lironmo commented Dec 5, 2019

@lara-hdr - thanks for your replay :),
I got the same error at the night build at batch norm layer,
I used:
(convert) liron@liron-Latitude-5490:~/work/pixoneye/model_conversion$ pip freeze | grep -i torch
torch-nightly==1.2.0.dev20190805
torchvision==0.4.1

so i need to wait to next night build? i created the traced torch script module with the new night build - updated in the bin.

@lironmo
Copy link
Author
lironmo commented Dec 5, 2019

@lara-hdr thanks for your reply :)

I also tried to install the night build and create a new traced model and convert it to onnx, but i get the same problem with the batch norm layer (see below for trace)

(convert) liron@liron-Latitude-5490:~/work/pixoneye/model_conversion$ pip freeze | grep -i torch
torch-nightly==1.2.0.dev20190805
torchvision==0.4.1

so i need to wait to the next night build?

i uploaded the new traced model to the bin (with night_build suffix, https://filebin.net/scu91052e8txtl4r)

about the flatten, i will wait for the fix.

trace:
Traceback (most recent call last):
File "/home/liron/work/pixoneye/model_conversion/test.py", line 17, in
out_onnx_path=out_onnx_script, transform_yaml_path=transform_yaml)
File "/home/liron/work/pixoneye/model_conversion/edgify/convert_model.py", line 28, in convert_from_torch_script
cls._load_model_weights_and_export(model, model_state_dict, out_onnx_path, transform_yaml_path)
File "/home/liron/work/pixoneye/model_conversion/edgify/convert_model.py", line 76, in _load_model_weights_and_export
example_outputs=target)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/init.py", line 132, in export
strip_doc_string, dynamic_axes)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 64, in export
example_outputs=example_outputs, strip_doc_string=strip_doc_string, dynamic_axes=dynamic_axes)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 329, in _export
_retain_param_name, do_constant_folding)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 225, in _model_to_graph
_disable_torch_constant_prop=_disable_torch_constant_prop)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 127, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/init.py", line 163, in _run_symbolic_function
return utils._run_symbolic_function(*args, kwargs)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 564, in _run_symbolic_function
return op_fn(g, inputs, attrs)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py", line 146, in wrapper
return fn(g, args)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py", line 876, in batch_norm
input_sizes = input.type().sizes()
RuntimeError: r INTERNAL ASSERT FAILED at /pytorch/aten/src/ATen/core/jit_type.h:155, please report a bug to PyTorch. (expect at /pytorch/aten/src/ATen/core/jit_type.h:155)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fb44d43e273 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: std::shared_ptrc10::CompleteTensorType c10::Type::expectc10::CompleteTensorType() + 0x1d3 (0x7fb4adc20413 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0x492412 (0x7fb4adc46412 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x1d5484 (0x7fb4ad989484 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: _PyCFunction_FastCallKeywords + 0x1eb (0x52393b in /home/liron/envs/convert/bin/python)
frame #5: /home/liron/envs/convert/bin/python() [0x57dc05]
frame #6: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #7: PyEval_EvalCodeEx + 0x285 (0x57dfd5 in /home/liron/envs/convert/bin/python)
frame #8: /home/liron/envs/convert/bin/python() [0x4fbb33]
frame #9: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #10: _PyEval_EvalFrameDefault + 0x1a7c (0x5778cc in /home/liron/envs/convert/bin/python)
frame #11: PyEval_EvalCodeEx + 0x5b2 (0x57e302 in /home/liron/envs/convert/bin/python)
frame #12: /home/liron/envs/convert/bin/python() [0x4fbc34]
frame #13: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x1a7c (0x5778cc in /home/liron/envs/convert/bin/python)
frame #15: PyEval_EvalCodeEx + 0x5b2 (0x57e302 in /home/liron/envs/convert/bin/python)
frame #16: /home/liron/envs/convert/bin/python() [0x4fbc34]
frame #17: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x1a7c (0x5778cc in /home/liron/envs/convert/bin/python)
frame #19: PyEval_EvalCodeEx + 0x285 (0x57dfd5 in /home/liron/envs/convert/bin/python)
frame #20: /home/liron/envs/convert/bin/python() [0x4fbb33]
frame #21: PyObject_Call + 0x3a (0x4e7cda in /home/liron/envs/convert/bin/python)
frame #22: torch::jit::BlockToONNX(torch::jit::Block
, torch::jit::Block
, torch::onnx::OperatorExportTypes, std::unordered_map<torch::jit::Value
, torch::jit::Value
, std::hashtorch::jit::Value*, std::equal_totorch::jit::Value*, std::allocator<std::pair<torch::jit::Value
const, torch::jit::Value
> > >) + 0x4b2 (0x7fb4adc0ee52 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #23: torch::jit::ToONNX(std::shared_ptrtorch::jit::Graph&, torch::onnx::OperatorExportTypes) + 0x2c2 (0x7fb4adc106e2 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #24: + 0x4548d3 (0x7fb4adc088d3 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #25: + 0x1d5484 (0x7fb4ad989484 in /home/liron/envs/convert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #26: _PyCFunction_FastCallKeywords + 0x1eb (0x52393b in /home/liron/envs/convert/bin/python)
frame #27: /home/liron/envs/convert/bin/python() [0x57da79]
frame #28: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #29: /home/liron/envs/convert/bin/python() [0x57535f]
frame #30: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #31: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #32: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #33: /home/liron/envs/convert/bin/python() [0x57535f]
frame #34: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #35: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #36: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #37: /home/liron/envs/convert/bin/python() [0x575716]
frame #38: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #39: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #40: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #41: /home/liron/envs/convert/bin/python() [0x57535f]
frame #42: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #43: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #44: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #45: /home/liron/envs/convert/bin/python() [0x57535f]
frame #46: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #47: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #48: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #49: /home/liron/envs/convert/bin/python() [0x57eb2d]
frame #50: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #51: _PyEval_EvalFrameDefault + 0x45e (0x5762ae in /home/liron/envs/convert/bin/python)
frame #52: /home/liron/envs/convert/bin/python() [0x57535f]
frame #53: /home/liron/envs/convert/bin/python() [0x57ebeb]
frame #54: /home/liron/envs/convert/bin/python() [0x57db5c]
frame #55: _PyEval_EvalFrameDefault + 0x11a5 (0x576ff5 in /home/liron/envs/convert/bin/python)
frame #56: /home/liron/envs/convert/bin/python() [0x57535f]
frame #57: PyEval_EvalCode + 0x23 (0x5750d3 in /home/liron/envs/convert/bin/python)
frame #58: /home/liron/envs/convert/bin/python() [0x5ea0c2]
frame #59: PyRun_FileExFlags + 0x9a (0x5ea52a in /home/liron/envs/convert/bin/python)
frame #60: PyRun_SimpleFileExFlags + 0x1a7 (0x5ea2e7 in /home/liron/envs/convert/bin/python)
frame #61: Py_Main + 0x623 (0x5ef7f3 in /home/liron/envs/convert/bin/python)
frame #62: main + 0xe9 (0x4d1f09 in /home/liron/envs/convert/bin/python)
frame #63: __libc_start_main + 0xf0 (0x7fb4b1aea830 in /lib/x86_64-linux-gnu/libc.so.6)

@lara-hdr
Copy link
Contributor
lara-hdr commented Dec 6, 2019

@lironmo, torch nightly should be version 1.4.0.dev not 1.2.0.
With he nightly you won't get the error with batch norm, but it will fail on flatten.
Once #30751 is merged you could get the latest nightly to export your model.

@lironmo
Copy link
Author
lironmo commented Dec 15, 2019

@lara-hdr sure you are right, Thanks!
I tried the latest night build (torch==1.4.0.dev20191214+cpu, torchvision==0.4.1) after the merge,
I created a new torch module(https://filebin.net/scu91052e8txtl4r ) and tried to convert to onnx, but i am getting the flatten error:

Traceback (most recent call last):
File "/home/liron/work/pixoneye/model_conversion/test.py", line 17, in
out_onnx_path=out_onnx_script, transform_yaml_path=transform_yaml)
File "/home/liron/work/pixoneye/model_conversion/edgify/convert_model.py", line 28, in convert_from_torch_script
cls._load_model_weights_and_export(model, model_state_dict, out_onnx_path, transform_yaml_path)
File "/home/liron/work/pixoneye/model_conversion/edgify/convert_model.py", line 76, in _load_model_weights_and_export
example_outputs=target)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/init.py", line 156, in export
custom_opsets)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 67, in export
custom_opsets=custom_opsets)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 466, in _export
fixed_batch_size=fixed_batch_size)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 336, in _model_to_graph
fixed_batch_size=fixed_batch_size, params_dict=params_dict)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 152, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/init.py", line 187, in _run_symbolic_function
return utils._run_symbolic_function(*args, **kwargs)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/utils.py", line 710, in _run_symbolic_function
return op_fn(g, *inputs, **attrs)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py", line 129, in wrapper
return fn(g, *args)
File "/home/liron/envs/convert/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py", line 1790, in flatten
end_dim = dim + end_dim
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int' (occurred when translating flatten)

@lara-hdr
Copy link
Contributor
lara-hdr commented Dec 16, 2019

@linronmo, as I explained above, the change for flatten could only be done for opset 11.
So you can export your model in opset_version=11, for that, you should use the parameter opset_version=11 in the exporter api with PyTorch nighly: torch.onnx.export(..., opset_version=11).

@lironmo
Copy link
Author
lironmo commented Dec 16, 2019

@lara-hdr, I verified the fix and it's works. Thanks!

BowenBao pushed a commit to BowenBao/pytorch that referenced this issue Dec 17, 2019
…0751)

Summary:
Update ONNX Flatten to accept negative indices in opset 11.
With this change, some cases of flatten do not rely on the input rank being available.
Fixes : pytorch#30512 .
Pull Request resolved: pytorch#30751

Reviewed By: hl475

Differential Revision: D18946904

Pulled By: houseroad

fbshipit-source-id: a6fa30a9182fff92211e505a19325525c6112f19
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this issue Jan 30, 2020
…0751)

Summary:
Update ONNX Flatten to accept negative indices in opset 11.
With this change, some cases of flatten do not rely on the input rank being available.
Fixes : pytorch#30512 .
Pull Request resolved: pytorch#30751

Reviewed By: hl475

Differential Revision: D18946904

Pulled By: houseroad

fbshipit-source-id: a6fa30a9182fff92211e505a19325525c6112f19
@RitchieHuang11
Copy link

@lironmo , the issue is that the shape information of the tensors are not always available when scripting. The ONNX exporter needs these information in certain cases when PyTorch and ONNX operators' behaviors don't align perfectly.

Batch_norm was recently updated to export without the shape information in this PR #29458, so the error you are getting with batch_norm is now fixed in master.

However when I tried exporting your model with PyTorch master I got a similar error with flatten.
I submitted a PR with some improvements for flatten in opset 11, that solves the problem in your case #30751.

Once this PR is merged you should be able to export your model in opset_version=11 (use the parameter opset_version=11 in the exporter api) with PyTorch nighly.

I got same error. set opset 11 not work for me

@deepindeed
Copy link
deepindeed commented Mar 25, 2020

@linronmo, as I explained above, the change for flatten could only be done for opset 11.
So you can export your model in opset_version=11, for that, you should use the parameter opset_version=11 in the exporter api with PyTorch nighly: torch.onnx.export(..., opset_version=11).

I got the same error also when calling torch.onnx.export api with parameter opset_version=11 set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: onnx Related to torch.onnx oncall: jit Add this issue/PR to JIT oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
0