[jit] support tracing tensor setitem with dynamic shape #45828

ppwwyyxx · 2020-10-05T03:20:31Z

Summary: fix #43548

Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_trace_slice' --jobs 1

Differential Revision: D24106641

Summary: fix pytorch#43548 Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_trace_slice' --jobs 1 Differential Revision: D24106641 fbshipit-source-id: 240a6a69f44bccb277e21fd76406e31e48c8968b

facebook-github-bot · 2020-10-05T03:20:51Z

This pull request was exported from Phabricator. Differential Revision: D24106641

gmagogsfm · 2020-10-09T08:09:55Z

@eellison could you help take a look? I am not familiar with this part of code.

eellison · 2020-10-09T18:51:04Z

requesting review from Pytorch eager folks since i'm also unfamiliar with this part of the code @zou3519 @gchanan @ezyang

dr-ci · 2020-10-12T09:56:20Z

💊 CI failures summary and remediations

As of commit 2719ba8 (more details on the Dr. CI page):

4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_clang5_asan_test2 (1/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Oct 13 19:22:26 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in

Oct 13 19:22:26     #7 0x55bafc7b770b in PyEval_EvalCode /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:731 
Oct 13 19:22:26     #8 0x55bafc837573 in run_mod /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:1025 
Oct 13 19:22:26     #9 0x55bafc83760c in PyRun_StringFlags /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:949 
Oct 13 19:22:26     #10 0x55bafc83766e in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:445 
Oct 13 19:22:26     #11 0x55bafc83b472 in run_command /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:301 
Oct 13 19:22:26     #12 0x55bafc83b472 in Py_Main /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:749 
Oct 13 19:22:26     #13 0x55bafc70543d in main /tmp/build/80754af9/python_1599604603603/work/Programs/python.c:69 
Oct 13 19:22:26     #14 0x7eff6c42383f in __libc_start_main /build/glibc-e6zv40/glibc-2.23/csu/../csu/libc-start.c:291 
Oct 13 19:22:26     #15 0x55bafc7e4d0a in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103 
Oct 13 19:22:26  
Oct 13 19:22:26 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in  
Oct 13 19:22:26 + retcode=1 
Oct 13 19:22:26 + set -e 
Oct 13 19:22:26 + return 1 
Oct 13 19:22:26 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]] 
Oct 13 19:22:26 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]] 
Oct 13 19:22:26 + '[' -n https://github.com/pytorch/pytorch/pull/45828 ']' 
Oct 13 19:22:26 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]] 
Oct 13 19:22:26 ++ mktemp 
Oct 13 19:22:26 + DETERMINE_FROM=/tmp/tmp.yeID1JkanU 
Oct 13 19:22:26 + file_diff_from_base /tmp/tmp.yeID1JkanU

pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (2/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_conda.sh 
Auto-merging .circleci/docker/common/install_conda.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/ge_config_tests.py 
Auto-merging .circleci/cimodel/data/simple/ge_config_tests.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_definitions.py 
Auto-merging .circleci/cimodel/data/pytorch_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py 
Auto-merging .circleci/cimodel/data/pytorch_build_data.py 
Automatic merge failed; fix conflicts and then commit the result.

pytorch_xla_linux_bionic_py3_6_clang9_build (3/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_conda.sh 
Auto-merging .circleci/docker/common/install_conda.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/ge_config_tests.py 
Auto-merging .circleci/cimodel/data/simple/ge_config_tests.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_definitions.py 
Auto-merging .circleci/cimodel/data/pytorch_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py 
Auto-merging .circleci/cimodel/data/pytorch_build_data.py 
Automatic merge failed; fix conflicts and then commit the result.

pytorch_linux_xenial_py3_6_gcc5_4_build (4/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 r 8000 erun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_conda.sh 
Auto-merging .circleci/docker/common/install_conda.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/ge_config_tests.py 
Auto-merging .circleci/cimodel/data/simple/ge_config_tests.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_definitions.py 
Auto-merging .circleci/cimodel/data/pytorch_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py 
Auto-merging .circleci/cimodel/data/pytorch_build_data.py 
Automatic merge failed; fix conflicts and then commit the result.

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 8 times.

ezyang · 2020-10-12T21:25:31Z

I can believe that this "helps", but the fix is still subtly wrong. You are still baking in static sizes, by fastpathing to copy_. If at runtime someone gives you tensors whose sizes don't match anymore, this still won't work, as there needed to be a broadcast that isn't happening in the trace. So the "correct" way would be to replace the explicit size materialization with something traceable.

I don't have enough information to assess if the hack is worth it.

ezyang · 2020-10-12T21:26:12Z

@robieta I want to suggest that indexing is something we should consider putting under benchmarking; when @yf225 and I were working on this code originally this code was very sensitive to dynamic allocations.

robieta · 2020-10-12T21:31:28Z

@robieta I want to suggest that indexing is something we should consider putting under benchmarking; when @yf225 and I were working on this code originally this code was very sensitive to dynamic allocations.

I currently have x[1] (setup=y = torch.ones((10,))) in the ad-hoc callgrind benchmarking that I'm doing. (And will become CI) But of course that's not super comprehensive. What other cases would you recommend? (Slicing, other Tensors as indices, etc.)

8000

ppwwyyxx · 2020-10-12T21:42:23Z

Yeah this is not perfect and I think a perfect solution for tracing to support dynamic sizes everywhere is very hard.

However, the edge case that's left in this PR can be easily workaround: if one needs to do a[...] = b where b needs broadcasting, in order to correctly support tracing (after this PR) they only have to do a[...] = b.reshape(xx.size(), ...) because .size() in Python can be correctly traced.
On the other hand, without this PR it is more difficult to workaround such an indexing operation for tracing (at least I can't think of an easy workaround).

yf225 · 2020-10-12T22:27:03Z

aten/src/ATen/TensorIndexing.h

+    // A shortcut to avoid generating hard-coded constant sizes during tracing.
+    dst.copy_(src);
+    return;
+  }


We might need to test whether this would hurt indexing perf for the fallback path. It would be awesome to collect numbers like the following (copy-pasted from #32841 (comment)):

Test Old (us) New (us) % difference (negative: speedup)

v[0] 1.68 1.69 0.60%

v[0] = 1 7.33 7.04 -3.96%

v[:] 1.67 1.69 1.20%

v[:] = 1 7.58 6.98 -7.92%

v[...] 1.29 1.29 0.00%

v[...] = 1 5.6 5.33 -4.82%

v[None] 1.67 1.68 0.60%

v[None] = 1 7.39 7.32 -0.95%

v[False] 9.37 9.03 -3.63%

v[False] = 1 1.15 1 -13.04%

v[True] 12.3 11.7 -4.88%

v[True] = 1 7.52 7.22 -3.99%

v[0, 0] 3.32 3.38 1.81%

v[0, 0] = 1 9 8.83 -1.89%

a[b, None, ..., :, True] 37.8 37.4 -1.06%

a[b, None, ..., :, True] = 1 39.4 39.2 -0.51%

to better understand the impact on performance (note: we might need to change the test cases to make sure we are executing the fallback path).

makes sense. Which code produces this table?

I think this is what I used:

#!/bin/bash # Declare an array of string with type declare -a StmtArray=( "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[0]'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[0] = 1'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[0, 0]'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[0, 0] = 1'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[...]'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[...] = 1'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[:]'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[:] = 1'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[None]'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[None] = 1'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[False]'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[False] = 1'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[True]'" "python -m timeit -s 'import torch; v = torch.randn(1,1,1);' 'v[True] = 1'" "python -m timeit -s 'import torch; a = torch.zeros(100, 100, 1, 1, 1); b = torch.arange(99, -1, -1).long(); ' 'a[b, None, ..., :, True]'" "python -m timeit -s 'import torch; a = torch.zeros(100, 100, 1, 1, 1); b = torch.arange(99, -1, -1).long(); ' 'a[b, None, ..., :, True] = 1'" ) for (( i = 0; i < ${#StmtArray[@]} ; i++ )); do printf "\n**** Running: ${StmtArray[$i]} *****\n\n" for (( j = 0 ; j < 20 ; j++ )); do eval "${StmtArray[$i]}" done done

The way to do it nowadays is something like:

from torch.utils.benchmark import Timer, Compare SCALAR_SETUP = "v = torch.randn(1,1,1)" VECTOR_SETUP = "a = torch.zeros(100, 100, 1, 1, 1); b = torch.arange(99, -1, -1).long()" TASKS = ( (SCALAR_SETUP, 'v[0]'), (SCALAR_SETUP, 'v[0] = 1'), (SCALAR_SETUP, 'v[0, 0]'), (SCALAR_SETUP, 'v[0, 0] = 1'), (SCALAR_SETUP, 'v[...]'), (SCALAR_SETUP, 'v[...] = 1'), (SCALAR_SETUP, 'v[:]'), (SCALAR_SETUP, 'v[:] = 1'), (SCALAR_SETUP, 'v[None]'), (SCALAR_SETUP, 'v[None] = 1'), (SCALAR_SETUP, 'v[False]'), (SCALAR_SETUP, 'v[False] = 1'), (SCALAR_SETUP, 'v[True]'), (SCALAR_SETUP, 'v[True] = 1'), (VECTOR_SETUP, 'a[b, None, ..., :, True]'), (VECTOR_SETUP, 'a[b, None, ..., :, True] = 1'), ) results = [] for setup, stmt in TASKS: timer = Timer( stmt, setup=setup, description="HEAD", ) results.append(timer.blocked_autorange(min_run_time=1)) cmp = Compare(results) # cmp.trim_significant_figures() cmp.print()

[-------------------- -------------------] | HEAD 1 threads: -------------------------------- v[0] | 1.7 v[0] = 1 | 7.2 v[0, 0] | 3.3 v[0, 0] = 1 | 9.5 v[...] | 1.3 v[...] = 1 | 6.0 v[:] | 1.6 v[:] = 1 | 6.7 v[None] | 1.6 v[None] = 1 | 7.0 v[False] | 8.3 v[False] = 1 | 1.1 v[True] | 11.3 v[True] = 1 | 7.0 a[b, None, ..., :, True] | 33.2 a[b, None, ..., :, True] = 1 | 35.1 Times are in microseconds (us).

trim_significant_figures is currently a bit conservative with really fast ops so I didn't use it here, but in the future it will be recommended.

@robieta To answer your question further above, this benchmark is the one that I'd expect to see added.

vadimkantorov · 2020-10-12T23:45:57Z

I propose some verbose mode for tracing that lets uncover some potentially incorrect traces (or a warning) wrt indexing. It's better than a silent recording of static shapes and then failure at runtime and implicitly divergent behavior.

It would also be cool to allow adding some "shape hints" to help shape inference, that would e.g. guarantee dimensionality, some static shapes or equality of a dim of one tensor to another (or to some known "dim symbol").

ezyang · 2020-10-13T15:13:32Z

However, the edge case that's left in this PR can be easily workaround: if one needs to do a[...] = b where b needs broadcasting, in order to correctly support tracing (after this PR) they only have to do a[...] = b.reshape(xx.size(), ...) because .size() in Python can be correctly traced.
On the other hand, without this PR it is more difficult to workaround such an indexing operation for tracing (at least I can't think of an easy workaround).

OK, this is sufficient justification to me.

ezyang

Can you bulk up the comment on the shortcut to include the discussion that we had here, including the suggested workaround?

@vadimkantorov's comment seems relevant in the long term, maybe file a follow up issue about this too.

facebook-github-bot

@ppwwyyxx has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ppwwyyxx · 2020-10-14T10:09:41Z

@vadimkantorov do you want to open a follow up issue and elaborate more about the ideas?

facebook-github-bot · 2020-10-14T10:12:12Z

@ppwwyyxx merged this pull request in b64cf93.

fmassa · 2020-10-14T10:16:32Z

Are we still developing tracing functionality? I thought we would be moving away from tracing to instead rely on scripting the model.

vadimkantorov · 2020-10-14T12:48:25Z

Currently scripting works really badly for export to ONNX (I filed a few issues about this)...-- Vadim Kantorov +33 6 03 29 27 69

vadimkantorov · 2020-10-23T19:31:02Z

@ppwwyyxx @ezyang I had two proposals:

some verbose flag to access potential wanings
expressing shaping information a bit similar to [discussion] Expressing tensor dimension semantics / constraints through typing / constraints blocks. Constraints block could be scripted/traced and help for tracing/script execution and codegen #40373. E.g. allowing tracing some constraints like assert size(X, 1) == size(Y, 2) and size(X, 2) == 64 and ndim(Z) == 3 etc and then using them for compilation/optimization (in my example in [discussion] Expressing tensor dimension semantics / constraints through typing / constraints blocks. Constraints block could be scripted/traced and help for tracing/script execution and codegen #40373 I annotate tensors with strings like BTC and BC and then check that similarly named dimensions of different tensors agree)

ezyang · 2020-10-26T21:38:44Z

@vadimkantorov wanna stick these in some fresh issues?

vadimkantorov · 2020-10-26T21:46:16Z

#40373 covers it, doesn't it? There's no substantial development, but motivation is well-discussed there, no?

ezyang · 2020-10-26T21:55:01Z

Maybe? That issue as a whole is not so likely to get resolved in the near future, but more focused fixes for tracing might be easier to tackle.

vadimkantorov · 2020-10-26T22:01:17Z

I added an UPD there and will rename the issue as well!

vadimkantorov · 2020-10-26T22:11:12Z

Okay, I'll make a new issue as well

[jit] support tracing tensor __setitem__ with dynamic shape

66d3daf

Summary: fix pytorch#43548 Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_trace_slice' --jobs 1 Differential Revision: D24106641 fbshipit-source-id: 240a6a69f44bccb277e21fd76406e31e48c8968b

facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue fb-exported labels Oct 5, 2020

ppwwyyxx requested review from wanchaol and suo October 5, 2020 03:23

ppwwyyxx mentioned this pull request Oct 7, 2020

Improve ONNX tracing support for Faster-RCNN facebookresearch/detectron2#2060

Closed

ppwwyyxx requested a review from gmagogsfm October 7, 2020 06:06

gmagogsfm requested a review from eellison October 9, 2020 08:09

ppwwyyxx requested review from zou3519, gchanan and ezyang and removed request for gmagogsfm and eellison October 9, 2020 19:54

Update TensorIndexing.h

7c781a6

ezyang requested a review from yf225 October 12, 2020 21:22

yf225 reviewed Oct 12, 2020

View reviewed changes

ezyang approved these changes Oct 13, 2020

View reviewed changes

Update TensorIndexing.h

2719ba8

facebook-github-bot reviewed Oct 13, 2020

View reviewed changes

facebook-github-bot closed this in b64cf93 Oct 14, 2020

facebook-github-bot added the Merged label Oct 14, 2020

vadimkantorov mentioned this pull request Oct 26, 2020

[discussion] Expressing tensor dimension semantics / constraints through typing / constraints blocks. Constraints block could be scripted/traced and help for tracing/script execution and codegen #40373

Open

ppwwyyxx mentioned this pull request Dec 25, 2020

Advanced Indexing does not trace correctly for tensor shape that has leading 1s #49852

Closed

ppwwyyxx deleted the export-D24106641 branch September 26, 2024 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[jit] support tracing tensor setitem with dynamic shape #45828

[jit] support tracing tensor setitem with dynamic shape #45828

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Test	Old (us)	New (us)	% difference (negative: speedup)
v[0]	1.68	1.69	0.60%
v[0] = 1	7.33	7.04	-3.96%
v[:]	1.67	1.69	1.20%
v[:] = 1	7.58	6.98	-7.92%
v[...]	1.29	1.29	0.00%
v[...] = 1	5.6	5.33	-4.82%
v[None]	1.67	1.68	0.60%
v[None] = 1	7.39	7.32	-0.95%
v[False]	9.37	9.03	-3.63%
v[False] = 1	1.15	1	-13.04%
v[True]	12.3	11.7	-4.88%
v[True] = 1	7.52	7.22	-3.99%
v[0, 0]	3.32	3.38	1.81%
v[0, 0] = 1	9	8.83	-1.89%
a[b, None, ..., :, True]	37.8	37.4	-1.06%
a[b, None, ..., :, True] = 1	39.4	39.2	-0.51%

[jit] support tracing tensor __setitem__ with dynamic shape #45828

[jit] support tracing tensor __setitem__ with dynamic shape #45828

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pytorch_linux_xenial_py3_clang5_asan_test2 (1/4)

pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (2/4)

pytorch_xla_linux_bionic_py3_6_clang9_build (3/4)

pytorch_linux_xenial_py3_6_gcc5_4_build (4/4)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[jit] support tracing tensor setitem with dynamic shape #45828

[jit] support tracing tensor setitem with dynamic shape #45828