free up dispatch key space (in C++) #72827

bdhirsh · 2022-02-15T00:01:20Z

Reland of #69633, after fixing some internal mobile failures.

Stack from ghstack (oldest at bottom):

Differential Revision: D34227616

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Reland of D34034848 Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)! [ghstack-poisoned]

pytorch-bot · 2022-02-15T00:01:24Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/fc9408d46c1c2cd42b8b72332ee340244b9ba2b1/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

facebook-github-bot · 2022-02-15T00:01:25Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/72827
↩️ [fb-only] Re-run with SSH instructions
Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 1c367d7 (more details on the Dr. CI page):

5/5 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pull / linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-25T01:59:41.8035118Z FAIL [0.013s]: test_caching_pinned_memory (__main__.TestCuda)

2022-03-25T01:59:41.7965098Z   test_scatter_cpu_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7969837Z   test_scatter_cpu_neg_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7974797Z   test_scatter_cpu_sizes (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7980961Z   test_scatter_gpu (__main__.TestCudaComm) ... skip (0.001s)
2022-03-25T01:59:41.7986414Z   test_scatter_gpu_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7991869Z   test_scatter_gpu_neg_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7997474Z   test_scatter_gpu_sizes (__main__.TestCudaComm) ... skip (0.001s)
2022-03-25T01:59:41.8027744Z   test_scatter_namedtuple (__main__.TestCudaComm) ... skip (0.003s)
2022-03-25T01:59:41.8028325Z 
2022-03-25T01:59:41.8034182Z ======================================================================
2022-03-25T01:59:41.8035118Z FAIL [0.013s]: test_caching_pinned_memory (__main__.TestCuda)
2022-03-25T01:59:41.8037302Z ----------------------------------------------------------------------
2022-03-25T01:59:41.8038159Z Traceback (most recent call last):
2022-03-25T01:59:41.8039496Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1754, in wrapper
2022-03-25T01:59:41.8040595Z     method(*args, **kwargs)
2022-03-25T01:59:41.8041385Z   File "test_cuda.py", line 1370, in test_caching_pinned_memory
2022-03-25T01:59:41.8042770Z     self.assertNotEqual(t.data_ptr(), ptr, msg='allocation re-used too soon')
2022-03-25T01:59:41.8044311Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2160, in assertNotEqual
2022-03-25T01:59:41.8045402Z     self.assertEqual(x, y, msg, atol=atol, rtol=rtol, **kwargs)
2022-03-25T01:59:41.8046576Z AssertionError: AssertionError not raised : allocation re-used too soon
2022-03-25T01:59:41.8047123Z

pull / linux-docs / build-docs (python) (2/4)

Step: "Build python docs" (full log | diagnosis details | 🔁 rerun)

2022-03-24T22:22:57.3040994Z ImportError: canno...da/lib/python3.7/site-packages/jinja2/__init__.py)

2022-03-24T22:22:57.3037378Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/cmd/build.py", line 25, in <module>
2022-03-24T22:22:57.3037669Z     from sphinx.application import Sphinx
2022-03-24T22:22:57.3038038Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/application.py", line 43, in <module>
2022-03-24T22:22:57.3038356Z     from sphinx.registry import SphinxComponentRegistry
2022-03-24T22:22:57.3038721Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/registry.py", line 24, in <module>
2022-03-24T22:22:57.3039003Z     from sphinx.builders import Builder
2022-03-24T22:22:57.3039524Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 26, in <module>
2022-03-24T22:22:57.3039858Z     from sphinx.util import import_object, logging, progress_message, rst, status_iterator
2022-03-24T22:22:57.3040260Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/util/rst.py", line 21, in <module>
2022-03-24T22:22:57.3040557Z     from jinja2 import Environment, environmentfilter
2022-03-24T22:22:57.3040994Z ImportError: cannot import name 'environmentfilter' from 'jinja2' (/opt/conda/lib/python3.7/site-packages/jinja2/__init__.py)
2022-03-24T22:22:57.3334424Z Makefile:38: recipe for target 'html' failed
2022-03-24T22:22:57.3334726Z make[1]: *** [html] Error 1
2022-03-24T22:22:57.3335123Z make[1]: Leaving directory '/var/lib/jenkins/workspace/docs'
2022-03-24T22:22:57.3335675Z Makefile:27: recipe for target 'html-stable' failed
2022-03-24T22:22:57.3335944Z make: *** [html-stable] Error 2
2022-03-24T22:22:57.3337557Z ++ code=2
2022-03-24T22:22:57.3338027Z ++ '[' 2 -ne 0 ']'
2022-03-24T22:22:57.3338200Z ++ set +x
2022-03-24T22:22:57.3338376Z =========================
2022-03-24T22:22:57.3348359Z =========================

pull / linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-24T22:22:23.1522646Z ImportError: canno...da/lib/python3.7/site-packages/jinja2/__init__.py)

2022-03-24T22:22:23.1516012Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/cmd/build.py", line 25, in <module>
2022-03-24T22:22:23.1516574Z     from sphinx.application import Sphinx
2022-03-24T22:22:23.1517248Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/application.py", line 43, in <module>
2022-03-24T22:22:23.1517829Z     from sphinx.registry import SphinxComponentRegistry
2022-03-24T22:22:23.1518761Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/registry.py", line 24, in <module>
2022-03-24T22:22:23.1519293Z     from sphinx.builders import Builder
2022-03-24T22:22:23.1519932Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 26, in <module>
2022-03-24T22:22:23.1520563Z     from sphinx.util import import_object, logging, progress_message, rst, status_iterator
2022-03-24T22:22:23.1521303Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/util/rst.py", line 21, in <module>
2022-03-24T22:22:23.1521860Z     from jinja2 import Environment, environmentfilter
2022-03-24T22:22:23.1522646Z ImportError: cannot import name 'environmentfilter' from 'jinja2' (/opt/conda/lib/python3.7/site-packages/jinja2/__init__.py)
2022-03-24T22:22:23.1799933Z Makefile:38: recipe for target 'doctest' failed
2022-03-24T22:22:23.1800218Z make: *** [doctest] Error 1
2022-03-24T22:22:23.1800698Z + cleanup
2022-03-24T22:22:23.1800893Z + retcode=2
2022-03-24T22:22:23.1801065Z + set +x
2022-03-24T22:22:23.1804405Z + cleanup
2022-03-24T22:22:23.1804678Z + retcode=2
2022-03-24T22:22:23.1804922Z + set +x
2022-03-24T22:22:23.1845397Z ##[error]Process completed with exit code 2.
2022-03-24T22:22:23.1880041Z ##[group]Run pytorch/pytorch/.github/actions/chown-workspace@master

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-24T22:22:40.9890897Z The PR is introduc...m to confirm whether this change is wanted or not.

2022-03-24T22:22:40.9876829Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-03-24T22:22:40.9878494Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-03-24T22:22:40.9879403Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-03-24T22:22:40.9881219Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-03-24T22:22:40.9883014Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-03-24T22:22:40.9884065Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-24T22:22:40.9885521Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-24T22:22:40.9886466Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-24T22:22:40.9888871Z processing existing schema:  _dump_stats(__tor
8000
ch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-03-24T22:22:40.9890429Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-03-24T22:22:40.9890897Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-03-24T22:22:40.9890925Z 
2022-03-24T22:22:40.9891202Z Broken ops: [
2022-03-24T22:22:40.9891512Z 	aten::to_sparse_csr(Tensor self) -> (Tensor)
2022-03-24T22:22:40.9891633Z ]
2022-03-24T22:22:41.0879109Z + cleanup
2022-03-24T22:22:41.0879186Z + retcode=1
2022-03-24T22:22:41.0879257Z + set +x
2022-03-24T22:22:41.0926406Z ##[error]Process completed with exit code 1.
2022-03-24T22:22:41.0963554Z ##[group]Run pytorch/pytorch/.github/actions/chown-workspace@master
2022-03-24T22:22:41.0963637Z env:

1 failure not recognized by patterns:

Job	Step	Action
^{pull / linux-docs / build-docs (cpp)}	^Unknown	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Reland of D34034848 Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)! [ghstack-poisoned]

pearu · 2022-03-23T13:21:29Z

aten/src/ATen/TensorSubclassLikeUtils.h

    DispatchKey::SparseCsrCPU,
    DispatchKey::SparseCsrCUDA,


We'll likely need to detach CPU/CUDA from CSR dispatch keys as well, similar to Sparse:

Suggested change

DispatchKey::SparseCsrCPU,

DispatchKey::SparseCsrCUDA,

DispatchKey::SparseCsr,

I'll probably do that in a followup. For context: There actually isn't a hard requirement on which way to represent SparseCsr - it's more about whether or not we want to trade off dispatch key slots for space in the runtime operator table. We could either:

(1) Make it a "per-backend functionality key". There would be one dispatch key for SparseCsr, and any backend (CPU, CUDA, etc) could implement their own kernels for it separately. Advantage: we save some key space, and give every backend the flexibility to override its meaning (instead of being limited to just CPU and CUDA). Minor disadvantage: we blow up the size of the runtime operator table a bit, since we now need to store an extra (# backends) number of slots for SparseCsr, for every operator. (From what you said, this is also probably logically a bit cleaner)

(2) Leave the existing SparseCsrCPU/CUDA keys, which saves some runtime operator table space.

It sounds like we should eventually do (1)

Reland of D34034848 Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)! [ghstack-poisoned]

Summary: X-link: pytorch/pytorch#72827 Reland of D34034848 ghstack-source-id: 152161452 Reviewed By: ezyang Differential Revision: D34227616 fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968

Summary: Pull Request resolved: #72827 Reland of D34034848 (6690256) ghstack-source-id: 152161452 Test Plan: Confirm that Milan tests are passing Reviewed By: ezyang Differential Revision: D34227616 fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968

facebook-github-bot · 2022-03-29T15:39:27Z

This pull request has been reverted by 9872a06. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

facebook-github-bot · 2022-03-30T00:56:45Z

This pull request has been reverted by 9872a06. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

…pace (in C++)""" This PR is a re-land of #69633 (this is the second re-land attempt, the first one is at #72827). The original PR had a memory corruption bug that only surfaced on mobile builds. *Background: Existing Mobile Optimization* Pytorch mobile builds have an existing optimization ([here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382) and [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214)), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294). The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). *The Bug* This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on [this](https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294) line). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the Dispatcher object, backendFallbackKernels_. **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)! [ghstack-poisoned]

This PR is a re-land of #69633 (this is the second re-land attempt, the first one is at #72827). The original PR had a memory corruption bug that only surfaced on mobile builds. *Background: Existing Mobile Optimization* Pytorch mobile builds have an existing optimization ([here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382) and [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214)), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294). The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). *The Bug* This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on [this](https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294) line). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the Dispatcher object, backendFallbackKernels_. **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)! [ghstack-poisoned]

pytorch-bot bot added the ciflow/default label Feb 15, 2022

facebook-github-bot added the cla signed label Feb 15, 2022

bdhirsh mentioned this pull request Feb 15, 2022

DispatchKeySet perf improvements #72828

Closed

bdhirsh added 5 commits February 17, 2022 10:09

ezyang mentioned this pull request Feb 26, 2022

Minimal NestedTensor #72881

Closed

bdhirsh mentioned this pull request Mar 21, 2022

free up dispatch key space (in C++) #72402

Closed

bdhirsh mentioned this pull request Mar 22, 2022

Introduce sparse CSC layout and is_sparse_csc property. #74546

Closed

suo removed the ciflow/default label Mar 22, 2022

pearu reviewed Mar 23, 2022

View reviewed changes

bdhirsh requested review from ezyang and albanD March 25, 2022 13:08

albanD approved these changes Mar 25, 2022

View reviewed changes

pytorchmergebot closed this in 2cbddc0 Mar 25, 2022

bdhirsh added release notes: composability release notes category topic: developer feature labels Mar 28, 2022

facebook-github-bot deleted the gh/bdhirsh/183/head branch March 29, 2022 14:17

facebook-github-bot added the Reverted label Mar 29, 2022

bdhirsh mentioned this pull request Mar 30, 2022

Back out "Back out "free up dispatch key space (in C++)"" #74963

Closed

zengk95 mentioned this pull request Apr 5, 2022

[Meta] CI Revert Tracker #66178

Closed

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

free up dispatch key space (in C++) #72827

free up dispatch key space (in C++) #72827

Uh oh!

Uh oh!

⚛️ CI Flow

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	DispatchKey::SparseCsrCPU,
	DispatchKey::SparseCsrCUDA,
	DispatchKey::SparseCsr,

free up dispatch key space (in C++) #72827

free up dispatch key space (in C++) #72827

Uh oh!

Conversation

Uh oh!

Uh oh!

⚛️ CI Flow

Uh oh!

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pull / linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (1/4)

pull / linux-docs / build-docs (python) (2/4)

pull / linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge) (3/4)

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (4/4)

1 failure not recognized by patterns:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!