8000 free up dispatch key space (in C++) by bdhirsh · Pull Request #72827 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

free up dispatch key space (in C++) #72827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

bdhirsh
Copy link
Contributor
@bdhirsh bdhirsh commented Feb 15, 2022

Reland of #69633, after fixing some internal mobile failures.

Stack from ghstack (oldest at bottom):

Differential Revision: D34227616

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
@pytorch-bot
Copy link
pytorch-bot bot commented Feb 15, 2022
CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/fc9408d46c1c2cd42b8b72332ee340244b9ba2b1/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk, ciflow/xla ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Feb 15, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 1c367d7 (more details on the Dr. CI page):


  • 5/5 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build pull / linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-25T01:59:41.8035118Z FAIL [0.013s]: test_caching_pinned_memory (__main__.TestCuda)
2022-03-25T01:59:41.7965098Z   test_scatter_cpu_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7969837Z   test_scatter_cpu_neg_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7974797Z   test_scatter_cpu_sizes (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7980961Z   test_scatter_gpu (__main__.TestCudaComm) ... skip (0.001s)
2022-03-25T01:59:41.7986414Z   test_scatter_gpu_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7991869Z   test_scatter_gpu_neg_dim (__main__.TestCudaComm) ... skip (0.000s)
2022-03-25T01:59:41.7997474Z   test_scatter_gpu_sizes (__main__.TestCudaComm) ... skip (0.001s)
2022-03-25T01:59:41.8027744Z   test_scatter_namedtuple (__main__.TestCudaComm) ... skip (0.003s)
2022-03-25T01:59:41.8028325Z 
2022-03-25T01:59:41.8034182Z ======================================================================
2022-03-25T01:59:41.8035118Z FAIL [0.013s]: test_caching_pinned_memory (__main__.TestCuda)
2022-03-25T01:59:41.8037302Z ----------------------------------------------------------------------
2022-03-25T01:59:41.8038159Z Traceback (most recent call last):
2022-03-25T01:59:41.8039496Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1754, in wrapper
2022-03-25T01:59:41.8040595Z     method(*args, **kwargs)
2022-03-25T01:59:41.8041385Z   File "test_cuda.py", line 1370, in test_caching_pinned_memory
2022-03-25T01:59:41.8042770Z     self.assertNotEqual(t.data_ptr(), ptr, msg='allocation re-used too soon')
2022-03-25T01:59:41.8044311Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2160, in assertNotEqual
2022-03-25T01:59:41.8045402Z     self.assertEqual(x, y, msg, atol=atol, rtol=rtol, **kwargs)
2022-03-25T01:59:41.8046576Z AssertionError: AssertionError not raised : allocation re-used too soon
2022-03-25T01:59:41.8047123Z 

See GitHub Actions build pull / linux-docs / build-docs (python) (2/4)

Step: "Build python docs" (full log | diagnosis details | 🔁 rerun)

2022-03-24T22:22:57.3040994Z ImportError: canno...da/lib/python3.7/site-packages/jinja2/__init__.py)
2022-03-24T22:22:57.3037378Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/cmd/build.py", line 25, in <module>
2022-03-24T22:22:57.3037669Z     from sphinx.application import Sphinx
2022-03-24T22:22:57.3038038Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/application.py", line 43, in <module>
2022-03-24T22:22:57.3038356Z     from sphinx.registry import SphinxComponentRegistry
2022-03-24T22:22:57.3038721Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/registry.py", line 24, in <module>
2022-03-24T22:22:57.3039003Z     from sphinx.builders import Builder
2022-03-24T22:22:57.3039524Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 26, in <module>
2022-03-24T22:22:57.3039858Z     from sphinx.util import import_object, logging, progress_message, rst, status_iterator
2022-03-24T22:22:57.3040260Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/util/rst.py", line 21, in <module>
2022-03-24T22:22:57.3040557Z     from jinja2 import Environment, environmentfilter
2022-03-24T22:22:57.3040994Z ImportError: cannot import name 'environmentfilter' from 'jinja2' (/opt/conda/lib/python3.7/site-packages/jinja2/__init__.py)
2022-03-24T22:22:57.3334424Z Makefile:38: recipe for target 'html' failed
2022-03-24T22:22:57.3334726Z make[1]: *** [html] Error 1
2022-03-24T22:22:57.3335123Z make[1]: Leaving directory '/var/lib/jenkins/workspace/docs'
2022-03-24T22:22:57.3335675Z Makefile:27: recipe for target 'html-stable' failed
2022-03-24T22:22:57.3335944Z make: *** [html-stable] Error 2
2022-03-24T22:22:57.3337557Z ++ code=2
2022-03-24T22:22:57.3338027Z ++ '[' 2 -ne 0 ']'
2022-03-24T22:22:57.3338200Z ++ set +x
2022-03-24T22:22:57.3338376Z =========================
2022-03-24T22:22:57.3348359Z =========================

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-24T22:22:23.1522646Z ImportError: canno...da/lib/python3.7/site-packages/jinja2/__init__.py)
2022-03-24T22:22:23.1516012Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/cmd/build.py", line 25, in <module>
2022-03-24T22:22:23.1516574Z     from sphinx.application import Sphinx
2022-03-24T22:22:23.1517248Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/application.py", line 43, in <module>
2022-03-24T22:22:23.1517829Z     from sphinx.registry import SphinxComponentRegistry
2022-03-24T22:22:23.1518761Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/registry.py", line 24, in <module>
2022-03-24T22:22:23.1519293Z     from sphinx.builders import Builder
2022-03-24T22:22:23.1519932Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/builders/__init__.py", line 26, in <module>
2022-03-24T22:22:23.1520563Z     from sphinx.util import import_object, logging, progress_message, rst, status_iterator
2022-03-24T22:22:23.1521303Z   File "/opt/conda/lib/python3.7/site-packages/sphinx/util/rst.py", line 21, in <module>
2022-03-24T22:22:23.1521860Z     from jinja2 import Environment, environmentfilter
2022-03-24T22:22:23.1522646Z ImportError: cannot import name 'environmentfilter' from 'jinja2' (/opt/conda/lib/python3.7/site-packages/jinja2/__init__.py)
2022-03-24T22:22:23.1799933Z Makefile:38: recipe for target 'doctest' failed
2022-03-24T22:22:23.1800218Z make: *** [doctest] Error 1
2022-03-24T22:22:23.1800698Z + cleanup
2022-03-24T22:22:23.1800893Z + retcode=2
2022-03-24T22:22:23.1801065Z + set +x
2022-03-24T22:22:23.1804405Z + cleanup
2022-03-24T22:22:23.1804678Z + retcode=2
2022-03-24T22:22:23.1804922Z + set +x
2022-03-24T22:22:23.1845397Z ##[error]Process completed with exit code 2.
2022-03-24T22:22:23.1880041Z ##[group]Run pytorch/pytorch/.github/actions/chown-workspace@master

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-24T22:22:40.9890897Z The PR is introduc...m to confirm whether this change is wanted or not.
2022-03-24T22:22:40.9876829Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-03-24T22:22:40.9878494Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-03-24T22:22:40.9879403Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-03-24T22:22:40.9881219Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-03-24T22:22:40.9883014Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-03-24T22:22:40.9884065Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-24T22:22:40.9885521Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-24T22:22:40.9886466Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-24T22:22:40.9888871Z processing existing schema:  _dump_stats(__tor
8000
ch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-03-24T22:22:40.9890429Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-03-24T22:22:40.9890897Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-03-24T22:22:40.9890925Z 
2022-03-24T22:22:40.9891202Z Broken ops: [
2022-03-24T22:22:40.9891512Z 	aten::to_sparse_csr(Tensor self) -> (Tensor)
2022-03-24T22:22:40.9891633Z ]
2022-03-24T22:22:41.0879109Z + cleanup
2022-03-24T22:22:41.0879186Z + retcode=1
2022-03-24T22:22:41.0879257Z + set +x
2022-03-24T22:22:41.0926406Z ##[error]Process completed with exit code 1.
2022-03-24T22:22:41.0963554Z ##[group]Run pytorch/pytorch/.github/actions/chown-workspace@master
2022-03-24T22:22:41.0963637Z env:

1 failure not recognized by patterns:

Job Step Action
GitHub Actions pull / linux-docs / build-docs (cpp) Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
Comment on lines 32 to 33
DispatchKey::SparseCsrCPU,
DispatchKey::SparseCsrCUDA,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll likely need to detach CPU/CUDA from CSR dispatch keys as well, similar to Sparse:

Suggested change
DispatchKey::SparseCsrCPU,
DispatchKey::SparseCsrCUDA,
DispatchKey::SparseCsr,

Copy link
Contributor Author
@bdhirsh bdhirsh Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably do that in a followup. For context: There actually isn't a hard requirement on which way to represent SparseCsr - it's more about whether or not we want to trade off dispatch key slots for space in the runtime operator table. We could either:

(1) Make it a "per-backend functionality key". There would be one dispatch key for SparseCsr, and any backend (CPU, CUDA, etc) could implement their own kernels for it separately. Advantage: we save some key space, and give every backend the flexibility to override its meaning (instead of being limited to just CPU and CUDA). Minor disadvantage: we blow up the size of the runtime operator table a bit, since we now need to store an extra (# backends) number of slots for SparseCsr, for every operator. (From what you said, this is also probably logically a bit cleaner)

(2) Leave the existing SparseCsrCPU/CUDA keys, which saves some runtime operator table space.

It sounds like we should eventually do (1)

Reland of D34034848

Differential Revision: [D34227616](https://our.internmc.facebook.com/intern/diff/D34227616/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34227616/)!

[ghstack-poisoned]
@bdhirsh bdhirsh requested review from ezyang and albanD March 25, 2022 13:08
facebook-github-bot pushed a commit to pytorch/nestedtensor that referenced this pull request Mar 25, 2022
Summary:
X-link: pytorch/pytorch#72827

Reland of D34034848
ghstack-source-id: 152161452

Reviewed By: ezyang

Differential Revision: D34227616

fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968
facebook-github-bot pushed a commit that referenced this pull request Mar 25, 2022
Summary:
Pull Request resolved: #72827

Reland of D34034848 (6690256)
ghstack-source-id: 152161452

Test Plan: Confirm that Milan tests are passing

Reviewed By: ezyang

Differential Revision: D34227616

fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968
@facebook-github-bot facebook-github-bot deleted the gh/bdhirsh/183/head branch March 29, 2022 14:17
@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 9872a06. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 9872a06. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

bdhirsh added a commit that referenced this pull request Mar 31, 2022
…pace (in C++)"""

This PR is a re-land of  #69633 (this is the second re-land attempt, the first one is at #72827). The original PR had a memory corruption bug that only surfaced on mobile builds.


*Background: Existing Mobile Optimization*

Pytorch mobile builds have an existing optimization ([here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382) and [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214)), which works as follows:

Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).
In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.
The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294).
The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator).


*The Bug*

This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on [this](https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294) line).
That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the Dispatcher object, backendFallbackKernels_.


**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)!

[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Mar 31, 2022
This PR is a re-land of  #69633 (this is the second re-land attempt, the first one is at #72827). The original PR had a memory corruption bug that only surfaced on mobile builds.


*Background: Existing Mobile Optimization*

Pytorch mobile builds have an existing optimization ([here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382) and [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214)), which works as follows:

Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).
In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.
The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294).
The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator).


*The Bug*

This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on [this](https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294) line).
That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the Dispatcher object, backendFallbackKernels_.


**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)!

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0