move Functionalize dispatch key closer to backends #77132

bdhirsh · 2022-05-10T01:08:10Z

Need this to get functionalize to work with backends (LTC/XLA). Now that we can kill the DECOMPOSE_FUNCTIONAL code in functorch (see pytorch/functorch#814), this should be ok to land once that PR merges.

Stack from ghstack:

[ghstack-poisoned]

facebook-github-bot · 2022-05-10T01:08:37Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/77132
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (3 Pending)

As of commit 1d4e8fa (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 9a0e787 Pull Request resolved: #77132

[ghstack-poisoned]

ghstack-source-id: 5429f44 Pull Request resolved: #77132

[ghstack-poisoned]

ghstack-source-id: c78ae84 Pull Request resolved: #77132

[ghstack-poisoned]

ghstack-source-id: 10b59d7 Pull Request resolved: #77132

[ghstack-poisoned]

zou3519

lgtm, but please check that functorch functionalize tests pass after this change

Need this to get functionalize to work with backends (LTC/XLA). Now that we can kill the `DECOMPOSE_FUNCTIONAL` code in functorch (see pytorch/functorch#814), this should be ok to land once that PR merges. [ghstack-poisoned]

…ont,Back} Fixes #842 As Brian pointed out: For jvp(sub, ...), the chain of dispatch should be: ``` DynamicLayerFrontMode -> at::sub autograd kernel -> DynamicLayerBackMode ``` Instead, what we're doing today is ``` JVP dynamic layer -> at::sub autograd kernel -> at::sub zero_kernel ``` (the zero_tensor kernel errors out, because the inputs are BatchedTensorImpl objects) functorch's behavior on dispatch keys between DynamicLayerFrontMode and DynamicLayerBack mode should be: - upon entering a dynamic layer (aka Interpreter), we zero out all dispatch keys* between FrontMode and BackMode - then, the dynamic layer (aka Interpreter) decides to re-enable some dispatch keys. For example, JVPInterpreter decides to re-enable the autograd keys - next, we do a dispatcher call, which will end up hitting one of the Autograd keys (in the JVPInterpreter case). The bug is that functorch has a hardcoded list of dispatch keys that it zeros out. This list does not include ZeroTensor, because before pytorch/pytorch#77132, the ZeroTensor key was not between DynamicLayer{Front,Back}Mode. *There is an exception for autocast and vmapmode, described in the next section. Change functorch to programmatically zero out keys between DynamicLayerBackMode and DynamicLayerFrontMode, with the exception of Autocast and VmapMode. This means that in the future, if someone adds a dispatch key between DynamicLayerBackMode and DynamicLayerFrontMode, we will (probably) be handling it "correctly": the model for dispatch is: - [functorch] -> [regular pytorch dispatcher] - a key like ZeroTensor gets handled in the [regular pytorch dispatcher] section. - functorch transforms get handled in the [functorch] section. We do not change the autocast <-> functorch interaction in this PR (i.e. functorch does not zero it out) because I'm not sure what the correct thing to do here is. We do not change how kVmapMode works because... it needs to be active to ban random operations in transforms later down the line :/ Wait for tests

…ont,Back} Fixes #842 The Diagnosis ============= As Brian pointed out: For jvp(sub, ...), the chain of dispatch should be: ``` DynamicLayerFrontMode -> at::sub autograd kernel -> DynamicLayerBackMode ``` Instead, what we're doing today is ``` JVP dynamic layer -> at::sub autograd kernel -> at::sub zero_kernel ``` (the zero_tensor kernel errors out, because the inputs are BatchedTensorImpl objects) The Problem ============= functorch's behavior on dispatch keys between DynamicLayerFrontMode and DynamicLayerBack mode should be: - upon entering a dynamic layer (aka Interpreter), we zero out all dispatch keys* between FrontMode and BackMode - then, the dynamic layer (aka Interpreter) decides to re-enable some dispatch keys. For example, JVPInterpreter decides to re-enable the autograd keys - next, we do a dispatcher call, which will end up hitting one of the Autograd keys (in the JVPInterpreter case). The bug is that functorch has a hardcoded list of dispatch keys that it zeros out. This list does not include ZeroTensor, because before pytorch/pytorch#77132, the ZeroTensor key was not between DynamicLayer{Front,Back}Mode. *There is an exception for autocast and vmapmode, described in the next section. The Solution ============ Change functorch to programmatically zero out keys between DynamicLayerBackMode and DynamicLayerFrontMode, with the exception of Autocast and VmapMode. This means that in the future, if someone adds a dispatch key between DynamicLayerBackMode and DynamicLayerFrontMode, we will (probably) be handling it "correctly": the model for dispatch is: - [functorch] -> [regular pytorch dispatcher] - a key like ZeroTensor gets handled in the [regular pytorch dispatcher] section. - functorch transforms get handled in the [functorch] section. We do not change the autocast <-> functorch interaction in this PR (i.e. functorch does not zero it out) because I'm not sure what the correct thing to do here is. We do not change how kVmapMode works because... it needs to be active to ban random operations in transforms later down the line :/ Test Plan ============ Wait for tests

…ont,Back} (#843) Fixes #842 The Diagnosis ============= As Brian pointed out: For jvp(sub, ...), the chain of dispatch should be: ``` DynamicLayerFrontMode -> at::sub autograd kernel -> DynamicLayerBackMode ``` Instead, what we're doing today is ``` JVP dynamic layer -> at::sub autograd kernel -> at::sub zero_kernel ``` (the zero_tensor kernel errors out, because the inputs are BatchedTensorImpl objects) The Problem ============= functorch's behavior on dispatch keys between DynamicLayerFrontMode and DynamicLayerBack mode should be: - upon entering a dynamic layer (aka Interpreter), we zero out all dispatch keys* between FrontMode and BackMode - then, the dynamic layer (aka Interpreter) decides to re-enable some dispatch keys. For example, JVPInterpreter decides to re-enable the autograd keys - next, we do a dispatcher call, which will end up hitting one of the Autograd keys (in the JVPInterpreter case). The bug is that functorch has a hardcoded list of dispatch keys that it zeros out. This list does not include ZeroTensor, because before pytorch/pytorch#77132, the ZeroTensor key was not between DynamicLayer{Front,Back}Mode. *There is an exception for autocast and vmapmode, described in the next section. The Solution ============ Change functorch to programmatically zero out keys between DynamicLayerBackMode and DynamicLayerFrontMode, with the exception of Autocast and VmapMode. This means that in the future, if someone adds a dispatch key between DynamicLayerBackMode and DynamicLayerFrontMode, we will (probably) be handling it "correctly": the model for dispatch is: - [functorch] -> [regular pytorch dispatcher] - a key like ZeroTensor gets handled in the [regular pytorch dispatcher] section. - functorch transforms get handled in the [functorch] section. We do not change the autocast <-> functorch interaction in this PR (i.e. functorch does not zero it out) because I'm not sure what the correct thing to do here is. We do not change how kVmapMode works because... it needs to be active to ban random operations in transforms later down the line :/ Test Plan ============ Wait for tests

Summary: Pull Request resolved: #77132 Approved by: https://github.com/ezyang, https://github.com/zou3519 Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/7ff091fc4e66f18c2fd463ca038688b67548a6b0 Reviewed By: seemethere Differential Revision: D36783103 Pulled By: bdhirsh fbshipit-source-id: 4b25d31257384588b4b1644f7d45adff683eb025

…amicLayer{Front,Back} (pytorch/functorch#843) Fixes pytorch/functorch#842 The Diagnosis ============= As Brian pointed out: For jvp(sub, ...), the chain of dispatch should be: ``` DynamicLayerFrontMode -> at::sub autograd kernel -> DynamicLayerBackMode ``` Instead, what we're doing today is ``` JVP dynamic layer -> at::sub autograd kernel -> at::sub zero_kernel ``` (the zero_tensor kernel errors out, because the inputs are BatchedTensorImpl objects) The Problem ============= functorch's behavior on dispatch keys between DynamicLayerFrontMode and DynamicLayerBack mode should be: - upon entering a dynamic layer (aka Interpreter), we zero out all dispatch keys* between FrontMode and BackMode - then, the dynamic layer (aka Interpreter) decides to re-enable some dispatch keys. For example, JVPInterpreter decides to re-enable the autograd keys - next, we do a dispatcher call, which will end up hitting one of the Autograd keys (in the JVPInterpreter case). The bug is that functorch has a hardcoded list of dispatch keys that it zeros out. This list does not include ZeroTensor, because before pytorch#77132, the ZeroTensor key was not between DynamicLayer{Front,Back}Mode. *There is an exception for autocast and vmapmode, described in the next section. The Solution ============ Change functorch to programmatically zero out keys between DynamicLayerBackMode and DynamicLayerFrontMode, with the exception of Autocast and VmapMode. This means that in the future, if someone adds a dispatch key between DynamicLayerBackMode and DynamicLayerFrontMode, we will (probably) be handling it "correctly": the model for dispatch is: - [functorch] -> [regular pytorch dispatcher] - a key like ZeroTensor gets handled in the [regular pytorch dispatcher] section. - functorch transforms get handled in the [functorch] section. We do not change the autocast <-> functorch interaction in this PR (i.e. functorch does not zero it out) because I'm not sure what the correct thing to do here is. We do not change how kVmapMode works because... it needs to be active to ban random operations in transforms later down the line :/ Test Plan ============ Wait for tests

…amicLayer{Front,Back} (pytorch/functorch#843) Fixes pytorch/functorch#842 The Diagnosis ============= As Brian pointed out: For jvp(sub, ...), the chain of dispatch should be: ``` DynamicLayerFrontMode -> at::sub autograd kernel -> DynamicLayerBackMode ``` Instead, what we're doing today is ``` JVP dynamic layer -> at::sub autograd kernel -> at::sub zero_kernel ``` (the zero_tensor kernel errors out, because the inputs are BatchedTensorImpl objects) The Problem ============= functorch's behavior on dispatch keys between DynamicLayerFrontMode and DynamicLayerBack mode should be: - upon entering a dynamic layer (aka Interpreter), we zero out all dispatch keys* between FrontMode and BackMode - then, the dynamic layer (aka Interpreter) decides to re-enable some dispatch keys. For example, JVPInterpreter decides to re-enable the autograd keys - next, we do a dispatcher call, which will end up hitting one of the Autograd keys (in the JVPInterpreter case). The bug is that functorch has a hardcoded list of dispatch keys that it zeros out. This list does not include ZeroTensor, because before #77132, the ZeroTensor key was not between DynamicLayer{Front,Back}Mode. *There is an exception for autocast and vmapmode, described in the next section. The Solution ============ Change functorch to programmatically zero out keys between DynamicLayerBackMode and DynamicLayerFrontMode, with the exception of Autocast and VmapMode. This means that in the future, if someone adds a dispatch key between DynamicLayerBackMode and DynamicLayerFrontMode, we will (probably) be handling it "correctly": the model for dispatch is: - [functorch] -> [regular pytorch dispatcher] - a key like ZeroTensor gets handled in the [regular pytorch dispatcher] section. - functorch transforms get handled in the [functorch] section. We do not change the autocast <-> functorch interaction in this PR (i.e. functorch does not zero it out) because I'm not sure what the correct thing to do here is. We do not change how kVmapMode works because... it needs to be active to ban random operations in transforms later down the line :/ Test Plan ============ Wait for tests

move Functionalize dispatch key closer to backends

667b258

[ghstack-poisoned]

facebook-github-bot added the cla signed label May 10, 2022

bdhirsh added a commit that referenced this pull request May 10, 2022

move Functionalize dispatch key closer to backends

85f218f

ghstack-source-id: 9a0e787 Pull Request resolved: #77132

Update on "move Functionalize dispatch key closer to backends"

9e602db

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request May 10, 2022

move Functionalize dispatch key closer to backends

f64360d

ghstack-source-id: 5429f44 Pull Request resolved: #77132

Update on "move Functionalize dispatch key closer to backends"

0de2d10

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request May 11, 2022

move Functionalize dispatch key closer to backends

23f0a2a

ghstack-source-id: c78ae84 Pull Request resolved: #77132

Update on "move Functionalize dispatch key closer to backends"

6cb12e4

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request May 13, 2022

move Functionalize dispatch key closer to backends

caf601d

ghstack-source-id: 10b59d7 Pull Request resolved: #77132

Update on "move Functionalize dispatch key closer to backends"

5f15080

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

50251bc

[ghstack-poisoned]

bdhirsh mentioned this pull request May 18, 2022

add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL pytorch/functorch#814

Merged

Update on "move Functionalize dispatch key closer to backends"

dc5e3f4

[ghstack-poisoned]

bdhirsh added 13 commits May 17, 2022 20:14

Update on "move Functionalize dispatch key closer to backends"

b10fa92

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

07c06d8

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

aaa6c89

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

3912289

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

87e97ac

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

dee4f50

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

462216c

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

738a606

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

74b121a

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

2aee5b6

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

140b296

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

747b50b

[ghstack-poisoned]

Update on "move Functionalize dispatch key closer to backends"

330ad5d

[ghstack-poisoned]

bdhirsh requested review from ezyang and zou3519 May 25, 2022 18:15

ezyang approved these changes May 25, 2022

View reviewed changes

zou3519 approved these changes May 25, 2022

View reviewed changes

pytorchmergebot closed this in 7ff091f May 26, 2022

facebook-github-bot deleted the gh/bdhirsh/230/head branch May 30, 2022 14:17

zou3519 mentioned this pull request May 31, 2022

Allow people to arbitrarily add dispatch keys between DynamicLayer{Front,Back} pytorch/functorch#843

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

move Functionalize dispatch key closer to backends #77132

move Functionalize dispatch key closer to backends #77132

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

move Functionalize dispatch key closer to backends #77132

move Functionalize dispatch key closer to backends #77132

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful links

✅ No Failures (3 Pending)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!