`kl_div`: fix for grads wrt `target`, double backward, forward-over-reverse AD support. #79007

nikitaved · 2022-06-07T10:51:45Z

Fixes #78867,
fixes #65466.
Adds forward-over-reverse AD support.

facebook-github-bot · 2022-06-07T10:51:50Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/79007
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 35e4465 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jbschlosser

Thanks for the fixes! I'll suggest alternatively tightening up the input.shape == target.shape check if possible (see comment here). If there are actual use cases that depend on the broadcasting (unlikely for loss calculation, slightly more likely if kl_div is used in a non-loss context), then the approach taken here makes sense.

torch/testing/_internal/common_methods_invocations.py

jbschlosser · 2022-06-07T16:53:54Z

torch/csrc/autograd/FunctionsManual.cpp

+  else {
+    g = at::where(target > 0, -grad * grad_output, at::zeros({}, grad.options()));
+    if (reduction == at::Reduction::Mean) {
+      g = areAnyTensorSubclassLike({g}) ? g / input.numel() : g.mul_(input.numel());


This doesn't seem correct to me - should the in-place version multiply by 1 / input.numel() instead?

Good catch! Should be div_ instead of mul_ indeed.

Interesting that the tests didn't catch this - do we need to improve coverage somewhere?

cc @soulitzer . I think this path should have been chosen here... So I wonder just the same. The coverage is otherwise there.

I checked that we are indeed going into the isTensorSubclass(g) == true path. See #79079

@mruberry , could you please tell whether OpInfo tests these paths?

I think OpInfo intends to test both isTensorSubclass(g) == true and false paths (test_fn_gradgrad runs gradgradcheck which checks with both batched and non-batched tensors), it's just due to a bug only the true paths is being reached currently.

I'd definitely feel better about correctness if that bug was fixed :p Can we at least run it manually before merging this PR?

I have tested these two paths separately, both work.

Interesting question about OpInfo coverage -- I don't think we currently test with a tensor subclass in any systematic way, but I could be mistaken. @ezyang would probably know better

torch/csrc/autograd/FunctionsManual.cpp

torch/testing/_internal/common_methods_invocations.py

nikitaved · 2022-06-07T18:32:59Z

@jbschlosser , thank you for your comments! I have addressed them. I have decided to keep broadcasting as kl_div already has a warning about potential issues when broadcasting is used. It suggests, however, that the grads could be wrong. I think it would make sense to change it to something different, or remove entirely. What do you think about that?

jbschlosser · 2022-06-07T18:36:08Z

@jbschlosser , thank you for your comments! I have addressed them. I have decided to keep broadcasting as kl_div already has a warning about potential issues when broadcasting is used. It suggests, however, that the grads could be wrong. I think it would make sense to change it to something different, or remove entirely. What do you think about that?

Ah interesting - can you point me to that warning please? I only see a warning regarding reduction='mean' vs. reduction='batchmean'.

nikitaved · 2022-06-07T18:40:04Z

Sorry, I confused kl_div with other loss functions. l1_loss and mse_loss have a warning, for example. I suspected them to have the same issue as kl_div, but they are fine wrt autograd since the kernels receive broadcasted inputs. No action is needed, unless we indeed decide to impose shape restrictions.

jbschlosser · 2022-06-07T18:48:41Z

Sorry, I confused kl_div with other loss functions. l1_loss and mse_loss have a warning, for example. I suspected them to have the same issue as kl_div, but they are fine wrt autograd since the kernels receive broadcasted inputs. No action is needed, unless we indeed decide to impose shape restrictions.

Okay awesome. I see the discussion in #16045 regarding broadcasting for loss functions that led to a warning instead of a hard error. Following the historical precedent set there, I think we should move forward with your broadcasting fix. Maybe in a future PR, it'd also make sense to add a similar warning to kl_div, but let's leave that out for now.

jbschlosser · 2022-06-07T18:51:59Z

torch/nn/functional.py

@@ -2869,7 +2869,8 @@ def kl_div(
        else:
            reduction_enum = _Reduction.get_enum(reduction)

-    reduced = torch.kl_div(input, target, reduction_enum, log_target=log_target)
+    expanded_input, expanded_target = torch.broadcast_tensors(input, target)


I see that this matches what was done for e.g. nn.MSELoss. With manual broadcasting here, do we still need the kernel changes?

No, not really. But I like to keep it consistent, since forward is using input.numel, not target.numel.

nikitaved · 2022-06-08T13:20:16Z

JIT issues seem relevant.

soulitzer · 2022-06-08T15:27:58Z

aten/src/ATen/native/cuda/Loss.cu

@@ -61,12 +61,14 @@ void binary_cross_entropy_backward_out_kernel(Tensor& grad_input, const Tensor&
 namespace at { namespace native {

 Tensor kl_div_backward_cuda(const Tensor& grad, const Tensor& input, const Tensor& target, int64_t reduction, bool log_target) {
-  auto grad_input = at::empty_like(input);
+  const auto grad_expanded = grad.expand_as(input);


Hmm backward wasn't failing previously right?

No, it was not because the tests used target.shape() == input.shape() inputs.

Oh, I was thinking that because grad has the same size as the potentially reduced output, we'd need to do some kind of expanding always, and whether target was broadcasted to input or not doesn't matter. But maybe TensorIterator is doing something?

TI was complaining about shape mismatch in some cases, and in some it would segfault. Probably worth investigating...

Yeah tensor iterator is supposed to handle broadcasting, but maybe there is an issue specifically with this case?

Either that, or the grad forwarding. I will try to isolate the issue, maybe it is something worth reporting about.

My understanding is that, before the fix, TI didn't know to broadcast because it was handed an unbroadcasted target and grad.

TI should have been broadcasted inputs with shapes input=(5, 5, 5) and output=(5, 5) (where it complaints about output-input mismatch), and shapes input=(5, 5, 5), output=(5,) should not have caused any segfaults. Back then TI used to broadcast inputs and outputs to the same shape, but, apparently, this behavior might have been changed, CC @ngimel .

soulitzer

Thanks!

nikitaved · 2022-06-08T17:17:42Z

@jbschlosser , do you have more comments and/or suggestions? Otherwise, if it is fine with you, I will merge it.

jbschlosser · 2022-06-08T17:33:20Z

@nikitaved I'm good with it, but is there any way to manually test the case that @soulitzer pointed out is wrongly being skipped?

nikitaved · 2022-06-08T18:03:58Z

@jbschlosser , I can confirm gradcheck is happy with both paths.

jbschlosser · 2022-06-08T18:57:13Z

@nikitaved good to merge on my end then!

nikitaved · 2022-06-09T09:02:11Z

@pytorchbot merge

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 [ghstack-poisoned]

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 9d28b23 Pull Request resolved: #80334

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: c25d0a0 Pull Request resolved: #80334

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 9b824cc Pull Request resolved: #80334

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: b02c11e Pull Request resolved: #80334

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 Pull Request resolved: #80334 Approved by: https://github.com/ezyang

Summary: Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 Pull Request resolved: #80334 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/828c787ea98da39eb786925eedcb8527aae07153 Reviewed By: mehtanirav Differential Revision: D37604775 Pulled By: mehtanirav fbshipit-source-id: b188d47df5a3a820e5c15d9ce18b1a2c3f31f287

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 8dd9be6 Pull Request resolved: #80334

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 2e71ccd Pull Request resolved: #80334

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 Pull Request resolved: #80334 Approved by: https://github.com/ezyang

Summary: Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 Pull Request resolved: #80334 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/b5b9db9f844f4f100651c6afa57124fa5851edec Reviewed By: DanilBaibak Differential Revision: D37847477 Pulled By: DanilBaibak fbshipit-source-id: a04919bbd2b746c30c654b971efcf76ef27ac5a6

github-actions · 2022-08-15T02:17:03Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

lezcano · 2022-08-15T09:06:53Z

Superseded by #80334

nikitaved added the module: autograd Related to torch.autograd, and the autograd engine in general label Jun 7, 2022

nikitaved requested review from mruberry, ngimel, albanD and soulitzer as code owners June 7, 2022 10:51

facebook-github-bot added the cla signed label Jun 7, 2022

nikitaved requested review from jbschlosser and removed request for albanD, ngimel and mruberry June 7, 2022 10:52

nikitaved changed the title ~~Nikitaved/kl div bacward boost~~ kl_div: fix for grads wrt target, double backward, forward-over-reverse AD support. Jun 7, 2022

nikitaved added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 7, 2022

pytorchbot added the open source label Jun 7, 2022

jbschlosser reviewed Jun 7, 2022

View reviewed changes

soulitzer mentioned this pull request Jun 7, 2022

isTensorSubclassLike is returning true when it isn't supposed to #79079

Closed

soulitzer reviewed Jun 8, 2022

View reviewed changes

soulitzer approved these changes Jun 8, 2022

View reviewed changes

jbschlosser approved these changes Jun 8, 2022

View reviewed changes

lezcano mentioned this pull request Jun 27, 2022

Make kl_div a composite function. #80334

Closed

lezcano added a commit that referenced this pull request Jun 27, 2022

Make kl_div a composite function.

43bc896

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jun 27, 2022

Update on "Make kl_div a composite function."

0b2edf9

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jun 27, 2022

Make kl_div a composite function.

2a6b935

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 9d28b23 Pull Request resolved: #80334

lezcano added a commit that referenced this pull request Jun 28, 2022

Update on "Make kl_div a composite function."

f8d2e87

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jun 28, 2022

Make kl_div a composite function.

6c5ff4f

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: c25d0a0 Pull Request resolved: #80334

lezcano added a commit that referenced this pull request Jun 30, 2022

Update base for Update on "Make kl_div a composite function."

7cb77ae

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jun 30, 2022

Make kl_div a composite function.

048b491

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 9b824cc Pull Request resolved: #80334

lezcano added a commit that referenced this pull request Jun 30, 2022

Update on "Make kl_div a composite function."

d435965

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 4, 2022

Update base for Update on "Make kl_div a composite function."

ebb8eef

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 4, 2022

Update on "Make kl_div a composite function."

6d315b8

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 4, 2022

Make kl_div a composite function.

ef57f49

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: b02c11e Pull Request resolved: #80334

lezcano added a commit that referenced this pull request Jul 12, 2022

Make kl_div a composite function.

d7edb33

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 8dd9be6 Pull Request resolved: #80334

lezcano added a commit that referenced this pull request Jul 12, 2022

Update base for Update on "Make kl_div a composite function."

16b4964

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 12, 2022

Update on "Make kl_div a composite function."

cab416d

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 13, 2022

Update base for Update on "Make kl_div a composite function."

b05bb94

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 13, 2022

Update on "Make kl_div a composite function."

4d66b0c

Benchmarks: #80334 (comment) Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 Supersedes #19659 [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 13, 2022

Make kl_div a composite function.

f00943c

Fixes #80158 Fixes #78867 Fixes #69230 Supersedes #79007 Supersedes #69212 ghstack-source-id: 2e71ccd Pull Request resolved: #80334

github-actions bot added the Stale label Aug 15, 2022

lezcano closed this Aug 15, 2022

rgommers removed the Merged label Sep 13, 2022

github-actions bot deleted the nikitaved/kl_div_bacward_boost branch February 17, 2024 01:50

kl_div: fix for grads wrt target, double backward, forward-over-reverse AD support. #79007

kl_div: fix for grads wrt target, double backward, forward-over-reverse AD support. #79007

Uh oh!

Conversation

Uh oh!

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`kl_div`: fix for grads wrt `target`, double backward, forward-over-reverse AD support. #79007

`kl_div`: fix for grads wrt `target`, double backward, forward-over-reverse AD support. #79007