-
Notifications
You must be signed in to change notification settings - Fork 24.4k
torch.nn.functional.kl_div
fails gradgradcheck if the target requires a gradient
#65466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It is not a bug I guess, the issue is that tensors of type |
My bad, I need to recheck why the gradcheck tests in |
torch.nn.functional.kl_div
fails gradchecktorch.nn.functional.kl_div
fails gradgradcheck if the target requires a gradient
You were right about that. The tests in |
OK, looks like the backward to pytorch/tools/autograd/derivatives.yaml Line 1937 in 9324d68
Which, I guess, is wrong as log is inf-many times differentiable with a non-zero value for positive values.@pmeier , unless you want to have a look into this, I could try to resolve it. |
Note: Interestingly enough, |
In order to fix that we need to implement |
@albanD what do you think? |
I can see individual formulas being wrong but I don't think it is a problem to have the formulas separated out. For example if you do import torch
# !pip install torchviz
import torchviz
a = torch.rand(1, 10, requires_grad=True)
t = torch.rand(1, 10, requires_grad=True)
loss = torch.kl_div(a, t).sum()
ga, gt = torch.autograd.grad(loss, (a, t), create_graph=True)
torchviz.make_dot((loss, ga, gt), params={k:v for k,v in locals().items() if isinstance(v, torch.Tensor)}) |
@albanD , when only partial derivatives are defined and I want to double backward, the effects of the differentiated backwards will be accumulated, right? Judging from how |
@nikitaved each backward formula is only responsible to provide the gradient flowing through the forward function they define. If some other function is also using target, then this other function is responsible for computing that part of the gradient.
Yes, if a variable is re-used multiple times, the engine will make sure that the gradients from all the usage are added before processing more. |
…everse AD support. (#79007) (#79007) Summary: Fixes #78867, fixes #65466. Adds forward-over-reverse AD support. Pull Request resolved: #79007 Approved by: https://github.com/soulitzer, https://github.com/jbschlosser Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/72ad222cff59cbe730a49dd828cb0a25d2a18417 Reviewed By: osalpekar Differential Revision: D37058939 Pulled By: osalpekar fbshipit-source-id: 28ee709c47bc5fcb82ae31dd4a30e9ecac573709
Uh oh!
There was an error while loading. Please reload this page.
🐛 Bug
torch.nn.functional.kl_div
fails gradgradcheck if the target requires a gradient.To Reproduce
The error only shows up, if the
target
requires a gradient.Additional context
torch==1.8.1
.OpInfo
forkl_div
in addOpInfo
fortorch.nn.functional.kl_div
#65469.cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @lezcano @Varal7 @mruberry @jbschlosser @walterddr
The text was updated successfully, but these errors were encountered: