8000 Use SimpleDistributedPerLayerClipping optimizer in hooks mode by iden-kalemaj · Pull Request #750 · pytorch/opacus · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Use SimpleDistributedPerLayerClipping optimizer in hooks mode #750

New issue 8000

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

iden-kalemaj
Copy link
Contributor

Summary:
We use SimpleDistributedPerLayerOptimizer instead of DistributedPerLayerOptimizer.

The latter causes an issue when switching to register_full_backward_hook.

The issue arises because DistributedPerLayerOptimizer uses per-parameter hooks on top of the per-module hooks. During the backward pass, the per-parameter hooks fire before the per-module hooks. Per-sample gradients are computed when the per-module hooks fire, and an error occurs when the per-parameter hooks try to access the per-sample gradients before they are computed. Forcing the order in which hooks are called is not possible with PyTorch.

Differential Revision: D72420168

…rch#720)

Summary:
Pull Request resolved: pytorch#720

register_backward_hook is deprecated and may lead to errors in gradient calculation. We switch to the supported register_full_backward_hook.

Differential Revision: D68562558

Reviewed By: HuanyuZhang
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72420168

iden-kalemaj added a commit to iden-kalemaj/opacus that referenced this pull request Apr 3, 2025
…h#750)

Summary:

We use SimpleDistributedPerLayerOptimizer instead of DistributedPerLayerOptimizer.

The latter causes an issue when switching to `register_full_backward_hook`.

The issue arises because DistributedPerLayerOptimizer uses per-parameter hooks on top of the per-module hooks. During the backward pass, the per-parameter hooks fire before the per-module hooks. Per-sample gradients are computed when the per-module hooks fire, and an error occurs when the per-parameter hooks try to access the per-sample gradients before they are computed. Forcing the order in which hooks are called is not possible with PyTorch.

Differential Revision: D72420168
…h#750)

Summary:
Pull Request resolved: pytorch#750

We use SimpleDistributedPerLayerOptimizer instead of DistributedPerLayerOptimizer.

The latter causes an issue when switching to `register_full_backward_hook`.

The issue arises because DistributedPerLayerOptimizer uses per-parameter hooks on top of the per-module hooks. During the backward pass, the per-parameter hooks fire before the per-module hooks. Per-sample gradients are computed when the per-module hooks fire, and an error occurs when the per-parameter hooks try to access the per-sample gradients before they are computed. Forcing the order in which hooks are called is not possible with PyTorch.

Differential Revision: D72420168
@facebook-github-bot
8000 Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72420168

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 58f11ec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0