-
Notifications
You must be signed in to change notification settings - Fork 4.7k
[PyTorch] Add lr scheduler #1305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Job d2l-en/PR-1305/2 is complete. |
Job d2l-en/PR-1305/1 is complete. |
Job d2l-en/PR-1305/5 is complete. |
Job d2l-en/PR-1305/6 is complete. |
Job d2l-en/PR-1305/7 is complete. |
Job d2l-en/PR-1305/8 is complete. |
@@ -356,15 +441,14 @@ In some cases initializing the parameters is not sufficient to guarantee a good | |||
A rather simple fix for this dilemma is to use a warmup period during which the learning rate *increases* to its initial maximum and to cool down the rate until the end of the optimization process. For simplicity one typically uses a linear increase for this purpose. This leads to a schedule of the form indicated below. | |||
|
|||
```{.python .input} | |||
scheduler = lr_scheduler.CosineScheduler(20, warmup_steps=5, base_lr=0.5, | |||
scheduler = lr_scheduler.CosineScheduler(20, warmup_steps=5, base_lr=0.3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we change base_lr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch implementation was not converging regularly somehow when base_lr was 0.5. So I changed it across frameworks for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok. Just give me a heads up if you change original mx code :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I mentioned it here. Will make it more explicit in the future if mxnet code is changed.
#@tab tensorflow | ||
scheduler = SquareRootScheduler(1.0) | ||
#@tab all | ||
scheduler = SquareRootScheduler(lr=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another example of original code modification. Please make sure that similar results are obtained.
Should we merge this now? |
Description of changes:
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.