8000 Transformers trainer compatibility by mkurman · Pull Request #7 · evanatyourservice/kron_torch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Transformers trainer compatibility #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 24, 2025

Conversation

mkurman
Copy link
Contributor
@mkurman mkurman commented Mar 24, 2025

Hey, I'm using your optimizer when training with the Transformers library. My PR allows saving optimizer checkpoints, eliminating the compression error:

AttributeError: Can't pickle local object 'precond_update_prob_schedule.._schedule'

@evanatyourservice
Copy link
Owner

This is great, thanks for the contribution! Would you mind copying it into the two distributed implementations too and I'll go ahead and merge? Also, let me know if you have any questions about hyperparameters for training, happy to help.

@evanatyourservice evanatyourservice merged commit fb188e0 into evanatyourservice:main Mar 24, 2025
@evanatyourservice
Copy link
Owner

Actually nevermind with the copying, I'll do it right now as I have some other things to take care of now that I think of it, thank you for the PR and let me know if you have any questions!

@mkurman
Copy link
Contributor Author
mkurman commented Mar 24, 2025

Awesome, thank you! I didn't have much time to go through the entire code, so I'm not entirely sure how it works. However, it seems to have successfully unlocked my SLM pre-training, where I encountered the wall while using AdamW and Moun. Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0