8000 Disable automatic output model snapshot tracking · Issue #146 · clearml/clearml · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Disable automatic output model snapshot tracking #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rorph opened this issue Jun 16, 2020 · 3 comments
Closed

Disable automatic output model snapshot tracking #146

rorph opened this issue Jun 16, 2020 · 3 comments

Comments

@rorph
Copy link
rorph commented Jun 16, 2020

Hello,

Is there an option to disable the output model snapshot while still keep tracking of the training framework?

EG: I want to save the best scoring model as output model, however, because snapshot tracking is enabled there's a race condition, when a new snapshot comes in it overwrites the best model set.

Thanks

Great project BTW

@bmartinn
Copy link
Member
bmartinn commented Jun 16, 2020

Thank you @rorph , kind words are always appreciated :)
I think I'm missing some information on what you are looking for, but it looks similar to one of the latest additions :)
See the PR for PyTorch-Ignite here and the discussion
Is this the use case ?

With trains 0.15.1rc0 we added a few callbacks to allow you to interfere with the model registration,
You can see the use case here

You can also disable the tracking automagic with :

from trains import Task
Task.init('examples', 'no tracking', auto_connect_frameworks={'pytorch': False})

And then manually log a model:

from trains import OutputModel
OutputModel().update_weights('my_best_model.bin')

@rorph
Copy link
Author
rorph commented Jun 17, 2020

Hey @bmartinn , thanks for the reply.
I saw the disable automagic options at the documentation, however, I only wanted to disable the snapshop feature in specific, all the other metrics produced by the automagic still pertinent.

Looking at the PR , I think I probably can circumvent this issue by setting a event that never saves it.

Just to put this in context, there's 2 scripts running in parallel, one training and one measuring the iterations, once a new iteration is created it's graded, if it finds a new best score i run trains.update_output_model to point to the best model.

@rorph rorph closed this as completed Jun 17, 2020
@bmartinn
Copy link
Member
bmartinn commented Jun 17, 2020

Looking at the PR , I think I probably can circumvent this issue by setting a event that never saves it.

fyi, If the "pre_callback" returns None the specific model save will not be tracked :)

Just to put this in context, there's 2 scripts running in parallel, one training and one measuring the iterations, once a new iteration is created it's graded, if it finds a new best score i run trains.update_output_model to point to the best model.

This is very cool! (In theory you could make it a distributed process, launch a Task to do the validation on another machine, like clone&enqueue base Task that does inference then plug the results back to the training Task)

@rorph out out curiosity which framework are you using ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0