Disable automatic output model snapshot tracking #146

rorph · 2020-06-16T16:57:38Z

Hello,

Is there an option to disable the output model snapshot while still keep tracking of the training framework?

EG: I want to save the best scoring model as output model, however, because snapshot tracking is enabled there's a race condition, when a new snapshot comes in it overwrites the best model set.

Thanks

Great project BTW

bmartinn · 2020-06-16T21:58:25Z

Thank you @rorph , kind words are always appreciated :)
I think I'm missing some information on what you are looking for, but it looks similar to one of the latest additions :)
See the PR for PyTorch-Ignite here and the discussion
Is this the use case ?

With trains 0.15.1rc0 we added a few callbacks to allow you to interfere with the model registration,
You can see the use case here

You can also disable the tracking automagic with :

from trains import Task
Task.init('examples', 'no tracking', auto_connect_frameworks={'pytorch': False})

And then manually log a model:

from trains import OutputModel
OutputModel().update_weights('my_best_model.bin')

rorph · 2020-06-17T12:28:24Z

Hey @bmartinn , thanks for the reply.
I saw the disable automagic options at the documentation, however, I only wanted to disable the snapshop feature in specific, all the other metrics produced by the automagic still pertinent.

Looking at the PR , I think I probably can circumvent this issue by setting a event that never saves it.

Just to put this in context, there's 2 scripts running in parallel, one training and one measuring the iterations, once a new iteration is created it's graded, if it finds a new best score i run trains.update_output_model to point to the best model.

bmartinn · 2020-06-17T12:37:44Z

bmartinn commented

Jun 17, 2020

•

Looking at the PR , I think I probably can circumvent this issue by setting a event that never saves it.

fyi, If the "pre_callback" returns None the specific model save will not be tracked :)

Just to put this in context, there's 2 scripts running in parallel, one training and one measuring the iterations, once a new iteration is created it's graded, if it finds a new best score i run trains.update_output_model to point to the best model.

This is very cool! (In theory you could make it a distributed process, launch a Task to do the validation on another machine, like clone&enqueue base Task that does inference then plug the results back to the training Task)

@rorph out out curiosity which framework are you using ?

rorph closed this as completed Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable automatic output model snapshot tracking #146

Disable automatic output model snapshot tracking #146

Disable automatic output model snapshot tracking #146

Disable automatic output model snapshot tracking #146

Comments