Retaining context across forward passes #57

flukeskywalker · 2015-10-18T14:24:28Z

We should have a context_reset_rate parameter (subject to renaming) in the trainer which is set by the train function. Using this, the context should be reset (cleared) if current_update_nr % context_reset_rate == 0, otherwise it should be retained.

The text was updated successfully, but these errors were encountered:

flukeskywalker · 2015-10-18T14:32:39Z

Additionally, this information also needs to be provided to anyone else that calls forward_passes on the network, including the evaluate tool and the hooks that use it.

Qwlouse · 2015-10-19T14:17:13Z

What would be the goal of this? It seems like a hack that only works in certain cases. I think if we tackle the issue of retaining context we should do it properly, such that you can specify exactly when it should be reset.

Additionally, the trainer never actually calls the forward pass of the network. That is done by the steppers and the hooks. So the trainer would only distribute the information.

Maybe it would be better to have the network be responsible. We could have a special input like the mask (say reset = ('B', 1)) and the network only resets the context if that contains at least one 1 in the current minibatch.

The issue might be more complicated if we allow steppers that call the forward pass multiple times, and it also obviously doesn't play well together with shuffling.

flukeskywalker · 2015-10-19T14:50:08Z

This feature is a pretty basic requirement for language (or any kind of data) modeling, so we need to have this feature ASAP.

Having a special input for such a mundane case is a bit annoying (but I wouldn't rule it out).

It's true that the trainer (or evaluate()) would simply pass around this information. I thought of storing this in the network, but having the network essentially count how many forward passes have been done on it also seems kinda hacky.

Qwlouse · 2015-10-19T16:19:15Z

How about putting the network in a special keep-context mode, and then having a hook call clear context on it when needed? That would work for training but not so well for evaluation (possibly inside another hook). Hmm now that I think about it: maybe not. So back to putting it alongside the data...

IMHO the default behaviour should remain to always discard context though.

flukeskywalker · 2015-10-19T16:25:18Z

This is why I thought (from the user's perspective) that giving this info to the train and evaluate (actually, the hooks that call evaluate) functions makes sense. Internally it is just passed along, but it appeared to be clearer with few chances of confusion and no extra data required.

I agree about the default behavior.

flukeskywalker changed the title ~~Retaining context during training~~ Retaining context across forward passes Oct 19, 2015

Qwlouse mentioned this issue Oct 28, 2015

Truncated BPTT #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retaining context across forward passes #57

Retaining context across forward passes #57

Retaining context across forward passes #57

Retaining context across forward passes #57

Comments