-
Notifications
You must be signed in to change notification settings - Fork 152
Retaining context across forward passes #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Additionally, this information also needs to be provided to anyone else that calls forward_passes on the network, including the |
What would be the goal of this? It seems like a hack that only works in certain cases. I think if we tackle the issue of retaining context we should do it properly, such that you can specify exactly when it should be reset. Additionally, the trainer never actually calls the forward pass of the network. That is done by the steppers and the hooks. So the trainer would only distribute the information. Maybe it would be better to have the network be responsible. We could have a special input like the mask (say The issue might be more complicated if we allow steppers that call the forward pass multiple times, and it also obviously doesn't play well together with shuffling. |
This feature is a pretty basic requirement for language (or any kind of data) modeling, so we need to have this feature ASAP. Having a special input for such a mundane case is a bit annoying (but I wouldn't rule it out). It's true that the trainer (or |
How about putting the network in a special keep-context mode, and then having a hook call clear context on it when needed? That would work for training but not so well for evaluation (possibly inside another hook). Hmm now that I think about it: maybe not. So back to putting it alongside the data... IMHO the default behaviour should remain to always discard context though. |
This is why I thought (from the user's perspective) that giving this info to the I agree about the default behavior. |
We should have a
context_reset_rate
parameter (subject to renaming) in the trainer which is set by thetrain
function. Using this, the context should be reset (cleared) ifcurrent_update_nr
%context_reset_rate
== 0, otherwise it should be retained.The text was updated successfully, but these errors were encountered: