8000 Retaining context across forward passes · Issue #57 · IDSIA/brainstorm · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Retaining context across forward passes #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
flukeskywalker opened this issue Oct 18, 2015 · 5 comments
Open

Retaining context across forward passes #57

flukeskywalker opened this issue Oct 18, 2015 · 5 comments

Comments

@flukeskywalker
Copy link
Collaborator

We should have a context_reset_rate parameter (subject to renaming) in the trainer which is set by the train function. Using this, the context should be reset (cleared) if current_update_nr % context_reset_rate == 0, otherwise it should be retained.

@flukeskywalker
Copy link
Collaborator Author

Additionally, this information also needs to be provided to anyone else that calls forward_passes on the network, including the evaluate tool and the hooks that use it.

@Qwlouse
Copy link
Collaborator
Qwlouse commented Oct 19, 2015

What would be the goal of this? It seems like a hack that only works in certain cases. I think if we tackle the issue of retaining context we should do it properly, such that you can specify exactly when it should be reset.

Additionally, the trainer never actually calls the forward pass of the network. That is done by the steppers and the hooks. So the trainer would only distribute the information.

Maybe it would be better to have the network be responsible. We could have a special input like the mask (say reset = ('B', 1)) and the network only resets the context if that contains at least one 1 in the current minibatch.

The issue might be more complicated if we allow steppers that call the forward pass multiple times, and it also obviously doesn't play well together with shuffling.

@flukeskywalker
Copy link
Collaborator Author

This feature is a pretty basic requirement for language (or any kind of data) modeling, so we need to have this feature ASAP.

Having a special input for such a mundane case is a bit annoying (but I wouldn't rule it out).

It's true that the trainer (or evaluate()) would simply pass around this information. I thought of storing this in the network, but having the network essentially count how many forward passes have been done on it also seems kinda hacky.

@flukeskywalker flukeskywalker changed the title Retaining context during training Retaining context across forward passes Oct 19, 2015
@Qwlouse
Copy link
Collaborator
Qwlouse commented Oct 19, 2015

How about putting the network in a special keep-context mode, and then having a hook call clear context on it when needed? That would work for training but not so well for evaluation (possibly inside another hook). Hmm now that I think about it: maybe not. So back to putting it alongside the data...

IMHO the default behaviour should remain to always discard context though.

@flukeskywalker
Copy link
Collaborator Author

This is why I thought (from the user's perspective) that giving this info to the train and evaluate (actually, the hooks that call evaluate) functions makes sense. Internally it is just passed along, but it appeared to be clearer with few chances of confusion and no extra data required.

I agree about the default behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0