-
Notifications
You must be signed in to change notification settings - Fork 74.7k
Better support for initializing variables that depend on each other #4920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
BTW, here's a work-around using @purpledog graph editor + |
@yaroslavvb Would you mind me fixing the bug? I'm working around the |
I'm not on TensorFlow team, but since it's "Contributions Welcome" it sounds like they'd be open to a fix. Maybe a way to start would be with |
@yaroslavvb You said initializer of |
Thanks for your adv
8000
ise on
|
Which version? import tensorflow as tf Out[1]: [1.5315183, -0.65960622, 1.5315183] |
Do you tried the |
Correct, this issue only happens when when initialization has to be split over several .run calls. Such use case arises when you need to run some data through parts of your graph in order to determine initial value for variables in another part (ie, in data-dependent initializations) |
But I think TF should support separate initializations for every |
Yes, I think it would help. |
Are there any PRs associated with this? I've written code several times that does something like x = tf.placeholder(tf.float32, [100])
v0 = tf.get_variable('v0', initializer=tf.zeros_like(x))
z = x * v0
v1 = tf.get_variable('v1', initializer=tf.zeros_like(z)) This cannot be initialized with It seems like the clean solution is to do a toposort on the graph consisting of all of the predecessors of any initializer. Executing the "initializer graph" nodes in this toposorted order will guarantee safe initialization. |
HI @yaroslavvb I'm using the smart_initialize, but I found it re-do the initialization each time you call the session run. Seem the edge or something like is not removed. I checked that the control_inputs are actually cleared, but there should be something else. I'm not very familiar with tensor-flow. Do you have any idea what is going on? Thanks. |
Can you provide a reproducible example? I'm assuming you are referring to smart_initialize from the gist: https://gist.github.com/yaroslavvb/d592394c0cedd32513f8fbb87ca05938 |
Hi, here is a example:
It will output:
|
@ZhimingZhou thanks for the easy to reproduce test case. It seems the semantics of variables changed. I would add dependency on "variable/read" node to force conditional initialization subgraph to run when "variable" is being read. But now this no longer works. I tried adding dependency on @alextp is there some way to force some op to run when variable is being read? |
Use an identity node with control dependencies instead of the value of the
variable itself.
…On Jan 17, 2017 22:58, "Yaroslav Bulatov" ***@***.***> wrote:
@ZhimingZhou <https://github.com/ZhimingZhou> thanks for the easy to
reproduce test case. It seems the semantics of variables changed. I would
add dependency on "variable/read" node to force conditional initialization
subgraph to run when "variable" is being read. But now this no longer
works. I tried adding dependency on var.read_value(), but that creates
new nodes on each evaluation.
@alextp <https://github.com/alextp> is there some way to force some op to
run when variable is being read?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4920 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAATxT8nbwuEu_hucID6iTDSMbj_LNDOks5rTbgrgaJpZM4KVHy9>
.
|
@ZhimingZhou here's a fixed version using @alextp suggestion -- http://pastebin.com/5zqcc5Q9 This needs a bit more thought though, I don't like the idea of users needing to keep separate tensors for variable write and variable read. Ideally one would be able to do |
@yaroslavvb I think the nicest solution here is having a toposorted list of the variables, and then making a I was starting to write a toposort routine, but I realized the graph is constructed with ancestors first and the TF collections are appended to, which means
I'm running TensorFlow 0.10, but here's a non-trivial dependent initialization that works with this approach:
It seems that the larger problem (why these workarounds are needed) is that |
If you want variables' initializations to depend on each other you can use
var1.initialized_value in var2's initializer and this should work with the
normal tf.global_variables_initializer
…On Wed, Jan 18, 2017 at 9:50 AM, Eric Martin ***@***.***> wrote:
@yaroslavvb <https://github.com/yaroslavvb> I think the nicest solution
here is having a toposorted list of the variables, and then making a
Session.run call for each variables initializer. It might be possible to
avoid the overhead of a bunch of Session.run calls by making a chain of
control dependencies between initializers, but I doubt it since it appears
data dependencies aren't obeyed during initialization.
I was starting to write a toposort routine, but I realized the graph is
constructed with ancestors first and the TF collections are appended to,
which means tf.global_variables is already toposorted. This is a little
bit of hack since it's not specified that TF collections are appended to,
but I think it will always work to do something like
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
sess.run(v.initializer)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4920 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAATxQQrxYc_4IBYyapNwnNr2hpxA0ajks5rTlDugaJpZM4KVHy9>
.
--
- Alex
|
@alextp the problem with A different snag is this situation -- your code does calculations based on Tensor @eamartin -- I think that's a reasonable replacement. First of all, overhead of |
@alextp Using
Beyond this, you can imagine dependent initialization spread out much further across a graph. I can't set I think I understand the problem with the default behavior. In my case, |
You can also use variable.initial_value, it returns the tensor which will
be used to initialize the variable.
…On Wed, Jan 18, 2017 at 10:44 AM, Eric Martin ***@***.***> wrote:
That breaks the (useful) abstraction of not having to worry about whether
a tensor is a variable or just a usual tensor.
Using initialized_value isn't very nice for cases like
x = tf.Variable(5.0)
y = x + 1.0
z = tf.Variable(tf.zeros_like(y))
Beyond this, you can imagine dependent initialization spread out much
further across a graph. I can't set y = x.initialized_value() + 1.0
because I want y = x + 1.0 on later graph runs (after I mutate x). Using
initialized_value would require separate code paths (/ subgraphs) for
initializing and running the graph.
I think I understand the problem with the default behavior. In my case, z's
initializer has a data dependency that goes back to x. However, x is
still uninitialized at this time, which causes an error. I attempted to
work around this by feeding in the initialized_value for variable.value(),
but this isn't a workaround because you can't always evaluated
initialized_value if the initializer depends on other non-initialized
variables. If there was a mode of Session.run which would default to
initial variable values, then this strategy could work.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4920 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAATxYySDqYcYsqJb5uZFmaIGDTuRdhJks5rTl11gaJpZM4KVHy9>
.
--
- Alex
|
@eamartin actually, maybe easier solution than toposort with multiple session.run calls is to have a wrapper around variable that recovers previous behavior (separate variable/read op which you can use to trigger initialization on read). Then you could do var = wrap_variable(var), and that would give you an op that runs variable initialization first time it's read using |
@yaroslavvb Thank for the new version. But I found it still re-do the initialization. There seems still have some modifications to the graph.
@alextp @eamartin @yaroslavvb
initial_value or initialized_value() seems not suitable when involving data manipulation outside variable initializer. But this is essential when people want to do data dependent initialization. The following works, but it could be too slow. It rerun the graph for each variable.
As you mentioned, @eamartin @yaroslavvb, I guess a simple workaround solution is run the variable initialization in order with one sess.run(). I tried but failed to finish it. For example, I tried adding all the variable in order, but it seems not work. Could you help? Thanks. |
I tried something like that strategy using tf.select (on a bool placeholder
indicating whether or not to initialize). This failed because the select op
wasn't happy with a non initialized variable as input.
Can tf.cond handle a non-initialized variable input?
I like this approach (assuming it works) as it uses a single sess.run call.
The downside is that it requires a custom variable creation function and
initialization is implicit rather than explicit.
On Jan 20, 2017 20:23, "Yaroslav Bulatov" <notifications@github.com> wrote:
@eamartin <https://github.com/eamartin> actually, maybe easier solution
than toposort with multiple session.run calls is to have a wrapper around
variable that recovers previous behavior (separate variable/read op which
you can use to trigger initialization on read). Then you could do var =
wrap_variable(var), and that would give you an op that runs variable
initialization first time it's read using tf.cond(var.is_initialized)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4920 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AARh4IH3LDmts0sFXwzUALYRBwRZInuMks5rUXo2gaJpZM4KVHy9>
.
|
Please take a look at the new behavior of Variable.initialized_value (introduced in b25d1c7 ). Now it doesn't force initialization, so it can be safely used when initializing variables from other variables. The only overhead of using this instead of the raw Variable is that it will trigger recomputing the initializer in the beginning of the step even if the variable has already been initialized (this is mildly annoying to fix but it's doable). So it should be kosher to use both in a single and in multiple session.run calls to initialize stuff. |
Progress! BTW, we use Variables that depend on each other, and some are initialized from placeholders. Recomputing the initializer means you have to feed placeholder values that aren't getting used. As a work-around I've been recommending the pattern below which uses graph_editor + Switch + Merge to rewrite the graph to make initializer execution lazy. https://gist.github.com/yaroslavvb/d67410e240369736fc4ba0267250ef27 |
Interesting. FWIW, replacing placeholder with constant should allow you to
not rerun initialization (at the expense of abandoning the error message).
…On Thu, Mar 23, 2017 at 2:08 PM, Yaroslav Bulatov ***@***.***> wrote:
Progress!
BTW, we use Variables that depend on each other, and some are initialized
from placeholders. Recomputing the initializer means you have to feed
placeholder values that aren't getting used.
As a work-around I've been recommending the pattern below which uses
graph_editor + Switch + Merge to rewrite the graph to make initializer
execution lazy.
https://gist.github.com/yaroslavvb/d67410e240369736fc4ba0267250ef27
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4920 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAATxdsYnQs4lBYK1rnTdpKTum3BrECPks5rot9agaJpZM4KVHy9>
.
--
- Alex
|
@yaroslavvb Is this resolved as of b25d1c7? |
Closing for now since it looks like yes, but happy to reopen if not. |
I think this should be reopened. Initializing variables from other variables that depend on placeholders is still really difficult. It would be ideal if it were achievable in a single run call, without depending on the graph editor method that @yaroslavvb provided. Although the behavior of initialized_value changed so that it uses tf.cond to decide whether or not to run the initializer, this is still not ideal for cases such as below:
That code, seemingly randomly, sometimes fails and sometimes works. Using initial_value doesn't help because then the placeholder would have to always be filled in subsequent runs, and using initialized_value doesn't work for the same reason (also because while its ok to do the tf.cond call once for initialization, repeatedly calling it for each training step for a more complex model would be very slow). Or is there a new, better way that I am not aware of? |
Imported from GitHub PR openxla/xla#4920 Support AllToAll in fp8 gemm pattern matching. Copybara import of the project: -- 74eafe0ea1c366918581dc2c8e2a88d8b32e53f4 by shuw <shuw@nvidia.com>: Support AllToAll in fp8 gemm pattern matching Merging this change closes #4920 PiperOrigin-RevId: 557513647
Uh oh!
There was an error while loading. Please reload this page.
Currently there's no easy way to properly initialize variables when some variables initial values depend on other variables, and initialization has to be split over several
.run
calls. This kind of initialization happens with data-dependent parameter init.This could be solved if there were something like
var.initialized_value()
, but which only runs initializer if the variable hasn't been initialized already.Example:
Here,
a
andb
end up with different values because initializer fora
has been run twice, which is counter-intuitive, the user expectsa,b,d
to have same valuesThe text was updated successfully, but these errors were encountered: