-
Notifications
You must be signed in to change notification settings - Fork 24.5k
functionalization fix for inplace comparison ops #77125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful links
✅ No Failures (0 Pending)As of commit c05a1c0 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
set_storage_offset(value_.storage_offset()); | ||
if (dtype() != value_.unsafeGetTensorImpl()->dtype() || layout() != value_.unsafeGetTensorImpl()->layout()) { | ||
value_ = value_.to(c10::TensorOptions().dtype(dtype()).layout(layout())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I do this by making assumptions about which classes of metadata can be mutated by an operator
- size/stride/storage offset can all be modified by an inplace op (e.g. out= ops that resize, or transpose_())
- dtype/layout are never modified by an operator, which lets me assume that the dtype/layout on the wrapper are "correct", and that I can propagate them to the inner tensor if they diverge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change (and some more further in the stack), I was able to get all of the LTC tests to pass, which makes me reasonable convinced that I haven't missed any weird cases.
Now that I think about it though, the absolutely correct thing to do would probably just to run the original operator with meta tensors, and compare the output meta tensor's metadata to the functional op's output.
We can't actually do that though without full meta tensor coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to do this with memory format too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes doing this with memory format will be good.
Another thought (that's for all functionalized inplace ops) - if a
is broadcasted, inplace op will balk, but copy_to
will happily proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Maybe I just need to run the original op with meta tensors to faithfully get all the correct error messages.
I can do that at least for inplace ops because they all have meta tensor impl's today. The inplace ops that aren't structured yet all have a generated a no-op meta tensor kernel. Which is... technically wrong though, since we won't error properly (but still better than nothing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there... a good way to check if two tensors have the same memory format? (what if they're both channels last, but one is contiguous and the other isn't?).
I guess I was thinking something like this, which looks pretty awful:
auto other_chan_last = other.unsafeGetTensorImpl()->is_strides_like_channels_last();
auto same_channels_last = is_strides_like_channels_last() == other_chan_last;
auto other_chan_last_3d = other.unsafeGetTensorImpl()->is_strides_like_channels_last_3d();
auto same_channels_last_3d = is_strides_like_channels_last_3d() == other_chan_last_3d
if (!same_channels_last || !same_channels_last_3d) {
auto mem_format = other_chan_last ? MemoryFormat::ChannelsLast : other_chan_last_3d ? MemoryFormat::ChannelsLast3d : MemoryFormat::Preserve;
value_ = value_.to(c10::TensorOptions().dtype(dtype()).layout(layout()), mem_format)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do they have the same sizes? One way is to compare strides
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
I think it will; we intend to model inplace operations as a safe copy to the out tensor after the operation is performed, which is kinda what functionalization wants to do in the first place? This is how we model out= today |
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** 9E88 ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
This is an interesting bug that surfaced from trying to integrate with LTC. If you look at the `torch.ge` function, its dtype promotion logic diverges between the functional and inplace variant: (1) `torch.ge(a, b)` always returned a `bool` tensor (2) `a.ge_(b)` won't change the dtype of `a`. So the returned tensor is whatever `a`'s dtype is This means that if a user calls `a.ge_(b)` and we want to functionalize it into an `at::ge(a, b)` call, then the metadata on the inner tensor inside of `FunctionalTensorWrapper` will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype. That actually means that the "correct" transformation for `ge_` would be: **Before** ``` $1 = torch._ops.aten.ge_.Scalar($0, 0) ``` **After** Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong" ``` $1 = torch._ops.aten.ge.Scalar($0, 0) $2 = torch._ops.aten._to_copy.default($1, dtype=6, layout=0)""") ``` [ghstack-poisoned]
Summary: Pull Request resolved: #77125 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/7ddc1425ff1582ab72e635fa7c4ace52357cdfc3 Reviewed By: mehtanirav Differential Revision: D36668394 Pulled By: bdhirsh fbshipit-source-id: d24138514fad48a382557d154d24e011b58cc820
This is an interesting bug that surfaced from trying to integrate with LTC.
If you look at the
torch.ge
function, its dtype promotion logic diverges between the functional and inplace variant:(1)
torch.ge(a, b)
always returned abool
tensor(2)
a.ge_(b)
won't change the dtype ofa
. So the returned tensor is whatevera
's dtype isThis means that if a user calls
a.ge_(b)
and we want to functionalize it into anat::ge(a, b)
call, then the metadata on the inner tensor inside ofFunctionalTensorWrapper
will be wrong! When we eventually pop out of functionalization (and return the inner tensor back to the user), it will have the wrong dtype.That actually means that the "correct" transformation for
ge_
would be:Before
After
Manually perform a dtype cast afterwards if metadata from the functional op call is "wrong"
Stack from ghstack: