-
Notifications
You must be signed in to change notification settings - Fork 24.4k
[Profiler] Use parent time for implicitly finished Torch ops #80810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time. This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced ``` e->endTimeNS() / 1000 - e->start_time_ns_ / 1000 ``` with ``` (e->endTimeNS() - e->start_time_ns_) / 1000 ``` that UBSAN catches the issue. Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/) [ghstack-poisoned]
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 64c6c00 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time. This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced ``` e->endTimeNS() / 1000 - e->start_time_ns_ / 1000 ``` with ``` (e->endTimeNS() - e->start_time_ns_) / 1000 ``` that UBSAN catches the issue. Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/) [ghstack-poisoned]
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time. This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced ``` e->endTimeNS() / 1000 - e->start_time_ns_ / 1000 ``` with ``` (e->endTimeNS() - e->start_time_ns_) / 1000 ``` that UBSAN catches the issue. Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/) [ghstack-poisoned]
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time. This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced ``` e->endTimeNS() / 1000 - e->start_time_ns_ / 1000 ``` with ``` (e->endTimeNS() - e->start_time_ns_) / 1000 ``` that UBSAN catches the issue. Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/) [ghstack-poisoned]
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time. This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced ``` e->endTimeNS() / 1000 - e->start_time_ns_ / 1000 ``` with ``` (e->endTimeNS() - e->start_time_ns_) / 1000 ``` that UBSAN catches the issue. Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/) [ghstack-poisoned]
@pytorchbot revert -m "Broke iOS builds, see https://github.com/pytorch/pytorch/runs/7372398950?check_suite_focus=true" -c nosignal |
@pytorchbot successfully started a revert job. Check the current status here |
@robieta your PR has been successfully reverted. |
…80810)" This reverts commit eaf92d7. Reverted #80810 on behalf of https://github.com/malfet due to Broke iOS builds, see https://github.com/pytorch/pytorch/runs/7372398950?check_suite_focus=true
Summary: Pull Request resolved: #80810 When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time. This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced ``` e->endTimeNS() / 1000 - e->start_time_ns_ / 1000 ``` with ``` (e->endTimeNS() - e->start_time_ns_) / 1000 ``` that UBSAN catches the issue. ghstack-source-id: 161652342 Test Plan: Added a `mark_finished` method to replace `->finished_ = true` calls. `mark_finished` checks that the times are properly handled. It fails `test_profiler_experimental_tree_with_record_function` with the old approach and passes with the new one. Reviewed By: aaronenyeshi, mehtanirav Differential Revision: D37591996 fbshipit-source-id: 3d255235addeb5712d96435e0ddbb1eebdd4b6b1
Stack from ghstack (oldest at bottom):
collection.cpp
#80796When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.
This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced
with
that UBSAN catches the issue.
Differential Revision: D37591996