8000 [Profiler] Use parent time for implicitly finished Torch ops by robieta · Pull Request #80810 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Profiler] Use parent time for implicitly finished Torch ops #80810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

robieta
Copy link
@robieta robieta commented Jul 3, 2022

Stack from ghstack (oldest at bottom):

When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.

This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced

e->endTimeNS() / 1000 - e->start_time_ns_ / 1000

with

(e->endTimeNS() - e->start_time_ns_) / 1000

that UBSAN catches the issue.

Differential Revision: D37591996

< 8000 /div>

When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.

This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced
```
e->endTimeNS() / 1000 - e->start_time_ns_ / 1000
```
with
```
(e->endTimeNS() - e->start_time_ns_) / 1000
```
that UBSAN catches the issue.

Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Jul 3, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 64c6c00 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.

This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced
```
e->endTimeNS() / 1000 - e->start_time_ns_ / 1000
```
with
```
(e->endTimeNS() - e->start_time_ns_) / 1000
```
that UBSAN catches the issue.

Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/)

[ghstack-poisoned]
@robieta robieta requested review from chaekit and aaronenyeshi July 3, 2022 22:44
Taylor Robie added 2 commits July 6, 2022 10:08
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.

This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced
```
e->endTimeNS() / 1000 - e->start_time_ns_ / 1000
```
with
```
(e->endTimeNS() - e->start_time_ns_) / 1000
```
that UBSAN catches the issue.

Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/)

[ghstack-poisoned]
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.

This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced
```
e->endTimeNS() / 1000 - e->start_time_ns_ / 1000
```
with
```
(e->endTimeNS() - e->start_time_ns_) / 1000
```
that UBSAN catches the issue.

Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/)

[ghstack-poisoned]
When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.

This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced
```
e->endTimeNS() / 1000 - e->start_time_ns_ / 1000
```
with
```
(e->endTimeNS() - e->start_time_ns_) / 1000
```
that UBSAN catches the issue.

Differential Revision: [D37591996](https://our.internmc.facebook.com/intern/diff/D37591996/)

[ghstack-poisoned]
@robieta robieta added the release notes: profiler release notes category label Jul 16, 2022
@malfet
Copy link
Contributor
malfet commented Jul 17, 2022

@pytorchbot revert -m "Broke iOS builds, see https://github.com/pytorch/pytorch/runs/7372398950?check_suite_focus=true" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

@robieta your PR has been successfully reverted.

facebook-github-bot pushed a commit that referenced this pull request Jul 18, 2022
Summary:
Pull Request resolved: #80810

When we implicitly mark an event as finished (for instance for user annotations which are never closed but whose parent finishes) the end time is never set. This can cause issues if we try to compute durations because the uninitialized value is less than start time and thus it makes no sense to compute a duration. This PR addresses this by allowing Torch ops which have been implicitly finished to use their parent's end time.

This issue was masked by the nanosecond to microsecond conversion: dividing the two times before subtracting keeps the numerics valid. (Even if the quantity itself is nonsensical.) It was only when I replaced
```
e->endTimeNS() / 1000 - e->start_time_ns_ / 1000
```
with
```
(e->endTimeNS() - e->start_time_ns_) / 1000
```
that UBSAN catches the issue.
ghstack-source-id: 161652342

Test Plan: Added a `mark_finished` method to replace `->finished_ = true` calls. `mark_finished` checks that the times are properly handled. It fails `test_profiler_experimental_tree_with_record_function` with the old approach and passes with the new one.

Reviewed By: aaronenyeshi, mehtanirav

Differential Revision: D37591996

fbshipit-source-id: 3d255235addeb5712d96435e0ddbb1eebdd4b6b1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0