8000 fix: Issue in non-Tensor Input Resolution by gs-olive · Pull Request #1617 · pytorch/TensorRT · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: Issue in non-Tensor Input Resolution #1617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 22, 2023

Conversation

gs-olive
Copy link
Collaborator
@gs-olive gs-olive commented Jan 25, 2023

Description

  • In certain TensorRT-executed blocks, multiple non-Tensor inputs originate from the same parent Tensor in another Torch block, so a single pass of resolveTRTNonTensorInputs does not resolve all non-Tensor inputs, causing failures at compile-time
  • Issue was determined to trace to the use of prim::Loop in the TensorRT block and is fixed in PR fix: fix the prim::Loop fallback issue #1691
  • Add function to SegmentedBlock to get the ID of a block, so re-inserted blocks retain the same ID as the previous block being replaced
  • Add regression test case to elicit behavior

Failure Case

Before Non-Tensor Input Resolution

DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress TensorRT block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @2:
    Target: TensorRT

    Graph: graph(%1 : int,
      %9 : int[],
      %12 : Tensor):

After 1 Round of Non-Tensor Input Resolution

GRAPH: [Torch-TensorRT - Debug Build] - Running shape analysis on block Segment Block @2:
    Target: TensorRT

    Graph: graph(%1 : int[],
      %3 : Tensor):

After 2 Rounds of Non-Tensor Input Resolution

INFO: [Torch-TensorRT - Debug Build] - Segment Block @2:
    Target: TensorRT

    Graph: graph(%1 : Tensor):

Fixes #1612
Related to: #1613

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • [ x ] My code follows the style guidelines of this project (You can use the linters)
  • [ x ] I have performed a self-review of my own code
  • [ x ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ x ] I have made corresponding changes to the documentation
  • [ x ] I have added tests to verify my fix or my feature
  • [ x ] New and existing unit tests pass locally with my changes
  • [ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

@github-actions github-actions bot added component: core Issues re: The core compiler component: partitioning labels Jan 25, 2023
@github-actions github-actions bot requested a review from narendasan January 25, 2023 02:27
@gs-olive gs-olive requested review from bowang007 and removed request for narendasan January 25, 2023 02:28
Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

@gs-olive gs-olive self-assigned this Jan 25, 2023
@github-actions github-actions bot added the component: tests Issues re: Tests label Jan 27, 2023
Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@peri044
Copy link
Collaborator
peri044 commented Feb 3, 2023

Expected this PR to make it into 23.03 release

@bowang007
Copy link
Collaborator

This does not make sense in fact.
Firstly, even if 2 non-Tensor inputs come from a same node, that node will be added into the segmented block. There is no need to do it for multiple passes.
Secondly, I guess why it fails for a single pass is because that there is a Loop block in the TensorRT segment. This could introduce undefined behaviors. The real issue I think here is the Loop block, it shouldn't be in TensorRT segment at all. In our last code refactoring for partitioning, we set all nodes aggressively to be run in TensorRT, this introduces some issue like LOOP nodes default to be run in TensorRT, this could result in some issues like this.

< 10000 div class="d-flex flex-auto">
bowang007 and others added 4 commits February 21, 2023 18:43
- In certain TensorRT-executed blocks, multiple non-Tensor inputs
originate from the same parent Tensor in another Torch block, so a
single pass of `resolveTRTNonTensorInputs` does not resolve all
non-Tensor inputs, causing failures at compile-time
- Add do-while loop to ensure all input dependencies are resolved by
running dependency-resolution algorithm multiple times, if necessary
- Add function to `SegmentedBlock` to get the ID of a block, so
re-inserted blocks retain the same ID as the previous block being
replaced
- Added test case in partitioning to elicit bug arising from multiple
nonTensor inputs to TensorRT block
@gs-olive gs-olive force-pushed the resolve_nontensor_inputs_bugfix branch from 62f59c6 to ac715b0 Compare February 22, 2023 04:04
Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@gs-olive gs-olive changed the base branch from main to fix_loop_fallback February 22, 2023 04:06
@gs-olive gs-olive requested review from bowang007 and removed request for bowang007 February 22, 2023 04:09
Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment on lines +289 to +290
cur_partitioned_block[i] =
SegmentedBlock(cur_partitioned_block[i].get_id(), SegmentedBlock::kTensorRT, dependency_nodes);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to ensure the block IDs of segmented blocks stay in order despite resolution of non-Tensor inputs.

Copy link
Collaborator
@bowang007 bowang007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please squash your commits then merge.

@gs-olive gs-olive merged commit c0e82c8 into fix_loop_fallback Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 [Bug] Node Inputs Cannot be Evaluated at Conversion Time
4 participants
0