8000 `test_profiler_experimental_tree_cuda_detailed` fails with mismatches in the profile output · Issue #83381 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

test_profiler_experimental_tree_cuda_detailed fails with mismatches in the profile output #83381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ptrblck opened this issue Aug 13, 2022 · 3 comments
Labels
oncall: profiler profiler-related issues (cpu, gpu, kineto)

Comments

@ptrblck
Copy link
Collaborator
ptrblck commented Aug 13, 2022

🐛 Describe the bug

#80797 added test_profiler_experimental_tree_cuda_detailed which started to directly fail after the merge with:

Traceback (most recent call last):
  File "test_profiler_tree.py", line 55, in begin_unit_test_marker
    out = f(self)
  File "test_profiler_tree.py", line 786, in test_profiler_experimental_tree_cuda_detailed
    self.assertTreesMatch(
  File "test_profiler_tree.py", line 190, in assertTreesMatch
    self.assertExpectedInline(actual, expected, skip=1)
  File "/opt/conda/lib/python3.8/site-packages/expecttest/__init__.py", line 262, in assertExpectedInline
    self.assertMultiLineEqualMaybeCppStack(expect, actual, msg=help_text)
  File "/opt/conda/lib/python3.8/site-packages/expecttest/__init__.py", line 281, in assertMultiLineEqualMaybeCppStack
    self.assertMultiLineEqual(expec
8000
t, actual[:len(expect)], *args, **kwargs)
AssertionError: '    [1269 chars]     void ..._kernel<...>(...)\n              [7937 chars] ...' != '    [1269 chars]     std::enable_if<!(false), void>::type inte[7922 chars]    '
              test_profiler_tree.py(...): test_profiler_experimental_tree_cuda_detailed
                torch/profiler/profiler.py(...): __enter__
                  ...
                test_profiler_tree.py(...): step
                  <built-in method ones of type object at 0xXXXXXXXXXXXX>
                    aten::ones
                      aten::empty
                        [memory]
                      aten::fill_
                        cudaLaunchKernel
                          void at::native::vectorized_elementwise_kernel<...>(...)
                  nn.Module: Linear_0
                    <built-in method _get_tracing_state of PyCapsule object at 0xXXXXXXXXXXXX>
                    torch/nn/modules/linear.py(...): forward
                      torch/nn/modules/module.py(...): __getattr__
                      torch/nn/modules/module.py(...): __getattr__
                      <built-in function linear>
                        aten::linear
                          aten::t
                            aten::transpose
                              aten::as_strided
                          aten::addmm
                            cudaMemcpyAsync
                              Memcpy DtoD (Device -> Device)
                            cudaLaunchKernel
-                             void ..._kernel<...>(...)
+                             std::enable_if<!(false), void>::type internal::gemvx::kernel<int, int, float, float, float, float, false, true, false, false, 7, false, cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float> >(cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float>)
                            [memory]
                            aten::expand
                              aten::as_strided
                  torch/_tensor.py(...): backward
                    <built-in function _has_torch_function_unary>
                    torch/autograd/__init__.py(...): backward
                      <built-in function isinstance>
                      <built-in function isinstance>
                      <built-in function len>
                      torch/autograd/__init__.py(...): _tensor_or_tensors_to_tuple
                      torch/autograd/__init__.py(...): _make_grads
                        <built-in function isinstance>
                        <built-in method numel of Tensor object at 0xXXXXXXXXXXXX>
                        <built-in method ones_like of type object at 0xXXXXXXXXXXXX>
                          aten::ones_like
                            aten::empty_like
                              aten::empty_strided
                                [memory]
                            aten::fill_
                              cudaLaunchKernel
                                void at::native::vectorized_elementwise_kernel<...>(...)
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                      <built-in method run_backward of torch._C._EngineBase object at 0xXXXXXXXXXXXX>
                        autograd::engine::evaluate_function: AddmmBackward0
                          AddmmBackward0
                            aten::t
                              aten::transpose
                                aten::as_strided
                            aten::mm
                              cudaLaunchKernel
-                               void ..._kernel<...>(...)
+                               std::enable_if<!(false), void>::type internal::gemvx::kernel<int, int, float, float, float, float, false, true, false, false, 7, false, cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float> >(cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float>)
                              [memory]
                            aten::t
                              aten::transpose
                                aten::as_strided
                          aten::sum
                            aten::sum
                              cudaLaunchKernel
                                void at::native::reduce_kernel<...>(...)
                              [memory]
                          aten::view
                            aten::view
                        autograd::engine::evaluate_function: torch::autograd::AccumulateGrad
                          torch::autograd::AccumulateGrad
                            aten::add_
                              cudaLaunchKernel
                                void at::native::vectorized_elementwise_kernel<...>(...)
                            [memory]
                        autograd::engine::evaluate_function: TBackward0
                          TBackward0
                            aten::t
                              aten::transpose
                                aten::as_strided
                        autograd::engine::evaluate_function: torch::autograd::AccumulateGrad
                          torch::autograd::AccumulateGrad
                            aten::add_
                              cudaLaunchKernel
                                void at::native::vectorized_elementwise_kernel<...>(...)
                            [memory]
                    [memory]
                  torch/optim/optimizer.py(...): wrapper
                    <built-in method format of str object at 0xXXXXXXXXXXXX>
                    torch/autograd/profiler.py(...): __init__
                      <built-in method zeros of type object at 0xXXXXXXXXXXXX>
                        aten::zeros
                          aten::zeros
                            aten::empty
                              [memory]
                            aten::zero_
                    torch/autograd/profiler.py(...): __enter__
                      torch/_ops.py(...): __call__
                        <built-in method _record_function_enter of PyCapsule object at 0xXXXXXXXXXXXX>
                          Optimizer.step#SGD.step
                            aten::empty
                              [memory]
                            [memory]
                      [memory]
                    torch/optim/optimizer.py(...): _use_grad
                      <built-in function is_grad_enabled>
                      torch/autograd/grad_mode.py(...): __init__
                        <built-in function is_grad_enabled>
                        <built-in function _set_grad_enabled>
                      torch/optim/sgd.py(...): step
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        torch/_tensor.py(...): __hash__
                          <built-in function id>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        torch/_tensor.py(...): __hash__
                          <built-in function id>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        torch/optim/sgd.py(...): sgd
                          torch/optim/sgd.py(...): _single_tensor_sgd
                            <built-in method mul_ of Tensor object at 0xXXXXXXXXXXXX>
                              [memory]
                              aten::mul_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                              [memory]
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                            <built-in method mul_ of Tensor object at 0xXXXXXXXXXXXX>
                              [memory]
                              aten::mul_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                              [memory]
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                        torch/_tensor.py(...): __hash__
                          <built-in function id>
                        torch/_tensor.py(...): __hash__
-                         <built-in function id>
-                     torch/autograd/grad_mode.py(...): __init__
-                       <built-in function is_grad_enabled>
-                       <built-in function _set_grad_enabled>
-                   torch/autograd/profiler.py(...): __exit__
-                     torch/_ops.py(...): __call__
-                       <built-in method _record_function_exit of PyCapsule object at 0xXXXXXXXXXXXX>
-               [memory]
?               ---------
+               -               [memory]
-               torch/profiler/profiler.py(...): __exit__
-                 torch/profiler/profiler.py(...): stop
-                   torch/profiler/profiler.py(...): _transit_action
-                     <built-in method get of dict object at 0xXXXXXXXXXXXX>
-                       enum.py(...): __hash__
-                         <built-in function hash>
-                     ... : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)
- 

To reproduce using a current master build:

/opt/pytorch/pytorch/test# python test_profiler_tree.py -v -k test_profiler_experimental_tree_cuda_detailed

It seems the void ..._kernel<...>(...) pattern does not match the std::enable_if<!(false), void>::type internal::gemvx::kernel usage.

CC @robieta

I cannot see the CI runs from your PR, so maybe the issue depends on the actual environment?

Versions

Current master build with CUDA 11.7.

cc @robieta @chaekit @aaronenyeshi @ngimel @nbcsm @guotuofeng @guyang3532 @gaoteng-git @tiffzhaofb

@ptrblck
Copy link
Collaborator Author
ptrblck commented Aug 14, 2022

By searching for this test a bit more it seems it also fails in CI in:

@ezyang ezyang added the oncall: profiler profiler-related issues (cpu, gpu, kineto) label Aug 15, 2022
@ezyang
Copy link
Contributor
ezyang commented Aug 15, 2022

This test is very wobbly, you should accept the new output with EXPECTTEST_ACCEPT=1

@sraikund16
Copy link
Contributor

Lets use #83606 to track this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: profiler profiler-related issues (cpu, gpu, kineto)
Projects
None yet
Development

No branches or pull requests

3 participants
0