`test_profiler_experimental_tree_cuda_detailed` fails with mismatches in the profile output #83381

ptrblck · 2022-08-13T23:59:52Z

🐛 Describe the bug

#80797 added test_profiler_experimental_tree_cuda_detailed which started to directly fail after the merge with:

Traceback (most recent call last):
  File "test_profiler_tree.py", line 55, in begin_unit_test_marker
    out = f(self)
  File "test_profiler_tree.py", line 786, in test_profiler_experimental_tree_cuda_detailed
    self.assertTreesMatch(
  File "test_profiler_tree.py", line 190, in assertTreesMatch
    self.assertExpectedInline(actual, expected, skip=1)
  File "/opt/conda/lib/python3.8/site-packages/expecttest/__init__.py", line 262, in assertExpectedInline
    self.assertMultiLineEqualMaybeCppStack(expect, actual, msg=help_text)
  File "/opt/conda/lib/python3.8/site-packages/expecttest/__init__.py", line 281, in assertMultiLineEqualMaybeCppStack
    self.assertMultiLineEqual(expec
8000
t, actual[:len(expect)], *args, **kwargs)
AssertionError: '    [1269 chars]     void ..._kernel<...>(...)\n              [7937 chars] ...' != '    [1269 chars]     std::enable_if<!(false), void>::type inte[7922 chars]    '
              test_profiler_tree.py(...): test_profiler_experimental_tree_cuda_detailed
                torch/profiler/profiler.py(...): __enter__
                  ...
                test_profiler_tree.py(...): step
                  <built-in method ones of type object at 0xXXXXXXXXXXXX>
                    aten::ones
                      aten::empty
                        [memory]
                      aten::fill_
                        cudaLaunchKernel
                          void at::native::vectorized_elementwise_kernel<...>(...)
                  nn.Module: Linear_0
                    <built-in method _get_tracing_state of PyCapsule object at 0xXXXXXXXXXXXX>
                    torch/nn/modules/linear.py(...): forward
                      torch/nn/modules/module.py(...): __getattr__
                      torch/nn/modules/module.py(...): __getattr__
                      <built-in function linear>
                        aten::linear
                          aten::t
                            aten::transpose
                              aten::as_strided
                          aten::addmm
                            cudaMemcpyAsync
                              Memcpy DtoD (Device -> Device)
                            cudaLaunchKernel
-                             void ..._kernel<...>(...)
+                             std::enable_if<!(false), void>::type internal::gemvx::kernel<int, int, float, float, float, float, false, true, false, false, 7, false, cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float> >(cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float>)
                            [memory]
                            aten::expand
                              aten::as_strided
                  torch/_tensor.py(...): backward
                    <built-in function _has_torch_function_unary>
                    torch/autograd/__init__.py(...): backward
                      <built-in function isinstance>
                      <built-in function isinstance>
                      <built-in function len>
                      torch/autograd/__init__.py(...): _tensor_or_tensors_to_tuple
                      torch/autograd/__init__.py(...): _make_grads
                        <built-in function isinstance>
                        <built-in method numel of Tensor object at 0xXXXXXXXXXXXX>
                        <built-in method ones_like of type object at 0xXXXXXXXXXXXX>
                          aten::ones_like
                            aten::empty_like
                              aten::empty_strided
                                [memory]
                            aten::fill_
                              cudaLaunchKernel
                                void at::native::vectorized_elementwise_kernel<...>(...)
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                      <built-in method run_backward of torch._C._EngineBase object at 0xXXXXXXXXXXXX>
                        autograd::engine::evaluate_function: AddmmBackward0
                          AddmmBackward0
                            aten::t
                              aten::transpose
                                aten::as_strided
                            aten::mm
                              cudaLaunchKernel
-                               void ..._kernel<...>(...)
+                               std::enable_if<!(false), void>::type internal::gemvx::kernel<int, int, float, float, float, float, false, true, false, false, 7, false, cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float> >(cublasGemvParams<cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float const>, cublasGemvTensorStridedBatched<float>, float>)
                              [memory]
                            aten::t
                              aten::transpose
                                aten::as_strided
                          aten::sum
                            aten::sum
                              cudaLaunchKernel
                                void at::native::reduce_kernel<...>(...)
                              [memory]
                          aten::view
                            aten::view
                        autograd::engine::evaluate_function: torch::autograd::AccumulateGrad
                          torch::autograd::AccumulateGrad
                            aten::add_
                              cudaLaunchKernel
                                void at::native::vectorized_elementwise_kernel<...>(...)
                            [memory]
                        autograd::engine::evaluate_function: TBackward0
                          TBackward0
                            aten::t
                              aten::transpose
                                aten::as_strided
                        autograd::engine::evaluate_function: torch::autograd::AccumulateGrad
                          torch::autograd::AccumulateGrad
                            aten::add_
                              cudaLaunchKernel
                                void at::native::vectorized_elementwise_kernel<...>(...)
                            [memory]
                    [memory]
                  torch/optim/optimizer.py(...): wrapper
                    <built-in method format of str object at 0xXXXXXXXXXXXX>
                    torch/autograd/profiler.py(...): __init__
                      <built-in method zeros of type object at 0xXXXXXXXXXXXX>
                        aten::zeros
                          aten::zeros
                            aten::empty
                              [memory]
                            aten::zero_
                    torch/autograd/profiler.py(...): __enter__
                      torch/_ops.py(...): __call__
                        <built-in method _record_function_enter of PyCapsule object at 0xXXXXXXXXXXXX>
                          Optimizer.step#SGD.step
                            aten::empty
                              [memory]
                            [memory]
                      [memory]
                    torch/optim/optimizer.py(...): _use_grad
                      <built-in function is_grad_enabled>
                      torch/autograd/grad_mode.py(...): __init__
                        <built-in function is_grad_enabled>
                        <built-in function _set_grad_enabled>
                      torch/optim/sgd.py(...): step
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        torch/_tensor.py(...): __hash__
                          <built-in function id>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        torch/_tensor.py(...): __hash__
                          <built-in function id>
                        <built-in method append of list object at 0xXXXXXXXXXXXX>
                        torch/optim/sgd.py(...): sgd
                          torch/optim/sgd.py(...): _single_tensor_sgd
                            <built-in method mul_ of Tensor object at 0xXXXXXXXXXXXX>
                              [memory]
                              aten::mul_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                              [memory]
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                            <built-in method mul_ of Tensor object at 0xXXXXXXXXXXXX>
                              [memory]
                              aten::mul_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                              [memory]
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                            <built-in method add_ of Tensor object at 0xXXXXXXXXXXXX>
                              aten::add_
                                cudaLaunchKernel
                                  void at::native::vectorized_elementwise_kernel<...>(...)
                        torch/_tensor.py(...): __hash__
                          <built-in function id>
                        torch/_tensor.py(...): __hash__
-                         <built-in function id>
-                     torch/autograd/grad_mode.py(...): __init__
-                       <built-in function is_grad_enabled>
-                       <built-in function _set_grad_enabled>
-                   torch/autograd/profiler.py(...): __exit__
-                     torch/_ops.py(...): __call__
-                       <built-in method _record_function_exit of PyCapsule object at 0xXXXXXXXXXXXX>
-               [memory]
?               ---------
+               -               [memory]
-               torch/profiler/profiler.py(...): __exit__
-                 torch/profiler/profiler.py(...): stop
-                   torch/profiler/profiler.py(...): _transit_action
-                     <built-in method get of dict object at 0xXXXXXXXXXXXX>
-                       enum.py(...): __hash__
-                         <built-in function hash>
-                     ... : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)
-

To reproduce using a current master build:

/opt/pytorch/pytorch/test# python test_profiler_tree.py -v -k test_profiler_experimental_tree_cuda_detailed

It seems the void ..._kernel<...>(...) pattern does not match the std::enable_if<!(false), void>::type internal::gemvx::kernel usage.

CC @robieta

I cannot see the CI runs from your PR, so maybe the issue depends on the actual environment?

Versions

Current master build with CUDA 11.7.

cc @robieta @chaekit @aaronenyeshi @ngimel @nbcsm @guotuofeng @guyang3532 @gaoteng-git @tiffzhaofb

The text was updated successfully, but these errors were encountered:

ptrblck · 2022-08-14T00:04:21Z

By searching for this test a bit more it seems it also fails in CI in:

Fix Tensor.__format__ for subclasses (#82764) #82766 - https://github.com/pytorch/pytorch/runs/7662839758?check_suite_focus=true

ezyang · 2022-08-15T03:03:32Z

This test is very wobbly, you should accept the new output with EXPECTTEST_ACCEPT=1

sraikund16 · 2024-05-10T20:32:56Z

Lets use #83606 to track this issue

ezyang added the oncall: profiler profiler-related issues (cpu, gpu, kineto) label Aug 15, 2022

sraikund16 closed this as completed May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`test_profiler_experimental_tree_cuda_detailed` fails with mismatches in the profile output #83381

`test_profiler_experimental_tree_cuda_detailed` fails with mismatches in the profile output #83381

Uh oh!

Uh oh!

Uh oh!

test_profiler_experimental_tree_cuda_detailed fails with mismatches in the profile output #83381

test_profiler_experimental_tree_cuda_detailed fails with mismatches in the profile output #83381

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!

Uh oh!

`test_profiler_experimental_tree_cuda_detailed` fails with mismatches in the profile output #83381

`test_profiler_experimental_tree_cuda_detailed` fails with mismatches in the profile output #83381