Closed
Description
🐛 Describe the bug
Testing ToT triton before release/2.8 to assess the issues.
This test fails locally on amd gpus and is not confirmed to be common for cuda.
Affected tests: 42
reproducer: PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=0 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCPU.test_comprehensive__batch_norm_with_update_cpu_float16
Sample error:
{'message': 'Exception: Caused by sample input at index 0: SampleInput(input=Tensor[size=(5, 5, 5), device="cpu", dtype=torch.float16], args=(Tensor[size=(5,), device="cpu", dtype=torch.float16],Tensor[size=(5,), device="cpu", dtype=torch.float16],Tensor[size=(5,), device="cpu", dtype=torch.float16],Tensor[size=(5,), device="cpu", dtype=torch.float16],0.5,0.6), kwargs={}, broadcasts_input=False, name=\'\')\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=0 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCPU.test_comprehensive__batch_norm_with_update_cpu_float16\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0', 'text': 'Traceback (most recent call last):\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1135, in test_wrapper\n return test(*args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1434, in only_fn\n return fn(self, *args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2291, in wrapper\n fn(*args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1215, in dep_fn\n return fn(slf, *args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1215, in dep_fn\n return fn(slf, *args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1215, in dep_fn\n return fn(slf, *args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1612, in wrapper\n fn(*args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1534, in wrapper\n fn(*args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched\n return func(*newargs, **newkeywargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner\n return func(*args, **kwds)\n File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner\n return func(*args, **kwds)\n File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner\n return func(*args, **kwds)\n File "/tmp/pytorch/test/inductor/test_torchinductor_opinfo.py", line 962, in inner\n raise e\n File "/tmp/pytorch/test/inductor/test_torchinductor_opinfo.py", line 954, in inner\n fn(self, device, dtype, op)\n File "/tmp/pytorch/test/inductor/test_torchinductor_opinfo.py", line 1207, in test_comprehensive\n raise e\n File "/tmp/pytorch/test/inductor/test_torchinductor_opinfo.py", line 1189, in test_comprehensive\n self.check_model(\n File "/tmp/pytorch/test/inductor/test_torchinductor.py", line 539, in check_model\n self.assertEqual(\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4102, in assertEqual\n raise error_metas.pop()[0].to_error( # type: ignore[index]\nAssertionError: Tensor-likes are not close!\n\nMismatched elements: 4 / 125 (3.2%)\nGreatest absolute difference: 0.0010986328125 at index (3, 1, 4) (up to 1e-05 allowed)\nGreatest relative difference: 0.00571441650390625 at index (3, 1, 4) (up to 0.001 allowed)\n\nThe failure occurred for item [0]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3154, in wrapper\n method(*args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 426, in instantiated_test\n result = test(self, **param_kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1612, in wrapper\n fn(*args, **kwargs)\n File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1147, in test_wrapper\n raise e_tracked from e\nException: Caused by sample input at index 0: SampleInput(input=Tensor[size=(5, 5, 5), device="cpu", dtype=torch.float16], args=(Tensor[size=(5,), device="cpu", dtype=torch.float16],Tensor[size=(5,), device="cpu", dtype=torch.float16],Tensor[size=(5,), device="cpu", dtype=torch.float16],Tensor[size=(5,), device="cpu", dtype=torch.float16],0.5,0.6), kwargs={}, broadcasts_input=False, name=\'\')\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=0 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCPU.test_comprehensive__batch_norm_with_update_cpu_float16\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'}
Full list of failing tests:
test_comprehensive__batch_norm_with_update_cpu_float16
test_comprehensive__native_batch_norm_legit_cpu_float16
test_comprehensive__softmax_backward_data_cpu_float16
test_comprehensive_addr_cpu_float16
test_comprehensive_complex_cpu_float16
test_comprehensive_cross_cpu_float16
test_comprehensive_histc_cpu_float16
test_comprehensive_linalg_cross_cpu_float16
test_comprehensive_linalg_vecdot_cpu_float16
test_comprehensive_log_softmax_cpu_float16
test_comprehensive_masked_log_softmax_cpu_float16
test_comprehensive_masked_var_cpu_float16
test_comprehensive_nanmean_cpu_float16
test_comprehensive_nansum_cpu_float16
test_comprehensive_native_batch_norm_cpu_float16
test_comprehensive_native_layer_norm_cpu_float16
test_comprehensive_nn_functional_batch_norm_cpu_float16
test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cpu_float16
test_comprehensive_nn_functional_cosine_embedding_loss_cpu_float16
test_comprehensive_nn_functional_cosine_similarity_cpu_float16
test_comprehensive_nn_functional_grid_sample_cpu_float16
test_comprehensive_nn_functional_hinge_embedding_loss_cpu_float16
test_comprehensive_nn_functional_huber_loss_cpu_float16
test_comprehensive_nn_functional_instance_norm_cpu_float16
test_comprehensive_nn_functional_interpolate_bicubic_cpu_float16
test_comprehensive_nn_functional_interpolate_linear_cpu_float16
test_comprehensive_nn_functional_interpolate_trilinear_cpu_float16
test_comprehensive_nn_functional_multilabel_soft_margin_loss_cpu_float16
test_comprehensive_nn_functional_soft_margin_loss_cpu_float16
test_comprehensive_sub_cpu_float16
test_comprehensive_trapezoid_cpu_float16
test_comprehensive_trapz_cpu_float16
test_comprehensive_view_as_complex_cpu_float16
test_comprehensive_div_floor_rounding_cuda_float16
test_comprehensive_div_trunc_rounding_cuda_float16
test_comprehensive_floor_divide_cuda_float16
test_comprehensive_max_pool2d_with_indices_backward_cuda_float16
test_comprehensive_nanquantile_cuda_float64
test_comprehensive_remainder_cuda_float16
Versions
upstream pytorch + triton commit: triton-lang/triton@2ec711b
cc @chauhang @penguinwu @bertmaher @int3 @davidberard98 @nmacchioni @chenyang78 @embg @peterbell10 @aakhundov