[chalf] enable testing for multiple ops #77405

kshitij12345 · 2022-05-13T07:55:56Z

Enable for permute, split, split_with_sizes, select, ravel, reshape, reshape_as, unfold, squeeze, unsqueeze, transpose

facebook-github-bot · 2022-05-13T07:56:02Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/77405
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit f22d099 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mruberry · 2022-05-13T13:14:06Z

torch/testing/_internal/common_methods_invocations.py

@@ -18467,6 +18466,12 @@ def __init__(
    PythonRefInfo(
        "_refs.permute",
        torch_opinfo_name="permute",
+        skips=(
+            DecorateInfo(unittest.expectedFailure, 'TestCommon',


What's going on here?

Oops forgot to add the error as comment

RuntimeError: "index_select_cuda" not implemented for 'ComplexHalf'

mruberry

Nice changes -- just add that comment, please

…lop/chalf/enable-testing-3

…shitij12345/pytorch into develop/chalf/enable-testing-3

kshitij12345 · 2022-05-13T20:26:56Z

@pytorchbot merge this please

github-actions · 2022-05-13T20:32:30Z

Hey @kshitij12345.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

malfet · 2022-05-14T00:23:50Z

@pytorchbot revert this please, as it caused torch_nn to fail with SIGIOT, see https://hud.pytorch.org/pytorch/pytorch/commit/fff560cb6e4232778cefe9b1a6ed78463b4b9e54

This reverts commit fff560c. Reverted #77405 on behalf of https://github.com/malfet

malfet · 2022-05-14T01:12:05Z

From the log it looks like it was triggered by SIGIOT while running test_reference_testing_linalg_tensorsolve_cuda_complex128:

2022-05-13T22:34:05.0889562Z   test_reference_testing_linalg_tensorsolve_cuda_complex128 (__main__.TestCommonCUDA) ... python: /opt/conda/conda-bld/magma-cuda113_1619629459349/work/interface_cuda/interface.cpp:901: void magma_queue_create_from_cuda_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue**, const char*, const char*, int): Assertion `queue->dCarray__ != __null' failed.
2022-05-13T22:34:06.9287544Z Traceback (most recent call last):
2022-05-13T22:34:06.9288191Z   File "test/run_test.py", line 1072, in <module>
2022-05-13T22:34:06.9291356Z     main()
2022-05-13T22:34:06.9291930Z   File "test/run_test.py", line 1050, in main
2022-05-13T22:34:06.9295361Z     raise RuntimeError(err_message)
2022-05-13T22:34:06.9295734Z RuntimeError: test_ops failed! Received signal: SIGIOT

And since coredumps are not preserved as artifacts, one can get a backtrace by installing wheel package and running gdb as shown below:

$ gdb /opt/conda/bin/python core.936  -ex "bt"
GNU gdb (Ubuntu 8.2-0ubuntu1~16.04.1) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/conda/bin/python...done.

warning: core file may not match specified executable file.
[New LWP 936]
[New LWP 939]
[New LWP 948]
[New LWP 942]
[New LWP 941]
[New LWP 940]
[New LWP 947]
[New LWP 943]
[New LWP 944]
[New LWP 945]
[New LWP 946]
[New LWP 949]
[New LWP 950]
[New LWP 951]
[New LWP 952]
[New LWP 953]
[New LWP 968]
[New LWP 969]
[New LWP 957]
[New LWP 961]
[New LWP 956]
[New LWP 967]
[New LWP 959]
[New LWP 954]
[New LWP 958]
[New LWP 1036]
[New LWP 963]
[New LWP 965]
[New LWP 966]
[New LWP 964]
[New LWP 970]
[New LWP 962]
[New LWP 1038]
[New LWP 960]
[New LWP 1037]
[New LWP 1040]
[New LWP 1041]
[New LWP 1039]
[New LWP 971]
[New LWP 955]
[New LWP 1042]

warning: Could not load shared library symbols for /usr/lib/x86_64-linux-gnu/libcuda.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/conda/bin/python test_ops.py -v --import-slow-tests --import-disabled-test'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f9eb38d9438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f9eb47ad700 (LWP 936))]
#0  0x00007f9eb38d9438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f9eb38db03a in __GI_abort () at abort.c:89
#2  0x00007f9eb38d1be7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7f9d6788a94a "queue->dCarray__ != __null", file=file@entry=0x7f9d6788a628 "/opt/conda/conda-bld/magma-cuda113_1619629459349/work/interface_cuda/interface.cpp", 
    line=line@entry=901, 
    function=function@entry=0x7f9d6788ab00 <magma_queue_create_from_cuda_internal::__PRETTY_FUNCTION__> "void magma_queue_create_from_cuda_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue**, const char*, const char*, int)")
    at assert.c:92
#3  0x00007f9eb38d1c92 in __GI___assert_fail (assertion=0x7f9d6788a94a "queue->dCarray__ != __null", file=0x7f9d6788a628 "/opt/conda/conda-bld/magma-cuda113_1619629459349/work/interface_cuda/interface.cpp", line=901, 
    function=0x7f9d6788ab00 <magma_queue_create_from_cuda_internal::__PRETTY_FUNCTION__> "void magma_queue_create_from_cuda_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue**, const char*, const char*, int)") at assert.c:101
#4  0x00007f9d6752432c in magma_queue_create_from_cuda_internal () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_linalg.so
#5  0x00007f9d674e2abb in at::native::MAGMAQueue::MAGMAQueue(long) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_linalg.so
#6  0x00007f9d674d9dfe in at::native::lazy_linalg::lu_solve_trans_dispatch(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::native::TransposeType) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_linalg.so
#7  0x00007f9e7936bcd7 in at::native::linalg_solve_out_info(at::Tensor&, at::Tensor&, at::Tensor const&, at::Tensor const&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#8  0x00007f9e7936c202 in at::native::linalg_solve_out(at::Tensor const&, at::Tensor const&, at::Tensor&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#9  0x00007f9e72040b8d in at::(anonymous namespace)::(anonymous namespace)::wrapper_out_linalg_solve_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
#10 0x00007f9e79d5f562 in at::_ops::linalg_solve_out::call(at::Tensor const&, at::Tensor const&, at::Tensor&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007f9e7935ced1 in at::native::linalg_solve(at::Tensor const&, at::Tensor const&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007f9e720409f1 in at::(anonymous namespace)::(anonymous namespace)::wrapper__linalg_solve(at::Tensor const&, at::Tensor const&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
#13 0x00007f9e72040a53 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper__linalg_solve>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) ()
   from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
#14 0x00007f9e79d162f2 in at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) const [clone .isra.203] () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#15 0x00007f9e79d177e6 in at::_ops::linalg_solve::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#16 0x00007f9e7af6a670 in torch::autograd::VariableType::(anonymous namespace)::linalg_solve(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#17 0x00007f9e7af6b186 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::linalg_solve>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#18 0x00007f9e79d5d68f in at::_ops::linalg_solve::call(at::Tensor const&, at::Tensor const&) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#19 0x00007f9e794bcb3b in at::native::linalg_tensorsolve(at::Tensor const&, at::Tensor const&, c10::OptionalArrayRef<long>) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#20 0x00007f9e7a0af9bd in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::OptionalArrayRef<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper__linalg_tensorsolve>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::OptionalArrayRef<long> > >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::OptionalArrayRef<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::OptionalArrayRef<long>) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#21 0x00007f9e79ba37b3 in at::_ops::linalg_tensorsolve::call(at::Tensor const&, at::Tensor const&, c10::OptionalArrayRef<long>) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#22 0x00007f9e85ca4dc4 in torch::autograd::THPVariable_linalg_tensorsolve(_object*, _object*, _object*) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#23 0x000056234c6fd078 in cfunction_call_varargs (kwargs=0x7f9d7648a4b0, args=0x7f9d7610d320, func=0x7f9e60d2e500) at /home/builder/tkoch/workspace/python_1648536129212/work/Objects/call.c:755
#24 PyCFunction_Call (kwargs=0x7f9d7648a4b0, args=0x7f9d7610d320, func=0x7f9e60d2e500) at /home/builder/tkoch/workspace/python_1648536129212/work/Objects/call.c:786
#25 do_call_core (kwdict=0x7f9d7648a4b0, callargs=0x7f9d7610d320, func=0x7f9e60d2e500) at /home/builder/tkoch/workspace/python_1648536129212/work/Python/ceval.c:4641
#26 _PyEval_EvalFrameDefault (f=0x7f9d75189590, throwflag=<optimized out>) at /home/builder/tkoch/workspace/python_1648536129212/work/Python/ceval.c:3191
#27 0x000056234c64be85 in PyEval_EvalFrameEx (throwflag=0, f=0x7f9d75189590) at /home/builder/tkoch/workspace/python_1648536129212/work/Python/ceval.c:547
#28 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x7f9d71bb0748, kwargs=0x7f9d71bb0750, kwcount=2, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, 
    name=0x7f9eb45d11b0, qualname=0x7f9dd56bfa30) at /home/builder/tkoch/workspace/python_1648536129212/work/Python/ceval.c:3930
#29 0x000056234c64d83e in _PyFunction_FastCallDict (kwargs=<optimized out>, nargs=<optimized out>, args=0x7ffe43475540, func=<optimized out>) at /home/builder/tkoch/workspace/python_1648536129212/work/Objects/call.c:376
#30 _PyObject_FastCallDict (callable=<optimized out>, args=
8000
0x7ffe43475540, nargs=<optimized out>, kwargs=<optimized out>) at /home/builder/tkoch/workspace/python_1648536129212/work/Objects/call.c:98
#31 0x000056234c6b94bc in _PyObject_Call_Prepend (kwargs=0x7f9d7686fa50, args=0x7f9d70ccc410, obj=<optimized out>, callable=0x7f9d793de830) at /home/builder/tkoch/workspace/python_1648536129212/work/Objects/call.c:906
#32 slot_tp_call (self=<optimized out>, args=0x7f9d70ccc410, kwds=0x7f9d7686fa50) at /home/builder/tkoch/workspace/python_1648536129212/work/Objects/typeobject.c:6402
#33 0x000056234c64db94 in PyObject_Call (callable=0x7f9d7903cdd0, args=<optimized out>, kwargs=<optimized out>) at /home/builder/tkoch/workspace/python_1648536129212/work/Objects/call.c:245
#34 0x000056234c6f7c58 in do_call_core (kwdict=0x7f9d7686fa50, callargs=0x7f9d70ccc410, func=0x7f9d7903cdd0) at /home/builder/tkoch/workspace/python_1648536129212/work/Python/ceval.c:4645

Core file can be downloaded from https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/2321687135/1/coredumps-default-1-4-linux.4xlarge.nvidia.gpu/test/core.936 and offending whl package from https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/2321687135/1/linux-xenial-cuda11.3-py3.7-gcc7/artifacts.zip, which are mentioned among the artifacts lists at https://hud.pytorg.org/pr/77405

kshitij12345 · 2022-05-14T06:37:53Z

This PR didn't touch the failing test. @lezcano have you seen such failure previously?

lezcano · 2022-05-14T11:35:50Z

Magma strikes again, this time with a new one cc @IvanYashchuk @xwang233 . It looks like a memory corruption or insufficient resources?

@malfet does the test fail consistnetly?

kshitij12345 · 2022-05-17T12:34:33Z

@malfet looks like it was a one off issue.

Can you approve this again so that I can land it?

Thanks!

anjali411 · 2022-05-17T14:56:03Z

As discussed above, the failure looks unrelated (but recurrent). Should we disable that test while we figure out the issue? @lezcano

lezcano · 2022-05-17T15:35:42Z

Are these errors caused by this PR or are they coming from some flaky behaviour in CI?
If it's the latter one, I guess we can skip them for now, but we should look into what's causing them. Could it be something related to the removal of torch.solve? cc @IvanYashchuk who wrote the removal.
I wonder whether these still happen on top of #74046, which heavily simplifies the implementation of linalg.solve

kshitij12345 · 2022-05-17T16:00:05Z

AFAIK, the failure isn't directly related to this PR as it doesn't touch that function or test. Seems to be a flaky case.

Will close this PR and open a new one with this branch for merging. (IIRC, reopening and remerging the same PR leads to issues internally).

Reland: #77405 Ref: #74537 Enable for `permute, split, split_with_sizes, select, ravel, reshape, reshape_as, unfold, squeeze, unsqueeze, transpose` Pull Request resolved: #77656 Approved by: https://github.com/anjali411

Summary: Reland: #77405 Ref: #74537 Enable for `permute, split, split_with_sizes, select, ravel, reshape, reshape_as, unfold, squeeze, unsqueeze, transpose` Pull Request resolved: #77656 Approved by: https://github.com/anjali411 Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/687ab97338c434f2d428325fd742ae7cd3042b53 Reviewed By: seemethere Differential Revision: D36494122 Pulled By: seemethere fbshipit-source-id: cb2803bf28c9be46547437c3b52e3dfb63b52336

[chalf] enable testing for multiple ops

f9bf972

facebook-github-bot added the cla signed label May 13, 2022

pytorchbot added the open source label May 13, 2022

update skips and meta-data

16accbf

kshitij12345 requested a review from anjali411 May 13, 2022 12:51

kshitij12345 marked this pull request as ready for review May 13, 2022 12:51

kshitij12345 requested review from mruberry and ngimel as code owners May 13, 2022 12:51

mruberry reviewed May 13, 2022

View reviewed changes

mruberry approved these changes May 13, 2022

View reviewed changes

kshitij12345 added 5 commits May 13, 2022 13:48

add comment for skip

2d2f24a

Merge branch 'master' into develop/chalf/enable-testing-3

212451a

Merge branch 'master' of https://github.com/pytorch/pytorch into deve…

59a9c27

…lop/chalf/enable-testing-3

Merge branch 'develop/chalf/enable-testing-3' of https://github.com/k…

79a4574

…shitij12345/pytorch into develop/chalf/enable-testing-3

update skip dtype

4dd4919

pytorchmergebot added the Merged label May 13, 2022

pytorchmergebot closed this in fff560c May 13, 2022

pytorchmergebot added a commit that referenced this pull request May 14, 2022

Revert "[chalf] enable testing for multiple ops (#77405)"

d0ce792

This reverts commit fff560c. Reverted #77405 on behalf of https://github.com/malfet

pytorchmergebot added the Reverted label May 14, 2022

kshitij12345 reopened this May 16, 2022

kshitij12345 added the ciflow/all label May 16, 2022

Merge branch 'pytorch:master' into develop/chalf/enable-testing-3

8000

f22d099

anjali411 approved these changes May 17, 2022

View reviewed changes

kshitij12345 closed this May 17, 2022

kshitij12345 mentioned this pull request May 18, 2022

[reland][chalf] enable testing for multiple ops #77656

Closed

janeyx99 mentioned this pull request May 18, 2022

[Meta] CI Revert Tracker #66178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[chalf] enable testing for multiple ops #77405

[chalf] enable testing for multiple ops #77405

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[chalf] enable testing for multiple ops #77405

[chalf] enable testing for multiple ops #77405

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!