core dumped (ver1.0.0) #16183

tagucci · 2019-01-19T04:14:41Z

In my first trial of ver1.0.0, I encountered core dumped.

$ ipython
In [1]: import torch
[1]    45369 floating point exception (core dumped)  ipython

In my setting is as below:

PyTorch Version: 1.0
OS : Ubuntu 16.04.5 LTS
How you installed PyTorch: pip
Python version: 3.6.5
CUDA/cuDNN version: CUDA9.0/cudnn7.4.1
GPU model: Tesla V100

When I installed torch==0.4.1, it worked.
How can I correctly install and use ver1.0.0?

The text was updated successfully, but these errors were encountered:

strobelTha · 2019-01-25T13:32:26Z

Hello,

the same Problem occurs on my system. I added the stacktrace, system description and a list showing the conda environment used. As can be seen in the stacktrace the exception is thrown in _GLOBAL__sub_I_jit_avx512_common_conv_kernel.cpp. Does anyone else have this problem or knows how to solve it?

Setting:

PyTorch Version: 1.0
OS : Ubuntu 18.04 LTS
How you installed PyTorch: conda
Python version: 3.7.2
CUDA/cuDNN version: CUDA9.0/cudnn7.4.1
GPU model: Tesla V100

Conda environment:

Name	Version	Build	Channel
blas	1.0	mkl
ca-certificates	2018.12.5	0
certifi	2018.11.29	py37_0
cffi	1.11.5	py37he75722e_1
freetype	2.9.1	h8a8886c_1
intel-openmp	2019.1	144
jpeg	9b	h024ee3a_2
libedit	3.1.20181209	hc058e9b_0
libffi	3.2.1	hd88cf55_4
libgcc-ng	8.2.0	hdf63c60_1
libgfortran-ng	7.3.0	hdf63c60_0
libpng	1.6.36	hbc83047_0
libstdcxx-ng	8.2.0	hdf63c60_1
libtiff	4.0.10	h2733197_1001
mkl	2019.1	144
mkl_fft	1.0.10	py37ha843d7b_0
mkl_random	1.0.2	py37hd81dba3_0
ncurses	6.1	he6710b0_1
ninja	1.8.2	py37h6bb024c_1
numpy	1.15.4	py37h7e9f1db_0
numpy-base	1.15.4	py37hde5b4d6_0
olefile	0.46	py37_0
openssl	1.1.1a	h7b6447c_0
pillow	5.4.1	py37h34e0f95_0
pip	18.1	py37_0
pycparser	2.19	py37_0
python	3.7.2	h0371630_0
pytorch	1.0.0	py3.7_cuda9.0.176_cudnn7.4.1_1	pytorch
readline	7.0	h7b6447c_5
setuptools	40.6.3	py37_0
six	1.12.0	py37_0
sqlite	3.26.0	h7b6447c_0
tk	8.6.8	hbc83047_0
torchvision	0.2.1	py_2	pytorch
wheel	0.32.3	py37_0
xz	5.2.4	h14c3975_4
zlib	1.2.11	h7b6447c_3

Stacktrace:
Program received signal SIGFPE, Arithmetic exception.
0x00007fffb7e179c0 in _GLOBAL__sub_I_jit_avx512_common_conv_kernel.cpp () from /home/leo/miniconda3/envs/torch-dbg/lib/python3.6/site-packages/torch/lib/libmkldnn.so.0
(gdb) bt
#0 0x00007fffb7e179c0 in _GLOBAL__sub_I_jit_avx512_common_conv_kernel.cpp () from /home/leo/miniconda3/envs/torch-dbg/lib/python3.6/site-packages/torch/lib/libmkldnn.so.0
#1 0x00007ffff7de5733 in call_init (env=0x7fffffffe208, argv=0x7fffffffe1f8, argc=1, l=) at dl-init.c:72
#2 _dl_init (main_map=main_map@entry=0x555555a04090, argc=1, argv=0x7fffffffe1f8, env=0x7fffffffe208) at dl-init.c:119
#3 0x00007ffff7dea1ff in dl_open_worker (a=a@entry=0x7fffffffba80) at dl-open.c:522
#4 0x00007ffff792c2df in __GI__dl_catch_exception (exception=0x7fffffffba60, operate=0x7ffff7de9dc0 <dl_open_worker>, args=0x7fffffffba80) at dl-error-skeleton.c:196
#5 0x00007ffff7de97ca in _dl_open (file=0x7ffff68d4cb0 "/home/leo/miniconda3/envs/torch-dbg/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so", mode=-2147483391,
caller_dlopen=0x55555574052a <_PyImport_FindSharedFuncptr+138>, nsid=, argc=1, argv=, env=0x7fffffffe208) at dl-open.c:605
#6 0x00007ffff75c1f96 in dlopen_doit (a=a@entry=0x7fffffffbcb0) at dlopen.c:66
#7 0x00007ffff792c2df in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffbc50, operate=0x7ffff75c1f40 <dlopen_doit>, args=0x7fffffffbcb0) at dl-error-skeleton.c:196
#8 0x00007ffff792c36f in __GI__dl_catch_error (objname=0x55555592eac0, errstring=0x55555592eac8, mallocedp=0x55555592eab8, operate=, args=) at dl-error-skeleton.c:215
#9 0x00007ffff75c2735 in _dlerror_run (operate=operate@entry=0x7ffff75c1f40 <dlopen_doit>, args=args@entry=0x7fffffffbcb0) at dlerror.c:162
#10 0x00007ffff75c2051 in __dlopen (file=, mode=) at dlopen.c:87
#11 0x000055555574052a in _PyImport_FindSharedFuncptr () at /tmp/build/80754af9/python_1546130271559/work/Python/dynload_shlib.c:95
#12 0x000055555576b2f0 in _PyImport_LoadDynamicModuleWithSpec () at /tmp/build/80754af9/python_1546130271559/work/Python/importdl.c:129
#13 0x000055555576b540 in _imp_create_dynamic_impl.isra.12 (file=0x0, spec=0x7ffff682d0b8) at /tmp/build/80754af9/python_1546130271559/work/Python/import.c:1994
#14 _imp_create_dynamic () at /tmp/build/80754af9/python_1546130271559/work/Python/clinic/import.c.h:289
#15 0x0000555555668711 in PyCFunction_Call () at /tmp/build/80754af9/python_1546130271559/work/Objects/methodobject.c:114
#16 0x00005555557164ad in do_call_core (kwdict=0x7ffff682eee8, callargs=0x7ffff6827e48, func=0x7ffff6bd0ea0) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5116
#17 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3404
#18 0x00005555556e58e4 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
#19 0x00005555556e6771 in fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4992
#20 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#21 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#22 0x00005555556e653b in _PyFunction_FastCall (globals=, nargs=2, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#23 fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
#24 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#25 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#26 0x00005555556e653b in _PyFunction_FastCall (globals=, nargs=1, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#27 fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
#28 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#29 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#30 0x00005555556e653b in _PyFunction_FastCall (globals=, nargs=1, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#31 fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
#32 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#33 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#34 0x00005555556e653b in _PyFunction_FastCall (globals=, nargs=2, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#35 fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
#36 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#37 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#38 0x00005555556e6bab in _PyFunction_FastCall (globals=, nargs=2, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#39 _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5035
#40 0x0000555555665b0f in _PyObject_FastCallDict () at /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2310
#41 0x00005555556a7810 in _PyObject_CallMethodIdObjArgs () at /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2796
#42 0x000055555565cb10 in PyImport_ImportModuleLevelObject () at /tmp/build/80754af9/python_1546130271559/work/Python/import.c:1578
#43 0x0000555555713a8b in import_name (level=0x555555892aa0 <small_ints+160>, fromlist=0x7ffff69f1198, name=0x7ffff69edf30, f=0x555555984fa8) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5245
#44 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:2899
#45 0x00005555556e7289 in _PyEval_EvalCodeWithName (qualname=0x0, name=, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=, kwargs=0x0, kwnames=0x0, argcount=0,
args=0x0, locals=0x7ffff6ad7480, globals=0x7ffff6ad7480, _co=0x7ffff6ad8db0) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
#46 PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4187
#47 0x00005555556e801c in PyEval_EvalCode (co=co@entry=0x7ffff6ad8db0, globals=globals@entry=0x7ffff6ad7480, locals=locals@entry=0x7ffff6ad7480)
at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:731
#48 0x000055555570ec8b in builtin_exec_impl.isra.11 (locals=0x7ffff6ad7480, globals=0x7ffff6ad7480, source=0x7ffff6ad8db0) at /tmp/build/80754af9/python_1546130271559/work/Python/bltinmodule.c:983
#49 builtin_exec () at /tmp/build/80754af9/python_1546130271559/work/Python/clinic/bltinmodule.c.h:283
---Type to continue, or q to quit---
#50 0x0000555555668711 in PyCFunction_Call () at /tmp/build/80754af9/python_1546130271559/work/Objects/methodobject.c:114
#51 0x00005555557164ad in do_call_core (kwdict=0x7ffff6ad7558, callargs=0x7ffff6ad9208, func=0x7ffff7e63990) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5116
#52 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3404
#53 0x00005555556e58e4 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
#54 0x00005555556e6771 in fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4992
#55 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#56 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#57 0x00005555556e653b in _PyFunction_FastCall (globals=, nargs=2, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#58 fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
#59 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#60 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#61 0x00005555556e653b in _PyFunction_FastCall (globals=, nargs=1, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#62 fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
#63 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#64 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#65 0x00005555556e653b in _PyFunction_FastCall (globals=, nargs=2, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#66 fast_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
#67 0x00005555556ec505 in call_function () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
#68 0x000055555571138a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
#69 0x00005555556e6bab in _PyFunction_FastCall (globals=, nargs=2, args=, co=) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
#70 _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5035
#71 0x0000555555665b0f in _PyObject_FastCallDict () at /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2310
#72 0x00005555556a7810 in _PyObject_CallMethodIdObjArgs () at /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2796
#73 0x000055555565cb10 in PyImport_ImportModuleLevelObject () at /tmp/build/80754af9/python_1546130271559/work/Python/import.c:1578
#74 0x0000555555713a8b in import_name (level=0x555555892aa0 <small_ints+160>, fromlist=0x55555584bb30 <_Py_NoneStruct>, name=0x7ffff6acfed8, f=0x7ffff7e45a38)
at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5245
#75 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:2899
#76 0x00005555556e7289 in _PyEval_EvalCodeWithName (qualname=0x0, name=, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=, kwargs=0x0, kwnames=0x0, argcount=0,
args=0x0, locals=0x7ffff6b94120, globals=0x7ffff6b94120, _co=0x7ffff6b0eed0) at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
#77 PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4187
#78 0x00005555556e801c in PyEval_EvalCode (co=co@entry=0x7ffff6b0eed0, globals=globals@entry=0x7ffff6b94120, locals=locals@entry=0x7ffff6b94120)
at /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:731
#79 0x000055555576a3c4 in run_mod () at /tmp/build/80754af9/python_1546130271559/work/Python/pythonrun.c:1025
#80 0x00005555556321e6 in PyRun_InteractiveOneObjectEx (fp=fp@entry=0x7ffff7bb0a00 <IO_2_1_stdin>, filename=filename@entry=0x7ffff6b50998, flags=flags@entry=0x7fffffffdfec)
at /tmp/build/80754af9/python_1546130271559/work/Python/pythonrun.c:246
#81 0x000055555563239c in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff7bb0a00 <IO_2_1_stdin>, filename_str=filename_str@entry=0x5555557a224e "", flags=flags@entry=0x7fffffffdfec)
at /tmp/build/80754af9/python_1546130271559/work/Python/pythonrun.c:114
#82 0x000055555563243c in PyRun_AnyFileExFlags (fp=fp@entry=0x7ffff7bb0a00 <IO_2_1_stdin>, filename=0x5555557a224e "", closeit=closeit@entry=0, flags=flags@entry=0x7fffffffdfec)
at /tmp/build/80754af9/python_1546130271559/work/Python/pythonrun.c:75
#83 0x0000555555634237 in run_file (p_cf=0x7fffffffdfec, filename=, fp=0x7ffff7bb0a00 <IO_2_1_stdin>) at /tmp/build/80754af9/python_1546130271559/work/Modules/main.c:340
#84 Py_Main (argc=1, argv=0x5555558a8260) at /tmp/build/80754af9/python_1546130271559/work/Modules/main.c:811
#85 0x000055555563702e in main () at /tmp/build/80754af9/python_1546130271559/work/Programs/python.c:69
#86 0x00007ffff77e6b97 in __libc_start_main (main=0x555555636f40

, argc=1, argv=0x7fffffffe1f8, init=, fini=, rtld_fini=, stack_end=0x7fffffffe1e8)
at ../csu/libc-start.c:310
#87 0x0000555555717e0e in _start () at ../sysdeps/x86_64/elf/start.S:103
(gdb)

vvishal · 2019-01-26T05:39:24Z

The problem is with mkl-dnn version 0.14.0 that is bundled with the PyPy package. This is an issue that was resolved in later versions, see uxlfoundation/oneDNN#215 and uxlfoundation/oneDNN@a5f6077. Can the maintainers please update the PyPy package to include a more recent build of mkl-dnn, Thank you!

yf225 · 2019-01-27T21:25:37Z

cc @soumith @pjh5

yf225 · 2019-01-28T18:39:40Z

@vvishal this is fixed in nightly builds and also v1.0.1

vvishal · 2019-01-28T23:04:27Z

Will, Thank you very much for your prompt attention. Installing via pip install -U still says 1.0.0 is the latest version. Do we need to do anything different to get v1.0.1? Best, Vishal

…

On Mon, Jan 28, 2019 at 10:40 AM Will Feng ***@***.***> wrote: @vvishal <https://github.com/vvishal> this is fixed in nightly builds and also v1.0.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16183 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADD4XRmgYicFEgW-6I0CyfmgIW1PM58uks5vH0QUgaJpZM4aI85s> .

yf225 · 2019-01-28T23:34:22Z

@vvishal v1.0.1 is not released yet, you can try out the nightly version: pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu90/torch_nightly.html

ehtom · 2019-01-30T14:34:11Z

@yf225 -- Thank you so much for all the comments so far, I have been having the same problem and this thread helped a lot.

Not sure if this is the right place to report this, but I still have the same kind of MKL-DNN related segmentation fault even in the nightly build for CUDA 10.

Infos: Ubuntu18.04/CUDA10/nightlybuild/conda install.

The gdb stacktrace points to a crash in libcaffe2.so's MKL-DNN functions targetting AVX512 Skylake-Server instructions. Before I was getting the exact same trace as @strobelTha

Going to try recompiling everything from source now to see if that helps matters.

vvishal · 2019-01-30T16:09:51Z

Yes, turns out you need HEAD of mkldnn - none of the releases including 0.17.2 have the full fix. The requisite patch went in around Dec 2018. As a temporary work around, you can clone and build mkldnn and simply replace the bundled shared libraries with the ones you build - has been working for me so far, caveat emptor. :-) Vishal

…

On Wed, Jan 30, 2019 at 6:34 AM ehtom ***@***.***> wrote: @yf225 <https://github.com/yf225> -- Thank you so much for all the comments so far, I have been having the same problem and this thread helped a lot. Not sure if this is the right place to report this, but I still have the same kind of MKL-DNN related segmentation fault even in the nightly build for CUDA 10. Infos: Ubuntu18.04/CUDA10/nightlybuild/conda install. The gdb stacktrace points to a crash in libcaffe2.so's MKL-DNN functions targetting AVX512 Skylake-Server instructions. Before I was getting the exact same trace as @strobelTha <https://github.com/strobelTha> Going to try recompiling everything from source now to see if that helps matters. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16183 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADD4XY307IkFpZR9MT9oxQETLdPgWi0vks5vIa2GgaJpZM4aI85s> .

vpirogov · 2019-01-30T16:38:03Z

@vvishal, could you please point to the patch you are referring to? If you are right and the issue is reproducible in PyTorch v1.0.1 we might want to backport that patch and release MKL-DNN v.0.17.3.

ehtom · 2019-01-30T19:50:28Z

@vvishal, thanks!

Your solution seems to work for me as well. I replaced the current branches of ideep and mkl-dnn in third-party/ with their current master branch and compiled from source.

@vpirogov, I am not sure which update exactly fixed it in mkl-dnn but from the look at its history it has quite a number of recent AVX512 updates (even since December).

vvishal · 2019-01-31T00:59:31Z

It's on line 216 in src/cpu/xbyak_util.h in the mkl dnn sources, the correct line should read: cores_sharing_data_cache[data_cache_levels] = (std::max)(actual_logical_cores / smt_width, 1u); If the max() is not done, you get zero under some circumstances. This results in getCacheSize() causing a divide by zero and that can get called from multiple places leading to a slightly different stack trace, but essentially the same problem. I think this line was added in commit 67393d999591c88f03d5b09d545b1bf19c46f836. Thanks! Vishal

…

On Wed, Jan 30, 2019 at 8:38 AM Vadim Pirogov ***@***.***> wrote: @vvishal <https://github.com/vvishal>, could you please point to the patch you are referring to? If you are right and the issue is reproducible in PyTorch v1.0.1 we might want to backport that patch and release MKL-DNN v.0.17.3. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16183 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADD4XZ3QJw56BKckibs1m2sxzPLZf-_Wks5vIcqLgaJpZM4aI85s> .

) Summary: Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183 Pull Request resolved: #16653 Differential Revision: D13918278 Pulled By: soumith fbshipit-source-id: b9c09c50ef188b4099966216e155c9f3f2542276

zou3519 · 2019-02-04T17:01:17Z

Should be fixed by #16653

mitar · 2019-02-12T20:36:37Z

I think that #16653 was reverted by #16660. So is this fixed or not?

strobelTha · 2019-02-13T08:29:56Z

Hello,

big thanks to @vvishal building with latest mkl-dnn worked for me. One can easily do this by cloning the pytorch repo, navigating to the mkl-dnn subfolder and checking out the latest version. After that one can build pytorch as usual.

The needed commands (from the cloned pytorch repos main folder):

cd third_party/ideep/mkl-dnn
git checkout origin/master
cd ../../..
python setup.py install

Summary: Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183 Pull Request resolved: #17107 Differential Revision: D14097600 Pulled By: yinghai fbshipit-source-id: 2baa44e211ce37fbdf01585344c98745f5ba008c

zhanwenchen · 2019-03-26T02:19:18Z

Hello,

big thanks to @vvishal building with latest mkl-dnn worked for me. One can easily do this by cloning the pytorch repo, navigating to the mkl-dnn subfolder and checking out the latest version. After that one can build pytorch as usual.

The needed commands (from the cloned pytorch repos main folder):

cd third_party/ideep/mkl-dnn
git checkout origin/master
cd ../../..
python setup.py install

The mkl_dnn submodules is already at HEAD=0.17.3. git checkout origin/master changes nothing. The only solution for me is to download the mkl_dnn 0.18.1 and just pasted the folder into it. It has to be 0.18.1: 0.17.4 causes the same error.

- See http://nvbugs/2470530 and http://nvbugs/2506132 and pytorch/pytorch#16183

zou3519 added the module: crash Problem manifests as a hard crash, as opposed to a RuntimeError label Jan 22, 2019

yf225 closed this as completed Jan 28, 2019

soumith reopened this Jan 31, 2019

gujinghui mentioned this issue Feb 1, 2019

Upgrade mkl-dnn to v0.17.3 to fix core dump issue (github#16183) #16653

Closed

zou3519 closed this as completed Feb 4, 2019

ngimel mentioned this issue Feb 13, 2019

Floating Point Exception on PyTorch nightlies #17029

Closed

gujinghui mentioned this issue Feb 14, 2019

Upgrade mkl-dnn to v0.17.3 to fix core dump issue #17107

Closed

kaixih pushed a commit to kaixih/tensorflow that referenced this issue Jun 5, 2019

Patch MKL-DNN to fix FP exception bug

b83fdbd

- See http://nvbugs/2470530 and http://nvbugs/2506132 and pytorch/pytorch#16183

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core dumped (ver1.0.0) #16183

core dumped (ver1.0.0) #16183

core dumped (ver1.0.0) #16183

core dumped (ver1.0.0) #16183

Comments