segfault in python multithreaded setting #1868

soumith · 2017-06-21T18:36:46Z

Zihang Dai reports (and I've reproduced) that the autograd engine is not thread-safe.
Here's a repro script: https://gist.github.com/zihangdai/fc8f76fbb8a0f6323a6b31e6d98ceb50
Run it a few times, occassionally it segfaults.

Segfault is from a much different location, when cleaning up imports:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=av@entry=0x7ffff70a8b20 <main_arena>) at malloc.c:4167
4167    malloc.c: No such file or directory.
(gdb) where
#0  malloc_consolidate (av=av@entry=0x7ffff70a8b20 <main_arena>) at malloc.c:4167
#1  0x00007ffff6d64678 in _int_free (av=0x7ffff70a8b20 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4075
#2  0x00007ffff6d6853c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968
#3  0x00007ffff7a6a427 in dict_dealloc (mp=0x7ffff7ed0c58) at Objects/dictobject.c:1044
#4  0x00007ffff7a682c7 in insertdict_by_entry (mp=0x7ffff7ed0d70, key='build_time_vars', hash=<optimized out>, ep=<optimized out>, value=<optimized out>) at Objects/dictobject.c:519
#5  0x00007ffff7a6b79c in insertdict (value=None, hash=-295987683324531010, key='build_time_vars', mp=0x7ffff7ed0d70) at Objects/dictobject.c:556
#6  dict_set_item_by_hash_or_entry (value=None, ep=0x0, hash=-295987683324531010, key='build_time_vars',
    op={'__builtins__': {'bytearray': <type at remote 0x7ffff7d7c300>, 'IndexError': <type at remote 0x7ffff7d82bc0>, 'all': <built-in function all>, 'help': <_Helper at remote 0x7ffff7ed8c50>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7ffff7d82540>, 'unicode': <type at remote 0x7ffff7d99040>, 'UnicodeDecodeError': <type at remote 0x7ffff7d833e0>, 'memoryview': <type at remote 0x7ffff7d8d900>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2016 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x7ffff7ed8a10>, 'NameError': <type at remote 0x7ffff7d82060>, 'BytesWarning': <type at remote 0x7ffff...(truncated)) at Objects/dictobject.c:795
#7  PyDict_SetItem (
    op={'__builtins__': {'bytearray': <type at remote 0x7ffff7d7c300>, 'IndexError': <type at remote 0x7ffff7d82bc0>, 'all': <built-in function all>, 'help': <_Helper at remote 0x7ffff7ed8c50>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7ffff7d82540>, 'unicode': <type at remote 0x7ffff7d99040>, 'UnicodeDecodeError': <type at remote 0x7ffff7d833e0>, 'memoryview': <type at remote 0x7ffff7d8d900>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2016 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x7ffff7ed8a10>, 'NameError': <type at remote 0x7ffff7d82060>, 'BytesWarning': <type at remote 0x7ffff...(truncated), key='build_time_vars', value=None)
    at Objects/dictobject.c:848
#8  0x00007ffff7a6ea8d in _PyModule_Clear (m=<optimized out>) at Objects/moduleobject.c:139
#9  0x00007ffff7aec4a1 in PyImport_Cleanup () at Python/import.c:512
#10 0x00007ffff7af957b in Py_Finalize () at Python/pythonrun.c:458
#11 0x00007ffff7b0f8e5 in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:670
#12 0x00007ffff6d04830 in __libc_start_main (main=0x4007f0 <main>, argc=2, argv=0x7fffffffe008, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdff8) at ../csu/libc-start.c:291
#13 0x0000000000400729 in _start ()

The text was updated successfully, but these errors were encountered:

soumith · 2017-06-21T18:37:11Z

also, he mentions:

I ran two versions of the actual training code, one with a thread lock on the model and the other without the lock. It turned out the one with lock is stilling running now (>1 hour), and the one without lock gave Segmentation fault sooner or later (I tried more than once).

apaszke · 2017-06-21T18:53:46Z

How do you know it's because of the engine? The stack trace points to Python interpreter shutdown

soumith · 2017-06-21T18:56:08Z

see his comment:

I ran two versions of the actual training code, one with a thread lock on the model and the other without the lock. It turned out the one with lock is stilling running now (>1 hour), and the one without lock gave Segmentation fault sooner or later (I tried more than once).

Of course, I'll try to repro the same.

apaszke · 2017-06-21T18:57:04Z

Yeah I know but the script doesn't really show what was guarded by the net_lock

zihangdai · 2017-06-21T23:43:06Z

What I locked was the forward and backward call of the module

Louis-Tian · 2017-07-31T03:57:30Z

Not sure it's the same issue. I am experiencing segfault with multithreading as well.

import torch
import torch.functional as f
from concurrent.futures import ThreadPoolExecutor as ThreadPool


def build(cuda=False):
    nn = torch.nn.Sequential(
        torch.nn.Linear(1024, 1024),
        torch.nn.Linear(1024, 1)
    )

    return nn.cuda() if cuda else nn

def train(nn, X, y, epoch=100):
    X = torch.autograd.Variable(X)
    y = torch.autograd.Variable(y)
    optim = torch.optim.SGD(nn.parameters(), lr=0.1)
    for i in range(epoch):
        yhat = nn(X)
        loss = ((yhat - y) ** 2).mean()
        loss.backward()
        optim.step()

def data(cuda=False):
    X = torch.rand(10, 1024)
    y = torch.rand((10, 1))
    return (X.cuda(), y.cuda()) if cuda else (X, y)

def cpu_run(i=None):
    nn = build(cuda=False)
    d = data(cuda=False)
    train(nn, *d)

def thread_cpu_run():
    pool = ThreadPool()
    threads = pool.map(cpu_run, list(range(5)))  
        
    return list(threads)

thread_cpu_run()

Starting program: /home/tianchuanting/miniconda3/bin/python speedtest.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: File "/home/tianchuanting/miniconda3/lib/libstdc++.so.6.0.19-gdb.py" auto-loading has been declined by your `auto-load safe-path' set t
o "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/tianchuanting/miniconda3/lib/libstdc++.so.6.0.19-gdb.py
line to your configuration file "/home/tianchuanting/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/tianchuanting/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[New Thread 0x7fffea7e0700 (LWP 5883)]
[New Thread 0x7fffe9bde700 (LWP 5884)]
[New Thread 0x7fffe8fdc700 (LWP 5885)]
[New Thread 0x7fffdbfff700 (LWP 5886)]
[New Thread 0x7fffdb3fd700 (LWP 5887)]

Thread 6 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdb3fd700 (LWP 5887)]
0x00007fffed4c1a97 in THRandom_random () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
(gdb) where
#0  0x00007fffed4c1a97 in THRandom_random () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
#1  0x00007fffed4c1af4 in THRandom_uniform () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
#2  0x00007fffed1f3910 in THFloatTensor_uniform () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
#3  0x00007fffedb8f11b in THPFloatTensor_uniform_ (self=0x7fffea9fc1c8, args=<optimized out>, kwargs=<optimized out>)
    at /home/tianchuanting/pytorch/torch/csrc/generic/TensorMethods.cpp:50822
#4  0x00007ffff7992df2 in _PyCFunction_FastCallDict (func_obj=0x7fffea9f98b8, args=0x7fffea9fb3a0, nargs=<optimized out>, kwargs=0x0)
    at Objects/methodobject.c:231
#5  0x00007ffff7a184ec in call_function (pp_stack=0x7fffdb3fb588, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4798
#6  0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#7  0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#8  0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7fffeaaad0d0) at Python/ceval.c:4915
#9  call_function (pp_stack=0x7fffdb3fb7b8, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#10 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#11 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7fffeabf00c0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=3, kwnames=0x0, kwargs=0x8, kwcount=0, kwstep=2, defs=0x7fffeaaa66f0, defcount=1, kwdefs=0x0, closure=0x7fffeaaa6908,
    name=0x7ffff7e15270, qualname=0x7fffeab20a30) at Python/ceval.c:4128
#12 0x00007ffff7a16cfc in _PyFunction_FastCallDict (func=0x7fffeaaad048, args=0x7fffdb3fb9f0, nargs=3, kwargs=0x0) at Python/ceval.c:5031
#13 0x00007ffff793bba6 in _PyObject_FastCallDict (func=0x7fffeaaad048, args=0x7fffdb3fb9f0, nargs=<optimized out>, kwargs=0x0)
    at Objects/abstract.c:2295
#14 0x00007ffff793bdfc in _PyObject_Call_Prepend (func=0x7fffeaaad048, obj=0x7fffea9e6fd0, args=0x7fffeacf6e48, kwargs=0x0)
    at Objects/abstract.c:2358
#15 0x00007ffff793be96 in PyObject_Call (func=0x7fffead7a408, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#16 0x00007ffff79b4233 in slot_tp_init (self=0x7fffea9e6fd0, args=0x7fffeacf6e48, kwds=0x0) at Objects/typeobject.c:6380
#17 0x00007ffff79a9d4c in type_call (type=<optimized out>, args=0x7fffeacf6e48, kwds=0x0) at Objects/typeobject.c:915
#18 0x00007ffff793bade in _PyObject_FastCallDict (func=0xb66868, args=<optimized out>, nargs=<optimized out>, kwargs=0x0)
    at Objects/abstract.c:2316
#19 0x00007ffff7a182bb in call_function (pp_stack=0x7fffdb3fbd08, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4822
#20 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#21 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7ffff6941a50, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=0, kwnames=0x7ffff68c5370, kwargs=0x7fffea9fb1e0, kwcount=1, kwstep=1, defs=0x7ffff68cf140, defcount=1, kwdefs=0x0, closure=0x0,
    name=0x7ffff68c5068, qualname=0x7ffff68c5068) at Python/ceval.c:4128
#22 0x00007ffff7a1848a in fast_function (kwnames=<optimized out>, nargs=0, stack=<optimized out>, func=0x7ffff6974e18) at Python/ceval.c:4939
#23 call_function (pp_stack=0x7fffdb3fbfa8, oparg=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:4819
#24 0x00007ffff7a1a8dd in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3300
#25 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7ffff6905e40, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=1, kwnames=0x7ffff7e12060, kwargs=0x7ffff7e12068, kwcount=0, kwstep=2, defs=0x7ffff682d098, defcount=1, kwdefs=0x0, closure=0x0,
    name=0x0, qualname=0x0) at Python/ceval.c:4128
#26 0x00007ffff7a16ee3 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x7ffff682d098, defcount=1, kwdefs=0x0, closure=0x0) at Python/ceval.c:4149
#27 0x00007ffff796eee1 in function_call (func=0x7fffea9efc80, arg=0x7fffea9e6208, kw=0x7fffea9f9438) at Objects/funcobject.c:604
#28 0x00007ffff793be96 in PyObject_Call (func=0x7fffea9efc80, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#29 0x00007ffff7a1c236 in do_call_core (kwdict=0x7fffea9f9438, callargs=<optimized out>, func=0x7fffea9efc80) at Python/ceval.c:5067
#30 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3366
#31 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#32 0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7fffea9ef950) at Python/ceval.c:4915
#33 call_function (pp_stack=0x7fffdb3fc508, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#34 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#35 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7fffeaa518a0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=2, kwnames=0x7ffff7e12060, kwargs=0x7ffff7e12068, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0,
    qualname=0x0) at Python/ceval.c:4128
#36 0x00007ffff7a16ee3 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4149
#37 0x00007ffff796eee1 in function_call (func=0x7fffea9ef840, arg=0x7fffeacf6d48, kw=0x7fffea9f96c0) at Objects/funcobject.c:604
#38 0x00007ffff793be96 in PyObject_Call (func=0x7fffea9ef840, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#39 0x00007ffff7a1c236 in do_call_core (kwdict=0x7fffea9f96c0, callargs=<optimized out>, func=0x7fffea9ef840) at Python/ceval.c:5067
#40 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3366
#41 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at Python/ceval.c:4880
#42 0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7ffff0c31a60) at Python/ceval.c:4915
#43 call_function (pp_stack=0x7fffdb3fca68, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#44 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#45 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#46 0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7ffff0c31c80) at Python/ceval.c:4915
#47 call_function (pp_stack=0x7fffdb3fcc98, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#48 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#49 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#50 0x00007ffff7a16e75 in _PyFunction_FastCallDict (func=0x7ffff0c31ae8, args=0x7fffdb3fce60, nargs=1, kwargs=0x0) at Python/ceval.c:4982
#51 0x00007ffff793bba6 in _PyObject_FastCallDict (func=0x7ffff0c31ae8, args=0x7fffdb3fce60, nargs=<optimized out>, kwargs=0x0)
    at Objects/abstract.c:2295
#52 0x00007ffff793bdfc in _PyObject_Call_Prepend (func=0x7ffff0c31ae8, obj=0x7fffea9e6f28, args=0x7ffff7e12048, kwargs=0x0)
    at Objects/abstract.c:2358
#53 0x00007ffff793be96 in PyObject_Call (func=0x7fffeaddca08, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#54 0x00007ffff7a68ae2 in t_bootstrap (boot_raw=0x7fffeaa5d7d8) at ./Modules/_threadmodule.c:998
#55 0x00007ffff76ba6ba in start_thread (arg=0x7fffdb3fd700) at pthread_create.c:333
#56 0x00007ffff6ad83dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

env

Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'0.2.0+8262920'

hahey · 2017-08-22T14:36:08Z

I got a similar segmentation fault problem with Louis-Tian due to random number generation.
My whole source code is a bit hard to attach here because there is dependency to data and code is quite long but the following is the gdb backtrace.

Thread 141 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff82ffd700 (LWP 15532)]
0x00007fffc8a82ba7 in THRandom_random () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
(gdb) bt
#0  0x00007fffc8a82ba7 in THRandom_random () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
#1  0x00007fffc8a82c04 in THRandom_uniform () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
#2  0x00007fffc87bedd7 in THFloatTensor_uniform () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
#3  0x00007fffd4df95fb in THPFloatTensor_uniform_ (self=<torch.FloatTensor at remote 0x7fffe91a9148>, args=(<float at remote 0x7fffedd09b10>, <float at remote 0x7ffff2de1f90>), kwargs=<optimized out>) at /pytorch/torch/csrc/generic/TensorMethods.cpp:50822
#4  0x00000000004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
#5  0x0000000000524414 in call_function (oparg=<optimized out>, pp_stack=0x7fff82ffb5a0) at ../Python/ceval.c:4705
#6  PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#7  0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fff82ffb6d0, func=<optimized out>) at ../Python/ceval.c:4803
#8  call_function (oparg=<optimized out>, pp_stack=0x7fff82ffb6d0) at ../Python/ceval.c:4730
#9  PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#10 0x000000000052e87a in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=(<cell at remote 0x7fffa3eb89a8>,), kwdefs=0x0, defcount=5, defs=0x7fffa3eadf28, kwcount=5, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    _co=<code at remote 0x7fffa3e45b70>) at ../Python/ceval.c:4018
#11 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#12 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#13 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#14 0x00000000005262af in ext_do_call (nk=<optimized out>, na=2, flags=<optimized out>, pp_stack=0x7fff82ffb988, func=<function at remote 0x7fffa3e49378>) at ../Python/ceval.c:5034
#15 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#16 0x000000000052e87a in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=(<cell at remote 0x7fffa3eb8a08>,), kwdefs=0x0, defcount=0, defs=0x0, kwcount=5, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    _co=<code at remote 0x7fffa3e505d0>) at ../Python/ceval.c:4018
#17 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#18 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#19 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#20 0x00000000004f413e in method_call.lto_priv () at ../Objects/classobject.c:330
#21 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#22 0x000000000054d359 in slot_tp_init () at ../Objects/typeobject.c:6268
#23 0x000000000055d17c in type_call.lto_priv () at ../Objects/typeobject.c:905
#24 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#25 0x00000000005262af in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fff82ffbd58, func=<type at remote 0x17cacb8>) at ../Python/ceval.c:5034
#26 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#27 0x000000000052e87a in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=(<cell at remote 0x7ffff68d2a38>,), kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    _co=<code at remote 0x7ffff5f42d20>) at ../Python/ceval.c:4018
#28 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#29 0x00000000004ebcc3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#30 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#31 0x00000000004f413e in method_call.lto_priv () at ../Objects/classobject.c:330
#32 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#33 0x000000000054d359 in slot_tp_init () at ../Objects/typeobject.c:6268
#34 0x000000000055d17c in type_call.lto_priv () at ../Objects/typeobject.c:905
#35 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#36 0x0000000000528d06 in do_call (nk=<optimized out>, na=<optimized out>, pp_stack=0x7fff82ffc120, func=<optimized out>) at ../Python/ceval.c:4936
#37 call_function (oparg=<optimized out>, pp_stack=0x7fff82ffc120) at ../Python/ceval.c:4732
#38 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#39 0x000000000052e12b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff5f53270>)
    at ../Python/ceval.c:4018
#40 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#41 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#42 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#43 0x00000000005262af in ext_do_call (nk=<optimized out>, na=1, flags=<optimized out>, pp_stack=0x7fff82ffc3d8, func=<function at remote 0x7fffa3e2eae8>) at ../Python/ceval.c:5034
#44 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#45 0x000000000052e12b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7fffa3f926f0>)
    at ../Python/ceval.c:4018
#46 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#47 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#48 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#49 0x00000000004f413e in method_call.lto_priv () at ../Objects/classobject.c:330
#50 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#51 0x000000000054d4f6 in slot_tp_call () at ../Objects/typeobject.c:6053
#52 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#53 0x00000000005262af in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fff82ffc778,
    func=<DAPmodel(_backward_hooks={}, _forward_hooks={}, _parameters={}, init_state=(<Variable at remote 0x7fffeb8259e8>, <Variable at remote 0x7fffeb825570>), _forward_pre_hooks={}, rnn_config={'rnn_type': <type at remote 0x17cacb8>, 'rnn_args': {'bidirectional': True, 'num_layers': 2, 'hidden_size': 256, 'dropout': <float at remote 0x7fffa3e0a240>}}, _backend=<THNNFunctionBackend(function_classes={'SpatialUpSamplingBilinear': <FunctionMeta(__module__='torch.nn._functions.thnn.auto', _backward_cls=<type at remote 0x16c7db8>, forward=<staticmethod at remote 0x7fffa3fa2278>, backward=<staticmethod at remote 0x7fffa3fa2208>, _is_legacy=False, __doc__=None) at remote 0x16c78c8>, 'NLLLoss2dBackward': <FunctionMeta(__module__='torch.nn._functions.thnn.auto', _backward_cls=<type at remote 0x172c238>, forward=<staticmethod at remote 0x7fffa3f64da0>, backward=<staticmethod at remote 0x7fffa3f64e10>, _is_legacy=False, __doc__=None) at remote 0x172bbc8>, 'SpatialSubSampling': <FunctionMeta(__module__='torch.nn._functions.thnn....(truncated)) at ../Python/ceval.c:5034

If I add a line of code setting a manual seed just before the random function that I am using,
the segmentation fault problem disappears. Also torch.random generates some huge number outside of (0,1] a while before the segmentation fault occurs.

jiasenlu · 2017-09-06T00:11:59Z

Same here, I also find that torch.random generates huge number in multi-gpu setting. So I have to replace it with numpy.random. I also face the segfault when using multi-gpu. Here is the log:

Thread 52613 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4724e700 (LWP 28927)]
0x00007fffa0cb4577 in THRandom_random ()
from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/lib/libTH.so.1
(gdb) where
#0 0x00007fffa0cb4577 in THRandom_random ()
from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/lib/libTH.so.1
#1 0x00007fffa0c54078 in THLongTensor_randperm ()
from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/lib/libTH.so.1
#2 0x00007fffab08924e in THPLongTensor_stateless_randperm (self=, args=,
kwargs=) from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/_C.so
#3 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff5c259098, arg=, kw=)
at Objects/abstract.c:2547
#4 0x00007fffaaddff42 in THPUtils_dispatchStateless (tensor=0x11b97a0, name=0x7fffabb55784 "randperm",
args=0x7fff12ec6210, kwargs=0x0) from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/_C.so
#5 0x00007ffff7ad81e5 in call_function (oparg=, pp_stack=0x7fff4724b0e8) at Python/ceval.c:4352
#6 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#7 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9b2230, globals=, locals=,
args=, argcount=2, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3584
#8 0x00007ffff7a54b68 in function_call (func=0x7fffe9234410, arg=0x7fff4a5e05f0, kw=0x7fff5f445910)
at Objects/funcobject.c:523
#9 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9234410, arg=, kw=)
at Objects/abstract.c:2547
#10 0x00007ffff7ad6886 in ext_do_call (nk=1247675888, na=, flags=, pp_stack=0x7fff4724b3d8,
func=0x7fffe9234410) at Python/ceval.c:4666
#11 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#12 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=,
args=, argcount=2, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#13 0x00007ffff7a54a61 in function_call (func=0x7fff7928d410, arg=0x7fff4a5e0830, kw=0x0) at Objects/funcobject.c:523
#14 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=)
at Objects/abstract.c:2547
#15 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff4a5e0830, kw=0x0)
at Objects/classobject.c:2602
#16 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff4ad01370, arg=, kw=)
at Objects/abstract.c:2547
#17 0x00007ffff7a952ac in slot_tp_call (self=0x7fff561db390, args=0x7fff567d8550, kwds=0x0) at Objects/typeobject.c:5546
#18 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff561db390, arg=, kw=)
#19 0x00007ffff7ad780d in do_call (nk=, na=, pp_stack=0x7fff4724b9d8, func=0x7fff561db390)
at Python/ceval.c:4569
#20 call_function (oparg=, pp_stack=0x7fff4724b9d8) at Python/ceval.c:4374
#21 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#22 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9acdb0, globals=, locals=,
args=, argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3584
#23 0x00007ffff7a54b68 in function_call (func=0x7fffe9234668, arg=0x7fff6182f3b0, kw=0x7fff54f204b0)
at Objects/funcobject.c:523
#24 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9234668, arg=, kw=)
at Objects/abstract.c:2547
---Type to continue, or q to quit---
#25 0x00007ffff7ad6886 in ext_do_call (nk=1635972016, na=, flags=, pp_stack=0x7fff4724bcc8, func=0x7fffe9234668)
at Python/ceval.c:4666
#26 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#27 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=, args=,
argcount=5, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#28 0x00007ffff7a54a61 in function_call (func=0x7fff7928d410, arg=0x7fff1c2d9b90, kw=0x0) at Objects/funcobject.c:523
#29 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=) at Objects/abstract.c:2547
#30 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff1c2d9b90, kw=0x0) at Objects/classobject.c:2602
#31 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff44309cd0, arg=, kw=) at Objects/abstract.c:2547
#32 0x00007ffff7a952ac in slot_tp_call (self=0x7fff5b154290, args=0x7fff481d2100, kwds=0x0) at Objects/typeobject.c:5546
#33 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff5b154290, arg=, kw=) at Objects/abstract.c:2547
#34 0x00007ffff7ad780d in do_call (nk=, na=, pp_stack=0x7fff4724c2c8, func=0x7fff5b154290) at Python/ceval.c:4569
#35 call_function (oparg=, pp_stack=0x7fff4724c2c8) at Python/ceval.c:4374
#36 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#37 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9ac2b0, globals=, locals=, args=,
argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#38 0x00007ffff7a54b68 in function_call (func=0x7fffe9238230, arg=0x7fff57f8ab90, kw=0x7fff54f20168) at Objects/funcobject.c:523
#39 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9238230, arg=, kw=) at Objects/abstract.c:2547
#40 0x00007ffff7ad6886 in ext_do_call (nk=1475914640, na=, flags=, pp_stack=0x7fff4724c5b8, func=0x7fffe9238230)
at Python/ceval.c:4666
#41 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#42 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=, args=,
argcount=5, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#43 0x00007ffff7a54a61 in function_call (func=0x7fff7928d410, arg=0x7fff59431ad0, kw=0x0) at Objects/funcobject.c:523
#44 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=) at Objects/abstract.c:2547
#45 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff59431ad0, kw=0x0) at Objects/classobject.c:2602
#46 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff4a516190, arg=, kw=) at Objects/abstract.c:2547
#47 0x00007ffff7a952ac in slot_tp_call (self=0x7fff529eb4d0, args=0x7fff401a7a48, kwds=0x0) at Objects/typeobject.c:5546
#48 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff529eb4d0, arg=, kw=) at Objects/abstract.c:2547
#49 0x00007ffff7ad780d in do_call (nk=, na=, pp_stack=0x7fff4724cbb8, func=0x7fff529eb4d0) at Python/ceval.c:4569
#50 call_function (oparg=, pp_stack=0x7fff4724cbb8) at Python/ceval.c:4374
#51 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#52 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9ac8b0, globals=, locals=, args=,
argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#53 0x00007ffff7a54b68 in function_call (func=0x7fffe9238410, arg=0x7fff1307e230, kw=0x7fff54f20c58) at Objects/funcobject.c:523
#54 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9238410, arg=, kw=) at Objects/abstract.c:2547
#55 0x00007ffff7ad6886 in ext_do_call (nk=319283760, na=, flags=, pp_stack=0x7fff4724cea8, func=0x7fffe9238410)
at Python/ceval.c:4666
#56 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#57 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=, args=,
argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#58 0x00007ffff7a54b68 in function_call (func=0x7fff7928d410, arg=0x7fff1d8c7f50, kw=0x7fff40d27b40) at Objects/funcobject.c:523
#59 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=) at Objects/abstract.c:2547
#60 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff1d8c7f50, kw=0x7fff40d27b40) at Objects/classobject.c:2602
#61 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff42f6c280, arg=, kw=) at Objects/abstract.c:2547
#62 0x00007ffff7a952ac in slot_tp_call (self=0x7fff529eb110, args=0x7fff594a2940, kwds=0x7fff40d27b40) at Objects/typeobject.c:5546
#63 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff529eb110, arg=, kw=) at Objects/abstract.c:2547
#64 0x00007ffff7ad6886 in ext_do_call (nk=1498032448, na=, flags=, pp_stack=0x7fff4724d4a8, func=0x7fff529eb110)
at Python/ceval.c:4666
#65 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#66 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff79247a30, globals=, locals=, args=,
argcount=7, kws=0x7ffff7f91068, kwcount=0, defs=0x7fff5a564168, defcount=1, closure=0x0) at Python/ceval.c:3584
#67 0x00007ffff7a54b68 in function_call (func=0x7fffe9234848, arg=0x7fff4725a670, kw=0x7fff54f06910) at Objects/funcobject.c:523
#68 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9234848, arg=, kw=) at Objects/abstract.c:2547
#69 0x00007ffff7ad6886 in ext_do_call (nk=1193649776, na=, flags=, pp_stack=0x7fff4724d798, func=0x7fffe9234848)
at Python/ceval.c:4666
#70 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#71 0x00007ffff7ad9345 in fast_function (nk=, na=, n=, pp_stack=0x7fff4724d908, func=0x7fffaca44320)
at Python/ceval.c:4437
#72 call_function (oparg=, pp_stack=0x7fff4724d908) at Python/ceval.c:4372
---Type to continue, or q to quit---
#73 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#74 0x00007ffff7ad9345 in fast_function (nk=, na=, n=, pp_stack=0x7fff4724da78, func=0x7fffaca44488)
at Python/ceval.c:4437
#75 call_function (oparg=, pp_stack=0x7fff4724da78) at Python/ceval.c:4372
#76 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#77 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffaca3a130, globals=, locals=, args=,
argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#78 0x00007ffff7a54a61 in function_call (func=0x7fffaca44398, arg=0x7fff5b2c5d90, kw=0x0) at Objects/funcobject.c:523
#79 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffaca44398, arg=, kw=) at Objects/abstract.c:2547
#80 0x00007ffff7a3764f in instancemethod_call (func=0x7fffaca44398, arg=0x7fff5b2c5d90, kw=0x0) at Objects/classobject.c:2602
#81 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff480a77d0, arg=, kw=) at Objects/abstract.c:2547
#82 0x00007ffff7acf7b3 in PyEval_CallObjectWithKeywords (func=0x7fff480a77d0, arg=0x7ffff7f91050, kw=) at Python/ceval.c:4221
#83 0x00007ffff7b121a2 in t_bootstrap (boot_raw=) at ./Modules/threadmodule.c:620
#84 0x00007ffff77c56ba in start_thread (arg=0x7fff4724e700) at pthread_create.c:333
#85 0x00007ffff6deb3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

spurra · 2017-10-30T16:32:20Z

I have the exact same issue. For my case, the calls to torch.Tensor(var.size()).normal_() and torch.Tensor(var.size()).bernoulli_() in the function of the threads were causing the problem. Once I took them out, the segfaults stopped appearing.

yukw777 · 2018-01-19T03:30:13Z

My code had almost the same structure as @Louis-Tian's example, and I was able to get around it by putting a lock where I instantiate my module in each thread. Working code below (pay attention to the lock)

import torch
import threading
import torch.functional as f
from concurrent.futures import ThreadPoolExecutor as ThreadPool


def build(cuda=False):
    nn = torch.nn.Sequential(
        torch.nn.Linear(1024, 1024),
        torch.nn.Linear(1024, 1)
    )

    return nn.cuda() if cuda else nn

def train(nn, X, y, epoch=100):
    X = torch.autograd.Variable(X)
    y = torch.autograd.Variable(y)
    optim = torch.optim.SGD(nn.parameters(), lr=0.1)
    for i in range(epoch):
        yhat = nn(X)
        loss = ((yhat - y) ** 2).mean()
        loss.backward()
        optim.step()

def data(cuda=False):
    X = torch.zeros(10, 1024)
    y = torch.zeros((10, 1))
    return (X.cuda(), y.cuda()) if cuda else (X, y)

def cpu_run(lock):
    with lock:
        nn = build(cuda=False)
    d = data(cuda=False)
    train(nn, *d)

def thread_cpu_run():
    pool = ThreadPool()
    lock = threading.Lock()
    threads = pool.map(cpu_run, [lock for _ in range(5)])

    return list(threads)

thread_cpu_run()

Summary: When we added `randperm_cpu` and `THTensor_(randperm)` we forgot to lock the `THGenerator` mutex before calling `THRandom_random`, which causes segfault error mentioned in facebookresearch/maskrcnn-benchmark#93 (comment). This PR fixes the bug. Closes pytorch/pytorch#1868. Pull Request resolved: pytorch/pytorch#13832 Differential Revision: D13025453 Pulled By: yf225 fbshipit-source-id: 6e363a35c72b4862412eaea6516a154126634c9d

…86aef9 (pytorch#18248) Summary: Pull Request resolved: pytorch#18248 Previous import was 96c58ceeacf0f2b73d752e413e4fd78787a12da3 Included changes: - **[f6f80657](onnx/onnx@f6f80657)**: Skip the schema check on ops in non-standard domain (pytorch#1876) <Lu Fang> - **[8c8be722](onnx/onnx@8c8be722)**: Introduce Function Body Helper (pytorch#1868) <Sherlock> - **[b605eafb](onnx/onnx@b605eafb)**: Support down sampling for Upsample with scales < 1. (pytorch#1773) <Ke Zhang> - **[47f7aa71](onnx/onnx@47f7aa71)**: Remove scaledtanh (pytorch#1866) <Ashwini Khade> - **[4dfc56de](onnx/onnx@4dfc56de)**: Add Ceil support for Max and Average Pooling (pytorch#1860) <Lara Haidar> - **[552a8efc](onnx/onnx@552a8efc)**: Add testcase generator for functions (pytorch#1862) <Raymond Yang> - **[fdb978a5](onnx/onnx@fdb978a5)**: Promote Thresholded Relu Op (pytorch#1856) <Ashwini Khade> - **[ce332628](onnx/onnx@ce332628)**: Update Slice with dynamic input & optional input steps (pytorch#1836) <Bowen Bao> - **[3a9a8787](onnx/onnx@3a9a8787)**: Merge function into opschema (pytorch#1834) <Raymond Yang> - **[3dbf8fe9](onnx/onnx@3dbf8fe9)**: Handle string comparision represented as np.objects (pytorch#1851) <Dmitri Smirnov> - **[3b0d3bb2](onnx/onnx@3b0d3bb2)**: remove global variable in header file (pytorch#1850) <Lu Fang> - **[1cca8733](onnx/onnx@1cca8733)**: bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (pytorch#1848) <Ke Zhang> - **[1ec81bc6](onnx/onnx@1ec81bc6)**: Change TopK operator to allow dynamic 'k' (pytorch#1829) <Hariharan Seshadri> - **[a89a4a16](onnx/onnx@a89a4a16)**: Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (pytorch#1832) <Ke Zhang> Differential Revision: D14549289 fbshipit-source-id: 1222721e9766d30d559ad7a5fba6ba0a6afd6344

…e0ea6c (pytorch#18285) Summary: Pull Request resolved: pytorch#18285 Previous import was 96c58ceeacf0f2b73d752e413e4fd78787a12da3 Included changes: - **[c05f2ae4](onnx/onnx@c05f2ae4)**: update both core and ml docs (pytorch#1879) <Lu Fang> - **[f895279b](onnx/onnx@f895279b)**: fix the problems introduced in previous PRs in operator registration (pytorch#1878) <Lu Fang> - **[f6f80657](onnx/onnx@f6f80657)**: Skip the schema check on ops in non-standard domain (pytorch#1876) <Lu Fang> - **[8c8be722](onnx/onnx@8c8be722)**: Introduce Function Body Helper (pytorch#1868) <Sherlock> - **[b605eafb](onnx/onnx@b605eafb)**: Support down sampling for Upsample with scales < 1. (pytorch#1773) <Ke Zhang> - **[47f7aa71](onnx/onnx@47f7aa71)**: Remove scaledtanh (pytorch#1866) <Ashwini Khade> - **[4dfc56de](onnx/onnx@4dfc56de)**: Add Ceil support for Max and Average Pooling (pytorch#1860) <Lara Haidar> - **[552a8efc](onnx/onnx@552a8efc)**: Add testcase generator for functions (pytorch#1862) <Raymond Yang> - **[fdb978a5](onnx/onnx@fdb978a5)**: Promote Thresholded Relu Op (pytorch#1856) <Ashwini Khade> - **[ce332628](onnx/onnx@ce332628)**: Update Slice with dynamic input & optional input steps (pytorch#1836) <Bowen Bao> - **[3a9a8787](onnx/onnx@3a9a8787)**: Merge function into opschema (pytorch#1834) <Raymond Yang> - **[3dbf8fe9](onnx/onnx@3dbf8fe9)**: Handle string comparision represented as np.objects (pytorch#1851) <Dmitri Smirnov> - **[3b0d3bb2](onnx/onnx@3b0d3bb2)**: remove global variable in header file (pytorch#1850) <Lu Fang> - **[1cca8733](onnx/onnx@1cca8733)**: bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (pytorch#1848) <Ke Zhang> - **[1ec81bc6](onnx/onnx@1ec81bc6)**: Change TopK operator to allow dynamic 'k' (pytorch#1829) <Hariharan Seshadri> - **[a89a4a16](onnx/onnx@a89a4a16)**: Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (pytorch#1832) <Ke Zhang> Differential Revision: D14566202 fbshipit-source-id: 3deb51c17eb9ebd6f6efc331d9110acb3462ece5

…e0ea6c (#18285) Summary: Pull Request resolved: #18285 Previous import was 96c58ceeacf0f2b73d752e413e 6DAF 4fd78787a12da3 Included changes: - **[c05f2ae4](onnx/onnx@c05f2ae4)**: update both core and ml docs (#1879) <Lu Fang> - **[f895279b](onnx/onnx@f895279b)**: fix the problems introduced in previous PRs in operator registration (#1878) <Lu Fang> - **[f6f80657](onnx/onnx@f6f80657)**: Skip the schema check on ops in non-standard domain (#1876) <Lu Fang> - **[8c8be722](onnx/onnx@8c8be722)**: Introduce Function Body Helper (#1868) <Sherlock> - **[b605eafb](onnx/onnx@b605eafb)**: Support down sampling for Upsample with scales < 1. (#1773) <Ke Zhang> - **[47f7aa71](onnx/onnx@47f7aa71)**: Remove scaledtanh (#1866) <Ashwini Khade> - **[4dfc56de](onnx/onnx@4dfc56de)**: Add Ceil support for Max and Average Pooling (#1860) <Lara Haidar> - **[552a8efc](onnx/onnx@552a8efc)**: Add testcase generator for functions (#1862) <Raymond Yang> - **[fdb978a5](onnx/onnx@fdb978a5)**: Promote Thresholded Relu Op (#1856) <Ashwini Khade> - **[ce332628](onnx/onnx@ce332628)**: Update Slice with dynamic input & optional input steps (#1836) <Bowen Bao> - **[3a9a8787](onnx/onnx@3a9a8787)**: Merge function into opschema (#1834) <Raymond Yang> - **[3dbf8fe9](onnx/onnx@3dbf8fe9)**: Handle string comparision represented as np.objects (#1851) <Dmitri Smirnov> - **[3b0d3bb2](onnx/onnx@3b0d3bb2)**: remove global variable in header file (#1850) <Lu Fang> - **[1cca8733](onnx/onnx@1cca8733)**: bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (#1848) <Ke Zhang> - **[1ec81bc6](onnx/onnx@1ec81bc6)**: Change TopK operator to allow dynamic 'k' (#1829) <Hariharan Seshadri> - **[a89a4a16](onnx/onnx@a89a4a16)**: Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (#1832) <Ke Zhang> Reviewed By: yinghai Differential Revision: D14566202 fbshipit-source-id: b1e5912ae6887e2865fc628363071e2b9938dfa4

…eplayed (pytorch#1868)

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser ghstack-source-id: 3745722 Pull Request resolved: #83067

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser [ghstack-poisoned]

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38543000](https://our.internmc.facebook.com/intern/diff/D38543000) Pull Request resolved: #83067 Approved by: https://github.com/davidberard98

Summary: Pull Request resolved: #83067 Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream #81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D38543000 Pulled By: davidberard98 fbshipit-source-id: 752edbfbced14fe01b84e417f23cc941b2148842

updating Apex commit id 6fc10c371d9ddae5268b2412365716c212eb51e8

soumith added bug labels Jun 21, 2017

maciejkula mentioned this issue Aug 8, 2017

Segfault when using together with Pandas #2348

Closed

Jiayuan-Gu mentioned this issue Nov 2, 2018

Update balanced_positive_negative_sampler.py facebookresearch/maskrcnn-benchmark#93

Merged

yf225 self-assigned this Nov 7, 2018

yf225 mentioned this issue Nov 12, 2018

Use RNG mutex for randperm on CPU #13832

Closed

facebook-github-bot closed this as completed in 03c0f4f Nov 12, 2018

houseroad mentioned this issue Mar 20, 2019

Automatic update of fbcode/onnx to f6f806572f48244644dc887d10682cd2b686aef9 #18248

Closed

houseroad mentioned this issue Mar 21, 2019

Automatic update of fbcode/onnx to c05f2ae412daf8fd64136ca354b97ccf73e0ea6c #18285

Closed

jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this issue Aug 6, 2022

Make MostInlined and BestEffort inline propagation no longer assert r…

172fb36

…eplayed (pytorch#1868)

jagadish-amd pushed a commit to jagadish-amd/pytorch that referenced this issue Feb 5, 2025

Update related_commits (pytorch#1868)

8b86ee3

updating Apex commit id 6fc10c371d9ddae5268b2412365716c212eb51e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault in python multithreaded setting #1868

segfault in python multithreaded setting #1868

segfault in python multithreaded setting #1868

segfault in python multithreaded setting #1868

Comments