8000 segfault in python multithreaded setting · Issue #1868 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

segfault in python multithreaded setting #1868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
soumith opened this issue Jun 21, 2017 · 10 comments
8000
Closed

segfault in python multithreaded setting #1868

soumith opened this issue Jun 21, 2017 · 10 comments
Assignees

Comments

@soumith
Copy link
Member
soumith commented Jun 21, 2017

Zihang Dai reports (and I've reproduced) that the autograd engine is not thread-safe.
Here's a repro script: https://gist.github.com/zihangdai/fc8f76fbb8a0f6323a6b31e6d98ceb50
Run it a few times, occassionally it segfaults.

Segfault is from a much different location, when cleaning up imports:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=av@entry=0x7ffff70a8b20 <main_arena>) at malloc.c:4167
4167    malloc.c: No such file or directory.
(gdb) where
#0  malloc_consolidate (av=av@entry=0x7ffff70a8b20 <main_arena>) at malloc.c:4167
#1  0x00007ffff6d64678 in _int_free (av=0x7ffff70a8b20 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4075
#2  0x00007ffff6d6853c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968
#3  0x00007ffff7a6a427 in dict_dealloc (mp=0x7ffff7ed0c58) at Objects/dictobject.c:1044
#4  0x00007ffff7a682c7 in insertdict_by_entry (mp=0x7ffff7ed0d70, key='build_time_vars', hash=<optimized out>, ep=<optimized out>, value=<optimized out>) at Objects/dictobject.c:519
#5  0x00007ffff7a6b79c in insertdict (value=None, hash=-295987683324531010, key='build_time_vars', mp=0x7ffff7ed0d70) at Objects/dictobject.c:556
#6  dict_set_item_by_hash_or_entry (value=None, ep=0x0, hash=-295987683324531010, key='build_time_vars',
    op={'__builtins__': {'bytearray': <type at remote 0x7ffff7d7c300>, 'IndexError': <type at remote 0x7ffff7d82bc0>, 'all': <built-in function all>, 'help': <_Helper at remote 0x7ffff7ed8c50>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7ffff7d82540>, 'unicode': <type at remote 0x7ffff7d99040>, 'UnicodeDecodeError': <type at remote 0x7ffff7d833e0>, 'memoryview': <type at remote 0x7ffff7d8d900>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2016 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x7ffff7ed8a10>, 'NameError': <type at remote 0x7ffff7d82060>, 'BytesWarning': <type at remote 0x7ffff...(truncated)) at Objects/dictobject.c:795
#7  PyDict_SetItem (
    op={'__builtins__': {'bytearray': <type at remote 0x7ffff7d7c300>, 'IndexError': <type at remote 0x7ffff7d82bc0>, 'all': <built-in function all>, 'help': <_Helper at remote 0x7ffff7ed8c50>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7ffff7d82540>, 'unicode': <type at remote 0x7ffff7d99040>, 'UnicodeDecodeError': <type at remote 0x7ffff7d833e0>, 'memoryview': <type at remote 0x7ffff7d8d900>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2016 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x7ffff7ed8a10>, 'NameError': <type at remote 0x7ffff7d82060>, 'BytesWarning': <type at remote 0x7ffff...(truncated), key='build_time_vars', value=None)
    at Objects/dictobject.c:848
#8  0x00007ffff7a6ea8d in _PyModule_Clear (m=<optimized out>) at Objects/moduleobject.c:139
#9  0x00007ffff7aec4a1 in PyImport_Cleanup () at Python/import.c:512
#10 0x00007ffff7af957b in Py_Finalize () at Python/pythonrun.c:458
#11 0x00007ffff7b0f8e5 in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:670
#12 0x00007ffff6d04830 in __libc_start_main (main=0x4007f0 <main>, argc=2, argv=0x7fffffffe008, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdff8) at ../csu/libc-start.c:291
#13 0x0000000000400729 in _start ()
@soumith
Copy link
Member Author
soumith commented Jun 21, 2017

also, he mentions:

I ran two versions of the actual training code, one with a thread lock on the model and the other without the lock. It turned out the one with lock is stilling running now (>1 hour), and the one without lock gave Segmentation fault sooner or later (I tried more than once).

@apaszke
Copy link
Contributor
apaszke commented Jun 21, 2017

How do you know it's because of the engine? The stack trace points to Python interpreter shutdown

@soumith
Copy link
Member Author
soumith commented Jun 21, 2017

see his comment:

I ran two versions of the actual training code, one with a thread lock on the model and the other without the lock. It turned out the one with lock is stilling running now (>1 hour), and the one without lock gave Segmentation fault sooner or later (I tried more than once).

Of course, I'll try to repro the same.

@apaszke
Copy link
Contributor
apaszke commented Jun 21, 2017

Yeah I know but the script doesn't really show what was guarded by the net_lock

@zihangdai
Copy link
zihangdai commented Jun 21, 2017

What I locked was the forward and backward call of the module

@Louis-Tian
Copy link

Not sure it's the same issue. I am experiencing segfault with multithreading as well.

import torch
import torch.functional as f
from concurrent.futures import ThreadPoolExecutor as ThreadPool


def build(cuda=False):
    nn = torch.nn.Sequential(
        torch.nn.Linear(1024, 1024),
        torch.nn.Linear(1024, 1)
    )

    return nn.cuda() if cuda else nn

def train(nn, X, y, epoch=100):
    X = torch.autograd.Variable(X)
    y = torch.autograd.Variable(y)
    optim = torch.optim.SGD(nn.parameters(), lr=0.1)
    for i in range(epoch):
        yhat = nn(X)
        loss = ((yhat - y) ** 2).mean()
        loss.backward()
        optim.step()

def data(cuda=False):
    X = torch.rand(10, 1024)
    y = torch.rand((10, 1))
    return (X.cuda(), y.cuda()) if cuda else (X, y)

def cpu_run(i=None):
    nn = build(cuda=False)
    d = data(cuda=False)
    train(nn, *d)

def thread_cpu_run():
    pool = ThreadPool()
    threads = pool.map(cpu_run, list(range(5)))  
        
    return list(threads)

thread_cpu_run()
Starting program: /home/tianchuanting/miniconda3/bin/python speedtest.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: File "/home/tianchuanting/miniconda3/lib/libstdc++.so.6.0.19-gdb.py" auto-loading has been declined by your `auto-load safe-path' set t
o "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/tianchuanting/miniconda3/lib/libstdc++.so.6.0.19-gdb.py
line to your configuration file "/home/tianchuanting/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/tianchuanting/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[New Thread 0x7fffea7e0700 (LWP 5883)]
[New Thread 0x7fffe9bde700 (LWP 5884)]
[New Thread 0x7fffe8fdc700 (LWP 5885)]
[New Thread 0x7fffdbfff700 (LWP 5886)]
[New Thread 0x7fffdb3fd700 (LWP 5887)]

Thread 6 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdb3fd700 (LWP 5887)]
0x00007fffed4c1a97 in THRandom_random () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
(gdb) where
#0  0x00007fffed4c1a97 in THRandom_random () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
#1  0x00007fffed4c1af4 in THRandom_uniform () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
#2  0x00007fffed1f3910 in THFloatTensor_uniform () from /home/tianchuanting/miniconda3/lib/python3.6/site-packages/torch/lib/libTH.so.1
#3  0x00007fffedb8f11b in THPFloatTensor_uniform_ (self=0x7fffea9fc1c8, args=<optimized out>, kwargs=<optimized out>)
    at /home/tianchuanting/pytorch/torch/csrc/generic/TensorMethods.cpp:50822
#4  0x00007ffff7992df2 in _PyCFunction_FastCallDict (func_obj=0x7fffea9f98b8, args=0x7fffea9fb3a0, nargs=<optimized out>, kwargs=0x0)
    at Objects/methodobject.c:231
#5  0x00007ffff7a184ec in call_function (pp_stack=0x7fffdb3fb588, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4798
#6  0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#7  0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#8  0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7fffeaaad0d0) at Python/ceval.c:4915
#9  call_function (pp_stack=0x7fffdb3fb7b8, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#10 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#11 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7fffeabf00c0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=3, kwnames=0x0, kwargs=0x8, kwcount=0, kwstep=2, defs=0x7fffeaaa66f0, defcount=1, kwdefs=0x0, closure=0x7fffeaaa6908,
    name=0x7ffff7e15270, qualname=0x7fffeab20a30) at Python/ceval.c:4128
#12 0x00007ffff7a16cfc in _PyFunction_FastCallDict (func=0x7fffeaaad048, args=0x7fffdb3fb9f0, nargs=3, kwargs=0x0) at Python/ceval.c:5031
#13 0x00007ffff793bba6 in _PyObject_FastCallDict (func=0x7fffeaaad048, args=0x7fffdb3fb9f0, nargs=<optimized out>, kwargs=0x0)
    at Objects/abstract.c:2295
#14 0x00007ffff793bdfc in _PyObject_Call_Prepend (func=0x7fffeaaad048, obj=0x7fffea9e6fd0, args=0x7fffeacf6e48, kwargs=0x0)
    at Objects/abstract.c:2358
#15 0x00007ffff793be96 in PyObject_Call (func=0x7fffead7a408, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#16 0x00007ffff79b4233 in slot_tp_init (self=0x7fffea9e6fd0, args=0x7fffeacf6e48, kwds=0x0) at Objects/typeobject.c:6380
#17 0x00007ffff79a9d4c in type_call (type=<optimized out>, args=0x7fffeacf6e48, kwds=0x0) at Objects/typeobject.c:915
#18 0x00007ffff793bade in _PyObject_FastCallDict (func=0xb66868, args=<optimized out>, nargs=<optimized out>, kwargs=0x0)
    at Objects/abstract.c:2316
#19 0x00007ffff7a182bb in call_function (pp_stack=0x7fffdb3fbd08, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4822
#20 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#21 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7ffff6941a50, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=0, kwnames=0x7ffff68c5370, kwargs=0x7fffea9fb1e0, kwcount=1, kwstep=1, defs=0x7ffff68cf140, defcount=1, kwdefs=0x0, closure=0x0,
    name=0x7ffff68c5068, qualname=0x7ffff68c5068) at Python/ceval.c:4128
#22 0x00007ffff7a1848a in fast_function (kwnames=<optimized out>, nargs=0, stack=<optimized out>, func=0x7ffff6974e18) at Python/ceval.c:4939
#23 call_function (pp_stack=0x7fffdb3fbfa8, oparg=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:4819
#24 0x00007ffff7a1a8dd in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3300
#25 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7ffff6905e40, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=1, kwnames=0x7ffff7e12060, kwargs=0x7ffff7e12068, kwcount=0, kwstep=2, defs=0x7ffff682d098, defcount=1, kwdefs=0x0, closure=0x0,
    name=0x0, qualname=0x0) at Python/ceval.c:4128
#26 0x00007ffff7a16ee3 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x7ffff682d098, defcount=1, kwdefs=0x0, closure=0x0) at Python/ceval.c:4149
#27 0x00007ffff796eee1 in function_call (func=0x7fffea9efc80, arg=0x7fffea9e6208, kw=0x7fffea9f9438) at Objects/funcobject.c:604
#28 0x00007ffff793be96 in PyObject_Call (func=0x7fffea9efc80, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#29 0x00007ffff7a1c236 in do_call_core (kwdict=0x7fffea9f9438, callargs=<optimized out>, func=0x7fffea9efc80) at Python/ceval.c:5067
#30 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3366
#31 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#32 0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7fffea9ef950) at Python/ceval.c:4915
#33 call_function (pp_stack=0x7fffdb3fc508, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#34 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#35 0x00007ffff7a16a60 in _PyEval_EvalCodeWithName (_co=0x7fffeaa518a0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=2, kwnames=0x7ffff7e12060, kwargs=0x7ffff7e12068, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0,
    qualname=0x0) at Python/ceval.c:4128
#36 0x00007ffff7a16ee3 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>,
    argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4149
#37 0x00007ffff796eee1 in function_call (func=0x7fffea9ef840, arg=0x7fffeacf6d48, kw=0x7fffea9f96c0) at Objects/funcobject.c:604
#38 0x00007ffff793be96 in PyObject_Call (func=0x7fffea9ef840, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#39 0x00007ffff7a1c236 in do_call_core (kwdict=0x7fffea9f96c0, callargs=<optimized out>, func=0x7fffea9ef840) at Python/ceval.c:5067
#40 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3366
#41 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at Python/ceval.c:4880
#42 0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7ffff0c31a60) at Python/ceval.c:4915
#43 call_function (pp_stack=0x7fffdb3fca68, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#44 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#45 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#46 0x00007ffff7a185e8 in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7ffff0c31c80) at Python/ceval.c:4915
#47 call_function (pp_stack=0x7fffdb3fcc98, oparg=<optimized out>, kwnames=0x0) at Python/ceval.c:4819
#48 0x00007ffff7a1b15d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3284
#49 0x00007ffff7a15e74 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>)
    at Python/ceval.c:4880
#50 0x00007ffff7a16e75 in _PyFunction_FastCallDict (func=0x7ffff0c31ae8, args=0x7fffdb3fce60, nargs=1, kwargs=0x0) at Python/ceval.c:4982
#51 0x00007ffff793bba6 in _PyObject_FastCallDict (func=0x7ffff0c31ae8, args=0x7fffdb3fce60, nargs=<optimized out>, kwargs=0x0)
    at Objects/abstract.c:2295
#52 0x00007ffff793bdfc in _PyObject_Call_Prepend (func=0x7ffff0c31ae8, obj=0x7fffea9e6f28, args=0x7ffff7e12048, kwargs=0x0)
    at Objects/abstract.c:2358
#53 0x00007ffff793be96 in PyObject_Call (func=0x7fffeaddca08, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2246
#54 0x00007ffff7a68ae2 in t_bootstrap (boot_raw=0x7fffeaa5d7d8) at ./Modules/_threadmodule.c:998
#55 0x00007ffff76ba6ba in start_thread (arg=0x7fffdb3fd700) at pthread_create.c:333
#56 0x00007ffff6ad83dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

env

Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'0.2.0+8262920'

@hahey
Copy link
hahey commented Aug 22, 2017

I got a similar segmentation fault problem with Louis-Tian due to random number generation.
My whole source code is a bit hard to attach here because there is dependency to data and code is quite long but the following is the gdb backtrace.

Thread 141 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff82ffd700 (LWP 15532)]
0x00007fffc8a82ba7 in THRandom_random () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
(gdb) bt
#0  0x00007fffc8a82ba7 in THRandom_random () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
#1  0x00007fffc8a82c04 in THRandom_uniform () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
#2  0x00007fffc87bedd7 in THFloatTensor_uniform () from /home/heuna/Documents/DAPmodel/.venv3/lib/python3.5/site-packages/torch/lib/libTH.so.1
#3  0x00007fffd4df95fb in THPFloatTensor_uniform_ (self=<torch.FloatTensor at remote 0x7fffe91a9148>, args=(<float at remote 0x7fffedd09b10>, <float at remote 0x7ffff2de1f90>), kwargs=<optimized out>) at /pytorch/torch/csrc/generic/TensorMethods.cpp:50822
#4  0x00000000004e9bc7 in PyCFunction_Call () at ../Objects/methodobject.c:98
#5  0x0000000000524414 in call_function (oparg=<optimized out>, pp_stack=0x7fff82ffb5a0) at ../Python/ceval.c:4705
#6  PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#7  0x0000000000528814 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fff82ffb6d0, func=<optimized out>) at ../Python/ceval.c:4803
#8  call_function (oparg=<optimized out>, pp_stack=0x7fff82ffb6d0) at ../Python/ceval.c:4730
#9  PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#10 0x000000000052e87a in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=(<cell at remote 0x7fffa3eb89a8>,), kwdefs=0x0, defcount=5, defs=0x7fffa3eadf28, kwcount=5, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    _co=<code at remote 0x7fffa3e45b70>) at ../Python/ceval.c:4018
#11 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#12 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#13 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#14 0x00000000005262af in ext_do_call (nk=<optimized out>, na=2, flags=<optimized out>, pp_stack=0x7fff82ffb988, func=<function at remote 0x7fffa3e49378>) at ../Python/ceval.c:5034
#15 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#16 0x000000000052e87a in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=(<cell at remote 0x7fffa3eb8a08>,), kwdefs=0x0, defcount=0, defs=0x0, kwcount=5, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    _co=<code at remote 0x7fffa3e505d0>) at ../Python/ceval.c:4018
#17 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#18 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#19 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#20 0x00000000004f413e in method_call.lto_priv () at ../Objects/classobject.c:330
#21 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#22 0x000000000054d359 in slot_tp_init () at ../Objects/typeobject.c:6268
#23 0x000000000055d17c in type_call.lto_priv () at ../Objects/typeobject.c:905
#24 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#25 0x00000000005262af in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fff82ffbd58, func=<type at remote 0x17cacb8>) at ../Python/ceval.c:5034
#26 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#27 0x000000000052e87a in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=(<cell at remote 0x7ffff68d2a38>,), kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    _co=<code at remote 0x7ffff5f42d20>) at ../Python/ceval.c:4018
#28 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#29 0x00000000004ebcc3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#30 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#31 0x00000000004f413e in method_call.lto_priv () at ../Objects/classobject.c:330
#32 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#33 0x000000000054d359 in slot_tp_init () at ../Objects/typeobject.c:6268
#34 0x000000000055d17c in type_call.lto_priv () at ../Objects/typeobject.c:905
#35 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#36 0x0000000000528d06 in do_call (nk=<optimized out>, na=<optimized out>, pp_stack=0x7fff82ffc120, func=<optimized out>) at ../Python/ceval.c:4936
#37 call_function (oparg=<optimized out>, pp_stack=0x7fff82ffc120) at ../Python/ceval.c:4732
#38 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#39 0x000000000052e12b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7ffff5f53270>)
    at ../Python/ceval.c:4018
#40 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#41 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#42 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#43 0x00000000005262af in ext_do_call (nk=<optimized out>, na=1, flags=<optimized out>, pp_stack=0x7fff82ffc3d8, func=<function at remote 0x7fffa3e2eae8>) at ../Python/ceval.c:5034
#44 PyEval_EvalFrameEx () at ../Python/ceval.c:3275
#45 0x000000000052e12b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<code at remote 0x7fffa3f926f0>)
    at ../Python/ceval.c:4018
#46 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#47 0x00000000004ebdd7 in function_call.lto_priv () at ../Objects/funcobject.c:627
#48 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#49 0x00000000004f413e in method_call.lto_priv () at ../Objects/classobject.c:330
#50 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#51 0x000000000054d4f6 in slot_tp_call () at ../Objects/typeobject.c:6053
#52 0x00000000005b7167 in PyObject_Call () at ../Objects/abstract.c:2165
#53 0x00000000005262af in ext_do_call (nk=<optimized out>, na=0, flags=<optimized out>, pp_stack=0x7fff82ffc778,
    func=<DAPmodel(_backward_hooks={}, _forward_hooks={}, _parameters={}, init_state=(<Variable at remote 0x7fffeb8259e8>, <Variable at remote 0x7fffeb825570>), _forward_pre_hooks={}, rnn_config={'rnn_type': <type at remote 0x17cacb8>, 'rnn_args': {'bidirectional': True, 'num_layers': 2, 'hidden_size': 256, 'dropout': <float at remote 0x7fffa3e0a240>}}, _backend=<THNNFunctionBackend(function_classes={'SpatialUpSamplingBilinear': <FunctionMeta(__module__='torch.nn._functions.thnn.auto', _backward_cls=<type at remote 0x16c7db8>, forward=<staticmethod at remote 0x7fffa3fa2278>, backward=<staticmethod at remote 0x7fffa3fa2208>, _is_legacy=False, __doc__=None) at remote 0x16c78c8>, 'NLLLoss2dBackward': <FunctionMeta(__module__='torch.nn._functions.thnn.auto', _backward_cls=<type at remote 0x172c238>, forward=<staticmethod at remote 0x7fffa3f64da0>, backward=<staticmethod at remote 0x7fffa3f64e10>, _is_legacy=False, __doc__=None) at remote 0x172bbc8>, 'SpatialSubSampling': <FunctionMeta(__module__='torch.nn._functions.thnn....(truncated)) at ../Python/ceval.c:5034

If I add a line of code setting a manual seed just before the random function that I am using,
the segmentation fault problem disappears. Also torch.random generates some huge number outside of (0,1] a while before the segmentation fault occurs.

@jiasenlu
Copy link
jiasenlu commented Sep 6, 2017

Same here, I also find that torch.random generates huge number in multi-gpu setting. So I have to replace it with numpy.random. I also face the segfault when using multi-gpu. Here is the log:

Thread 52613 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4724e700 (LWP 28927)]
0x00007fffa0cb4577 in THRandom_random ()
from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/lib/libTH.so.1
(gdb) where
#0 0x00007fffa0cb4577 in THRandom_random ()
from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/lib/libTH.so.1
#1 0x00007fffa0c54078 in THLongTensor_randperm ()
from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/lib/libTH.so.1
#2 0x00007fffab08924e in THPLongTensor_stateless_randperm (self=, args=,
kwargs=) from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/_C.so
#3 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff5c259098, arg=, kw=)
at Objects/abstract.c:2547
#4 0x00007fffaaddff42 in THPUtils_dispatchStateless (tensor=0x11b97a0, name=0x7fffabb55784 "randperm",
args=0x7fff12ec6210, kwargs=0x0) from /nethome/jlu347/anaconda2/lib/python2.7/site-packages/torch/_C.so
#5 0x00007ffff7ad81e5 in call_function (oparg=, pp_stack=0x7fff4724b0e8) at Python/ceval.c:4352
#6 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#7 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9b2230, globals=, locals=,
args=, argcount=2, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3584
#8 0x00007ffff7a54b68 in function_call (func=0x7fffe9234410, arg=0x7fff4a5e05f0, kw=0x7fff5f445910)
at Objects/funcobject.c:523
#9 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9234410, arg=, kw=)
at Objects/abstract.c:2547
#10 0x00007ffff7ad6886 in ext_do_call (nk=1247675888, na=, flags=, pp_stack=0x7fff4724b3d8,
func=0x7fffe9234410) at Python/ceval.c:4666
#11 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#12 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=,
args=, argcount=2, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#13 0x00007ffff7a54a61 in function_call (func=0x7fff7928d410, arg=0x7fff4a5e0830, kw=0x0) at Objects/funcobject.c:523
#14 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=)
at Objects/abstract.c:2547
#15 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff4a5e0830, kw=0x0)
at Objects/classobject.c:2602
#16 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff4ad01370, arg=, kw=)
at Objects/abstract.c:2547
#17 0x00007ffff7a952ac in slot_tp_call (self=0x7fff561db390, args=0x7fff567d8550, kwds=0x0) at Objects/typeobject.c:5546
#18 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff561db390, arg=, kw=)
#19 0x00007ffff7ad780d in do_call (nk=, na=, pp_stack=0x7fff4724b9d8, func=0x7fff561db390)
at Python/ceval.c:4569
#20 call_function (oparg=, pp_stack=0x7fff4724b9d8) at Python/ceval.c:4374
#21 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#22 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9acdb0, globals=, locals=,
args=, argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3584
#23 0x00007ffff7a54b68 in function_call (func=0x7fffe9234668, arg=0x7fff6182f3b0, kw=0x7fff54f204b0)
at Objects/funcobject.c:523
#24 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9234668, arg=, kw=)
at Objects/abstract.c:2547
---Type to continue, or q to quit---
#25 0x00007ffff7ad6886 in ext_do_call (nk=1635972016, na=, flags=, pp_stack=0x7fff4724bcc8, func=0x7fffe9234668)
at Python/ceval.c:4666
#26 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#27 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=, args=,
argcount=5, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#28 0x00007ffff7a54a61 in function_call (func=0x7fff7928d410, arg=0x7fff1c2d9b90, kw=0x0) at Objects/funcobject.c:523
#29 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=) at Objects/abstract.c:2547
#30 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff1c2d9b90, kw=0x0) at Objects/classobject.c:2602
#31 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff44309cd0, arg=, kw=) at Objects/abstract.c:2547
#32 0x00007ffff7a952ac in slot_tp_call (self=0x7fff5b154290, args=0x7fff481d2100, kwds=0x0) at Objects/typeobject.c:5546
#33 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff5b154290, arg=, kw=) at Objects/abstract.c:2547
#34 0x00007ffff7ad780d in do_call (nk=, na=, pp_stack=0x7fff4724c2c8, func=0x7fff5b154290) at Python/ceval.c:4569
#35 call_function (oparg=, pp_stack=0x7fff4724c2c8) at Python/ceval.c:4374
#36 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#37 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9ac2b0, globals=, locals=, args=,
argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#38 0x00007ffff7a54b68 in function_call (func=0x7fffe9238230, arg=0x7fff57f8ab90, kw=0x7fff54f20168) at Objects/funcobject.c:523
#39 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9238230, arg=, kw=) at Objects/abstract.c:2547
#40 0x00007ffff7ad6886 in ext_do_call (nk=1475914640, na=, flags=, pp_stack=0x7fff4724c5b8, func=0x7fffe9238230)
at Python/ceval.c:4666
#41 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#42 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=, args=,
argcount=5, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#43 0x00007ffff7a54a61 in function_call (func=0x7fff7928d410, arg=0x7fff59431ad0, kw=0x0) at Objects/funcobject.c:523
#44 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=) at Objects/abstract.c:2547
#45 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff59431ad0, kw=0x0) at Objects/classobject.c:2602
#46 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff4a516190, arg=, kw=) at Objects/abstract.c:2547
#47 0x00007ffff7a952ac in slot_tp_call (self=0x7fff529eb4d0, args=0x7fff401a7a48, kwds=0x0) at Objects/typeobject.c:5546
#48 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff529eb4d0, arg=, kw=) at Objects/abstract.c:2547
#49 0x00007ffff7ad780d in do_call (nk=, na=, pp_stack=0x7fff4724cbb8, func=0x7fff529eb4d0) at Python/ceval.c:4569
#50 call_function (oparg=, pp_stack=0x7fff4724cbb8) at Python/ceval.c:4374
#51 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#52 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffeb9ac8b0, globals=, locals=, args=,
argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#53 0x00007ffff7a54b68 in function_call (func=0x7fffe9238410, arg=0x7fff1307e230, kw=0x7fff54f20c58) at Objects/funcobject.c:523
#54 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9238410, arg=, kw=) at Objects/abstract.c:2547
#55 0x00007ffff7ad6886 in ext_do_call (nk=319283760, na=, flags=, pp_stack=0x7fff4724cea8, func=0x7fffe9238410)
at Python/ceval.c:4666
#56 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#57 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff793b3430, globals=, locals=, args=,
argcount=5, kws=0x7ffff7f91068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#58 0x00007ffff7a54b68 in function_call (func=0x7fff7928d410, arg=0x7fff1d8c7f50, kw=0x7fff40d27b40) at Objects/funcobject.c:523
#59 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff7928d410, arg=, kw=) at Objects/abstract.c:2547
#60 0x00007ffff7a3764f in instancemethod_call (func=0x7fff7928d410, arg=0x7fff1d8c7f50, kw=0x7fff40d27b40) at Objects/classobject.c:2602
#61 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff42f6c280, arg=, kw=) at Objects/abstract.c:2547
#62 0x00007ffff7a952ac in slot_tp_call (self=0x7fff529eb110, args=0x7fff594a2940, kwds=0x7fff40d27b40) at Objects/typeobject.c:5546
#63 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff529eb110, arg=, kw=) at Objects/abstract.c:2547
#64 0x00007ffff7ad6886 in ext_do_call (nk=1498032448, na=, flags=, pp_stack=0x7fff4724d4a8, func=0x7fff529eb110)
at Python/ceval.c:4666
#65 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#66 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fff79247a30, globals=, locals=, args=,
argcount=7, kws=0x7ffff7f91068, kwcount=0, defs=0x7fff5a564168, defcount=1, closure=0x0) at Python/ceval.c:3584
#67 0x00007ffff7a54b68 in function_call (func=0x7fffe9234848, arg=0x7fff4725a670, kw=0x7fff54f06910) at Objects/funcobject.c:523
#68 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffe9234848, arg=, kw=) at Objects/abstract.c:2547
#69 0x00007ffff7ad6886 in ext_do_call (nk=1193649776, na=, flags=, pp_stack=0x7fff4724d798, func=0x7fffe9234848)
at Python/ceval.c:4666
#70 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:3028
#71 0x00007ffff7ad9345 in fast_function (nk=, na=, n=, pp_stack=0x7fff4724d908, func=0x7fffaca44320)
at Python/ceval.c:4437
#72 call_function (oparg=, pp_stack=0x7fff4724d908) at Python/ceval.c:4372
---Type to continue, or q to quit---
#73 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#74 0x00007ffff7ad9345 in fast_function (nk=, na=, n=, pp_stack=0x7fff4724da78, func=0x7fffaca44488)
at Python/ceval.c:4437
#75 call_function (oparg=, pp_stack=0x7fff4724da78) at Python/ceval.c:4372
#76 PyEval_EvalFrameEx (f=, throwflag=) at Python/ceval.c:2989
#77 0x00007ffff7ad9c3e in PyEval_EvalCodeEx (co=0x7fffaca3a130, globals=, locals=, args=,
argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#78 0x00007ffff7a54a61 in function_call (func=0x7fffaca44398, arg=0x7fff5b2c5d90, kw=0x0) at Objects/funcobject.c:523
#79 0x00007ffff7a24e93 in PyObject_Call (func=0x7fffaca44398, arg=, kw=) at Objects/abstract.c:2547
#80 0x00007ffff7a3764f in instancemethod_call (func=0x7fffaca44398, arg=0x7fff5b2c5d90, kw=0x0) at Objects/classobject.c:2602
#81 0x00007ffff7a24e93 in PyObject_Call (func=0x7fff480a77d0, arg=, kw=) at Objects/abstract.c:2547
#82 0x00007ffff7acf7b3 in PyEval_CallObjectWithKeywords (func=0x7fff480a77d0, arg=0x7ffff7f91050, kw=) at Python/ceval.c:4221
#83 0x00007ffff7b121a2 in t_bootstrap (boot_raw=) at ./Modules/threadmodule.c:620
#84 0x00007ffff77c56ba in start_thread (arg=0x7fff4724e700) at pthread_create.c:333
#85 0x00007ffff6deb3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@spurra
Copy link
spurra commented Oct 30, 2017

I have the exact same issue. For my case, the calls to torch.Tensor(var.size()).normal_() and torch.Tensor(var.size()).bernoulli_() in the function of the threads were causing the problem. Once I took them out, the segfaults stopped appearing.

@yukw777
Copy link
yukw777 commented Jan 19, 2018

My code had almost the same structure as @Louis-Tian's example, and I was able to get around it by putting a lock where I instantiate my module in each thread. Working code below (pay attention to the lock)

import torch
import threading
import torch.functional as f
from concurrent.futures import ThreadPoolExecutor as ThreadPool


def build(cuda=False):
    nn = torch.nn.Sequential(
        torch.nn.Linear(1024, 1024),
        torch.nn.Linear(1024, 1)
    )

    return nn.cuda() if cuda else nn

def train(nn, X, y, epoch=100):
    X = torch.autograd.Variable(X)
    y = torch.autograd.Variable(y)
    optim = torch.optim.SGD(nn.parameters(), lr=0.1)
    for i in range(epoch):
        yhat = nn(X)
        loss = ((yhat - y) ** 2).mean()
        loss.backward()
        optim.step()

def data(cuda=False):
    X = torch.zeros(10, 1024)
    y = torch.zeros((10, 1))
    return (X.cuda(), y.cuda()) if cuda else (X, y)

def cpu_run(lock):
    with lock:
        nn = build(cuda=False)
    d = data(cuda=False)
    train(nn, *d)

def thread_cpu_run():
    pool = ThreadPool()
    lock = threading.Lock()
    threads = pool.map(cpu_run, [lock for _ in range(5)])

    return list(threads)

thread_cpu_run()

@yf225 yf225 self-assigned this Nov 7, 2018
zdevito pushed a commit to zdevito/ATen that referenced this issue Nov 12, 2018
Summary:
When we added `randperm_cpu` and `THTensor_(randperm)` we forgot to lock the `THGenerator` mutex before calling `THRandom_random`, which causes segfault error mentioned in facebookresearch/maskrcnn-benchmark#93 (comment). This PR fixes the bug.

Closes pytorch/pytorch#1868.
Pull Request resolved: pytorch/pytorch#13832

Differential Revision: D13025453

Pulled By: yf225

fbshipit-source-id: 6e363a35c72b4862412eaea6516a154126634c9d
houseroad added a commit to houseroad/pytorch that referenced this issue Mar 20, 2019
…86aef9 (pytorch#18248)

Summary:
Pull Request resolved: pytorch#18248

Previous import was 96c58ceeacf0f2b73d752e413e4fd78787a12da3

Included changes:
- **[f6f80657](onnx/onnx@f6f80657)**: Skip the schema check on ops in non-standard domain (pytorch#1876) <Lu Fang>
- **[8c8be722](onnx/onnx@8c8be722)**: Introduce Function Body Helper  (pytorch#1868) <Sherlock>
- **[b605eafb](onnx/onnx@b605eafb)**: Support down sampling for Upsample with scales < 1. (pytorch#1773) <Ke Zhang>
- **[47f7aa71](onnx/onnx@47f7aa71)**: Remove scaledtanh (pytorch#1866) <Ashwini Khade>
- **[4dfc56de](onnx/onnx@4dfc56de)**: Add Ceil support for Max and Average Pooling (pytorch#1860) <Lara Haidar>
- **[552a8efc](onnx/onnx@552a8efc)**: Add testcase generator for functions (pytorch#1862) <Raymond Yang>
- **[fdb978a5](onnx/onnx@fdb978a5)**: Promote Thresholded Relu Op (pytorch#1856) <Ashwini Khade>
- **[ce332628](onnx/onnx@ce332628)**: Update Slice with dynamic input & optional input steps (pytorch#1836) <Bowen Bao>
- **[3a9a8787](onnx/onnx@3a9a8787)**: Merge function into opschema (pytorch#1834) <Raymond Yang>
- **[3dbf8fe9](onnx/onnx@3dbf8fe9)**: Handle string comparision represented as np.objects (pytorch#1851) <Dmitri Smirnov>
- **[3b0d3bb2](onnx/onnx@3b0d3bb2)**: remove global variable in header file (pytorch#1850) <Lu Fang>
- **[1cca8733](onnx/onnx@1cca8733)**: bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (pytorch#1848) <Ke Zhang>
- **[1ec81bc6](onnx/onnx@1ec81bc6)**: Change TopK operator to allow dynamic 'k' (pytorch#1829) <Hariharan Seshadri>
- **[a89a4a16](onnx/onnx@a89a4a16)**: Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (pytorch#1832) <Ke Zhang>

Differential Revision: D14549289

fbshipit-source-id: 1222721e9766d30d559ad7a5fba6ba0a6afd6344
houseroad added a commit to houseroad/pytorch that referenced this issue Mar 22, 2019
…e0ea6c (pytorch#18285)

Summary:
Pull Request resolved: pytorch#18285

Previous import was 96c58ceeacf0f2b73d752e413e4fd78787a12da3

Included changes:
- **[c05f2ae4](onnx/onnx@c05f2ae4)**: update both core and ml docs (pytorch#1879) <Lu Fang>
- **[f895279b](onnx/onnx@f895279b)**: fix the problems introduced in previous PRs in operator registration (pytorch#1878) <Lu Fang>
- **[f6f80657](onnx/onnx@f6f80657)**: Skip the schema check on ops in non-standard domain (pytorch#1876) <Lu Fang>
- **[8c8be722](onnx/onnx@8c8be722)**: Introduce Function Body Helper  (pytorch#1868) <Sherlock>
- **[b605eafb](onnx/onnx@b605eafb)**: Support down sampling for Upsample with scales < 1. (pytorch#1773) <Ke Zhang>
- **[47f7aa71](onnx/onnx@47f7aa71)**: Remove scaledtanh (pytorch#1866) <Ashwini Khade>
- **[4dfc56de](onnx/onnx@4dfc56de)**: Add Ceil support for Max and Average Pooling (pytorch#1860) <Lara Haidar>
- **[552a8efc](onnx/onnx@552a8efc)**: Add testcase generator for functions (pytorch#1862) <Raymond Yang>
- **[fdb978a5](onnx/onnx@fdb978a5)**: Promote Thresholded Relu Op (pytorch#1856) <Ashwini Khade>
- **[ce332628](onnx/onnx@ce332628)**: Update Slice with dynamic input & optional input steps (pytorch#1836) <Bowen Bao>
- **[3a9a8787](onnx/onnx@3a9a8787)**: Merge function into opschema (pytorch#1834) <Raymond Yang>
- **[3dbf8fe9](onnx/onnx@3dbf8fe9)**: Handle string comparision represented as np.objects (pytorch#1851) <Dmitri Smirnov>
- **[3b0d3bb2](onnx/onnx@3b0d3bb2)**: remove global variable in header file (pytorch#1850) <Lu Fang>
- **[1cca8733](onnx/onnx@1cca8733)**: bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (pytorch#1848) <Ke Zhang>
- **[1ec81bc6](onnx/onnx@1ec81bc6)**: Change TopK operator to allow dynamic 'k' (pytorch#1829) <Hariharan Seshadri>
- **[a89a4a16](onnx/onnx@a89a4a16)**: Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (pytorch#1832) <Ke Zhang>

Differential Revision: D14566202

fbshipit-source-id: 3deb51c17eb9ebd6f6efc331d9110acb3462ece5
facebook-github-bot pushed a commit that referenced this issue Mar 22, 2019
…e0ea6c (#18285)

Summary:
Pull Request resolved: #18285

Previous import was 96c58ceeacf0f2b73d752e413e
6DAF
4fd78787a12da3

Included changes:
- **[c05f2ae4](onnx/onnx@c05f2ae4)**: update both core and ml docs (#1879) <Lu Fang>
- **[f895279b](onnx/onnx@f895279b)**: fix the problems introduced in previous PRs in operator registration (#1878) <Lu Fang>
- **[f6f80657](onnx/onnx@f6f80657)**: Skip the schema check on ops in non-standard domain (#1876) <Lu Fang>
- **[8c8be722](onnx/onnx@8c8be722)**: Introduce Function Body Helper  (#1868) <Sherlock>
- **[b605eafb](onnx/onnx@b605eafb)**: Support down sampling for Upsample with scales < 1. (#1773) <Ke Zhang>
- **[47f7aa71](onnx/onnx@47f7aa71)**: Remove scaledtanh (#1866) <Ashwini Khade>
- **[4dfc56de](onnx/onnx@4dfc56de)**: Add Ceil support for Max and Average Pooling (#1860) <Lara Haidar>
- **[552a8efc](onnx/onnx@552a8efc)**: Add testcase generator for functions (#1862) <Raymond Yang>
- **[fdb978a5](onnx/onnx@fdb978a5)**: Promote Thresholded Relu Op (#1856) <Ashwini Khade>
- **[ce332628](onnx/onnx@ce332628)**: Update Slice with dynamic input & optional input steps (#1836) <Bowen Bao>
- **[3a9a8787](onnx/onnx@3a9a8787)**: Merge function into opschema (#1834) <Raymond Yang>
- **[3dbf8fe9](onnx/onnx@3dbf8fe9)**: Handle string comparision represented as np.objects (#1851) <Dmitri Smirnov>
- **[3b0d3bb2](onnx/onnx@3b0d3bb2)**: remove global variable in header file (#1850) <Lu Fang>
- **[1cca8733](onnx/onnx@1cca8733)**: bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (#1848) <Ke Zhang>
- **[1ec81bc6](onnx/onnx@1ec81bc6)**: Change TopK operator to allow dynamic 'k' (#1829) <Hariharan Seshadri>
- **[a89a4a16](onnx/onnx@a89a4a16)**: Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (#1832) <Ke Zhang>

Reviewed By: yinghai

Differential Revision: D14566202

fbshipit-source-id: b1e5912ae6887e2865fc628363071e2b9938dfa4
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this issue Aug 6, 2022
jjsjann123 added a commit that referenced this issue Aug 9, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- codegen improvements:
  1. removes un-necessary sync from redundant thread compute analysis
  2. symmetric API for BestEffortReplay
  3. support merge on trivial reductions
  4. Ampere async copy improvements
- bug fixes:
  1. vectorization bug fixes
  2. type inference patch : fixes upstream #81725
  3. segmenter bug fix with deterministic iteration ordering
- parser update
  1. added leaky_relu
- scheduler
  1. normalization scheduler clean up.
  2. simplifies matmul scheduling with new transform propagator
  3. merge all dimensions in PW scheduler
  4. various gemm related improvements
- debuggability
  1. nsight compute support
  2. debug dump for InlinePropagator
  3. Add `UnaryOpType::Print`

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD
1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884)
7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803
3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD
01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878)
0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881)
7bc76aa Fix most inlined propagator for mismatched dims (#1875)
501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826)
d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827)
e0ae11a Larger sized mma instructions to support full vectorization (#1824)
9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823)
a48270a Merge all dims in pointwise scheduler (#1872)
172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868)
a64462a Allow trivial reduction to be merged (#1871)
440102b Symmetric API for BestEffortReplay (#1870)
d1caf33 Some misc cleanups/refactor split out from #1854 (#1867)
1013eda Remove some welford specific logic. (#1864)
51589d3 Some cleanups on tests and heuristics params (#1866)
a6b3e70 Segmenter bug fix, and deterministic iteration ordering.  (#1865)
1b665b9 Add nullptr checks to IrBuilder (#1861)
1cd9451 Simplify matmul scheduling with the new transform propagator.  (#1817)
bbc1fb9 Add leaky_relu operation (#1852)
e842a9b Minor cleanup in pointwise scheduler (#1858)
9ee850c Fix stringstream usage (#1857)
20a36c1 Improve nsight compute support (#1855)
4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822)
01117bf Misc cleanup (#1853)
5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846)
92e6f02 Cleanup normalization scheduler (#1845)
db89c65 Type inference patch (#1848)
102fe93 Add debug dump for InlinePropagator (#1847)
b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687)
942be5b Upstream ci build fixes (#1842)
0b83645 Fix vectorization bug introduced in #1831 (#1840)
63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825)
9135a96 Fix transpose benchmark dtype (#1839)
2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831)
```

RUN_TORCHBENCH: nvfuser

ghstack-source-id: 3745722
Pull Request resolved: #83067
jjsjann123 added a commit that referenced this issue Aug 9, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- codegen improvements:
  1. removes un-necessary sync from redundant thread compute analysis
  2. symmetric API for BestEffortReplay
  3. support merge on trivial reductions
  4. Ampere async copy improvements
- bug fixes:
  1. vectorization bug fixes
  2. type inference patch : fixes upstream #81725
  3. segmenter bug fix with deterministic iteration ordering
- parser update
  1. added leaky_relu
- scheduler
  1. normalization scheduler clean up.
  2. simplifies matmul scheduling with new transform propagator
  3. merge all dimensions in PW scheduler
  4. various gemm related improvements
- debuggability
  1. nsight compute support
  2. debug dump for InlinePropagator
  3. Add `UnaryOpType::Print`

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD
1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884)
7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803
3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD
01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878)
0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881)
7bc76aa Fix most inlined propagator for mismatched dims (#1875)
501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826)
d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827)
e0ae11a Larger sized mma instructions to support full vectorization (#1824)
9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823)
a48270a Merge all dims in pointwise scheduler (#1872)
172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868)
a64462a Allow trivial reduction to be merged (#1871)
440102b Symmetric API for BestEffortReplay (#1870)
d1caf33 Some misc cleanups/refactor split out from #1854 (#1867)
1013eda Remove some welford specific logic. (#1864)
51589d3 Some cleanups on tests and heuristics params (#1866)
a6b3e70 Segmenter bug fix, and deterministic iteration ordering.  (#1865)
1b665b9 Add nullptr checks to IrBuilder (#1861)
1cd9451 Simplify matmul scheduling with the new transform propagator.  (#1817)
bbc1fb9 Add leaky_relu operation (#1852)
e842a9b Minor cleanup in pointwise scheduler (#1858)
9ee850c Fix stringstream usage (#1857)
20a36c1 Improve nsight compute support (#1855)
4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822)
01117bf Misc cleanup (#1853)
5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846)
92e6f02 Cleanup normalization scheduler (#1845)
db89c65 Type inference patch (#1848)
102fe93 Add debug dump for InlinePropagator (#1847)
b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687)
942be5b Upstream ci build fixes (#1842)
0b83645 Fix vectorization bug introduced in #1831 (#1840)
63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825)
9135a96 Fix transpose benchmark dtype (#1839)
2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831)
```

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this issue Aug 10, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- codegen improvements:
  1. removes un-necessary sync from redundant thread compute analysis
  2. symmetric API for BestEffortReplay
  3. support merge on trivial reductions
  4. Ampere async copy improvements
- bug fixes:
  1. vectorization bug fixes
  2. type inference patch : fixes upstream #81725
  3. segmenter bug fix with deterministic iteration ordering
- parser update
  1. added leaky_relu
- scheduler
  1. normalization scheduler clean up.
  2. simplifies matmul scheduling with new transform propagator
  3. merge all dimensions in PW scheduler
  4. various gemm related improvements
- debuggability
  1. nsight compute support
  2. debug dump for InlinePropagator
  3. Add `UnaryOpType::Print`

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD
1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884)
7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803
3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD
01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878)
0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881)
7bc76aa Fix most inlined propagator for mismatched dims (#1875)
501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826)
d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827)
e0ae11a Larger sized mma instructions to support full vectorization (#1824)
9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823)
a48270a Merge all dims in pointwise scheduler (#1872)
172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868)
a64462a Allow trivial reduction to be merged (#1871)
440102b Symmetric API for BestEffortReplay (#1870)
d1caf33 Some misc cleanups/refactor split out from #1854 (#1867)
1013eda Remove some welford specific logic. (#1864)
51589d3 Some cleanups on tests and heuristics params (#1866)
a6b3e70 Segmenter bug fix, and deterministic iteration ordering.  (#1865)
1b665b9 Add nullptr checks to IrBuilder (#1861)
1cd9451 Simplify matmul scheduling with the new transform propagator.  (#1817)
bbc1fb9 Add leaky_relu operation (#1852)
e842a9b Minor cleanup in pointwise scheduler (#1858)
9ee850c Fix stringstream usage (#1857)
20a36c1 Improve nsight compute support (#1855)
4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822)
01117bf Misc cleanup (#1853)
5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846)
92e6f02 Cleanup normalization scheduler (#1845)
db89c65 Type inference patch (#1848)
102fe93 Add debug dump for InlinePropagator (#1847)
b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687)
942be5b Upstream ci build fixes (#1842)
0b83645 Fix vectorization bug introduced in #1831 (#1840)
63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825)
9135a96 Fix transpose benchmark dtype (#1839)
2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D38543000](https://our.internmc.facebook.com/intern/diff/D38543000)
Pull Request resolved: #83067
Approved by: https://github.com/davidberard98
facebook-github-bot pushed a commit that referenced this issue Aug 10, 2022
Summary:
Pull Request resolved: #83067

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- codegen improvements:
  1. removes un-necessary sync from redundant thread compute analysis
  2. symmetric API for BestEffortReplay
  3. support merge on trivial reductions
  4. Ampere async copy improvements
- bug fixes:
  1. vectorization bug fixes
  2. type inference patch : fixes upstream #81725
  3. segmenter bug fix with deterministic iteration ordering
- parser update
  1. added leaky_relu
- scheduler
  1. normalization scheduler clean up.
  2. simplifies matmul scheduling with new transform propagator
  3. merge all dimensions in PW scheduler
  4. various gemm related improvements
- debuggability
  1. nsight compute support
  2. debug dump for InlinePropagator
  3. Add `UnaryOpType::Print`

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD
1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884)
7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803
3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD
01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878)
0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881)
7bc76aa Fix most inlined propagator for mismatched dims (#1875)
501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826)
d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827)
e0ae11a Larger sized mma instructions to support full vectorization (#1824)
9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823)
a48270a Merge all dims in pointwise scheduler (#1872)
172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868)
a64462a Allow trivial reduction to be merged (#1871)
440102b Symmetric API for BestEffortReplay (#1870)
d1caf33 Some misc cleanups/refactor split out from #1854 (#1867)
1013eda Remove some welford specific logic. (#1864)
51589d3 Some cleanups on tests and heuristics params (#1866)
a6b3e70 Segmenter bug fix, and deterministic iteration ordering.  (#1865)
1b665b9 Add nullptr checks to IrBuilder (#1861)
1cd9451 Simplify matmul scheduling with the new transform propagator.  (#1817)
bbc1fb9 Add leaky_relu operation (#1852)
e842a9b Minor cleanup in pointwise scheduler (#1858)
9ee850c Fix stringstream usage (#1857)
20a36c1 Improve nsight compute support (#1855)
4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822)
01117bf Misc cleanup (#1853)
5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846)
92e6f02 Cleanup normalization scheduler (#1845)
db89c65 Type inference patch (#1848)
102fe93 Add debug dump for InlinePropagator (#1847)
b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687)
942be5b Upstream ci build fixes (#1842)
0b83645 Fix vectorization bug introduced in #1831 (#1840)
63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825)
9135a96 Fix transpose benchmark dtype (#1839)
2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831)
```

RUN_TORCHBENCH: nvfuser

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D38543000

Pulled By: davidberard98

fbshipit-source-id: 752edbfbced14fe01b84e417f23cc941b2148842
jagadish-amd pushed a commit to jagadish-amd/pytorch that referenced this issue Feb 5, 2025
updating Apex commit id 6fc10c371d9ddae5268b2412365716c212eb51e8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants
0