Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) in forward #13123

remi-r-mp · 2018-10-25T15:14:41Z

Hi,
I am using torch version 0.4.1, without the usage of a GPU. I have a really simple network with 2 Linear layers and a few neurons per layer (~10). I do not use a huge batch size or anything.
Yet, when I run my code, I run into trouble I got the following error: "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
The line where the bug happen is:
outputs = network(batch['sample'])
I have seen related topics to this error, none of which provided me with a working solution. Moreover, I've run the exact same code on another project and it does work...

PyTorch Version (e.g., 1.0): 0.4.1
OS (e.g., Linux): Ubuntu, 16.04 xenial "Ubuntu 16.04.5 LTS"
How you installed PyTorch (conda, pip, source): pip install torch
Build command you used (if compiling from source):
Python version: 3.5.2
CUDA/cuDNN version: None
GPU models and configuration: None
Any other relevant information: None

Thanks for helping me !

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411

The text was updated successfully, but these errors were encountered:

zou3519 · 2018-10-29T17:44:47Z

Could you try one of the following (it would help us pin point the problem):

Try a nightly build and see if the problem persists (https://pytorch.org/)
give us a stack trace via gdb for the crash

remi-r-mp · 2018-10-30T11:35:34Z

Hi, thanks for the response, here is the back trace of gdb:

(gdb) backtrace
#0 0x00007fff78d7e260 in ?? ()
#1 0x00007fffec721103 in void std::_Bind_simple<void (())()>::_M_invoke<>(std::_Index_tuple<>) ()
from /home/user/venv/research/lib/python3.5/site-packages/ortools/constraint_solver/../.libs/libprotobuf.so.3.6.1
#2 0x00007fffec72109d in std::_Bind_simple<void (())()>::operator()() ()
from /home/user/venv/research/lib/python3.5/site-packages/ortools/constraint_solver/../.libs/libprotobuf.so.3.6.1
#3 0x00007fffec720fcd in void std::__once_call_impl<std::_Bind_simple<void (*())()> >() ()
from /home/user/venv/research/lib/python3.5/site-packages/ortools/constraint_solver/../.libs/libprotobuf.so.3.6.1
#4 0x00007ffff7bc8a99 in __pthread_once_slow (once_control=0x7fffa8c0eb80 <torch::utils::type_from_string(std::string const&)::cpu_once>, init_routine=0x7fffe9af7ac0 <__once_proxy>)
at pthread_once.c:116
#5 0x00007fffa4e6a138 in __gthread_once (__func=, __once=0x7fffa8c0eb80 <torch::utils::type_from_string(std::string const&)::cpu_once>)
at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/x86_64-redhat-linux/bits/gthr-default.h:699
#6 std::call_once<torch::utils::type_from_string(const string&)::<lambda()> > (__once=..., __f=) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/mutex:746
#7 torch::utils::type_from_string (str="torch.FloatTensor") at torch/csrc/utils/tensor_types.cpp:59
#8 0x00007fffa552f462 in torch::autograd::THPVariable_type (self=0x7fff77f11b88, args=0x7ffff2ceb9b0, kwargs=0x0) at torch/csrc/autograd/generated/python_variable_methods.cpp:631
#9 0x00000000004e9ba7 in PyCFunction_Call () at ../Objects/methodobject.c:98
#10 0x00000000005372f4 in call_function (oparg=, pp_stack=0x7fffffffc490) at ../Python/ceval.c:4705
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
---Type to continue, or q to quit---
#12 0x0000000000540f9b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=, argcount=,
args=, locals=, globals=, _co=0x7fffe66aa300) at ../Python/ceval.c:4018
#13 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#14 0x00000000004ebd23 in function_call.lto_priv () at ../Objects/funcobject.c:627
#15 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#16 0x00000000004fb9ce in method_call.lto_priv () at ../Objects/classobject.c:330
#17 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#18 0x0000000000584716 in call_method.lto_priv () at ../Objects/typeobject.c:1439
#19 0x0000000000537916 in PyEval_EvalFrameEx () at ../Python/ceval.c:1594
#20 0x00000000005406df in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#21 0x000000000053c1d0 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffcae0, func=) at ../Python/ceval.c:4813
#22 call_function (oparg=, pp_stack=0x7fffffffcae0) at ../Python/ceval.c:4730
#23 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#24 0x00000000005416ea in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=, argcount=,
args=, locals=, globals=, _co=0x7fff78c90390) at ../Python/ceval.c:4018
#25 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#26 0x00000000004ebd23 in function_call.lto_priv () at ../Objects/funcobject.c:627
---Type to continue, or q to quit---
#27 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#28 0x00000000004fb9ce in method_call.lto_priv () at ../Objects/classobject.c:330
#29 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#30 0x0000000000584716 in call_method.lto_priv () at ../Objects/typeobject.c:1439
#31 0x00000000004ede4f in enum_next.lto_priv () at ../Objects/enumobject.c:130
#32 0x0000000000537791 in PyEval_EvalFrameEx () at ../Python/ceval.c:3013
#33 0x0000000000540199 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#34 0x000000000053bd92 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffd160, func=) at ../Python/ceval.c:4813
#35 call_function (oparg=, pp_stack=0x7fffffffd160) at ../Python/ceval.c:4730
#36 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#37 0x0000000000540199 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#38 0x000000000053bd92 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffd370, func=) at ../Python/ceval.c:4813
#39 call_function (oparg=, pp_stack=0x7fffffffd370) at ../Python/ceval.c:4730
#40 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#41 0x000000000053b7e4 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffd4a0, func=) at ../Python/ceval.c:4803
#42 call_function (oparg=, pp_stack=0x7fffffffd4a0) at ../Python/ceval.c:4730
#43 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
---Type to continue, or q to quit---
#44 0x0000000000540199 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#45 0x0000000000540e4f in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#46 PyEval_EvalCode (co=, globals=, locals=) at ../Python/ceval.c:777
#47 0x000000000060c272 in run_mod () at ../Python/pythonrun.c:976
#48 0x000000000060e71a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
#49 0x000000000060ef0c in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:396
#50 0x000000000063fb26 in run_file (p_cf=0x7fffffffd710, filename=0xa75240 L"/home/user/research/portfolios/expected_return/main.py", fp=0xae10e0) at ../Modules/main.c:318
#51 Py_Main () at ../Modules/main.c:768
#52 0x00000000004cfeb1 in main () at ../Programs/python.c:65
#53 0x00007ffff7810830 in __libc_start_main (main=0x4cfdd0
, argc=2, argv=0x7fffffffd928, init=, fini=, rtld_fini=, stack_end=0x7fffffffd918)
at ../csu/libc-start.c:291
#54 0x00000000005d6049 in _start ()

remi-r-mp · 2018-11-02T08:49:28Z

I've made some code to reproduce the error:

import dotenv
import django
dotenv.load_dotenv()
django.setup()

import torch
import torch.nn as nn


class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.linear = nn.Linear(4, 1)

    def forward(self, x):
        x = self.linear(x)
        return x


inputs_tensor = torch.rand(10, 4)
targets_tensor = torch.rand(10, 1)

net = Network()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.0001)

optimizer.zero_grad()
outputs = net(inputs_tensor)
loss = criterion(outputs, targets_tensor)

loss.backward()
optimizer.step()

It seems that django is somehow interfering, causing the bug... (when you comment it out it does work, though I need a django connection to fetch data from the database (not used in this example))

zou3519 · 2021-04-28T15:05:34Z

Hi-pri for crash, but this needs reproduction because it's an older issue.

gchanan · 2021-04-29T16:25:52Z

closing because too old.

zou3519 added the todo Not as important as medium or high priority tasks, but we will work on these. label Nov 5, 2018

zou3519 added high priority module: crash Problem manifests as a hard crash, as opposed to a RuntimeError and removed todo Not as important as medium or high priority tasks, but we will work on these. labels Apr 28, 2021

pytorch-probot bot added the triage review label Apr 28, 2021

gchanan closed this as completed Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) in forward #13123

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) in forward #13123

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) in forward #13123

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) in forward #13123

Comments