8000 Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) in forward · Issue #13123 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) in forward #13123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
remi-r-mp opened this issue Oct 25, 2018 · 5 comments
Labels
high priority module: crash Problem manifests as a hard crash, as opposed to a RuntimeError triage review

Comments

@remi-r-mp
Copy link
remi-r-mp commented Oct 25, 2018

Hi,
I am using torch version 0.4.1, without the usage of a GPU. I have a really simple network with 2 Linear layers and a few neurons per layer (~10). I do not use a huge batch size or anything.
Yet, when I run my code, I run into trouble I got the following error: "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
The line where the bug happen is:
outputs = network(batch['sample'])
I have seen related topics to this error, none of which provided me with a working solution. Moreover, I've run the exact same code on another project and it does work...

  • PyTorch Version (e.g., 1.0): 0.4.1
  • OS (e.g., Linux): Ubuntu, 16.04 xenial "Ubuntu 16.04.5 LTS"
  • How you installed PyTorch (conda, pip, source): pip install torch
  • Build command you used (if compiling from source):
  • Python version: 3.5.2
  • CUDA/cuDNN version: None
  • GPU models and configuration: None
  • Any other relevant information: None

Thanks for helping me !

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411

@zou3519
Copy link
Contributor
zou3519 commented Oct 29, 2018

Could you try one of the following (it would help us pin point the problem):

  • Try a nightly build and see if the problem persists (https://pytorch.org/)
  • give us a stack trace via gdb for the crash

@remi-r-mp
Copy link
Author

Hi, thanks for the response, here is the back trace of gdb:

(gdb) backtrace
#0 0x00007fff78d7e260 in ?? ()
#1 0x00007fffec721103 in void std::_Bind_simple<void (())()>::_M_invoke<>(std::_Index_tuple<>) ()
from /home/user/venv/research/lib/python3.5/site-packages/ortools/constraint_solver/../.libs/libprotobuf.so.3.6.1
#2 0x00007fffec72109d in std::_Bind_simple<void (
())()>::operator()() ()
from /home/user/venv/research/lib/python3.5/site-packages/ortools/constraint_solver/../.libs/libprotobuf.so.3.6.1
#3 0x00007fffec720fcd in void std::__once_call_impl<std::_Bind_simple<void (*())()> >() ()
from /home/user/venv/research/lib/python3.5/site-packages/ortools/constraint_solver/../.libs/libprotobuf.so.3.6.1
#4 0x00007ffff7bc8a99 in __pthread_once_slow (once_control=0x7fffa8c0eb80 <torch::utils::type_from_string(std::string const&)::cpu_once>, init_routine=0x7fffe9af7ac0 <__once_proxy>)
at pthread_once.c:116
#5 0x00007fffa4e6a138 in __gthread_once (__func=, __once=0x7fffa8c0eb80 <torch::utils::type_from_string(std::string const&)::cpu_once>)
at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/x86_64-redhat-linux/bits/gthr-default.h:699
#6 std::call_once<torch::utils::type_from_string(const string&)::<lambda()> > (__once=..., __f=) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/mutex:746
#7 torch::utils::type_from_string (str="torch.FloatTensor") at torch/csrc/utils/tensor_types.cpp:59
#8 0x00007fffa552f462 in torch::autograd::THPVariable_type (self=0x7fff77f11b88, args=0x7ffff2ceb9b0, kwargs=0x0) at torch/csrc/autograd/generated/python_variable_methods.cpp:631
#9 0x00000000004e9ba7 in PyCFunction_Call () at ../Objects/methodobject.c:98
#10 0x00000000005372f4 in call_function (oparg=, pp_stack=0x7fffffffc490) at ../Python/ceval.c:4705
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
---Type to continue, or q to quit---
#12 0x0000000000540f9b in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=, argcount=,
args=, locals=, globals=, _co=0x7fffe66aa300) at ../Python/ceval.c:4018
#13 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#14 0x00000000004ebd23 in function_call.lto_priv () at ../Objects/funcobject.c:627
#15 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#16 0x00000000004fb9ce in method_call.lto_priv () at ../Objects/classobject.c:330
#17 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#18 0x0000000000584716 in call_method.lto_priv () at ../Objects/typeobject.c:1439
#19 0x0000000000537916 in PyEval_EvalFrameEx () at ../Python/ceval.c:1594
#20 0x00000000005406df in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#21 0x000000000053c1d0 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffcae0, func=) at ../Python/ceval.c:4813
#22 call_function (oparg=, pp_stack=0x7fffffffcae0) at ../Python/ceval.c:4730
#23 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#24 0x00000000005416ea in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=, argcount=,
args=, locals=, globals=, _co=0x7fff78c90390) at ../Python/ceval.c:4018
#25 PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#26 0x00000000004ebd23 in function_call.lto_priv () at ../Objects/funcobject.c:627
---Type to continue, or q to quit---
#27 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#28 0x00000000004fb9ce in method_call.lto_priv () at ../Objects/classobject.c:330
#29 0x00000000005c1797 in PyObject_Call () at ../Objects/abstract.c:2165
#30 0x0000000000584716 in call_method.lto_priv () at ../Objects/typeobject.c:1439
#31 0x00000000004ede4f in enum_next.lto_priv () at ../Objects/enumobject.c:130
#32 0x0000000000537791 in PyEval_EvalFrameEx () at ../Python/ceval.c:3013
#33 0x0000000000540199 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#34 0x000000000053bd92 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffd160, func=) at ../Python/ceval.c:4813
#35 call_function (oparg=, pp_stack=0x7fffffffd160) at ../Python/ceval.c:4730
#36 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#37 0x0000000000540199 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#38 0x000000000053bd92 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffd370, func=) at ../Python/ceval.c:4813
#39 call_function (oparg=, pp_stack=0x7fffffffd370) at ../Python/ceval.c:4730
#40 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#41 0x000000000053b7e4 in fast_function (nk=, na=, n=, pp_stack=0x7fffffffd4a0, func=) at ../Python/ceval.c:4803
#42 call_function (oparg=, pp_stack=0x7fffffffd4a0) at ../Python/ceval.c:4730
#43 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
---Type to continue, or q to quit---
#44 0x0000000000540199 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#45 0x0000000000540e4f in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#46 PyEval_EvalCode (co=, globals=, locals=) at ../Python/ceval.c:777
#47 0x000000000060c272 in run_mod () at ../Python/pythonrun.c:976
#48 0x000000000060e71a in PyRun_FileExFlags () at ../Python/pythonrun.c:929
#49 0x000000000060ef0c in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:396
#50 0x000000000063fb26 in run_file (p_cf=0x7fffffffd710, filename=0xa75240 L"/home/user/research/portfolios/expected_return/main.py", fp=0xae10e0) at ../Modules/main.c:318
#51 Py_Main () at ../Modules/main.c:768
#52 0x00000000004cfeb1 in main () at ../Programs/python.c:65
#53 0x00007ffff7810830 in __libc_start_main (main=0x4cfdd0

, argc=2, argv=0x7fffffffd928, init=, fini=, rtld_fini=, stack_end=0x7fffffffd918)
at ../csu/libc-start.c:291
#54 0x00000000005d6049 in _start ()

@remi-r-mp
Copy link
Author

I've made some code to reproduce the error:

import dotenv
import django
dotenv.load_dotenv()
django.setup()

import torch
import torch.nn as nn


class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.linear = nn.Linear(4, 1)

    def forward(self, x):
        x = self.linear(x)
        return x


inputs_tensor = torch.rand(10, 4)
targets_tensor = torch.rand(10, 1)

net = Network()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.0001)

optimizer.zero_grad()
outputs = net(inputs_tensor)
loss = criterion(outputs, targets_tensor)

loss.backward()
optimizer.step()

It seems that django is somehow interfering, causing the bug... (when you comment it out it does work, though I need a django connection to fetch data from the database (not used in this example))

@zou3519 zou3519 added the todo Not as important as medium or high priority tasks, but we will work on these. label Nov 5, 2018
@zou3519 zou3519 added high priority module: crash Problem manifests as a hard crash, as opposed to a RuntimeError and removed todo Not as important as medium or high priority tasks, but we will work on these. labels Apr 28, 2021
@zou3519
Copy link
Contributor
zou3519 commented Apr 28, 2021

Hi-pri for crash, but this needs reproduction because it's an older issue.

@gchanan gchanan closed this as completed Apr 29, 2021
@gchanan
Copy link
Contributor
gchanan commented Apr 29, 2021

closing because too old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: crash Problem manifests as a hard crash, as opposed to a RuntimeError triage review
Projects
None yet
Development

No branches or pull requests

3 participants
0