8000 GitHub Β· Where software is built
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
device-side assert triggered - when probability > 1.0Β #42452
Open
@avloss

Description

@avloss

πŸ› Bug

When using CUDA, getting non-descriptive runtime error, which makes any future calls to CUDA fail, and requires runtime restart. When not using CUDA, error is nice and descriptive, and doesn't break the runtime.

To Reproduce

Steps to reproduce the behavior:

import torch
prob = torch.tensor([1.01]).to('cuda:0')
torch.distributions.bernoulli.Bernoulli(probs=prob).sample([1])

This results in:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1591914743399/work/aten/src/THC/THCReduceAll.cuh:327

Any subsequent calls to CUDA (even with valid instructions), result in:

RuntimeError: CUDA error: device-side assert triggered

Expected behavior

This is a nice descriptive error, which one gets if not using CUDA.

RuntimeError: Expected p_in >= 0 && p_in <= 1 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

Environment

Collecting environment information...
PyTorch version: 1.5.1
Is debug build: No
CUDA used to build PyTorch: 9.2

OS: Debian GNU/Linux 10 (buster)
GCC version: (Debian 8.3.0-6) 8.3.0
CMake version: Could not collect

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.5.1
[pip3] torchvision==0.6.0a0+35d732a
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               9.2                           0  
[conda] mkl                       2020.1                      217  
[conda] mkl-service               2.3.0            py38he904b0f_0  
[conda] mkl_fft                   1.1.0            py38h23d657b_0  
[conda] mkl_random                1.1.1            py38h0573a6f_0  
[conda] numpy                     1.19.1                   pypi_0    pypi
[conda] pytorch                   1.5.1           py3.8_cuda9.2.148_cudnn7.6.3_0    pytorch
[conda] torchvision               0.6.1                 py38_cu92    pytorch

Additional context

Not sure how much of an actual issue this is, since any additional checks on parameters would result in slower execution - and it takes one second to try same code without CUDA to identify the issue. Also this has been mentioned several times in other tickets, and it just seems that CUDA errors aren't as descriptive.

#1204
#9585

cc @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    better-engineeringRelatively self-contained tasks for better engineering contributorsmodule: crashProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: cudaRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0