device-side assert triggered - when probability > 1.0

@ngimel

🐛 Bug

When using CUDA, getting non-descriptive runtime error, which makes any future calls to CUDA fail, and requires runtime restart. When not using CUDA, error is nice and descriptive, and doesn't break the runtime.

To Reproduce

Steps to reproduce the behavior:

import torch
prob = torch.tensor([1.01]).to('cuda:0')
torch.distributions.bernoulli.Bernoulli(probs=prob).sample([1])

This results in:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1591914743399/work/aten/src/THC/THCReduceAll.cuh:327

Any subsequent calls to CUDA (even with valid instructions), result in:

RuntimeError: CUDA error: device-side assert triggered

Expected behavior

This is a nice descriptive error, which one gets if not using CUDA.

RuntimeError: Expected p_in >= 0 && p_in <= 1 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

Environment

Collecting environment information...
PyTorch version: 1.5.1
Is debug build: No
CUDA used to build PyTorch: 9.2

OS: Debian GNU/Linux 10 (buster)
GCC version: (Debian 8.3.0-6) 8.3.0
CMake version: Could not collect

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.5.1
[pip3] torchvision==0.6.0a0+35d732a
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               9.2                           0  
[conda] mkl                       2020.1                      217  
[conda] mkl-service               2.3.0            py38he904b0f_0  
[conda] mkl_fft                   1.1.0            py38h23d657b_0  
[conda] mkl_random                1.1.1            py38h0573a6f_0  
[conda] numpy                     1.19.1                   pypi_0    pypi
[conda] pytorch                   1.5.1           py3.8_cuda9.2.148_cudnn7.6.3_0    pytorch
[conda] torchvision               0.6.1                 py38_cu92    pytorch

Additional context

Not sure how much of an actual issue this is, since any additional checks on parameters would result in slower execution - and it takes one second to try same code without CUDA to identify the issue. Also this has been mentioned several times in other tickets, and it just seems that CUDA errors aren't as descriptive.

#1204
#9585

cc @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions