Typo in the addition kernel in muladd.cu

There is a typo in the add_kernel routine in the CUDA file muladd.cu. I assumed that this kernel should compute the sum of two tensors, but it acually computes the multiplication:

__global__ void add_kernel(int numel, const float* a, const float* b, float* result) {
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if (idx < num
53DF
el) result[idx] = a[idx] * b[idx];
}

This bug can be tested by running the following python script:

import extension_cpp as ext
import torch

device = 'cuda'
n=100
a = torch.rand(n).to(device)
b = torch.rand(n).to(device)
add2 = torch.zeros(n).to(device)

add1=a+b
ext.ops.myadd_out(a,b,add2)
print(torch.equal(add1,add2))

The CPU implementation gives the correct result (with device="cpu" in the code above).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions