CUDA support #8

kadir014 · 2025-04-12T18:24:59Z

Thank you for the nice wrapper. I managed to use the CUDA support with these steps if anyone else wants to do it as well:

The latest version on PyPI (0.2) doesn't support the new Pythonic API, clone the repo and install an editable build. (pip install -e .)
Download the latest OIDN binaries from here: https://github.com/RenderKit/oidn/releases
Replace all the dll files and copy OpenImageDenoise_device_cuda.dll as well.

In __init__.py, add CUDA dll as well

ctypes.CDLL(os.path.join(cur_path, f"lib.win.x64/OpenImageDenoise_device_cuda.dll"))

Disable the GetDeviceError function (it was raising a struct related error on my end.)
In Buffer.create function, change the tensor creation function to the current arguments:
```
bf.buffer_delegate = torch.zeros(*storage_shape, dtype=torch.float32)
```
And done! Thanks to this I'm using denoising close to realtime in my toy pathtracer project.

If I can find time I can just open a PR as well.

The text was updated successfully, but these errors were encountered:

sxysxy · 2025-04-12T18:37:25Z

It seems nice. I'd also like to try to improve this project after I finish my master's degree. Current project mainly provides bindings to the raw C API. Those 'pythonic APIs' still expose too much underlying concept such as buffers, devices, and are still not elegant. We just want it to denoise images from path-tracers. I think it's a good idea to make a very simple interface:

def denoise(image : Union[np.ndarray, PIL.Image, torch.tensor], **maybe_some_options): ...

This could handle 99% situations...

sxysxy · 2025-04-12T18:50:57Z

There are some annoying issues with CUDA support. CUDA kernels can not run on GPUs with incompatible compute capability(e.g. kernels compiled for compute capability 7.5 can not run on 8.6 devies such Nvidia A100）. Old version of precompiled binaries from https://github.com/RenderKit/oidn/releases may not ensure compatibility on newest GPU.

Possible Solutions:

Compile the source code of RenderKit/oidn when using pip to install. But this can lead to problems for users who don't know much about building python native extensions. Those codes are relevent to CUDA, proning to many enviromental issues
Maintain precompiled versions. The maintainers of this repository build oidn binaries from source codes and test usability...

kadir014 · 2025-04-12T19:03:17Z

I used the edited CUDA version to try realtime-ish denoising in my toy pathtracer project.

In my tests oidn.Filter.execute took ~28% of the time (around 9ms), whereas managing oidn.Buffers, converting them to numpy arrays, reading the data into moderngl.Textures, etc... was the main bottleneck. It took ~72% of the time (around 20ms).

I wonder if oidn.Buffer implementation could be improved for this case.

But for non-real-time scenarios, even a more straightforward approach like the def denoise(...) idea above would be great.

kadir014 changed the title ~~Running OIDN on CUDA~~ CUDA support Apr 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA support #8

CUDA support #8

Uh oh!

Uh oh!

Uh oh!

CUDA support #8

CUDA support #8

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!