Releases: Enigmatisms/cuda-pt
Parallel SBVH acceleration structure building
Credit to @AdamYuan, who coded PSBVH in his own repo (VkAdypt) and took time to migrate some of his effort to this repo.
PR #18 to #20 implements a better parallel construction procedure for SBVH. Originally, for the kitchen
scene, OpenMP accelerated within-node parallelism needs 15s to build the SBVH, while the current implementation only requires less than 6s.
v1.10.0: SBVH
This release implements the idea from paper 'Spatial Splits in Bounding Volume Hierarchies' by Martin Stich, et al:
- Primitive reference based BVH building (my old implementation is reference free). Also, the code is completely compatible with my GPU stackless traversal (but the design is indeed painful).
- Sutherland–Hodgman algorithm + line drawing based chopped binning.
- SAH-based spatial split and object split decisions.
- Reference unsplitting mentioned in the paper.
- A simple try for within-node parallel chopped binning.
The average performance is increased by 30%, with SAH cost significantly reduced (for example, in the Kitchen
scene, 15.3-->11.14). Yet the current SBVH construction is still pretty slow. I have no intention to further accelerate it. Here are some results (BVH cost visualization):
setting\scene | Kitchen | Rich Cars |
---|---|---|
BVH | ||
SBVH |
Also, several other features / modifications have been made:
- Implements a simple
accel-only
mode, which only constructs acceleration structure and does no rendering. This is for future (and current) acceleration development purpose. - Improved BVH cost visualizing rendering mode. Now we can view the
traverse cost
,intersection cost
andfull cost
in three different channels and select one of them by ImGui porting. - Adds acceleration structure evaluation metrics (see
bvh_opt.cu
for more details). - Restructure BVH and SBVH by employing a more object-oriented code style instead of the old function oriented way.
v1.9.1
What's Changed
- [Code Style] Better code style via pre-commit by @Enigmatisms in #15
Full Changelog: v1.9.0...v1.9.1
Volumetric Path Tracer
What's Changed
- Volumetric Path Tracer (megakernel) and Infrastructure for Online Neural Path Guiding (wavefront). See the pull request #13 by @Enigmatisms for more details:
video-med-quality.mp4
Full Changelog: v1.8.1...v1.9.0
v1.8.1
What's Changed
- Accelerated WFPT implementation. by @Enigmatisms in #11
Full Changelog: v1.8...v1.8.1
v1.8: Python API and Distributed Parallel Rendering
What's Changed (Pull 10#)
Exported python API via nanobind:
PythonRenderer
class is defined and can be imported in Python when compiled.PythonRenderer
supports exporting CUDA torch.Tensor (output buffer and variance buffer), as well as the running-mean of frame time.- The structure of the CMake project is utterly changed. The core functions and classes are compiled to a static lib (RenderCore) now.
Add scripts that supports distributed parallel rendering:
- Based on PyTorch DDP module.
pyrender/ddp_render.py
supports rendering with multiple GPUs (on a same device). Multiple devices can be trivially supported, but it is not implemented yet. - The script writes output buffer (RGB image), variance buffer (single channel image), variance curves (Welford's algorithm), frame time (different processes) and average frame time curves to a tensorboard log. The variance estimation buffer can be used for adaptive sampling in the later updates.
Tested the compilation on Ubuntu 22.04, fixed several bugs that are exclusive on Linux.
Variance Image | Variance Curves | Frame Time (ms) Curves |
---|---|---|
Full Changelog: v1.7...v1.8
v1.7
What's Changed
- Major updates with more renderers, better visualization and higher performance by @Enigmatisms in #9
Vader Scene | Whiskey Scene |
---|---|
Full Changelog: v1.6...v1.7
v1.6
What's Changed
- Major update: Pitched array GPU textures. by @Enigmatisms in #8
Full Changelog: v1.5...v1.6
v1.5
What's Changed
- Plastic Transmission (thin) implemented. The new sports-car scene with around 3M triangles is rendered. Check the rendered image in the updated README.
- Upgrade the GUI. Now you can change the emitter / material on-the-fly. This is laborious for GPU (why would I write GPU inheritance in the first place). Check the updated README for video.
- Fixed several bugs.
v1.4
Full Changelog: v1.3.3...v1.4