RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

8000 Chong Zeng · Yue Dong · Pieter Peers · Hongzhi Wu · Xin Tong

SIGGRAPH 2025 Conference Papers

Examples of triangle-mesh based scenes rendered with RenderFormer without per-scene training or fine-tuning that include (multiple) specular reflections, complex shadows, diffuse indirect lighting, glossy reflections, soft and hard shadows, and multiple light sources.

Project Page | arXiv | Paper | Model | Official Code

RenderFormer is a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning. Instead of taking a physics-centric approach to rendering, we formulate rendering as a sequence-to-sequence transformation where a sequence of tokens representing triangles with reflectance properties is converted to a sequence of output tokens representing small patches of pixels. RenderFormer follows a two stage pipeline: a view-independent stage that models triangle-to-triangle light transport, and a view-dependent stage that transforms a token representing a bundle of rays to the corresponding pixel values guided by the triangle-sequence from the the view-independent stage. Both stages are based on the transformer architecture and are learned with minimal prior constraints. We demonstrate and evaluate RenderFormer on scenes with varying complexity in shape and light transport.

Table of Content

Installation
Usage
- Image Rendering
- Video Rendering
  - Download Example Data
  - Rendering a Video Using Inference Script
    - Available Arguments of the Inference Script
Bring Your Own Scene!
Acknowledgements
License
Citation

Installation

Prerequisites

System: The code is tested on Linux, MacOS and Windows.
Hardware: The code has been tested on both NVIDIA CUDA GPUs and Apple Metal GPUs. The minimal GPU memory requirement is 8GB.

Environment Setup

First set up an environment with PyTorch 2.0+. For CUDA users, you can install Flash Attention from https://github.com/Dao-AILab/flash-attention.

The rest of the dependencies can be installed through:

git clone https://github.com/microsoft/renderformer
cd renderformer
pip install -r requirements.txt
python3 -c "import imageio; imageio.plugins.freeimage.download()"  # Needed for HDR image IO

Pretrained Models

Model	Params	Link	Model ID
RenderFormer-V1-Base	205M	Hugging Face	`microsoft/renderformer-v1-base`
RenderFormer-V1.1-Large	483M	Hugging Face	`microsoft/renderformer-v1.1-swin-large`

Note on the released models

We found a shader bug in the training data that we used in the submission. We re-trained the models with the corrected shader and released the new models. Thus the model performance and output might be different from the results in the paper.

Usage

Image Rendering

Scene Conversion

We put example scene config JSON files at examples. To render a scene, first convert a scene config JSON file into our HDF5 scene format:

python3 scene_processor/convert_scene.py examples/cbox.json --output_h5_path tmp/cbox/cbox.h5

Rendering a Single Image Using Inference Script

python3 infer.py --h5_file tmp/cbox/cbox.h5 --output_dir output/cbox/

You should now see output/cbox/cbox_view_0.exr and output/cbox/cbox_view_0.png under your output folder. .exr is the HDR Linear output from RenderFormer, and .png is the LDR version of the rendered image. You can enable different tone mappers through --tone_mapper to achieve better visual results.

The script will automatically fallback to use torch scaled dot product attention if Flash Attention is not found on the system. We also provide an environment ATTN_IMPL for you to choose which attention implementation to use:

# Use SDPA intentionally
ATTN_IMPL=sdpa python3 infer.py --h5_file tmp/cbox/cbox.h5 --output_dir output/cbox/

Please check the image render shell script for more examples.

Available Arguments of the Inference Script

--h5_file H5_FILE     Path to the input H5 file
--model_id MODEL_ID   Model ID on Hugging Face or local path
--precision {bf16,fp16,fp32}
                      Precision for inference (Default: fp16)
--resolution RESOLUTION
                      Resolution for inference (Default: 512)
--output_dir OUTPUT_DIR
                      Output directory (Default: same as input H5 file)
--tone_mapper {none,agx,filmic,pbr_neutral}
                      Tone mapper for inference (Default: none)

Inference with `RenderFormerRenderingPipeline`

You can achieve batch rendering with RenderFormerRenderingPipeline by providing a batch of input scene and rendering camera parameters.

Minimal example (without meaningful inputs, just for testing):

import torch
from renderformer import RenderFormerRenderingPipeline

pipeline = RenderFormerRenderingPipeline.from_pretrained("microsoft/renderformer-v1.1-swin-large")
device = torch.device('cuda')
pipeline.to(device)

BATCH_SIZE = 2
NUM_TRIANGLES = 1024
TEX_PATCH_SIZE = 32
NUM_VIEWS = 4

triangles = torch.randn((BATCH_SIZE, NUM_TRIANGLES, 3, 3), device=device)
texture = torch.randn((BATCH_SIZE, NUM_TRIANGLES, 13, TEX_PATCH_SIZE, TEX_PATCH_SIZE), device=device)
mask = torch.ones((BATCH_SIZE, NUM_TRIANGLES), dtype=torch.bool, device=device)
vn = torch.randn((BATCH_SIZE, NUM_TRIANGLES, 3, 3), device=device)
c2w = torch.randn((BATCH_SIZE, NUM_VIEWS, 4, 4), device=device)
fov = torch.randn((BATCH_SIZE, NUM_VIEWS, 1), device=device)

rendered_imgs = pipeline(
    triangles=triangles,
    texture=texture,
    mask=mask,
    vn=vn,
    c2w=c2w,
    fov=fov,
    resolution=512,
    torch_dtype=torch.float16,
)
print("Inference completed. Rendered Linear HDR images shape:", rendered_imgs.shape)
# Inference completed. Rendered Linear HDR images shape: torch.Size([2, 4, 512, 512, 3])

Please check infer.py and rendering_pipeline.py for detailed usages.

Video Rendering

Download Example Data

We put example video input data on Hugging Face. You can download and unzip them with this script.

Rendering a Video Using Inference Script

python3 batch_infer.py --h5_folder renderformer-video-data/submission-videos/cbox-roughness/ --output_dir output/videos/cbox-roughness

Please check the video render shell script for more examples.

Available Arguments of the Inference Script

--h5_folder H5_FOLDER
                      Path to the folder containing input H5 files
--model_id MODEL_ID   Model ID on Hugging Face or local path
--precision {bf16,fp16,fp32}
                      Precision for inference
--resolution RESOLUTION
                      Resolution for inference
--batch_size BATCH_SIZE
                      Batch size for inference
--padding_length PADDING_LENGTH
                      Padding length for inference
--num_workers NUM_WORKERS
                      Number of workers for data loading
--output_dir OUTPUT_DIR
                      Output directory for rendered images (default: same as input folder)
--save_video          Merge rendered images into a video at video.mp4.
--tone_mapper {none,agx,filmic,pbr_neutral}
                      Tone mapper for inference

Bring Your Own Scene!

Scene Definition JSON

RenderFormer uses a JSON-based scene description format that defines the geometry, materials, lighting, and camera setup for your scene. The scene configuration is defined using a hierarchical structure with the following key components:

Scene Structure

scene_name: A descriptive name for your scene
version: The version of the scene description format (currently "1.0")
objects: A dictionary of objects in the scene, including both geometry and lighting
cameras: A list of camera configurations for rendering

Object Configuration

Each object in the scene requires:

mesh_path: Path to the .obj mesh file
material: Material properties including:
- diffuse: RGB diffuse color [r, g, b]
- specular: RGB specular color [r, g, b] (We currently only support white specular, and diffuse + specular should be no larger than 1.0)
- roughness: Surface roughness (0.01 to 1.0)
- emissive: RGB emission color [r, g, b] (We currently only support white emission, and only on light source triangles)
- smooth_shading: Whether to use smooth shading on this object
- rand_tri_diffuse_seed: Optional seed for random triangle coloring, if none, use the diffuse color directly
- random_diffuse_max: Maximum value for random diffuse color assignment (max diffuse color + specular color should be no larger than 1.0)
- random_diffuse_type: Type of random diffuse color assignment, either per triangle or per shading group
transform: Object transformation including:
- translation: [x, y, z] position
- rotation: [x, y, z] rotation in degrees
- scale: [x, y, z] scale factors
- normalize: Whether to normalize object to unit sphere
remesh: Whether to remesh the object
remesh_target_face_num: Target face number of the remeshed object

Camera Configuration

Each camera requires:

position: [x, y, z] camera position
look_at: [x, y, z] target point
up: [x, y, z] up vector
fov: Field of view in degrees

Example Scene

We recommend start from the examples/init-template.json and modify it to your needs. For more complex examples, refer to the scene configurations in the examples directory.

HDF5 Data Fields

The HDF5 file contains the following fields:

triangles: [N, 3, 3] array of triangle vertices
texture: [N, 13, 32, 32] array of texture patches
vn: [N, 3, 3] array of vertex normals
c2w: [N, 4, 4] array of camera-to-world matrices
fov: [N] array of field of view

We use the same camera coordinate system as Blender (-Z = view direction, +Y = up, +X = right), be mindful of the coordinate system when implementing your own HDF5 converter.

Please refer to scene_processor/to_h5.py for more details.

Remesh Objects

We provide a simple remeshing tool in scene_processor/remesh.py. You can use it to remesh your objects before putting them into the scene.

We also provide fields in the scene config JSON file (remesh and remesh_target_face_num) to allow you to remesh the object during scene conversion process.

python3 scene_processor/remesh.py --input path/to/your/high_res_mesh.obj ----output remeshed_object.obj --target_face_num 1024

Blender Extension

We provide a Blender Extension to simplify the process of setting up a scene for RenderFormer. Please refer to the Blender Extension for more details.

Scene Setting Tips

Always start from the examples/init-template.json.
Please limit the scene in our training data range, extrapolation can work but not guaranteed.
- Camera distance to scene center in [1.5, 2.0], fov in [30, 60] degrees
- Scene bounding box in [-0.5, 0.5] in x, y, z
- Light sources: up to 8 triangles (please use the triangle mesh at examples/templates/lighting/tri.obj), each scale in [2.0, 2.5], distance to scene center in [2.1, 2.7], emission values summed in [2500, 5000]
- Total number of triangles: training data covers up to 4096 triangles, but extending to 8192 triangles during inference usually still works.
- All training objects are water-tight and simplified with QSlim. Uniform triangle sizes are preferred. If you find your object not working, try to remesh it with our provided script or other remeshing tools.

Acknowledgements

We borrowed some code from the following repositories. We thank the authors for their contributions.

In addition to the 3D model from Objaverse, we express our profound appreciation to the contributors of the 3D models that we used in the examples.

Shader Ball: by Wenzel Jakob from Mitsuba Gallery
Stanford Bunny & Lucy: from The Stanford 3D Scanning Repository
Cornell Box: from Cornell Box Data, Cornell University Program of Computer Graphics
Utah Teapot: from Utah Model Repository
Veach MIS: From Eric Veach and Leonidas J. Guibas. 1995. Optimally combining sampling techniques for Monte Carlo rendering
Spot: By Keenan Crane from Keenan's 3D Model Repository
Klein Bottle: By Fausto Javier Da Rosa
Constant Width: Original mesh from Small volume bodies of constant width. Derived mesh from Keenan's 3D Model Repository
Jewelry: By elbenZ
Banana, Easter Basket, Water Bottle, Bronco, Heart: By Microsoft
Lowpoly Fox: By Vlad Zaichyk
Lowpoly Crystals: By Mongze
Bowling Pin: By SINOFWRATH
Cube Cascade, Marching Cubes: By Tycho Magnetic Anomaly
Dancing Crab: By Bohdan Lvov
Magical Gyroscope: By reddification
Capoeira Cube: By mortaleiros
P.U.C. Security Bot: By Gouhadouken

License

RenderFormer model and the majority of the code are licensed under the MIT License. The following submodules may have different licenses:

renderformer-liger-kernel: Redistributed Liger Kernel for RenderFormer integration. It's derived from original Liger Kernel and licensed under the BSD 2-Clause "Simplified" License.
simple-ocio: We use this tool to simplify OpenColorIO usage for tone-mapping. This package redistributes the complete Blender Color Management directory. The full license text is available at ocio-license.txt and the headers of each configuration file. The package itself is still licensed under the MIT License.

Citation

If you find this work helpful, please cite our paper:

@inproceedings {zeng2025renderformer,
    title      = {RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination},
    author     = {Chong Zeng and Yue Dong and Pieter Peers and Hongzhi Wu and Xin Tong},
    booktitle  = {ACM SIGGRAPH 2025 Conference Papers},
    year       = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
examples		examples
medias		medias
renderformer		renderformer
scene_processor		scene_processor
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
batch_infer.py		batch_infer.py
download_video_data.sh		download_video_data.sh
infer.py		infer.py
render-images.sh		render-images.sh
render-videos.sh		render-videos.sh
requirements.txt		requirements.txt

License

microsoft/renderformer

Folders and files

Latest commit

History

Repository files navigation

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

SIGGRAPH 2025 Conference Papers

Table of Content

Installation

Prerequisites

Environment Setup

Pretrained Models

Usage

Image Rendering

Scene Conversion

Rendering a Single Image Using Inference Script

Available Arguments of the Inference Script

Inference with RenderFormerRenderingPipeline

Video Rendering

Download Example Data

Rendering a Video Using Inference Script

Available Arguments of the Inference Script

Bring Your Own Scene!

Scene Definition JSON

Scene Structure

Object Configuration

Camera Configuration

Example Scene

HDF5 Data Fields

Remesh Objects

Blender Extension

Scene Setting Tips

Acknowledgements

License

Citation

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Inference with `RenderFormerRenderingPipeline`

Packages