8000 GitHub - microsoft/renderformer: Official Code Release for [SIGGRAPH 2025] RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Official Code Release for [SIGGRAPH 2025] RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

License

Notifications You must be signed in to change notification settings

microsoft/renderformer

Repository files navigation

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

8000 Chong Zeng · Yue Dong · Pieter Peers · Hongzhi Wu · Xin Tong

SIGGRAPH 2025 Conference Papers

Examples of triangle-mesh based scenes rendered with RenderFormer without per-scene training or fine-tuning that include (multiple) specular reflections, complex shadows, diffuse indirect lighting, glossy reflections, soft and hard shadows, and multiple light sources.


Project Page | arXiv | Paper | Model | Official Code

RenderFormer is a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning. Instead of taking a physics-centric approach to rendering, we formulate rendering as a sequence-to-sequence transformation where a sequence of tokens representing triangles with reflectance properties is converted to a sequence of output tokens representing small patches of pixels. RenderFormer follows a two stage pipeline: a view-independent stage that models triangle-to-triangle light transport, and a view-dependent stage that transforms a token representing a bundle of rays to the corresponding pixel values guided by the triangle-sequence from the the view-independent stage. Both stages are based on the transformer architecture and are learned with minimal prior constraints. We demonstrate and evaluate RenderFormer on scenes with varying complexity in shape and light transport.

Table of Content

Installation

Prerequisites

  • System: The code is tested on Linux, MacOS and Windows.
  • Hardware: The code has been tested on both NVIDIA CUDA GPUs and Apple Metal GPUs. The minimal GPU memory requirement is 8GB.

Environment Setup

First set up an environment with PyTorch 2.0+. For CUDA users, you can install Flash Attention from https://github.com/Dao-AILab/flash-attention.

The rest of the dependencies can be installed through:

git clone https://github.com/microsoft/renderformer
cd renderformer
pip install -r requirements.txt
python3 -c "import imageio; imageio.plugins.freeimage.download()"  # Needed for HDR image IO

Pretrained Models

Model Params Link Model ID
RenderFormer-V1-Base 205M Hugging Face microsoft/renderformer-v1-base
RenderFormer-V1.1-Large 483M Hugging Face microsoft/renderformer-v1.1-swin-large
Note on the released models
We found a shader bug in the training data that we used in the submission. We re-trained the models with the corrected shader and released the new models. Thus the model performance and output might be different from the results in the paper.

Usage

Image Rendering

Scene Conversion

We put example scene config JSON files at examples. To render a scene, first convert a scene config JSON file into our HDF5 scene format:

python3 scene_processor/convert_scene.py examples/cbox.json --output_h5_path tmp/cbox/cbox.h5

Rendering a Single Image Using Inference Script

python3 infer.py --h5_file tmp/cbox/cbox.h5 --output_dir output/cbox/

You should now see output/cbox/cbox_view_0.exr and output/cbox/cbox_view_0.png under your output folder. .exr is the HDR Linear output from RenderFormer, and .png is the LDR version of the rendered image. You can enable different tone mappers through --tone_mapper to achieve better visual results.

The script will automatically fallback to use torch scaled dot product attention if Flash Attention is not found on the system. We also provide an environment ATTN_IMPL for you to choose which attention implementation to use:

# Use SDPA intentionally
ATTN_IMPL=sdpa python3 infer.py --h5_file tmp/cbox/cbox.h5 --output_dir output/cbox/

Please check the image render shell script for more examples.

Available Arguments of the Inference Script

--h5_file H5_FILE     Path to the input H5 file
--model_id MODEL_ID   Model ID on Hugging Face or local path
--precision {bf16,fp16,fp32}
                      Precision for inference (Default: fp16)
--resolution RESOLUTION
                      Resolution for inference (Default: 512)
--output_dir OUTPUT_DIR
                      Output directory (Default: same as input H5 file)
--tone_mapper {none,agx,filmic,pbr_neutral}
                      Tone mapper for inference (Default: none)

Inference with RenderFormerRenderingPipeline

You can achieve batch rendering with RenderFormerRenderingPipeline by providing a batch of input scene and rendering camera parameters.

Minimal example (without meaningful inputs, just for testing):

import torch
from renderformer import RenderFormerRenderingPipeline

pipeline = RenderFormerRenderingPipeline.from_pretrained("microsoft/renderformer-v1.1-swin-large")
device = torch.device('cuda')
pipeline.to(device)

BATCH_SIZE = 2
NUM_TRIANGLES = 1024
TEX_PATCH_SIZE = 32
NUM_VIEWS = 4

triangles = torch.randn((BATCH_SIZE, NUM_TRIANGLES, 3, 3), device=device)
texture = torch.randn((BATCH_SIZE, NUM_TRIANGLES, 13, TEX_PATCH_SIZE, TEX_PATCH_SIZE), device=device)
mask = torch.ones((BATCH_SIZE, NUM_TRIANGLES), dtype=torch.bool, device=device)
vn = torch.randn((BATCH_SIZE, NUM_TRIANGLES, 3, 3), device=device)
c2w = torch.randn((BATCH_SIZE, NUM_VIEWS, 4, 4), device=device)
fov = torch.randn((BATCH_SIZE, NUM_VIEWS, 1), device=device)

rendered_imgs = pipeline(
    triangles=triangles,
    texture=texture,
    mask=mask,
    vn=vn,
    c2w=c2w,
    fov=fov,
    resolution=512,
    torch_dtype=torch.float16,
)
print("Inference completed. Rendered Linear HDR images shape:", rendered_imgs.shape)
# Inference completed. Rendered Linear HDR images shape: torch.Size([2, 4, 512, 512, 3])

Please check infer.py and rendering_pipeline.py for detailed usages.

Video Rendering

Download Example Data

We put example video input data on Hugging Face. You can download and unzip them with this script.

Rendering a Video Using Inference Script

python3 batch_infer.py --h5_folder renderformer-video-data/submission-videos/cbox-roughness/ --output_dir output/videos/cbox-roughness

Please check the video render shell script for more examples.

Available Arguments of the Inference Script

--h5_folder H5_FOLDER
                      Path to the folder containing input H5 files
--model_id MODEL_ID   Model ID on Hugging Face or local path
--precision {bf16,fp16,fp32}
                      Precision for inference
--resolution RESOLUTION
                      Resolution for inference
--batch_size BATCH_SIZE
                      Batch size for inference
--padding_length PADDING_LENGTH
                      Padding length for inference
--num_workers NUM_WORKERS
                      Number of workers for data loading
--output_dir OUTPUT_DIR
                      Output directory for rendered images (default: same as input folder)
--save_video          Merge rendered images into a video at video.mp4.
--tone_mapper {none,agx,filmic,pbr_neutral}
                      Tone mapper for inference

Bring Your Own Scene!

Scene Definition JSON

RenderFormer uses a JSON-based scene description format that defines the geometry, materials, lighting, and camera setup for your scene. The scene configuration is defined using a hierarchical structure with the following key components:

Scene Structure

  • scene_name: A descriptive name for your scene
  • version: The version of the scene description format (currently "1.0")
  • objects: A dictionary of objects in the scene, including both geometry and lighting
  • cameras: A list of camera configurations for rendering

Object Configuration

Each object in the scene requires:

  • mesh_path: Path to the .obj mesh file
  • material: Material properties including:
    • diffuse: RGB diffuse color [r, g, b]
    • specular: RGB specular color [r, g, b] (We currently only support white specular, and diffuse + specular should be no larger than 1.0)
    • roughness: Surface roughness (0.01 to 1.0)
    • emissive: RGB emission color [r, g, b] (We currently only support white emission, and only on light source triangles)
    • smooth_shading: Whether to use smooth shading on this object
    • rand_tri_diffuse_seed: Optional seed for random triangle coloring, if none, use the diffuse color directly
    • random_diffuse_max: Maximum value for random diffuse color assignment (max diffuse color + specular color should be no larger than 1.0)
    • random_diffuse_type: Type of random diffuse color assignment, either per triangle or per shading group
  • transform: Object transformation including:
    • translation: [x, y, z] position
    • rotation: [x, y, z] rotation in degrees
    • scale: [x, y, z] scale factors
    • normalize: Whether to normalize object to unit sphere
  • remesh: Whether to remesh the object
  • remesh_target_face_num: Target face number of the remeshed object

Camera Configuration

Each camera requires:

  • position: [x, y, z] camera position
  • look_at: [x, y, z] target point
  • up: [x, y, z] up vector
  • fov: Field of view in degrees

Example Scene

We recommend start from the examples/init-template.json and modify it to your needs. For more complex examples, refer to the scene configurations in the examples directory.

HDF5 Data Fields

The HDF5 file contains the following fields:

  • triangles: [N, 3, 3] array of triangle vertices
  • texture: [N, 13, 32, 32] array of texture patches
  • vn: [N, 3, 3] array of vertex normals
  • c2w: [N, 4, 4] array of camera-to-world matrices
  • fov: [N] array of field of view

We use the same camera coordinate system as Blender (-Z = view direction, +Y = up, +X = right), be mindful of the coordinate system when implementing your own HDF5 converter.

Please refer to scene_processor/to_h5.py for more details.

Remesh Objects

We provide a simple remeshing tool in scene_processor/remesh.py. You can use it to remesh your objects before putting them into the scene.

We also provide fields in the scene config JSON file (remesh and remesh_target_face_num) to allow you to remesh the object during scene conversion process.

python3 scene_processor/remesh.py --input path/to/your/high_res_mesh.obj ----output remeshed_object.obj --target_face_num 1024

Blender Extension

We provide a Blender Extension to simplify the process of setting up a scene for RenderFormer. Please refer to the Blender Extension for more details.

Scene Setting Tips

  1. Always start from the examples/init-template.json.
  2. Please limit the scene in our training data range, extrapolation can work but not guaranteed.
    • Camera distance to scene center in [1.5, 2.0], fov in [30, 60] degrees
    • Scene bounding box in [-0.5, 0.5] in x, y, z
    • Light sources: up to 8 triangles (please use the triangle mesh at examples/templates/lighting/tri.obj), each scale in [2.0, 2.5], distance to scene center in [2.1, 2.7], emission values summed in [2500, 5000]
    • Total number of triangles: training data covers up to 4096 triangles, but extending to 8192 triangles during inference usually still works.
    • All training objects are water-tight and simplified with QSlim. Uniform triangle sizes are preferred. If you find your object not working, try to remesh it with our provided script or other remeshing tools.

Acknowledgements

We borrowed some code from the following repositories. We thank the authors for their contributions.

In addition to the 3D model from Objaverse, we express our profound appreciation to the contributors of the 3D models that we used in the examples.

License

RenderFormer model and the majority of the code are licensed under the MIT License. The following submodules may have different licenses:

Citation

If you find this work helpful, please cite our paper:

@inproceedings {zeng2025renderformer,
    title      = {RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination},
    author     = {Chong Zeng and Yue Dong and Pieter Peers and Hongzhi Wu and Xin Tong},
    booktitle  = {ACM SIGGRAPH 2025 Conference Papers},
    year       = {2025}
}

About

Official Code Release for [SIGGRAPH 2025] RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0