research-article

Open access

A Statistical Approach to Monte Carlo Denoising

Authors: Hiroyuki Sakai, Christian Freude, Thomas Auzinger, David Hahn, Michael WimmerAuthors Info & Claims

SA '24: SIGGRAPH Asia 2024 Conference Papers

Article No.: 68, Pages 1 - 11

https://doi.org/10.1145/3680528.3687591

Published: 03 December 2024 Publication History

All formats PDF

Abstract

The stochastic nature of modern Monte Carlo (MC) rendering methods inevitably produces noise in rendered images for a practical number of samples per pixel. The problem of denoising these images has been widely studied, with most recent methods relying on data-driven, pretrained neural networks. In contrast, in this paper we propose a statistical approach to the denoising problem, treating each pixel as a random variable and reasoning about its distribution. Considering a pixel of the noisy rendered image, we formulate fast pair-wise statistical tests—based on online estimators—to decide which of the nearby pixels to exclude from the denoising filter. We show that for symmetric pixel weights and normally distributed samples, the classical Welch t-test is optimal in terms of mean squared error. We then show how to extend this result to handle non-normal distributions, using more recent confidence-interval formulations in combination with the Box-Cox transformation. Our results show that our statistical denoising approach matches the performance of state-of-the-art neural image denoising without having to resort to any computation-intensive pretraining. Furthermore, our approach easily generalizes to other quantities besides pixel intensity, which we demonstrate by showing additional applications to Russian roulette path termination and multiple importance sampling.

1 Introduction

Monte Carlo (MC) integration is ubiquitous in modern photorealistic rendering. The basic idea is to estimate an integral (pixel intensity) by a (weighted) average of random samples. However, compared to deterministic quadrature rules, MC integration is stochastic in nature and must therefore contend with uncertainty. This stochastic uncertainty manifests as variance—i.e., noise—affecting every sample taken during rendering.

Fig. 1:

The field of MC denoising originated from the idea of applying traditional image filters from the image-processing community to reduce noise in rendering. These filters essentially average multiple estimates (i.e., pixels) in image space, thereby decreasing variance. This approach can be highly effective in situations where the estimates are the same (or similar) in the limit, i.e., in regions of an image that show smooth color gradients. However, in cases of significant difference, such as along feature edges, filtering potentially introduces unwanted bias, often apparent as excessive blurring or color bleeding.

Denoising therefore must achieve a good bias–variance trade-off. In contrast to traditional image filtering, recent denoising approaches rely on neural networks to learn a mapping from a noisy input image, along with auxiliary information (in the form of so-called G-buffers containing low-noise ground-truth scene information), to a low-error approximation of the desired image.

Departing from these pretrained machine learning approaches, in this paper, we propose a general statistical framework for denoising in MC rendering. Specifically, we establish a theoretical connection between minimizing mean squared error (MSE) for pair-wise symmetric weights and Welch’s t-test for normally distributed samples. Using more recent results from the statistics literature, we then generalize this approach to reduce the assumptions on the distributions. As a practical implementation of this framework, we demonstrate an image-space denoising scheme, building upon the well-known joint bilateral filter. Our approach tracks online statistics of the per-pixel samples generated by a state-of-the-art MC renderer. These statistics describe the distribution of samples and allow us to draw statistically valid conclusions about which estimates among neighboring pixels should be combined in order to achieve a near-optimal trade-off between noise and bias. Our statistics-based filter effectively avoids blurring over legitimate features, including lighting effects (such as shadows or caustics) that are not present in the G-buffers (and would therefore be over-blurred by existing denoisers). All these statistics can be estimated online during rendering. Our filter runs on the GPU, taking around 30 ms for 1280 × 720 resolution images on commodity hardware.

Apart from image denoising, we demonstrate that our approach can generalize to other applications, such as Russian roulette (RR) path termination and multiple importance sampling (MIS). Our general approach has an important advantage relating to these generalizations: We do not rely on preexisting training data, therefore we can produce low-error estimates of any quantity of interest, not only the radiance produced by the renderer. For instance, we denoise the per-bounce incident radiance in our RR example or estimates for which sampling strategy outperforms the others (formalized as win rates) during MIS. For both quantities, obtaining sufficient training data for neural denoising approaches would be quite challenging.

In summary, we present the following main contributions:

•

a general statistical framework that puts MC rendering into a statistical context, alongside a theoretical analysis on minimizing mean squared error in the setting of pair-wise testing,

•

an image-space implementation of this framework that produces low-error estimates for any quantity of interest in the context of MC rendering,

•

and applications of this denoising approach to standard image denoising, RR path termination, and MIS.

We provide additional materials (e.g., source code) for our paper at https://www.cg.tuwien.ac.at/StatMC.

2 Related Work

The stochastic nature of Monte Carlo (MC) rendering introduces error in the form of variance. To reduce this error, one can either compute more samples or use some form of noise reduction, e.g., adaptive sampling or filtering [Huo and Yoon 2021; Zwicker et al. 2015]. A common approach is a posteriori noise reduction or denoising, which operates on the samples generated by MC rendering. Many methods work with (image-space) pixel estimates and range from classical denoising approaches [Overbeck et al. 2009; Xu and Pattanaik 2005] to more recent neural-network-based methods [Back et al. 2022; Firmino et al. 2022].

Classical approaches usually adapt and apply some form of filter while trying to find a balance between noise reduction and introduced bias (usually apparent in the form of excessive blurring). Many use image-filtering kernels, e.g., Gaussian [Rousselle et al. 2011], (joint) bilateral [Li et al. 2012; Liu et al. 2018b; Mara et al. 2017; Park et al. 2013; Rousselle et al. 2013; Sen and Darabi 2012; Xu and Pattanaik 2005; Zheng and Liu 2018], non-local means [Delbracio et al. 2014; Moon et al. 2013; Rousselle et al. 2012; Vicini et al. 2019], wavelet [Dammertz et al. 2010; Kalantari and Sen 2013; Overbeck et al. 2009; Schied et al. 2017; 2018], or non-local Bayes [Boughida and Boubekeur 2017] filters. Other methods build on diffusion [McCool 1999] or higher-order regression [Bauszat et al. 2011; Bitterli et al. 2016; Liu et al. 2018a; Moon et al. 2014; 2015; 2016; Yuan and Zheng 2017; 2018]. In addition to the noisy input image, many of the mentioned approaches incorporate additional auxiliary features like albedos and normals, as well as additional statistics like the corresponding variances. Among these, joint bilateral filtering with auxiliary features is most closely related to our denoising approach.

In a similar direction to our statistics-based approach, Sen and Darabi [2012] presented a method called random parameter filtering (RPF), which uses histograms of sample vectors to calculate the mutual information between scene features and the random parameters of the MC process in order to adjust the weights of a joint bilateral filter and reduce the noise stemming from these random parameters. Even though their approach achieves good results at low sample counts, its computation becomes prohibitively expensive with an increasing number of samples. This downside was addressed by Park et al. [2013] by interpolating sparsely computed mutual information, which reduces but not fully eliminates the scaling issues for high sample counts and requires several seconds or multiple minutes to denoise a low-SPP image. In contrast, our approach is completely independent of the number of samples and requires only a few milliseconds to compute. A similar idea is to represent sample distributions as histograms [Boughida and Boubekeur 2017; Delbracio et al. 2014]. However, histograms require additional memory and their accuracy depends on the number and sizing of bins, which must be chosen a priori. In contrast, we only use a few statistical moments to compactly store information about sample distributions, which can also be efficiently updated with each new sample.

Li et al. [2012] surpass the denoising quality of RPF by using Stein’s unbiased risk estimator (SURE) to select which one out of a discrete set of joint bilateral filters (of different spatial scales) minimizes the error. Similarly, Rousselle et al. [2013] use SURE to select among filters that differ in robustness to noise and sensitivity to image details. Following this general idea of “meta”-approaches, Zheng et al. [2021] combine multiple denoisers, whereas Firmino et al. [2022] apply denoising only when beneficial for convergence. In contrast, our approach does not have the overhead of computing multiple filters, or corresponding derivatives for error estimation, before the final reconstruction. We instead derive filter kernels directly from the pixel statistics and use statistical testing to ensure that only estimates with similar distributions are combined, leading to a fast one-pass approach, which could also improve the results of meta-denoisers by acting as an additional base input.

Mara et al. [2017] presented a real-time approach using a pipeline consisting of several joint bilateral filter stages, and showed results comparable to non-local means (NLM) [Rousselle et al. 2012] and regression-based approaches [Bitterli et al. 2016; Moon et al. 2014]. Their method is specifically tailored for real time and builds on a custom renderer with special handling of direct and indirect illumination. Our approach uses a more straightforward pipeline, which can be easily integrated into any conventional MC renderer with little overhead while offering similar performance during filtering.

Similar to ours, some methods use confidence intervals [Moon et al. 2013; 2016; Sen and Darabi 2012] to exclude pixels from combination during filtering if their corresponding statistics differ significantly. These approaches use confidence intervals in specific subparts of their fairly complex pipelines, which are tailored specifically to MC image denoising. Similar statistical concepts were used by Back et al. [2023], who first applied the concept of uncorrelated statistics to denoising MC renderings. In contrast, we use statistical tests at the core of our efficient denoising pipeline, including higher-order central moments to relax the requirement of normally distributed samples implied by commonly used confidence intervals.

Neural-network-based approaches represent the current state of the art in MC denoising, e.g., NVIDIA OptiX AI-Accelerated Denoiser (OAADN), which is based on Chaitanya et al.’s work [2017], or Intel Open Image Denoise (OIDN) [Áfra 2024]. Both approaches take a noisy image and so-called G-buffers (such as normals or albedos) as inputs. The neural networks then map those inputs to a low-error approximation of the ground truth. For this purpose, the networks have to be trained on numerous input–reference image pairs, so that an effective mapping can be established. Reconstructing image features that suffer from high variance and are not present in the G-buffers, such as lighting effects, is a difficult problem. In these cases, the networks must rely solely on the noisy image to differentiate between legitimate features and noise. Furthermore, the performance of such approaches hinges on the training and available data. Generally, there are no guarantees concerning their convergence behavior. For instance, OIDN [Áfra 2024] includes images at different degrees of convergence into the training to incorporate a notion of convergence. This stands in contrast to several non-neural methods that provide consistency [Back et al. 2023; Bitterli et al. 2016; Moon et al. 2013; Rousselle et al. 2013]. To alleviate this fundamental problem, Firmino et al. [2022] train an additional neural network that predicts per-pixel mixing factors to reduce the weight of the biased reconstruction in favor of the converging input. They use confidence intervals to ensure convergence by limiting the neural mixing weights such that the biased contribution vanishes as the variance decreases with increased sample counts.

Our approach, in contrast, does not require any training, as it directly uses the statistics gathered during rendering to infer relationships between pixels for filtering. Furthermore, we provide a theoretical guarantee concerning the consistency of our filter. We also show in our results that we can denoise quantities other than the radiance reaching the camera. In this paper, we always keep all filtering parameters constant as the number of samples per pixel (SPP) increases to demonstrate the convergence behavior of our method. In the future, automatic parameter tuning as done in neural denoising or integration of our method into meta-denoising strategies could further improve denoising performance.

3 Background and Notation

In this section, we summarize the statistical concepts and methods on which we build our general statistical framework in §4 and establish the connection to the common denoising problem in Monte Carlo rendering. As a starting point, we consider the well-known rendering integral [Kajiya 1986], where each pixel I_i in an image is formed as

\begin{equation} {I_i} = \int _{{\Omega _i}} {{W_i}(\omega)L^{\prime }(\mathbf {x},\omega)\;d\omega }, \end{equation}

(1)

where W_i describes the sensor response and L′ is the radiance incident on the i-th pixel, which covers the solid angle Ω_i from the camera location x. This radiance must satisfy Kajiya’s rendering equation throughout a virtual scene. Each pixel intensity is commonly approximated by Monte Carlo (MC) integration:

\begin{equation} {I_i} \approx \frac{1}{{{n_i}}}\sum \nolimits _{k=1}^{n_i} {\frac{{{f_k}}}{{{p_k}}}}, \end{equation}

(2)

where f_k refers to evaluations of the integrand in Eq. (1) for n_i randomly drawn samples from Ω_i with probability density p_k.

We now move on to a more abstract statistical interpretation of the rendering process outlined above. In particular, we interpret each pixel as a statistical estimator \({\hat{\theta }_i(X_1,\dots ,X_{n_i})}\) for the (unknown) ground-truth value θ_i (in this case I_i, although the same idea applies to other estimators), where we view each sample X_k as a random variable (i.e., a distribution from which the rendered sample f_k/p_k is drawn). We generally assume unbiased estimators, i.e., \(\mathop{\mathbb {E}}[\hat{\theta }_i] = \theta _i\).

Using results from descriptive statistics, our core idea is to formulate statistical tests to decide which estimators are sufficiently similar so that they can be combined during denoising. In general, we employ the following descriptive statistics of the sample distribution: mean (μ), and the central moments of orders 2–4 (M_l, l ∈ 2, 3, 4), or their standardized variants, variance (σ² = M₂), skewness (\(M_3 / M_2^{3/2}\)), and kurtosis (\(M_4 / M_2^2\)) [Kenney and Keeping 1951]. The central moments are defined as

\begin{equation} {M_l} = \frac{1}{{{n_i}}}\sum \nolimits _k {{{({X_k} - \bar{X})}^l}}, \end{equation}

(3)

where \(\bar{X}\) is the sample mean. Crucially, all these statistics can be computed online [Meng 2015], building on the classical result by Welford [1962], i.e., by updating them one sample at a time during rendering without having to store all samples in memory. In our implementation, we use central moments up to the third order.

3.1 Denoising Estimators

In this work, we consider general denoising filters that construct the denoised estimator \(\tilde{\theta }_j\) as a convex combination of (noisy) input estimators \(\lbrace \hat{\theta }_i\rbrace\) :

\begin{equation} \tilde{\theta }_j = \sum \nolimits _{i} w_{ij} \hat{\theta }_i, \end{equation}

(4)

where w_ij is the weight assigned to estimator \(\hat{\theta }_i\); all weights must be non-negative, and \(\sum \nolimits _i w_{ij} = 1\). The key question now is: how to find appropriate (sparse) weights w_ij? In our approach, we split these weights into two parts, Eq. (10): a base filter, such as the well-known joint bilateral filter, and a novel pair-wise statistical membership function m_ij, which decides which combinations of estimators are admissible.

Formally, the mean squared error (MSE) of the combined estimator \(\tilde{\theta }_j\) can be decomposed into variance and bias

\begin{equation} \text{MSE}(\tilde{\theta }_j, \theta _j) = \mathop{\mathbb {E}}[(\tilde{\theta }_j - \theta _j)^2] = \text{Var}(\tilde{\theta }_j) + \text{Bias}(\tilde{\theta }_j, \theta _j)^2, \end{equation}

(5)

where θ_j refers to the estimand, i.e., the unknown ground-truth value for the j-th estimator. Note that even though we assume unbiased estimators \(\hat{\theta }_j\), their filtered counterparts \(\tilde{\theta }_j\) generally contain some bias. The variance and bias are given by

\begin{align} \text{Var}(\tilde{\theta }_j) & = \sum \nolimits _i w_{ij}^2 \; \text{Var}(\hat{\theta }_i(n_i)), \ \text{and} \end{align}

(6)

\begin{align} \text{Bias}(\tilde{\theta }_j, \theta _j) & = \sum \nolimits _i w_{ij} \; \text{Bias}(\hat{\theta }_i(n_i), \theta _j), \end{align}

(7)

where \(\text{Bias}(\hat{\theta }_i, \theta _j) = \mathop{\mathbb {E}}[\hat{\theta }_i]-\theta _j\).

The overall goal of MC denoising in this context is to find weights that minimize the total MSE, or a similar error metric, i.e., achieve an optimal trade-off between variance (noise) and bias, across all denoised estimators (i.e., the whole image):

\begin{equation} \lbrace w_{ij}^*\rbrace = \mathop{arg\,min}_{\lbrace w_{ij}\rbrace } \sum \nolimits _j \text{MSE}(\tilde{\theta }_j, \theta _j). \end{equation}

(8)

However, this optimization problem cannot be addressed directly, as the ground-truth values are of course unknown, and we must therefore work with noisy estimates of variance and bias.

4 Statistical Filtering Framework

In this section, we develop our statistical filtering framework. Considering an abstract set of estimators \(\lbrace \hat{\theta }_i\rbrace\), we ask under which conditions combining a subset of these estimators improves mean squared error. Here, we approach the inherent noise-to-bias trade-off from a statistical perspective, formulating a membership function (m), fulfilling the requirements summarized in §4.1. We show how this approach relates to hypothesis testing in statistics, before describing the specifics of our image-space implementation in §5.

4.1 Problem Statement

Considering the general denoising problem introduced in §3, we formulate membership functions that decide which combinations of estimators are admissible during denoising, i.e., under which conditions the combination is likely to improve image quality. In particular, we require the following important properties:

(a) Pair-wise evaluation: for any pair of estimators \((\hat{\theta }_i, \hat{\theta }_j)\), m must be defined as a function of these estimators’ statistics only, i.e., \({m_{ij} = m(\mathcal {S}({\hat{\theta }_i}), \mathcal {S}({\hat{\theta }_j}))}\), where \(\mathcal {S}({\hat{\theta }_i})\) denotes descriptive sample statistics, such as \(\mu _i, \sigma ^2_i, M_{3,i}, M_{4,i}, \ldots\), of estimator \(\hat{\theta }_i\).

(b) Online statistics: all these statistics \(\mathcal {S}\) must be computable by an online algorithm, i.e., by updating their value one sample at a time during Monte Carlo (MC) rendering. Together with the pair-wise property, this requirement ensures that all components of our filtering pipeline can be implemented efficiently in terms of both parallel execution and memory usage.

(c) Symmetry: we require that m_ij = m_ji. In practice, symmetry of weights enforces energy preservation during filtering. Relaxing this requirement sometimes produces visually more pleasing results around bright outliers (“fireflies”) at the cost of losing some overall brightness and possibly slightly higher mean squared error (MSE).

(d) Convergence: In order to guarantee convergence of the denoised result \(\tilde{\theta }_i \rightarrow \theta _i \; \forall i\) with increasing sample size, we require that the membership function satisfies

\begin{equation} \text{Var}(\hat{\theta }_i) \rightarrow 0 \Rightarrow m_{ij} = 0 \text{ if } \Vert \theta _i - \theta _j\Vert \gt 0. \end{equation}

(9)

In other words, as the variance approaches zero, the membership function must exclude estimator j from the combination with estimator i if there is any difference in their estimands, which would introduce bias.

(e) Identity : An estimator does not introduce additional bias to itself; we therefore set m_ii = 1 by definition.

Finally, we can state our main problem as finding membership functions m_ij that satisfy properties (a–e) and deliver the best possible variance–bias trade-off within these constraints.

4.2 Our Approach

Here, we describe our general framework for formulating statistics-based membership functions respecting the aforementioned requirements. We then use these functions to address the denoising problem, Eq. (8), by defining the filter weights in Eq. (4), as follows:

\begin{equation} \boxed{ w_{ij} = \frac{\rho _{ij} \; m_{ij}}{\sum \nolimits _i\rho _{ij} \; m_{ij}}. } \end{equation}

(10)

The first component of the weights, ρ_ij, provides the option to integrate existing filters based on a priori available information into our system. The main reasons for doing so are: (1) to limit the size of the filter kernel, thereby improving the runtime performance by limiting the amount of required membership-function evaluations; (2) to be able to build upon well-known existing approaches; and (3) to include additional low-noise a priori information available from the renderer (G-buffers, e.g., normals or albedos in image space). Note that while a priori information is often useful to preserve some features due to geometry or textures, it is inherently useless for other features that only manifest themselves through sampling (e.g., shadows or caustics). In this regard, our membership function and this a priori weight complement each other, incorporating both empirical and a priori information. In our implementation (§5), we set ρ_ij to represent a joint bilateral filter.

Existing filters are highly effective at reducing variance. The key task of the membership function m_ij is therefore to limit bias by excluding estimators from the filter whose estimands differ significantly. Following the requirements stated in §4.1, we now consider the MSE minimization problem, Eq. (8), for a pair of input estimators \(\hat{\theta }_i, \hat{\theta }_j\). These estimators are combined with weight w into denoised estimators \(\tilde{\theta }_i = w\hat{\theta }_i + (1-w) \hat{\theta }_j\) and \(\tilde{\theta }_j = w\hat{\theta }_j + (1-w) \hat{\theta }_i\). Note that we enforce symmetry of the weighting here, w_ij = w_ji = (1 − w). Assuming unbiased input estimators, the bias of their combination is \(\text{Bias}(\tilde{\theta }_i,\theta _i) = {\mathop{\mathbb {E}}[\tilde{\theta }_i-\theta _i]} = {(w-1) \theta _i + (1-w) \theta _j}\), and analogously for \(\text{Bias}(\tilde{\theta }_j,\theta _j)\). Minimizing the sum of the error terms, Eq. (5), for both denoised estimators then reads:

\begin{equation} \begin{aligned} w^* = {\mathop{arg\,min}}_{w} \; (\;& w^2 \text{Var}(\hat{\theta }_i) &&+ (1-w)^2 \text{Var}(\hat{\theta }_j) &+ \\ & (\; (w-1) \theta _i &&+ (1-w) \theta _j\;)^2 &+ \\ & w^2 \text{Var}(\hat{\theta }_j) &&+ (1-w)^2 \text{Var}(\hat{\theta }_i) &+ \\ & (\; (w-1) \theta _j &&+ (1-w) \theta _i\;)^2 &). \end{aligned} \end{equation}

(11)

For this objective, we can find the optimum by setting the derivative with respect to the weight to zero and solving for w^* (assuming at least some uncertainty, i.e., \(\text{Var}(\hat{\theta }_i) + \text{Var}(\hat{\theta }_j) \gt 0\)):

\begin{equation} w^* = \frac{2(\theta _i - \theta _j)^2 + \text{Var}(\hat{\theta }_i) + \text{Var}(\hat{\theta }_j) }{2\left((\theta _i - \theta _j)^2 + \text{Var}(\hat{\theta }_i) + \text{Var}(\hat{\theta }_j)\right)}. \end{equation}

(12)

Note, however, that we do not know ground-truth values, or estimator variances, and therefore work with (noisy) estimates of these quantities instead, producing noisy results for w^*. Consequently, the trivial choice of setting m_ij = 1 − w^* and ρ_ij = 1 does not yield satisfactory results. Our approach is therefore to use an existing smoothing filter for ρ_ij and enforce a binary membership function such that

\begin{equation} m_{ij} = 1 \text{ if } (1- w^*) \gt \gamma , \; 0 \text{ otherwise}. \end{equation}

(13)

Here, γ is a threshold that determines how discriminative the membership function is. In this way, we effectively use the pair-wise optimal weight as a test statistic (similar to those used in statistical hypothesis testing) and prevent the estimators from being filtered together if the introduced bias (i.e., the difference of their means) is too large relative to the sum of their variances. In our supplementary document, we show that this test is equivalent to Welch’s t-test, t < γ_w with \(\gamma _w = \sqrt {1/(2\gamma) - 1}\). Our results use a critical value from Student’s t-distribution of γ_w = t_{1 − α/2, ν}, with the significance level α = 0.005 and ν = n_i + n_j − 2, i.e., the upper bound for the degrees of freedom approximated by the Welch–Satterthwaite equation [1946].

Welch’s t-test is related to the confidence interval for the difference of two normally distributed means, see, for example, Eq. (18) in the paper by Curto [2023]. Consequently, we can relax the normality assumption by choosing a different confidence-interval formulation from the literature. Various formulations are available that consider not only the sample variance but also higher-order statistics, such as skewness, of the sample distribution. In our results, we find that the correction of the means proposed by Curto [2023], Eq. (20) there, yields good denoising behavior when combined with a Box-Cox transformation of the samples as detailed in §5.

As an orthogonal extension, we note that allowing asymmetric membership functions can produce visually more appealing results; while fewer constraints can also lead to improved MSE, the potential energy loss due to asymmetric weights may cause overall darker images and therefore slightly worse quantitative errors (Fig. 6 g). Relaxing the symmetry assumption can be easily done by removing the second error term (last two lines) from Eq. (11). In this case, the optimal weight simplifies to

\begin{equation} w^*_\text{asym} = \frac{(\theta _i-\theta _j)^2 + \text{Var}(\hat{\theta }_j)}{(\theta _i-\theta _j)^2 + \text{Var}(\hat{\theta }_i) + \text{Var}(\hat{\theta }_j)}. \end{equation}

(14)

As before, we then replace the unknown ground-truth values with estimates based on the observed sample statistics, using either normality assumption or Curto’s correction, to evaluate the membership function according to Eq. (13).

5 Application to Image-Space Denoising

This section describes how to apply our general statistical framework to image-space denoising. In image space, the indices i and j each denote an individual pixel (not image-space coordinates). In particular, our implementation contains three major components. First, we track online statistics of the samples from the MC renderer, Alg. 1 ; this part is implemented as part of the renderer itself (we use pbrt-v3 [Pharr et al. 2016] for our results). Second, we select a joint bilateral filter to define the a priori weight ρ_ij in Eq. (10). Finally, we implement our statistical denoising pipeline, which takes the noisy image, sample statistics, and a priori weights as input to compute the filter weights, Alg. 2, and produce the final output image (we implement this part on the GPU using CUDA).

Fig. 2:

Fig. 3:

Fig. 4:

Fig. 5:

In MC rendering, we work with (weighted) radiance samples that cannot be negative, while a few paths may result in large contributions to a pixel’s intensity. In statistical terms, the distribution of samples tends to be right-skewed. Given these observations, we employ the widely used Box-Cox transformation [Box and Cox 2018] to “normalize” the rendered samples:

\begin{equation} x^{\prime }_k(\lambda) = \left\lbrace \begin{array}{ll}\text{log}(x_k) & \text{if}\, \lambda = 0,\\ (x_k^\lambda - 1)/{\lambda } & \text{otherwise}.\\ \end{array}\right. \end{equation}

(15)

We note that for samples generated by MC rendering, choosing λ = 0 is impractical, as many samples may be zero (e.g., paths that do not reach a light source before termination), where the log function is undefined. In our experiments, we find that λ = 1/2 yields good results in practice, as it effectively “compresses” high-valued outliers while avoiding excessive “stretching” of small values toward − ∞. For each transformed sample \(x^{\prime }_k\) arriving from the renderer, we then update the online statistics following Alg. 1 .

As a base filter, we choose the joint bilateral filter [Eisemann and Durand 2004; Petschnigg et al. 2004] to define the a priori weights ρ_ij for any pair of pixels (i, j). These weights follow a Gaussian falloff with distance in combined image space and G-buffers:

\begin{equation} \rho _{ij} = \operatorname{exp}(-\frac{1}{2} (\mathbf {p}_j - \mathbf {p}_i)^\textsf {T} \Sigma ^{-1} (\mathbf {p}_j - \mathbf {p}_i)), \end{equation}

(16)

where p_i denotes the a priori information for pixel i. We use image-space position, RGB albedo color, and surface normal for each pixel, i.e., \(\mathbf {p}_i = (x_i,y_i, r_i,g_i,b_i, n_{ix},n_{iy},n_{iz})^\textsf {T}\), in our results. Moreover, Σ denotes a covariance matrix that controls the rate of falloff for each dimension of p_i. To enforce the sparsity of these weights, we limit the filter to a small neighborhood within a constant radius around each pixel in image space; our default covariance matrix is then Σ = diag(10, 10, 0.02, 0.02, 0.02, 0.1, 0.1, 0.1).

In practice, our statistical filter operates with three color channels per pixel in RGB color space and evaluates the membership function for each channel according to Alg. 2 . We set the final weight w_ij to zero unless all three channels pass the statistical test before applying the filter. In this way, we avoid color shifts that could occur if a channel is evaluated differently from the others.

In summary, our approach builds on three key insights: First, minimizing pair-wise MSE is closely related to Welch’s t-test; this theoretical contribution forms the basis of our method. Second, all collected statistics are noisy estimates; enforcing binary membership functions prevents most of this noise from propagating downstream. Finally, Box-Cox transformation, as well as correcting the mean [Curto 2023; Johnson 1978], makes our method more robust to non-normality, effectively mitigating the influence of outliers (“fireflies”).

The parameters of our method affect the results as follows: The threshold γ, Eq. (13), adjusts denoising strength. For symmetric weights, γ = 0 reverts to the base filter, γ = 1/2 and non-zero variance yields m_ij = 0 (when i ≠ j), effectively disabling filtering. The covariance matrix in Eq. (16) specifies the extent of the filtering window, depending on distances in image space and differences in G-buffer values; higher values generally increase the number of membership evaluations and lead to smoother results. The Box-Cox parameter λ determines how samples are transformed: λ = 1 keeps their distribution shape unchanged, smaller values correct right-skewed distributions (reducing positive outliers), larger values correct left skewness.

6 Results I — Image Denoising

In this section, we compare our statistical denoiser to state-of-the-art machine-learning-based denoisers (NVIDIA OptiX AI-Accelerated Denoiser and Intel Open Image Denoise), as well as a meta-denoiser [Firmino et al. 2022], on multiple well-known test scenes [Bitterli 2016] in Figs. 1–4. Figure 5 shows a quantitative convergence analysis for these examples. Note that without useful information from the G-buffers (e.g., objects seen in the mirror in Fig. 2), neural methods often fail to reconstruct sharp features. We also compare our membership formulation to the one-sided confidence-interval test proposed by Moon et al. [2013], using the \(99.8\%\) confidence level as proposed in their paper. Here, we compare the membership functions directly without applying their two-step denoising process.

Fig. 6:

We also analyze the effect of various components and parameters of our method in Fig. 6. Our statistical membership function greatly contributes to preserving the hard edges of shadows, which would be over-blurred by the base filter, as such lighting effects are not present in the G-buffers. Note how the G-buffers and our membership functions complement each other in reconstructing image features (Figs. 6 b and c). The Box-Cox transformation further increases performance by “normalizing” the sample distributions.

All results were rendered and denoised on a desktop PC with an AMD Ryzen 9 5950X CPU and an NVIDIA RTX 3080 Ti GPU. Our filtering runtimes are generally comparable to the duration of the inference step of Intel Open Image Denoise (OIDN) and substantially faster than progressive denoising [Firmino et al. 2022]. We set the base-filter radius to a relatively large size of 20 pixels (mostly for 1280 × 720 resolution images). Reducing the radius would further speed up our method, as the number of membership tests scales quadratically with the radius: while for 20 pixels, we require around 28 ms, for 6 pixels, we only require around 11 ms (Fig. 6 d). Note that using the base filter alone would lead to noticeable over-blurring (Fig. 6 b).

7 Results II — Further Applications

The generality of our denoising framework allows us to go beyond simply denoising an image produced by MC rendering. In particular, we see great potential in leveraging the ability to quickly denoise arbitrary statistics of arbitrary quantities tracked during path tracing. This section presents two such applications: including per-bounce approximate path contributions in Russian roulette (RR), as well as an extension to multiple importance sampling (MIS) using MIS strategy win rates.

7.1 Approximate-Contribution Russian Roulette

Here, we extend classic throughput-based RR for path tracing. Instead of determining the path termination probability based on just the camera-centric throughput, we track and denoise the average incoming radiance for a particular bounce index per pixel. The termination probability at each bounce then considers the product of the path throughput and this average incoming radiance. Our goal is to improve termination probabilities for each path individually; thus, we divide the product by the average incoming radiance at the corresponding pixel of the camera.

Formally, the per-bounce incident radiance is defined as

\begin{equation} L^{\prime }_{j,k} = \int _{\mathbf {x} \in \Xi } p_{j,k}(\mathbf {x}) \int _{\omega _i \in \Omega } L_i(\mathbf {x}, \omega _i) \mathop {}\!\mathrm{d}\omega _i \mathop {}\!\mathrm{d}\mathbf {x}, \end{equation}

(17)

where Ξ is world space and p_{j, k}(x) the probability of sampling a position x within that space for a given pixel j and bounce index k. In practice, we do not need to explicitly evaluate p_{j, k}(x) as this probability is inherent to the path-tracing process. In fact, to estimate \(L^{\prime }_{j,k}\), we just have to average all incoming-radiance samples (over all paths) for a given pixel and bounce index, which can be easily integrated into common path-tracing code.

We then apply an iterative process during rendering: In each iteration, we track new incoming-radiance samples generated during path tracing, progressively refining our estimate of \(L^{\prime }_{j,k}\), Eq. (17), and the corresponding statistics required for denoising. At the end of each iteration, we denoise these estimates as described in §5. From the second iteration onward, where denoised estimates are available, we use modified path termination probabilities according to these radiance estimates.

Fig. 7:

Fig. 8:

Figure 7 shows improved image errors for our approximate-contribution version (ACRR) in an equal-time comparison over two variants of classic RR taking only path throughput into account. Classic RR often terminates only from a certain (e.g., fifth) bounce onward to avoid prematurely terminating high-contribution paths. Because of our improved path-contribution approximation, ACRR does not rely on this work-around. Rath et al. [2022] show that optimal weights for RR can be approximated using a spatio-directional data structure. While our method cannot achieve their sampling efficiency, we illustrate how our denoising framework, based on regular image buffers, can be used as an alternative. We provide further details and results in the supplementary document.

7.2 Selective Multiple Importance Sampling

As our second example application, we present an extension to (multi-sample) multiple importance sampling (MIS) [Veach 1997; Veach and Guibas 1995]. The basic idea of MIS is to combine multiple sampling strategies by a weighted average, depending on the probability density function (PDF) of each strategy. The sampling strategies typically cover different features of an integrand, for instance, via bidirectional-scattering-distribution-function (BSDF) or direct-light sampling. However, in situations where one sampling strategy is clearly better than the other(s), MIS can introduce additional variance (noise), especially when using the balance heuristic. Veach [1997] also developed the cut-off, power, and maximum heuristics to mitigate this issue, generally moving large weights closer to one and small weights closer to zero. Choosing MIS weights and sample allocations is non-trivial and an active area of research [Grittmann et al. 2019; Kondapaneni et al. 2019; Szirmay-Kalos and Sbert 2022], and the effectiveness of a strategy depends on the given situation. We illustrate how our framework can be used to selectively perform MIS only in beneficial cases, independently of the chosen strategy.

For this, we extend MIS by identifying strategies that should be disabled in order to save computational effort and reduce noise. The primary metric for this decision is the win rate, \(\eta _m = n_m^* / n_i\), of sampling strategy m, where a “win” (counted in \(n_m^*\)) is a non-zero sample whose PDF value of strategy m exceeds the PDF values of all other strategies. When determining this win rate—as in other MIS approaches—we evaluate all sampling strategies for each sample (not just the strategy that produced the sample), even if a strategy has already been disabled earlier. Similarly to the average radiance in the RR example described above, we estimate and denoise the win rates per pixel and per bounce. In contrast to radiance samples, we do not Box-Cox transform win rates. We again proceed iteratively, starting with all MIS strategies enabled, tracking and denoising win rates, and—from the second iteration onward—disabling inferior (η_m < 10^{− 3}) sampling strategies (per pixel and per bounce). Finally, note that disabling a sampling strategy generally does not introduce bias, because two unbiased estimates (a MIS estimate and an estimate from a single sampling strategy) can be naturally combined without introducing bias. Figure 8 shows example results and equal-time convergence behavior for combining bidirectional-scattering-distribution-function and direct-light sampling.

8 Discussion and Conclusion

We have presented a simple yet effective denoising method for Monte Carlo rendering. Using well-known image filters for variance reduction and statistical tests (membership functions) to prevent bias, we achieve state-of-the-art image quality. Our approach is entirely free of pretrained components, using descriptive statistics of the sample distributions instead that can be estimated online during rendering. The properties of our membership function guarantee convergence with increasing sample counts. At low sample counts, high variance dominates the image error. In this case, the membership function becomes less discriminating, thereby reducing that variance and accepting some bias. As rendering progresses and more samples are added, variance decreases and the bias becomes more relevant to the overall error. If bias is significant (relative to variance), we set the corresponding filter weight to zero, based on statistical tests, which essentially eliminates that source of bias. In the limit, as variance approaches zero, any bias is unacceptable and thus prevented by the membership function.

We have focused on binary membership functions in this work. While we derive these functions from optimal pair-wise weights, recall that these weights are noisy estimates in practice. Introducing a threshold and restricting to binary memberships effectively prevents this noise from propagating through the pipeline. We leave investigating continuous membership functions for future work.

In contrast to neural-network-based approaches, our method does not require any computationally expensive training and does not risk adding “wrong” details that were present in the training data but should not be present in the output image.

Including our method as an additional input to existing meta-denoisers, such as progressive denoising [Firmino et al. 2022] or ensemble denoising [Zheng et al. 2021] could in turn deliver another step up in image quality for these methods. Furthermore, we also see great potential for the applicability of our statistics-based approach to other types of estimators during MC rendering. In particular, variance estimates have been used for adaptive sampling [Rousselle et al. 2012], MIS [Grittmann et al. 2019], or path guiding [Rath et al. 2020] in the past. Efficiently denoising variance estimates with our framework could yield improved performance of these methods in the future.

Another promising avenue for future work is the extension of our method to the temporal domain: Conceptually, estimates at different points in time of an animation can be treated equivalently to estimates at different image-space positions. Including such estimates could improve denoising performance and temporal coherence for animation denoising.

Acknowledgments

We would like to thank Lukas Lipp for fruitful discussions, Károly Zsolnai-Fehér and Jaroslav Křivánek for valuable contributions to early versions of this work, and Bernhard Kerbl for help with our CUDA implementation. Moreover, we would like to thank the creators of the scenes we have used: Wig42 for “Wooden Staircase” (Fig. 1), “Grey and White Room” (Fig. S6), and “Modern Living Room” (Fig. S8); nacimus for “Bathroom” (Fig. 3, S5); NovaZeeke for “Japanese Classroom” (Fig. 4, 6); Beeple for “Zero-Day” (Fig. 8); Jay-Artist for “White Room” (Fig. S7); Mareck for “Contemporary Bathroom” (Fig. 2); Christian Freude for “Glass Caustics” (Fig. S10); and Benedikt Bitterli for “Veach Ajar” (Fig. 7, S2), “Veach MIS” (Fig. S4), and “Fur Ball” (Fig. S11). This work has received funding from the Vienna Science and Technology Fund (WWTF) project ICT22-028 (“Toward Optimal Path Guiding for Photorealistic Rendering”) and the Austrian Science Fund (FWF) project F 77 (SFB “Advanced Computational Design”).

Supplemental Material

PDF File

Supplementary document

Download
10.24 MB

ZIP File

Result images in PNG format

Download
115.57 MB

References

[1]

Attila T. Áfra. 2024. Intel^® Open Image Denoise. https://www.openimagedenoise.org.

Abstract

1 Introduction

2 Related Work

3 Background and Notation

3.1 Denoising Estimators

4 Statistical Filtering Framework

4.1 Problem Statement

4.2 Our Approach

5 Application to Image-Space Denoising

6 Results I — Image Denoising

7 Results II — Further Applications

7.1 Approximate-Contribution Russian Roulette

7.2 Selective Multiple Importance Sampling

8 Discussion and Conclusion

Acknowledgments

Supplemental Material

References

Index Terms

Recommendations

Neural Kernel Regression for Consistent Monte Carlo Denoising

Robust deep residual denoising for Monte Carlo rendering

Adversarial Monte Carlo denoising with conditioned auxiliary feature modulation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations