1 Introduction
Monte Carlo (MC) integration is ubiquitous in modern photorealistic rendering. The basic idea is to estimate an integral (pixel intensity) by a (weighted) average of random samples. However, compared to deterministic quadrature rules, MC integration is stochastic in nature and must therefore contend with uncertainty. This stochastic uncertainty manifests as variance—i.e., noise—affecting every sample taken during rendering.
The field of MC denoising originated from the idea of applying traditional image filters from the image-processing community to reduce noise in rendering. These filters essentially average multiple estimates (i.e., pixels) in image space, thereby decreasing variance. This approach can be highly effective in situations where the estimates are the same (or similar) in the limit, i.e., in regions of an image that show smooth color gradients. However, in cases of significant difference, such as along feature edges, filtering potentially introduces unwanted bias, often apparent as excessive blurring or color bleeding.
Denoising therefore must achieve a good bias–variance trade-off. In contrast to traditional image filtering, recent denoising approaches rely on neural networks to learn a mapping from a noisy input image, along with auxiliary information (in the form of so-called G-buffers containing low-noise ground-truth scene information), to a low-error approximation of the desired image.
Departing from these pretrained machine learning approaches, in this paper, we propose a general statistical framework for denoising in MC rendering. Specifically, we establish a theoretical connection between minimizing mean squared error (MSE) for pair-wise symmetric weights and Welch’s t-test for normally distributed samples. Using more recent results from the statistics literature, we then generalize this approach to reduce the assumptions on the distributions. As a practical implementation of this framework, we demonstrate an image-space denoising scheme, building upon the well-known joint bilateral filter. Our approach tracks online statistics of the per-pixel samples generated by a state-of-the-art MC renderer. These statistics describe the distribution of samples and allow us to draw statistically valid conclusions about which estimates among neighboring pixels should be combined in order to achieve a near-optimal trade-off between noise and bias. Our statistics-based filter effectively avoids blurring over legitimate features, including lighting effects (such as shadows or caustics) that are not present in the G-buffers (and would therefore be over-blurred by existing denoisers). All these statistics can be estimated online during rendering. Our filter runs on the GPU, taking around 30 ms for 1280 × 720 resolution images on commodity hardware.
Apart from image denoising, we demonstrate that our approach can generalize to other applications, such as Russian roulette (RR) path termination and multiple importance sampling (MIS). Our general approach has an important advantage relating to these generalizations: We do not rely on preexisting training data, therefore we can produce low-error estimates of any quantity of interest, not only the radiance produced by the renderer. For instance, we denoise the per-bounce incident radiance in our RR example or estimates for which sampling strategy outperforms the others (formalized as win rates) during MIS. For both quantities, obtaining sufficient training data for neural denoising approaches would be quite challenging.
In summary, we present the following main contributions:
•
a general statistical framework that puts MC rendering into a statistical context, alongside a theoretical analysis on minimizing mean squared error in the setting of pair-wise testing,
•
an image-space implementation of this framework that produces low-error estimates for any quantity of interest in the context of MC rendering,
•
and applications of this denoising approach to standard image denoising, RR path termination, and MIS.
2 Related Work
The stochastic nature of Monte Carlo (MC) rendering introduces error in the form of variance. To reduce this error, one can either compute more samples or use some form of noise reduction, e.g., adaptive sampling or filtering [Huo and Yoon
2021; Zwicker et al.
2015]. A common approach is a posteriori noise reduction or
denoising, which operates on the samples generated by MC rendering. Many methods work with (image-space) pixel estimates and range from classical denoising approaches [Overbeck et al.
2009; Xu and Pattanaik
2005] to more recent neural-network-based methods [Back et al.
2022; Firmino et al.
2022].
Classical approaches usually adapt and apply some form of filter while trying to find a balance between noise reduction and introduced bias (usually apparent in the form of excessive blurring). Many use image-filtering kernels, e.g., Gaussian [Rousselle et al.
2011], (joint) bilateral [Li et al.
2012; Liu et al.
2018b; Mara et al.
2017; Park et al.
2013; Rousselle et al.
2013; Sen and Darabi
2012; Xu and Pattanaik
2005; Zheng and Liu
2018], non-local means [Delbracio et al.
2014; Moon et al.
2013; Rousselle et al.
2012; Vicini et al.
2019], wavelet [Dammertz et al.
2010; Kalantari and Sen
2013; Overbeck et al.
2009; Schied et al.
2017;
2018], or non-local Bayes [Boughida and Boubekeur
2017] filters. Other methods build on diffusion [McCool
1999] or higher-order regression [Bauszat et al.
2011; Bitterli et al.
2016; Liu et al.
2018a; Moon et al.
2014;
2015;
2016; Yuan and Zheng
2017;
2018]. In addition to the noisy input image, many of the mentioned approaches incorporate additional auxiliary features like albedos and normals, as well as additional statistics like the corresponding variances. Among these, joint bilateral filtering with auxiliary features is most closely related to our denoising approach.
In a similar direction to our statistics-based approach, Sen and Darabi [
2012] presented a method called random parameter filtering (RPF), which uses histograms of sample vectors to calculate the mutual information between scene features and the random parameters of the MC process in order to adjust the weights of a joint bilateral filter and reduce the noise stemming from these random parameters. Even though their approach achieves good results at low sample counts, its computation becomes prohibitively expensive with an increasing number of samples. This downside was addressed by Park et al. [
2013] by interpolating sparsely computed mutual information, which reduces but not fully eliminates the scaling issues for high sample counts and requires several seconds or multiple minutes to denoise a low-SPP image. In contrast, our approach is completely independent of the number of samples and requires only a few milliseconds to compute. A similar idea is to represent sample distributions as histograms [Boughida and Boubekeur
2017; Delbracio et al.
2014]. However, histograms require additional memory and their accuracy depends on the number and sizing of bins, which must be chosen a priori. In contrast, we only use a few statistical moments to compactly store information about sample distributions, which can also be efficiently updated with each new sample.
Li et al. [
2012] surpass the denoising quality of RPF by using Stein’s unbiased risk estimator (SURE) to select which one out of a discrete set of joint bilateral filters (of different spatial scales) minimizes the error. Similarly, Rousselle et al. [
2013] use SURE to select among filters that differ in robustness to noise and sensitivity to image details. Following this general idea of “meta”-approaches, Zheng et al. [
2021] combine multiple denoisers, whereas Firmino et al. [
2022] apply denoising only when beneficial for convergence. In contrast, our approach does not have the overhead of computing multiple filters, or corresponding derivatives for error estimation, before the final reconstruction. We instead derive filter kernels directly from the pixel statistics and use statistical testing to ensure that only estimates with similar distributions are combined, leading to a fast one-pass approach, which could also improve the results of meta-denoisers by acting as an additional base input.
Mara et al. [
2017] presented a real-time approach using a pipeline consisting of several joint bilateral filter stages, and showed results comparable to non-local means (NLM) [Rousselle et al.
2012] and regression-based approaches [Bitterli et al.
2016; Moon et al.
2014]. Their method is specifically tailored for real time and builds on a custom renderer with special handling of direct and indirect illumination. Our approach uses a more straightforward pipeline, which can be easily integrated into any conventional MC renderer with little overhead while offering similar performance during filtering.
Similar to ours, some methods use confidence intervals [Moon et al.
2013;
2016; Sen and Darabi
2012] to exclude pixels from combination during filtering if their corresponding statistics differ significantly. These approaches use confidence intervals in specific subparts of their fairly complex pipelines, which are tailored specifically to MC image denoising. Similar statistical concepts were used by Back et al. [
2023], who first applied the concept of uncorrelated statistics to denoising MC renderings. In contrast, we use statistical tests at the core of our efficient denoising pipeline, including higher-order central moments to relax the requirement of normally distributed samples implied by commonly used confidence intervals.
Neural-network-based approaches represent the current state of the art in MC denoising, e.g., NVIDIA OptiX AI-Accelerated Denoiser (OAADN), which is based on Chaitanya et al.’s work [
2017], or Intel Open Image Denoise (OIDN) [Áfra
2024]. Both approaches take a noisy image and so-called G-buffers (such as normals or albedos) as inputs. The neural networks then map those inputs to a low-error approximation of the ground truth. For this purpose, the networks have to be trained on numerous input–reference image pairs, so that an effective mapping can be established. Reconstructing image features that suffer from high variance and are not present in the G-buffers, such as lighting effects, is a difficult problem. In these cases, the networks must rely solely on the noisy image to differentiate between legitimate features and noise. Furthermore, the performance of such approaches hinges on the training and available data. Generally, there are no guarantees concerning their convergence behavior. For instance, OIDN [Áfra
2024] includes images at different degrees of convergence into the training to incorporate a notion of convergence. This stands in contrast to several non-neural methods that provide consistency [Back et al.
2023; Bitterli et al.
2016; Moon et al.
2013; Rousselle et al.
2013]. To alleviate this fundamental problem, Firmino et al. [
2022] train an additional neural network that predicts per-pixel mixing factors to reduce the weight of the biased reconstruction in favor of the converging input. They use confidence intervals to ensure convergence by limiting the neural mixing weights such that the biased contribution vanishes as the variance decreases with increased sample counts.
Our approach, in contrast, does not require any training, as it directly uses the statistics gathered during rendering to infer relationships between pixels for filtering. Furthermore, we provide a theoretical guarantee concerning the consistency of our filter. We also show in our results that we can denoise quantities other than the radiance reaching the camera. In this paper, we always keep all filtering parameters constant as the number of samples per pixel (SPP) increases to demonstrate the convergence behavior of our method. In the future, automatic parameter tuning as done in neural denoising or integration of our method into meta-denoising strategies could further improve denoising performance.
3 Background and Notation
In this section, we summarize the statistical concepts and methods on which we build our general statistical framework in §
4 and establish the connection to the common denoising problem in Monte Carlo rendering. As a starting point, we consider the well-known rendering integral [Kajiya
1986], where each pixel
Ii in an image is formed as
where
Wi describes the sensor response and
L′ is the radiance incident on the
i-th pixel, which covers the solid angle
Ωi from the camera location
x. This radiance must satisfy Kajiya’s rendering equation throughout a virtual scene. Each pixel intensity is commonly approximated by Monte Carlo (MC) integration:
where
fk refers to evaluations of the integrand in Eq. (
1) for
ni randomly drawn samples from
Ωi with probability density
pk.
We now move on to a more abstract statistical interpretation of the rendering process outlined above. In particular, we interpret each pixel as a statistical estimator \({\hat{\theta }_i(X_1,\dots ,X_{n_i})}\) for the (unknown) ground-truth value θi (in this case Ii, although the same idea applies to other estimators), where we view each sample Xk as a random variable (i.e., a distribution from which the rendered sample fk/pk is drawn). We generally assume unbiased estimators, i.e., \(\mathop{\mathbb {E}}[\hat{\theta }_i] = \theta _i\).
Using results from descriptive statistics, our core idea is to formulate statistical tests to decide which estimators are sufficiently similar so that they can be combined during denoising. In general, we employ the following descriptive statistics of the sample distribution: mean (
μ), and the central moments of orders 2–4 (
Ml,
l ∈ 2, 3, 4), or their standardized variants, variance (
σ2 =
M2), skewness (
\(M_3 / M_2^{3/2}\)), and kurtosis (
\(M_4 / M_2^2\)) [Kenney and Keeping
1951]. The central moments are defined as
where
\(\bar{X}\) is the sample mean. Crucially, all these statistics can be computed
online [Meng
2015], building on the classical result by Welford [
1962], i.e., by updating them one sample at a time during rendering without having to store all samples in memory.
In our implementation, we use central moments up to the third order.3.1 Denoising Estimators
In this work, we consider general denoising filters that construct the denoised estimator
\(\tilde{\theta }_j\) as a convex combination of (noisy) input estimators
\(\lbrace \hat{\theta }_i\rbrace\) :
where
wij is the weight assigned to estimator
\(\hat{\theta }_i\); all weights must be non-negative, and
\(\sum \nolimits _i w_{ij} = 1\). The key question now is: how to find appropriate (sparse) weights
wij? In our approach, we split these weights into two parts, Eq. (
10): a base filter, such as the well-known joint bilateral filter, and a novel pair-wise statistical
membership function mij, which decides which combinations of estimators are admissible.
Formally, the mean squared error (MSE) of the combined estimator
\(\tilde{\theta }_j\) can be decomposed into variance and bias
where
θj refers to the
estimand, i.e., the unknown ground-truth value for the
j-th estimator. Note that even though we assume unbiased estimators
\(\hat{\theta }_j\), their filtered counterparts
\(\tilde{\theta }_j\) generally contain some bias. The variance and bias are given by
where
\(\text{Bias}(\hat{\theta }_i, \theta _j) = \mathop{\mathbb {E}}[\hat{\theta }_i]-\theta _j\).
The overall goal of MC denoising in this context is to find weights that minimize the total MSE, or a similar error metric, i.e., achieve an optimal trade-off between variance (noise) and bias, across all denoised estimators (i.e., the whole image):
However, this optimization problem cannot be addressed directly, as the ground-truth values are of course unknown, and we must therefore work with noisy estimates of variance and bias.
4 Statistical Filtering Framework
In this section, we develop our statistical filtering framework. Considering an abstract set of estimators
\(\lbrace \hat{\theta }_i\rbrace\), we ask under which conditions combining a subset of these estimators improves mean squared error. Here, we approach the inherent noise-to-bias trade-off from a statistical perspective, formulating a
membership function (
m), fulfilling the requirements summarized in §
4.1. We show how this approach relates to hypothesis testing in statistics, before describing the specifics of our image-space implementation in §
5.
4.1 Problem Statement
Considering the general denoising problem introduced in §
3, we formulate membership functions that decide which combinations of estimators are admissible during denoising, i.e., under which conditions the combination is likely to improve image quality. In particular, we require the following important properties:
(a) Pair-wise evaluation: for any pair of estimators \((\hat{\theta }_i, \hat{\theta }_j)\), m must be defined as a function of these estimators’ statistics only, i.e., \({m_{ij} = m(\mathcal {S}({\hat{\theta }_i}), \mathcal {S}({\hat{\theta }_j}))}\), where \(\mathcal {S}({\hat{\theta }_i})\) denotes descriptive sample statistics, such as \(\mu _i, \sigma ^2_i, M_{3,i}, M_{4,i}, \ldots\), of estimator \(\hat{\theta }_i\).
(b) Online statistics: all these statistics \(\mathcal {S}\) must be computable by an online algorithm, i.e., by updating their value one sample at a time during Monte Carlo (MC) rendering. Together with the pair-wise property, this requirement ensures that all components of our filtering pipeline can be implemented efficiently in terms of both parallel execution and memory usage.
(c) Symmetry: we require that mij = mji. In practice, symmetry of weights enforces energy preservation during filtering. Relaxing this requirement sometimes produces visually more pleasing results around bright outliers (“fireflies”) at the cost of losing some overall brightness and possibly slightly higher mean squared error (MSE).
(d)
Convergence: In order to guarantee convergence of the denoised result
\(\tilde{\theta }_i \rightarrow \theta _i \; \forall i\) with increasing sample size, we require that the membership function satisfies
In other words, as the variance approaches zero, the membership function must exclude estimator
j from the combination with estimator
i if there is any difference in their estimands, which would introduce bias.
(e) Identity : An estimator does not introduce additional bias to itself; we therefore set mii = 1 by definition.
Finally, we can state our main problem as finding membership functions mij that satisfy properties (a–e) and deliver the best possible variance–bias trade-off within these constraints.
4.2 Our Approach
Here, we describe our general framework for formulating statistics-based membership functions respecting the aforementioned requirements. We then use these functions to address the denoising problem, Eq. (
8), by defining the filter weights in Eq. (
4), as follows:
The first component of the weights,
ρij, provides the option to integrate existing filters based on a priori available information into our system. The main reasons for doing so are: (1) to limit the size of the filter kernel, thereby improving the runtime performance by limiting the amount of required membership-function evaluations; (2) to be able to build upon well-known existing approaches; and (3) to include additional low-noise a priori information available from the renderer (
G-buffers, e.g., normals or albedos in image space). Note that while a priori information is often useful to preserve some features due to geometry or textures, it is inherently useless for other features that only manifest themselves through sampling (e.g., shadows or caustics). In this regard, our membership function and this a priori weight complement each other, incorporating both empirical and a priori information. In our implementation (§
5), we set
ρij to represent a joint bilateral filter.
Existing filters are highly effective at reducing variance. The key task of the membership function
mij is therefore to limit bias by excluding estimators from the filter whose estimands differ significantly. Following the requirements stated in §
4.1, we now consider the MSE minimization problem, Eq. (
8), for a pair of input estimators
\(\hat{\theta }_i, \hat{\theta }_j\). These estimators are combined with weight
w into denoised estimators
\(\tilde{\theta }_i = w\hat{\theta }_i + (1-w) \hat{\theta }_j\) and
\(\tilde{\theta }_j = w\hat{\theta }_j + (1-w) \hat{\theta }_i\). Note that we enforce symmetry of the weighting here,
wij =
wji = (1 −
w). Assuming unbiased input estimators, the bias of their combination is
\(\text{Bias}(\tilde{\theta }_i,\theta _i) = {\mathop{\mathbb {E}}[\tilde{\theta }_i-\theta _i]} = {(w-1) \theta _i + (1-w) \theta _j}\), and analogously for
\(\text{Bias}(\tilde{\theta }_j,\theta _j)\). Minimizing the sum of the error terms, Eq. (
5), for both denoised estimators then reads:
For this objective, we can find the optimum by setting the derivative with respect to the weight to zero and solving for
w* (assuming at least some uncertainty, i.e.,
\(\text{Var}(\hat{\theta }_i) + \text{Var}(\hat{\theta }_j) \gt 0\)):
Note, however, that we do not know ground-truth values, or estimator variances, and therefore work with (noisy) estimates of these quantities instead, producing noisy results for
w*. Consequently, the trivial choice of setting
mij = 1 −
w* and
ρij = 1 does not yield satisfactory results. Our approach is therefore to use an existing smoothing filter for
ρij and enforce a binary membership function such that
Here,
γ is a threshold that determines how discriminative the membership function is. In this way, we effectively use the pair-wise optimal weight as a test statistic (similar to those used in statistical hypothesis testing) and prevent the estimators from being filtered together if the introduced bias (i.e., the difference of their means) is too large relative to the sum of their variances. In our supplementary document, we show that this test is equivalent to Welch’s t-test,
t <
γw with
\(\gamma _w = \sqrt {1/(2\gamma) - 1}\). Our results use a critical value from Student’s t-distribution of
γw =
t1 − α/2, ν, with the significance level
α = 0.005 and
ν =
ni +
nj − 2, i.e., the upper bound for the degrees of freedom approximated by the Welch–Satterthwaite equation [
1946].
Welch’s t-test is related to the confidence interval for the difference of two normally distributed means, see, for example, Eq. (18) in the paper by Curto [
2023]. Consequently, we can relax the normality assumption by choosing a different confidence-interval formulation from the literature. Various formulations are available that consider not only the sample variance but also higher-order statistics, such as skewness, of the sample distribution. In our results, we find that the correction of the means proposed by Curto [
2023], Eq. (20) there, yields good denoising behavior when combined with a Box-Cox transformation of the samples as detailed in §
5.
As an orthogonal extension, we note that allowing asymmetric membership functions can produce visually more appealing results; while fewer constraints can also lead to improved MSE, the potential energy loss due to asymmetric weights may cause overall darker images and therefore slightly worse quantitative errors (Fig.
6 g). Relaxing the symmetry assumption can be easily done by removing the second error term (last two lines) from Eq. (
11). In this case, the optimal weight simplifies to
As before, we then replace the unknown ground-truth values with estimates based on the observed sample statistics, using either normality assumption or Curto’s correction, to evaluate the membership function according to Eq. (
13).
5 Application to Image-Space Denoising
This section describes how to apply our general statistical framework to image-space denoising. In image space, the indices
i and
j each denote an individual pixel (not image-space coordinates). In particular, our implementation contains three major components. First, we track online statistics of the samples from the MC renderer, Alg. 1 ; this part is implemented as part of the renderer itself (we use pbrt-v3 [Pharr et al.
2016] for our results). Second, we select a joint bilateral filter to define the a priori weight
ρij in Eq. (
10). Finally, we implement our statistical denoising pipeline, which takes the noisy image, sample statistics, and a priori weights as input to compute the filter weights, Alg. 2, and produce the final output image (we implement this part on the GPU using CUDA).
In MC rendering, we work with (weighted) radiance samples that cannot be negative, while a few paths may result in large contributions to a pixel’s intensity. In statistical terms, the distribution of samples tends to be right-skewed. Given these observations, we employ the widely used Box-Cox transformation [Box and Cox
2018] to “normalize” the rendered samples:
We note that for samples generated by MC rendering, choosing
λ = 0 is impractical, as many samples may be zero (e.g., paths that do not reach a light source before termination), where the log function is undefined. In our experiments, we find that
λ = 1/2 yields good results in practice, as it effectively “compresses” high-valued outliers while avoiding excessive “stretching” of small values toward − ∞. For each transformed sample
\(x^{\prime }_k\) arriving from the renderer, we then update the online statistics following Alg. 1 .
As a base filter, we choose the joint bilateral filter [Eisemann and Durand
2004; Petschnigg et al.
2004] to define the a priori weights
ρij for any pair of pixels (
i,
j). These weights follow a Gaussian falloff with distance in combined image space and G-buffers:
where
pi denotes the a priori information for pixel
i. We use image-space position, RGB albedo color, and surface normal for each pixel, i.e.,
\(\mathbf {p}_i = (x_i,y_i, r_i,g_i,b_i, n_{ix},n_{iy},n_{iz})^\textsf {T}\), in our results. Moreover,
Σ denotes a covariance matrix that controls the rate of falloff for each dimension of
pi. To enforce the sparsity of these weights, we limit the filter to a small neighborhood within a constant radius around each pixel in image space;
our default covariance matrix is then Σ = diag(10, 10, 0.02, 0.02, 0.02, 0.1, 0.1, 0.1).In practice, our statistical filter operates with three color channels per pixel in RGB color space and evaluates the membership function for each channel according to Alg. 2 . We set the final weight wij to zero unless all three channels pass the statistical test before applying the filter. In this way, we avoid color shifts that could occur if a channel is evaluated differently from the others.
In summary, our approach builds on three key insights: First, minimizing pair-wise MSE is closely related to Welch’s t-test; this theoretical contribution forms the basis of our method. Second, all collected statistics are noisy estimates; enforcing binary membership functions prevents most of this noise from propagating downstream. Finally, Box-Cox transformation, as well as correcting the mean [Curto 2023; Johnson 1978], makes our method more robust to non-normality, effectively mitigating the influence of outliers (“fireflies”). The parameters of our method affect the results as follows: The threshold γ, Eq. (13), adjusts denoising strength. For symmetric weights, γ = 0 reverts to the base filter, γ = 1/2 and non-zero variance yields mij = 0 (when i ≠ j), effectively disabling filtering. The covariance matrix in Eq. (16) specifies the extent of the filtering window, depending on distances in image space and differences in G-buffer values; higher values generally increase the number of membership evaluations and lead to smoother results. The Box-Cox parameter λ determines how samples are transformed: λ = 1 keeps their distribution shape unchanged, smaller values correct right-skewed distributions (reducing positive outliers), larger values correct left skewness. 6 Results I — Image Denoising
In this section, we compare our statistical denoiser to state-of-the-art machine-learning-based denoisers (NVIDIA OptiX AI-Accelerated Denoiser and Intel Open Image Denoise), as well as a meta-denoiser [Firmino et al.
2022], on multiple well-known test scenes [Bitterli
2016] in Figs.
1–
4. Figure
5 shows a quantitative convergence analysis for these examples. Note that without useful information from the G-buffers (e.g., objects seen in the mirror in Fig.
2), neural methods often fail to reconstruct sharp features. We also compare our membership formulation to the one-sided confidence-interval test proposed by Moon et al. [
2013], using the
\(99.8\%\) confidence level as proposed in their paper. Here, we compare the membership functions directly without applying their two-step denoising process.
We also analyze the effect of various components and parameters of our method in Fig.
6. Our statistical membership function greatly contributes to preserving the hard edges of shadows, which would be over-blurred by the base filter, as such lighting effects are not present in the G-buffers. Note how the G-buffers and our membership functions complement each other in reconstructing image features (Figs.
6 b and c). The Box-Cox transformation further increases performance by “normalizing” the sample distributions.
All results were rendered and denoised on a desktop PC with an AMD Ryzen 9 5950X CPU and an NVIDIA RTX 3080 Ti GPU. Our filtering runtimes are generally comparable to the duration of the inference step of Intel Open Image Denoise (OIDN) and substantially faster than progressive denoising [Firmino et al.
2022]. We set the base-filter radius to a relatively large size of 20 pixels (mostly for 1280 × 720 resolution images). Reducing the radius would further speed up our method, as the number of membership tests scales quadratically with the radius: while for 20 pixels, we require around 28 ms, for 6 pixels, we only require around 11 ms (Fig.
6 d). Note that using the base filter alone would lead to noticeable over-blurring (Fig.
6 b).
8 Discussion and Conclusion
We have presented a simple yet effective denoising method for Monte Carlo rendering. Using well-known image filters for variance reduction and statistical tests (membership functions) to prevent bias, we achieve state-of-the-art image quality. Our approach is entirely free of pretrained components, using descriptive statistics of the sample distributions instead that can be estimated online during rendering. The properties of our membership function guarantee convergence with increasing sample counts. At low sample counts, high variance dominates the image error. In this case, the membership function becomes less discriminating, thereby reducing that variance and accepting some bias. As rendering progresses and more samples are added, variance decreases and the bias becomes more relevant to the overall error. If bias is significant (relative to variance), we set the corresponding filter weight to zero, based on statistical tests, which essentially eliminates that source of bias. In the limit, as variance approaches zero, any bias is unacceptable and thus prevented by the membership function.
We have focused on binary membership functions in this work. While we derive these functions from optimal pair-wise weights, recall that these weights are noisy estimates in practice. Introducing a threshold and restricting to binary memberships effectively prevents this noise from propagating through the pipeline. We leave investigating continuous membership functions for future work.
In contrast to neural-network-based approaches, our method does not require any computationally expensive training and does not risk adding “wrong” details that were present in the training data but should not be present in the output image.
Including our method as an additional input to existing meta-denoisers, such as progressive denoising [Firmino et al.
2022] or ensemble denoising [Zheng et al.
2021] could in turn deliver another step up in image quality for these methods. Furthermore, we also see great potential for the applicability of our statistics-based approach to other types of estimators during MC rendering. In particular, variance estimates have been used for adaptive sampling [Rousselle et al.
2012], MIS [Grittmann et al.
2019], or path guiding [Rath et al.
2020] in the past. Efficiently denoising variance estimates with our framework could yield improved performance of these methods in the future.
Another promising avenue for future work is the extension of our method to the temporal domain: Conceptually, estimates at different points in time of an animation can be treated equivalently to estimates at different image-space positions. Including such estimates could improve denoising performance and temporal coherence for animation denoising.