Abstract
Algorithms for automatically selecting a scalar or locally varying regularization parameter for total variation models with an \(L^{\tau }\)-data fidelity term, \(\tau \in \{1,2\}\), are presented. The automated selection of the regularization parameter is based on the discrepancy principle, whereby in each iteration a total variation model has to be minimized. In the case of a locally varying parameter, this amounts to solve a multiscale total variation minimization problem. For solving the constituted multiscale total variation model, convergent first- and second-order methods are introduced and analyzed. Numerical experiments for image denoising and image deblurring show the efficiency, the competitiveness, and the performance of the proposed fully automated scalar and locally varying parameter selection algorithms.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Observed images are often contaminated by noise and may be additionally distorted by some measurement device. Then the obtained data g can be described as
where \(\hat{u}\) is the unknown original image, T is a linear bounded operator modeling the image formation device, and \(\mathcal {N}\) represents noise. In this paper, we consider images which are contaminated by either white Gaussian noise or impulse noise. While for white Gaussian noise the degraded image g is obtained as
where the noise \(\eta \) is oscillatory with zero mean and standard deviation \(\sigma \), there are two main models for impulse noise that are widely used in a variety of applications, namely salt-and-pepper noise and random-valued impulse noise. We assume that \(T\hat{u}\) is in the dynamic range [0, 1], i.e., \(0 \le T\hat{u} \le 1\), then in the presence of salt-and-pepper noise the observation g is given by
with \(1-r_1-r_2>0\). If the image is contaminated by random-valued impulse noise, then g is described as
where \(\rho \) is a uniformly distributed random variable in the image intensity range [0, 1].
The recovery of \(\hat{u}\) from the given degraded image g is an ill-posed inverse problem and thus regularization techniques are required to restore the unknown image [41]. A good approximation of \(\hat{u}\) may be obtained by solving a minimization problem of the type
where \(\mathcal {H}(.;g) \) represents a data fidelity term, which enforces the consistency between the recovered and measured images, \(\mathcal {R}\) is an appropriate filter or regularization term, which prevents overfitting, and \(\alpha >0\) is a regularization parameter weighting the importance of the two terms. We aim at reconstructions in which edges and discontinuities are preserved. For this purpose, we use the total variation as a regularization term, first proposed in [81] for image denoising. Hence, here and in the remaining of the paper we choose \(\mathcal {R}(u)=\int _\varOmega |Du|\), where \(\int _\varOmega |Du|\) denotes the total variation of u in \(\varOmega \); see [3, 46] for more details. However, we note that other regularization terms, such as the total generalized variation [13], the nonlocal total variation [60], the Mumford–Shah regularizer [71], or higher order regularizers (see e.g. [77] and references therein) might be used as well.
1.1 Choice of the Fidelity Term
The choice of \(\mathcal {H}\) typically depends on the type of noise contamination. For images corrupted by Gaussian noise, a quadratic \(L^2\)-data fidelity term is typically chosen and has been successfully used; see for example [18–20, 24, 28–30, 32, 35, 47, 72, 76, 91, 93]. In this approach, which we refer to as the \(L^2\)-TV model, the image \(\hat{u}\) is recovered from the observed data g by solving
where \(BV(\varOmega )\) denotes the space of functions with bounded variation, i.e., \(u\in BV(\varOmega )\) if and only if \(u\in L^1(\varOmega )\) and \(\int _\varOmega |Du| <\infty \). In the presence of impulse noise, e.g., salt-and-pepper noise or random-valued impulse noise, the above model usually does not yield a satisfactory restoration. In this context, a more successful approach, suggested in [1, 74, 75], uses a nonsmooth \(L^1\)-data fidelity term instead of the \(L^2\)-data fidelity term in (1.4), i.e., one considers
which we call the \(L^1\)-TV model. In this paper, we are interested in both models, i.e., the \(L^2\)-TV and the \(L^1\)-TV model, and condense them into
to obtain a combined model for removing Gaussian or impulsive noise, where \( \mathcal {H}_\tau (u;g):=\frac{1}{\tau }\Vert Tu-g\Vert _{L^\tau (\varOmega )}^\tau \) for \(\tau =1,2\). Note that instead of (1.6) one can consider the equivalent problem
where \(\lambda =\frac{1}{\alpha }>0\). Other and different fidelity terms have been considered in connection with other type of noise models, as Poisson noise [64], multiplicative noise [4], and Rician noise [43]. For images which are simultaneously contaminated by Gaussian and impulse noise [15], a combined \(L^1--L^2\)-data fidelity term has been recently suggested and demonstrated to work satisfactorily [53]. However, in this paper, we concentrate on images degraded by only one type of noise, i.e., either Gaussian noise or one type of impulse noise, and perhaps additionally corrupted by some measurement device.
1.2 Choice of the Scalar Regularization Parameter
For the reconstruction of such images, the proper choice of \(\alpha \) in (1.6) and \(\lambda \) in (1.7) is delicate; cf. Fig. 1. In particular, large \(\alpha \) and small \(\lambda \), which lead to an oversmoothed reconstruction, not only remove noise but also eliminate details in images. On the other hand, small \(\alpha \) and large \(\lambda \) lead to solutions which fit the given data properly but therefore retain noise in homogeneous regions. Hence, a good reconstruction can be obtained by choosing \(\alpha \) and, respectively, \(\lambda \) such that a good compromise of the aforementioned effects is made. There are several ways of how to select \(\alpha \) in (1.6) and equivalently \(\lambda \) in (1.7), such as manually by the trial-and-error method, the unbiased predictive risk estimator method (UPRE) [67, 69], the Stein unbiased risk estimator method (SURE) [10, 38, 82] and its generalizations [34, 40, 45], the generalized cross-validation method (GCV) [48, 66, 67, 78], the L-curve method [49, 50], the discrepancy principle [70], and the variational Bayes’ approach [6]. Further parameter selection methods for general inverse problems can be found for example in [41, 42, 88, 89].
Based on a training set of pairs \((g_k, \hat{u}_k)\), for \(k=1, 2, \ldots , N \in \mathbb N\), where \(g_k\) is the noisy observation and \(\hat{u}_k\) represents the original image, for example in [16, 33, 61] bilevel optimization approaches have been presented to compute suitable scalar regularization parameters of the corresponding image model. Since in our setting we do not have a training set given, these approaches are not applicable here.
Applying the discrepancy principle to estimate \(\alpha \) in (1.6) or \(\lambda \) in (1.7), the image restoration problem can be formulated as a constrained optimization problem of the form
where \(\mathcal {B}_\tau :=\frac{\nu _\tau }{\tau } |\varOmega |\) with \(\nu _\tau >0\) being here a constant depending on the underlying noise, \(\tau =1,2\), and \(|\varOmega |\) denoting the volume of \(\varOmega \); see Sect. 3 for more details. Note that here we assume to know a priori the noise level. In real applications, this means that possibly in a first step a noise estimation has to be performed before the discrepancy principle may be used. However, in general it is easier to estimate the noise level than the regularization parameter [18].
The constrained minimization problem (1.8) is naturally linked to the unconstrained minimization problem (1.7) and accordingly to (1.6). In particular, there exists a constant \(\lambda \ge 0\) such that the unconstrained problem (1.7) is equivalent to the constrained problem (1.8) if T does not annihilate constant functions, i.e., \(T\in \mathcal {L}(L^2(\varOmega ))\) is such that \(T\cdot 1 =1\); see Sect. 2 for more details. Several methods based on the discrepancy principle and problem (1.8) with \(\tau =2\) have been proposed in the literature, see for example [9, 18, 51, 92] and references therein, while not so much attention has been given to the case \(\tau =1\), see for example [73, 91].
1.3 Spatially Adaptive Parameter
Note that a scalar regularization parameter might not be the best choice for every image restoration problem, since images usually have large homogeneous regions as well as parts with a lot of details. Actually, it seems obvious that \(\alpha \) should be small, or \(\lambda \) should be large, in parts with small features in order to preserve the details. On the contrary, \(\alpha \) should be large, or \(\lambda \) should be small, in homogeneous parts to remove noise considerably. With such a choice of a spatially varying weight, we expect better reconstructions than with a globally constant parameter, as demonstrated for example in [37, 58]. This motivated to consider multiscale total variation models with spatially varying parameters initially suggested in [80]. The multiscale version of (1.6) reads as
while for (1.7) one writes
and in the sequel we refer to (1.9) and (1.10) as the multiscale \(L^\tau \)-TV model.
In [84], the influence of the scale of an image feature on the choice of \(\alpha \) is studied and the obtained observations were later used in [83] to construct an updating scheme of \(\alpha \). Based on (1.10), in [8] a piecewise constant function \(\lambda \), where the pieces are defined by a partitioning of the image due to a presegmentation, is determined. In particular, for each segment a scalar \(\lambda _i\), \(i=1,\ldots ,\#\)pieces is computed by Uzawa’s method [27].
Later, it was noticed that stable choices of \(\lambda \), respectively, \(\alpha \) should incorporate statistical properties of the noise. In this vein, in [2, 37, 44] for the problem (1.10) automated update rules for \(\lambda \) based on statistics of local constraints were proposed. In [44], a two-level approach for variational denoising is considered, where in the first level noise and relevant texture are isolated in order to compute local constraints based on local variance estimation. In the second level, a gradient descent method and an update formula for \(\lambda (x)\) derived from the Euler–Lagrange equation are utilized. An adaptation of this approach to multiplicative noise can be found in [65]. For convolution type of problems in [2] based on an estimate of the noise variance for each pixel, an automatic updating scheme of \(\lambda \) using Uzawa’s method is created. This approach is improved in [37] by determining the fidelity weights due to the Gumbel statistic for the maximum of a finite number of random variables associated with localized image residuals and by incorporating hierarchical image decompositions, proposed in [86, 87], to speed up the iterative parameter adjustment process. An adaptation of this approach to a total variation model with \(L^1\) local constraints is studied in [58]. A different approach has been proposed in [85] for image denoising only, where nonlocal means [14] are used to create a nonlocal data fidelity term. While in all these approaches the adjustment of \(\lambda \) relies on the output of T being a deteriorated image again, in [54] the method of [37] is adjusted to the situation where T is an orthogonal wavelet transform or Fourier transform. Very recently, also bilevel optimisation approaches are considered for computing spatially adaptive weights [26, 56, 57].
1.4 Contribution
Our first contribution of this paper is to present a method which automatically computes the regularization parameter \(\alpha \) in (1.6) based on (1.8) for \(\tau =1\) as well as for \(\tau =2\). Our approach is motivated by the parameter selection algorithm presented in [18], which was originally introduced for \(L^2\)-TV image denoising only, i.e., when \(T=I\), where I denotes the identity operator. In this setting, the algorithm in [18] is shown to converge to a parameter \(\alpha ^*\) such that the corresponding minimizer \(u_{\alpha ^*}\) of (1.6) is also a solution of (1.8). The proof relies on the nonincrease of the function \(\alpha \mapsto \frac{\mathcal {H}_2(u_\alpha ;g)}{\mathcal {B}_2}\). However, this important property does not hold for operators \(T\not =I\) in general. Nevertheless, we generalize the algorithm from [18] to problems of the type (1.6) for \(\tau =1,2\) and for general linear bounded operators T, e.g., T might be a convolution type of operator. Utilizing an appropriate update of \(\alpha \), which is different than the one used in [18], we are able to show analytically and numerically that our approach indeed converges to the desired regularization parameter. Further, besides the general applicability of our proposed method, it even possesses advantages for the case \(\tau =2\) and \(T=I\) over the algorithm from [18] with respect to convergence. More precisely, in our numerics it turned out that our proposed method always needs less or at least the same number of iterations as the algorithm from [18] till termination.
Motivated by multiscale total variation minimization, the second contribution of this paper is concerned with the automated selection of a suitable spatially varying \(\alpha \) for the optimization problem in (1.9) for \(\tau =1,2\). Based on our considerations for an automatic scalar regularization parameter selection, we present algorithms where the adjustment of a locally varying \(\alpha \) is fully automatic. Differently to the scalar case, the adjustment of \(\alpha \) is now based on local constraints, similarly as already considered for example in [2, 37, 58]. However, our approach differs significantly from these previous works, where problem (1.10) is considered and Uzawa’s method or an Uzawa-like method is utilized for the update of the spatially varying parameter. Note that in Uzawa’s method an additional parameter has to be introduced and chosen accordingly. We propose an update scheme of \(\alpha \) which does not need any additional parameter and hence is not similar to Uzawa’s method. Moreover, differently to the approaches in [37, 58] where the initial regularization parameter \(\lambda >0\) has to be set sufficiently small, in our approach any initial \(\alpha >0\) is allowed. In this sense, our algorithm is even more general than the ones presented in [37, 58].
1.5 Outline of the Paper
The remaining of the paper is organized as follows: In Sect. 2, we revisit and discuss the connection between the constrained minimization problem (1.8) and the unconstrained optimization problem (1.6). Section 3 is devoted to the automated scalar parameter selection. In particular, we present our proposed method and analyze its convergence behavior. Based on local constraints, we describe in Sect. 4 our new locally adapted total variation algorithm in detail. Algorithms for performing total variation minimization for spatially varying \(\alpha \) are presented in Sect. 5 where also their convergence properties are studied. To demonstrate the performance of the new algorithms, we present in Sect. 6 numerical experiments for image denoising and image deblurring. Finally, in Sect. 7 conclusions are drawn.
2 Constrained Versus Unconstrained Minimization Problem
In this section, we discuss the connection between the unconstrained minimization problem (1.6) and the constrained optimization problem (1.8). For this purpose, we introduce the following basic terminology. Let \({\mathcal V}\) be a locally convex space, \({\mathcal V}'\) its topological dual, and \(\langle \cdot , \cdot \rangle \) the bilinear canonical pairing over \({\mathcal V}\times {\mathcal V}'\). The domain of a functional \({\mathcal J}: {\mathcal V}\rightarrow \mathbb R\cup \{+\infty \}\) is defined as the set
A functional \({\mathcal J}\) is called lower semicontinuous (l.s.c) if for every weakly convergent subsequence \(v^{(n)}\rightharpoonup \hat{v}\) we have
For a convex functional \({\mathcal J}: {\mathcal V}\rightarrow \mathbb R\cup \{+\infty \}\), we define the subdifferential of \({\mathcal J}\) at \(v\in {\mathcal V}\), as the set-valued function \(\partial {\mathcal J}(v) = \emptyset \) if \({\mathcal J}(v)=\infty \), and otherwise as
For any operator T, we denote by \(T^*\) its adjoint and by \(\mathcal {L}(L^2(\varOmega ))\) we denote the space of linear and continuous operators from \(L^2(\varOmega )\) to \(L^2(\varOmega )\). Moreover, \(g_\varOmega \) describes the average value of the function \(g\in L^1(\varOmega )\) in \(\varOmega \) defined by \(g_\varOmega :=\frac{1}{|\varOmega |} \int _\varOmega g(x) \ d x\).
Theorem 2.1
Assume that \(T\in \mathcal {L}(L^2(\varOmega ))\) does not annihilate constant functions, i.e., \(T 1_\varOmega \not =0\), where \(1_\varOmega (x) = 1\) for \(x\in \varOmega \). Then the problem
has a solution for \(\tau =1,2\).
Proof
For a proof, we refer the reader to [20] and [58]. \(\square \)
Moreover, we have the following statement.
Proposition 2.1
Assume that \(T\in \mathcal {L}(L^2(\varOmega ))\) is such that \(T\cdot 1=1\) and \(\nu _\tau |\varOmega | \le \Vert g-g_\varOmega \Vert ^\tau _{L^\tau (\varOmega )}\). Then problem (2.1) is equivalent to the constrained minimization problem (1.8) for \(\tau =1,2\).
Proof
For \(\tau =2\), the statement is shown in [20]. We state the proof for \(\tau =1\) by noting that it follows similar arguments as for \(\tau =2\). Let \(\tilde{u}\) be a solution of (2.1). Note that there exists \(u\in BV(\varOmega )\) such that \(\tilde{u}= u+g_\varOmega \). We consider now the continuous function \(f(s)=\Vert T(s u + g_\varOmega )- g\Vert _{L^1(\varOmega )}\) for \(s\in [0,1]\). Note that \(f(1)=\Vert T\tilde{u} - g\Vert _{L^1(\varOmega )}\le \nu _1|\varOmega |\) and \(f(0)=\Vert T g_\varOmega - g\Vert _{L^1(\varOmega )} = \Vert g - g_\varOmega \Vert _{L^1(\varOmega )}\ge \nu _1 |\varOmega |\), since \(T\cdot 1=1\), and hence there exists some \(s\in [0,1]\) such that \(f(s)=\nu _1 |\varOmega |\). Set \(u'=su\) which satisfies \(\Vert Tu' - g\Vert _{L^1(\varOmega )}=\nu _1|\varOmega |\) and
where \((u_n)_n\) is a minimizing sequence of (1.8). Hence \(u'\) is a solution of (1.8). \(\square \)
Now, we are able to argue the equivalence of the problems (1.6) and (1.8).
Theorem 2.2
Let \(T\in \mathcal {L}(L^2(\varOmega ))\) be such that \(T\cdot 1 =1\) and \(\nu _\tau |\varOmega | \le \Vert g-g_\varOmega \Vert ^\tau _{L^\tau (\varOmega )}\). Then there exists \(\alpha \ge 0\) such that the constrained minimization problem (1.8) is equivalent to the unconstrained problem (1.6), i.e., u is a solution of (1.8) if and only if u solves (1.6).
Proof
For \(\tau =2\), the proof can be found in [20, Prop. 2.1]. By similar arguments, one can show the statement for \(\tau =1\), which we state here. Set \( \mathcal {R}(u):= \int _{\varOmega } |D u|\) and
Notice that \(\mathcal {R}\) and \({\mathcal G}\) are convex l.s.c functions and problem (2.1) is equivalent to \(\min _u \mathcal {R}(u)+{\mathcal G}(Tu)\). We have \({\text {Dom}}(\mathcal {R}) = BV(\varOmega ) \cap L^2(\varOmega )\) and \({\text {Dom}}({\mathcal G})=\{u\in L^2(\varOmega ) : {\mathcal G}(u)<+\infty \}\). Since \(g\in \overline{T{\text {Dom}}(\mathcal {R})}\), there exists \(\tilde{u}\in {\text {Dom}}(\mathcal {R})\) with \(\Vert T\tilde{u} - g\Vert _{L^1(\varOmega )} \le \nu _1|\varOmega |/2\). As \(T\in \mathcal {L}(L^2(\varOmega ))\) is continuous, \({\mathcal G}\circ T\) is continuous at \(\tilde{u}\). Hence, by [39, Prop. 5.6, p. 26] we obtain
for all u. Further, \({\mathcal G}\) is continuous at \(T\tilde{u}\), and hence by [39, Prop. 5.7,p. 27] we have for all u
where \(\partial {\mathcal G}(u)=\{0\}\) if \(\Vert u-g\Vert _{L^1(\varOmega )}<\nu _1|\varOmega |\) and \(\partial {\mathcal G}(u)=\{\alpha \partial (\Vert u-g\Vert _{L^1(\varOmega )}), \alpha \ge 0\}\) if \(\Vert u-g\Vert _{L^1(\varOmega )}=\nu _1|\varOmega |\).
If u is a solution of (2.1) and hence of (1.8), then
Since any solution of (1.8) satisfies \(\Vert Tu-g\Vert _{L^1(\varOmega )}=\nu _1 |\varOmega |\), this shows that there exists an \(\alpha \ge 0\) such that
Hence, for this \(\alpha \ge 0\), u is a minimizer of the problem in (1.6).
Conversely, a minimizer u of (1.6) with the above \(\alpha \) is obviously a solution of (1.8) with \(\Vert Tu-g\Vert _{L^1(\varOmega )}=\nu _1 |\varOmega |\). This concludes the proof.\(\square \)
Note that \(\Vert T u_\alpha -g\Vert _{L^1(\varOmega )}\) is (only) convex with respect to \(Tu_\alpha \), and hence a minimizer of (1.5) is in general not unique even in the simple case when \(T=I\), i.e., for two minimizers \(u_\alpha ^1\) and \(u_\alpha ^2\) in general we have \(T u_\alpha ^1 \not =T u_\alpha ^2\). On the contrary, \(\Vert T u_\alpha -g\Vert _{L^2(\varOmega )}^2\) is strictly convex with respect to \(Tu_\alpha \), i.e., for two minimizers \(u_\alpha ^1\) and \(u_\alpha ^2\) of (1.4) we have \(T u_\alpha ^1 =T u_\alpha ^2\). Moreover, the function \(\alpha \mapsto \mathcal {H}_1(u_\alpha ;g)\) is in general not continuous [23], while \(\alpha \mapsto \mathcal {H}_2(u_\alpha ;g)\) indeed is continuous [20], where \(u_\alpha \) is a respective minimizer of (1.6). Hence we have the following further properties:
Lemma 2.1
Let \(u_\alpha \) be a minimizer of (1.6). Then \(\alpha \mapsto \mathcal {H}_\tau (u_\alpha ;g)\) is nondecreasing for \(\tau =1,2\). Moreover, \(\alpha \mapsto \mathcal {H}_2(u_\alpha ;g)\) maps \(\mathbb R^+\) onto \([0,\Vert g- g_\varOmega \Vert _{L^2(\varOmega )}^2]\).
Proof
For a proof see [20, 23].\(\square \)
Proposition 2.2
If \(u_{\alpha _i}\) is a minimizer of
for \(i=1,2\), then we have
with \(C:=\min \left\{ 2\left| \frac{\alpha _2 - \alpha _1}{\alpha _2 + \alpha _1}\right| , \left| \frac{\alpha _2 - \alpha _1}{\alpha _2 + \alpha _1}\right| ^{\frac{1}{2}}\right\} \).
Proof
By [7, Lemma10.2], we have
Summing up these two inequalities yields
which implies
By the nondecrease and boundedness of the function \(\alpha \mapsto \mathcal {H}_2(u_\alpha ;g)\), see Lemma 2.1, it follows
On the other hand, inequality (2.2) implies
where we used the binomial formula \(a^2 -b^2 = (a+b)(a-b)\) for \(a,b\in \mathbb R\) and the triangle inequality. Using Lemma 2.1 yields
The assertion follows then from (2.3) and (2.4). \(\square \)
Remark 2.1
Without loss of generality, let \(\alpha _2 \ge \alpha _1\) in Proposition 2.2. Then we easily check that
3 Automated Scalar Parameter Selection
In order to find a suitable regularization parameter \(\alpha >0\) of the minimization problem (1.6), we consider the corresponding constrained optimization problem (1.8). Throughout the paper, we assume that T does not annihilate constant function, which guarantees the existence of a minimizer of the considered optimization problems; see Sect. 2. We recall that in the constraint of (1.8) the value \(\mathcal {B}_\tau \) is defined as \(\mathcal {B}_\tau =\frac{\nu _\tau }{\tau }|\varOmega |\), where \(\nu _\tau \in \mathbb R\) is a statistical value depending on the underlying noise and possibly on the original image.
3.1 Statistical Characterization of the Noise
Let us characterize the noise corrupting the image in more detail by making similar considerations as in [58, Sect. 2]. Note that at any point \(x\in \varOmega \) the contaminated image \(g(x) = \mathcal {N}(T\hat{u})(x)\) is a stochastic observation, which depends on the underlying noise. Two important measures to characterize noise are the expected absolute value and the variance, which we denote by \(\nu _1\) and \(\nu _2\), respectively. For images contaminated by Gaussian white noise with standard deviation \(\sigma \), we typically set \(\tau =2\) and \(\nu _2=\sigma ^2\). If the image is instead corrupted by impulse noise, then we set \(\tau =1\) and we have to choose \(\nu _1\) properly. In particular, for salt-and-pepper noise \(\nu _1\in [\min \{r_1,r_2\},\max \{r_1,r_2\}]\), while for random-valued impulse noise \(\nu _1\) should be a value in the interval \([\frac{r}{4},\frac{r}{2}]\), where we used that for any point \(x\in \varOmega \) we have \(T\hat{u}(x)\in [0,1]\); cf. [58]. Here \(\nu _1\) seems to be fixed, while actually \(\nu _1\) depends on the true (unknown) image \(\hat{u}\). In particular, for salt-and-pepper noise the expected absolute value is given by
and for random-valued impulse noise we have
However, instead of considering the constraint \({\mathcal H}_\tau (u;g) = \mathcal {B}_\tau (u)\) in (1.8), which results in a quite nonlinear problem, in our numerics we choose a reference image and compute an approximate value \(\mathcal {B}_\tau \). Since our proposed algorithms are of iterative nature (see APS- and pAPS-algorithm below), it makes sense to choose the current approximation as the reference image, i.e., the reference image changes during the iterations. Note that for salt-and-pepper noise with \(r_1=r_2\) the expected absolute value becomes independent of \(\hat{u}\) and hence \(\nu _1=r_1\). In case of Gaussian noise, \(\nu _\tau \) and \(\mathcal {B}_\tau \) are independent of \(\hat{u}\) too. Nevertheless, in order to keep the paper concise, in the sequel instead of \(\nu _\tau \) and \(\mathcal {B}_\tau \) we often write \(\nu _\tau (\tilde{u})\) and \(\mathcal {B}_\tau (\tilde{u})\), where \(\tilde{u}\) represents a reference image approximating \(\hat{u}\), even if the values may actually be independent from the image.
3.2 Automated Parameter Selection Strategy
In order to determine a suitable regularization parameter \(\alpha \) in [18], an algorithm for solving the constrained minimization problem (1.8) for \(T=I\) and \(\tau =2\) is proposed, i.e., in the presence of Gaussian noise with zero mean and standard deviation \(\sigma \). This algorithm relies on the fact that \(\alpha \mapsto {\mathcal H}_2(u_\alpha ;g)\) is nondecreasing, which leads to the following iterative procedure.
For the minimization of the optimization problem in step 1) in [18], a method based on the dual formulation of the total variation is used. However, we note that any other algorithm for total variation minimization might be used for solving this minimization problem. The CPS-algorithm generates a sequence \((u_{\alpha _n})_n\) such that for \(n\rightarrow \infty \), \(\Vert u_{\alpha _n} - g\Vert _{L^2(\varOmega )} \rightarrow \sigma \sqrt{|\varOmega |}\) and \(u_{\alpha _n}\) converges to the unique solution of (1.8) with \(T=I\) and \(\tau =2\) [18]. The proof relies on the fact that the function \(\alpha \rightarrow \frac{\Vert u_\alpha - g\Vert _{L^2(\varOmega )}}{\alpha }\) is nonincreasing. Note that this property does not hold in general for operators \(T\not =I\).
3.2.1 The p-Adaptive Algorithm
We generalize now the CPS-algorithm to optimization problems of the type (1.8) for \(\tau =1,2\) and for general operators T. In order to keep or obtain appropriate convergence properties, we need the following two conditions to be satisfied. Firstly, the function \(\alpha \mapsto {\mathcal H}_\tau (u_\alpha ;g)\) has to be monotonic, which is the case due to Lemma 2.1. Secondly, in each iteration n the parameter \(\alpha _n\) has to be updated such that \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) is monotonic and bounded by \((\mathcal {B}_\tau (u_{\alpha _n}))_n\). More precisely, if \({\mathcal H}_\tau (u_{\alpha _0};g)\le \mathcal {B}_\tau (u_{\alpha _0})\), then there has to exist an \(\alpha _{n+1}\ge \alpha _n\) such that \( {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau (u_{\alpha _{n+1}})\), while if \({\mathcal H}_\tau (u_{\alpha _0};g)> \mathcal {B}_\tau (u_{\alpha _0})\), then there has to exist an \(\alpha _{n+1}\le \alpha _n\) such that \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\ge \mathcal {B}_\tau (u_{\alpha _{n+1}})\). This holds true by setting in every iteration
together with an appropriate choice of \(p\ge 0\). In particular, there exists always a \(p\ge 0\) such that this condition is satisfied.
Proposition 3.1
Assume \(\Vert g-g_\varOmega \Vert _{L^\tau (\varOmega )}^\tau \ge \nu _\tau |\varOmega |\) and \(\alpha _{n+1}\) is defined as in (3.3).
-
(i)
If \(\alpha _n>0\) such that \({\mathcal H}_\tau (u_{\alpha _n};g)= \mathcal {B}_\tau (u_{\alpha _{n}})\), then for all \(p\in \mathbb R\) we have that \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau (u_{\alpha _{n+1}})\).
-
(ii)
If \(\alpha _n>0\) such that \(0<{\mathcal H}_\tau (u_{\alpha _n};g)<\mathcal {B}_\tau (u_{\alpha _{n}})\), then there exist \(p\ge 0\) with \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau (u_{\alpha _{n+1}})\).
-
(iii)
If \(\alpha _n>0\) such that \({\mathcal H}_\tau (u_{\alpha _n};g)>\mathcal {B}_\tau (u_{\alpha _{n}})\), then there exist \(p\ge 0\) with \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\ge \mathcal {B}_\tau (u_{\alpha _{n+1}})\).
Proof
The assertion immediately follows by noting that for \(p=0\) we have \(\alpha _{n+1}=\alpha _n\).\(\square \)
Taking these considerations into account, a generalization of the CPS-algorithm can be formulated as the following p-adaptive automated parameter selection algorithm:
Note that due to the dependency of \(\alpha _{n+1}\) on p a proper p cannot be explicitly computed, but only iteratively, as in the pAPS-algorithm.
The initial \(p_0> 0\) can be chosen arbitrarily. However, we suggest to choose it sufficiently large in order to keep the number of iterations small. In particular, in our numerical experiments in Sect. 6 we set \(p_0=32\), which seems large enough to us.
Proposition 3.2
The pAPS-algorithm generates monotone sequences \((\alpha _n)_n\) and \(\left( {\mathcal H}_\tau (u_{\alpha _n};g)\right) _n\) such that \(\left( {\mathcal H}_\tau (u_{\alpha _n};g)\right) _n\) is bounded. Moreover, if \( {\mathcal H}_\tau (u_{\alpha _0};g) > \mathcal {B}_\tau (u_{\alpha _0})\) or \(\mathcal {B}_\tau (u_\alpha )\le \frac{1}{\tau }\Vert g- g_\varOmega \Vert _\tau ^\tau \) for all \(\alpha >0\), then \((\alpha _n)_n\) is also bounded.
Proof
If \({\mathcal H}_\tau (u_{\alpha _0};g) > \mathcal {B}_\tau (u_{\alpha _0})\), then by induction and Lemma 2.1 one shows that \(0< \alpha _{n+1} \le \alpha _n\) and \(0 \le {\mathcal H}_\tau (u_{\alpha _{n+1}};g) \le {\mathcal H}_\tau (u_{\alpha _n};g)\) for all \(n\in \mathbb N\). Consequently, \((\alpha _n)_n\) and \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) are monotonically decreasing and bounded.
If \({\mathcal H}_\tau (u_{\alpha _0};g)\le \mathcal {B}_\tau (u_{\alpha _0})\), due to Lemma 2.1 we have that \(0<\alpha _n\le \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g)\le {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\) for all \(n\in \mathbb N\) and hence \((\alpha _n)_n\) and \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) are monotonically increasing. Since there exists \(\mathcal {B}_\tau ^*>0\) such that \({\mathcal H}_\tau (u_{\alpha _n};g) \le \mathcal {B}_\tau (u_{\alpha _n})\le \mathcal {B}_\tau ^*\) for all \(n\in \mathbb N\), see Sect. 3.1, \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) is also bounded. If we additionally assume that \(\mathcal {B}_\tau (u_\alpha )\le \frac{1}{\tau }\Vert g-g_\varOmega \Vert _\tau ^\tau \) for all \(\alpha >0\) and we set \(\mathcal {B}_\tau ^*:=\max _\alpha \mathcal {B}_\tau (u_\alpha )\), then Theorem 2.2 ensures the existence of an \(\alpha ^*\ge 0\) such that \({\mathcal H}_\tau (u_{\alpha ^*};g) = \mathcal {B}_\tau ^*\). By Lemma 2.1, it follows that \(\alpha _{n}\le \alpha ^*\) for all \(n\in \mathbb N\), since \({\mathcal H}_\tau (u_{\alpha _{n}};g) \le \mathcal {B}_\tau (u_{\alpha _n})\le \mathcal {B}_\tau ^*\). Hence, \((\alpha _n)_n\) is bounded, which finishes the proof.\(\square \)
Since any monotone and bounded sequence converges to a finite limit, also \((\alpha _n)_n\) converges to a finite value if one of the assumptions in Proposition 3.2 holds. For constant \(\mathcal {B}_\tau \), we are even able to argue the convergence of the pAPS-algorithm to a solution of the constrained minimization problem (1.8).
Theorem 3.1
Assume that \(\mathcal {B}_\tau (u)\equiv \mathcal {B}_\tau \) is a constant independent of u and \(\Vert g- g_\varOmega \Vert _\tau ^\tau \ge \nu _\tau |\varOmega |\). Then the pAPS-algorithm generates a sequence \((\alpha _n)_n\) such that \(\lim _{n\rightarrow \infty } \alpha _n=\bar{\alpha } >0\) with \({\mathcal H}_\tau (u_{\bar{\alpha }};g)=\lim _{n\rightarrow \infty }{\mathcal H}_\tau (u_{\alpha _n};g)\) \(=\) \(\mathcal {B}_\tau \) and \(u_{\alpha _n} \rightarrow u_{\bar{\alpha }}\in \arg \min _{u\in X} \mathcal {J}_\tau (u,\bar{\alpha })\) for \(n\rightarrow \infty \).
Proof
Let us start with assuming that \({\mathcal H}_\tau (u_{\alpha _0};g)\le B_\tau \). By induction, we show that \(\alpha _n \le \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g)\) \(\le \) \( {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau \). In particular, if \({\mathcal H}_\tau (u_{\alpha _n};g) \le \mathcal {B}_\tau \), then \(\alpha _{n+1}=\left( \frac{\mathcal {B}_\tau }{{\mathcal H}_\tau (u_{\alpha _n};g)} \right) ^p\alpha _n > \alpha _n\), where \(p> 0\) such that \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau \); cf. pAPS-algorithm. Then by Lemma 2.1 it follows that
Note that there exists an \(\alpha ^*>0\) with \({\mathcal H}_\tau (u_{\alpha ^*};g)=B_\tau \), see Theorem 2.2, such that for any \(\alpha \ge \alpha ^*\), \({\mathcal H}_\tau (u_{\alpha };g) \ge \mathcal {B}_\tau \); cf. Lemma 2.1. If \(\alpha _n\ge \alpha ^*\), then \({\mathcal H}_\tau (u_{\alpha _n};g)\ge B_\tau \). Hence \({\mathcal H}_\tau (u_{\alpha _n};g) = B_\tau \) and \(\alpha _{n+1}=\alpha _n\). Thus, we deduce that the sequences \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) and \((\alpha _n)_n\) are nondecreasing and bounded. Consequently, there exists an \(\bar{\alpha }\) such that \(\lim _{n\rightarrow \infty }\alpha _n = \bar{\alpha }\) with \({\mathcal H}_\tau (u_{\bar{\alpha }};g) = B_\tau \). Let \(\bar{{\mathcal H}} = \lim _{n\rightarrow \infty } {\mathcal H}_\tau (u_{\alpha _n};g)\). Then \(\bar{{\mathcal H}}={\mathcal H}_\tau (u_{\bar{\alpha }};g)=B_\tau \). By the optimality of \(u_{\alpha _n}\), we have that \(0\in \partial \mathcal {J}_\tau (u_{\alpha _n},\alpha _n) = \partial {\mathcal H}_\tau (u_{\alpha _n};g) + \alpha _n \partial \int _\varOmega |D u_{\alpha _n}|\); see [39, Prop. 5.6 + Eq. (5.21), p. 26]. Consequently, there exist \(v_{\alpha _n}\in \partial \int _\varOmega |D u_{\alpha _n}|\) such that \(-\alpha _n v_{\alpha _n}\in \partial {\mathcal H}_\tau (u_{\alpha _n};g) \) with \(\lim _{n\rightarrow \infty } v_{\alpha _n}=v_{\bar{\alpha }}\). By [79, Theorem. 24.4, p. 233], we obtain that \(-\bar{\alpha }v_{\bar{\alpha }}\in \partial {\mathcal H}_\tau (u_{\bar{\alpha }};g) \) with \(v_{\bar{\alpha }}\in \partial \int _\varOmega |D u_{\bar{\alpha }}|\) and hence \(0\in \partial \mathcal {J}_\tau (u_{\bar{\alpha }},\bar{\alpha })\) for \(n\rightarrow \infty \).
If \({\mathcal H}_\tau (u_{\alpha _0};g)>B_\tau \), then as above we can show by induction that \(\alpha _n \ge \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g)\ge {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\ge B_\tau \). Thus, we deduce that \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) and \((\alpha _n)_n\) are nonincreasing and bounded. Note that there exists an \(\alpha ^*>0\) with \({\mathcal H}_\tau (u_{\alpha ^*};g)=B_\tau \) such that for any \(\alpha \le \alpha ^*\), \({\mathcal H}_\tau (u_{\alpha };g)\le B_\tau \). Hence if \(\alpha _n\le \alpha ^*\), then \({\mathcal H}_\tau (u_{\alpha _n};g)\le B_\tau \). This implies that \({\mathcal H}_\tau (u_{\alpha _n};g) = B_\tau \) and \(\alpha _{n+1}=\alpha _n\). The rest of the proof is identical to the above.\(\square \)
Remark 3.1
The adaptive choice of the value p in the pAPS-algorithm is fundamental for proving convergence in Theorem 3.1. In particular, the value p is chosen in dependency of \(\alpha \), i.e., actually \(p=p(\alpha )\), such that in the case of a constant \(\mathcal {B}_\tau \) the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)^{p(\alpha )}}{\alpha }\) is nonincreasing; cf. Fig. 5a.
3.2.2 The Non p-Adaptive Case
A special case of the pAPS-algorithm accrues when the value p is not adapted in each iteration but set fixed. For the case \(p=1\) (fixed), we obtain the following automated parameter selection algorithm.
Even in this case, although under certain assumptions, we can immediately argue the convergence of this algorithm.
Theorem 3.2
For \(\alpha >0\), let \(u_\alpha \) be a minimizer of \(\mathcal {J}_\tau (u,\alpha )\). Assume that \(\mathcal {B}_\tau (u)\equiv \mathcal {B}_\tau \) is a constant independent of u, the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is nonincreasing, and \(\Vert g- g_\omega \Vert _\tau ^\tau \ge \nu _\tau |\varOmega |\). Then the APS-algorithm generates a sequence \((\alpha _n)_n \subset \mathbb R^+\) such that \(\lim _{n\rightarrow \infty } \alpha _n = \bar{\alpha }>0\), \(\lim _{n\rightarrow \infty }{\mathcal H}_\tau (u_{\alpha _n};g)=B_\tau \), and \(u_{\alpha _n}\) converges to \(u_{\bar{\alpha }}\in \arg \min _{u\in BV(\varOmega )} \mathcal {J}(u,\bar{\alpha })\) for \(n\rightarrow \infty \).
Proof
We only consider the case when \({\mathcal H}_\tau (u_{\alpha _0};g)\le \mathcal {B}_\tau \) by noting that the case \({\mathcal H}_\tau (u_{\alpha _0};g)> \mathcal {B}_\tau \) can be shown analogous. By induction, we can show that \(\alpha _n\le \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g) \le {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau \). More precisely, if \({\mathcal H}_\tau (u_{\alpha _n};g)\le \mathcal {B}_\tau \), then \(\alpha _{n+1}=\frac{\mathcal {B}_\tau }{{\mathcal H}_\tau (u_{\alpha _n};g)}\alpha _n \ge \alpha _n\) and by Lemma 2.1 it follows that \({\mathcal H}_\tau (u_{\alpha _n};g)\le {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\). Moreover, by the assumption that \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_{\alpha };g)}{\alpha }\) is nonincreasing, we obtain \({\mathcal H}_\tau (u_{\alpha _{n+1}};g) \le \frac{\alpha _{n+1}}{\alpha _n} {\mathcal H}_\tau (u_{\alpha _n};g)=\mathcal {B}_\tau .\) That is,
The rest of the proof is analogous to the one of Theorem 3.1.\(\square \)
Nothing is known about the convergence of the APS-algorithm, if \(\mathcal {B}_\tau (\cdot )\) indeed depends on u and \(\mathcal {B}_\tau (u_{\alpha _n})\) is used instead of a fixed constant. In particular, in our numerics for some examples, in particular for the application of removing random-valued impulse noise with \(r=0.05\), we even observe that starting from a certain iteration the sequence \((\alpha _n)_n\) oscillates between two states, see Fig. 10c. This behavior can be attributed to the fact that, for example, if \(H_\tau (u_{\alpha _n}) \le \mathcal {B}_\tau (u_{\alpha _n})\), then it is not guaranteed that also \(H_\tau (u_{\alpha _{n+1}}) \le \mathcal {B}_\tau (u_{\alpha _{n+1}})\), which is essential for the convergence. The second assumption in the previous theorem, i.e., the nonincrease of the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\), can be slightly loosened, since for the convergence of the APS-algorithm it is enough to demand the nonincrease starting from a certain iteration \(\tilde{n}\ge 0\). That is, if there exists a region \(U\subset \mathbb R^+\) where \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is nonincreasing and \((\alpha _n)_{n\ge \tilde{n}} \subset U\), then the algorithm converges; see Fig. 5. Analytically, this can be easily shown via Theorem 3.2 by just considering \(\alpha _{\tilde{n}}\) as the initial value of the algorithm. If \(\tau =2\), similar to the CPS-algorithm, we are able to show the following monotonicity property.
Proposition 3.3
If there exists a constant \(c > 0\) such that \(\Vert T^*(T u - g)\Vert _{L^2(\varOmega )} = c \Vert T u - g\Vert _{L^2(\varOmega )}\) for all \(u\in L^2(\varOmega )\), then the function \(\alpha \mapsto \frac{\sqrt{{\mathcal H}_2(u_\alpha ;g)}}{\alpha }\) is nonincreasing, where \(u_\alpha \) is a minimizer of \( \mathcal {J}_2(u,\alpha )\).
Proof
We start by replacing the functional \(\mathcal {J}_2\) by a family of surrogate functionals denoted by \(\bar{\mathcal {S}}\) and defined for \(u,a\in X\) as
where \(\delta >\Vert T\Vert ^2, z(a):=a - \frac{1}{\delta }T^*(T a - g)\), and \(\psi \) is a function independent of u. It can be shown that the iteration
generates a sequence \((u_{\alpha ,k})_k\) which converges weakly for \(k\rightarrow \infty \) to a minimizer \(u_{\alpha }\) of \(\mathcal {J}_2(u,\alpha )\); see for example [32]. The unique minimizer \(u_{\alpha ,k+1}\) is given by \(u_{\alpha ,k+1} = (I-P_{\frac{\alpha }{\delta }K_{}})(z(u_{\alpha ,k}))\), where K is the closure of the set
and \(P_K(u):=\arg \min _{v\in K} \Vert u-v\Vert _{L^2(\varOmega )}\); see [18]. Then for \(k\rightarrow \infty \), let us define
Since \(\Vert T^*(T u - g)\Vert _{L^2(\varOmega )} = c \Vert T u - g\Vert _{L^2(\varOmega )}\), it follows that \(\tilde{f}(\alpha )=\frac{c}{\delta } \sqrt{2 {\mathcal H}_2(u_\alpha ;g)}\). The assertion follows by applying [18, Lemma4.1], which is extendable to infinite dimensions, to \(\tilde{f}\) and by noting that the nonincrease of \(\alpha \mapsto \frac{\tilde{f}(\alpha )}{\alpha }\) implies the nonincrease of \(\alpha \mapsto \frac{\sqrt{{\mathcal H}_2(u_\alpha ;g)}}{\alpha }\).\(\square \)
We remark that for convolution type of operators the assumption of Proposition 3.3 does not hold in general. However, there exist several operators T, relevant in image processing, with the property \(\Vert T^*(Tu-g)\Vert _{L^2(\varOmega )} = \Vert Tu-g\Vert _{L^2(\varOmega )}\). Such operators include \(T=I\) for image denoising, \(T=1_D\) for image inpainting, where \(1_D\) denotes the characteristic function of the domain \(D\subset \varOmega \), and \(T=S\circ A\), where S is a subsampling operator and A is an analysis operator of a Fourier or orthogonal wavelet transform. The latter type of operator is used for reconstructing signals from partial Fourier data [17] or in wavelet inpainting [25], respectively. For all such operators, the function \(\alpha \mapsto \frac{\sqrt{{\mathcal H}_2(u_\alpha ;g)}}{\alpha }\) is nonincreasing and hence by setting \(p=\frac{1}{2}\) fixed in the pAPS-algorithm or changing the update of \(\alpha \) in the APS-algorithm to
where \(\mathcal {B}_2\) is a fixed constant, chosen according to (1.6), we obtain in these situations a convergent algorithm.
We emphasize once more that in general the nonincrease of the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is not guaranteed. Nevertheless, there exists always a constant \(p\ge 0\) such that \(\alpha \mapsto \frac{({\mathcal H}_\tau (u_\alpha ;g))^p}{\alpha }\) is indeed nonincreasing. For example, \(p=\frac{1}{2}\) for operators T with the property \(\Vert T^*(Tu-g)\Vert _{L^2(\varOmega )} = \Vert Tu-g\Vert _{L^2(\varOmega )}\); cf. Propsition 3.3. In particular, one easily checks the following result.
Proposition 3.4
Let \(0<\alpha \le \beta \), and \(u_\alpha \) and \(u_\beta \) minimizers of \(\mathcal {J}_\tau (\cdot ,\alpha )\) and \(\mathcal {J}_\tau (\cdot ,\beta )\), respectively, for \(\tau =1,2\). Then \(\frac{({\mathcal H}_\tau (u_{\beta };g))^p}{\beta } \le \frac{({\mathcal H}_\tau (u_{\alpha };g))^p}{\alpha }\) if and only if \(p \le \frac{\ln {\beta } - \ln {\alpha }}{\ln {{\mathcal H}_\tau (u_{\beta };g)} - \ln {{\mathcal H}_\tau (u_{\alpha };g)}}\).
4 Locally Constrained TV Problem
In order to enhance image details, while preserving homogeneous regions, we formulate, as in [37, 58], a locally constrained optimization problem. That is, instead of considering (1.6) we formulate
for almost every \(x\in \varOmega \), where w is a normalized filter, i.e., \(w\in L^\infty (\varOmega \times \varOmega )\), and \(w\ge 0\) on \(\varOmega \times \varOmega \) with
for all \(\phi \in L^{\tau }(\varOmega )\) and for some \(\epsilon >0\) independent of \(\phi \); cf. [37, 58].
4.1 Local Filtering
In practice, for w we may use the mean filter together with a windowing technique; see for example [37, 58]. In order to explain the main idea, we continue in a discrete setting. Let \(\varOmega ^h\) be a discrete image domain containing \(N_1\times N_2\) pixels, \(N_1,N_2\in \mathbb N\), and by \(|\varOmega ^h|=N_1N_2\) we denote the size of the discrete image (number of pixels). We approximate functions u by discrete functions, denoted by \(u^h\). The considered function spaces are \(X=\mathbb R^{N_1\times N_2}\) and \(Y=X\times X\). In what follows for all \(u^h\in X\), we use the following norms:
for \(1\le \tau <+\infty \). Moreover, we denote by \(u^h_\varOmega \) the average value of \(u^h\in X\), i.e, \( u^h_\varOmega := \frac{1}{|\varOmega ^h|} \sum _{x\in \varOmega ^h} u^h(x)\). The discrete gradient \(\nabla ^h: X \rightarrow Y\) and the discrete divergence \({\text {div}}^h : Y \rightarrow X\) are defined in a standard way by forward and backward differences such that \({\text {div}}^h:= - (\nabla ^h)^* \); see for example [18, 22, 55, 63]. With the above notations and definitions, the discretization of the general function in (1.9) is given by
where \(H_\tau (u^h)=\frac{1}{\tau } \Vert T^hu^h-g^h\Vert _{\tau }^{\tau }\), \(\tau \in \{1,2\}\), \(T^h : X \rightarrow X\) is a bounded linear operator, \(\alpha \in (\mathbb R^+)^{N_1 \times N_2}\), and
with \(|y|_{l^2}=\sqrt{y_1^2 + y_2^2}\) for every \(y=(y_1,y_2)\in \mathbb R^2\). In the sequel, if \(\alpha \) is a scalar or \(\alpha \equiv 1\) in (4.4), we write instead of \(R_\alpha \) or \(R_1\) just \(\alpha R\) or R, respectively, i.e.,
is the discrete total variation of u in \(\varOmega ^h\), and we write \(\bar{E}_\tau \) instead of \(E_\tau \) to indicate that \(\alpha \) is constant. Introducing some step-size h, then for \(h\rightarrow 0\) (i.e. the number of pixels \(N_1N_2\) goes to infinity), one can show, similar as for the case \(\alpha \equiv 1\), that \(R_{\alpha }\) \(\Gamma \)-converges to \(\int _\varOmega \alpha |Du|\); see [12, 62]. We turn now to the locally constrained minimization problem, which is given in the discrete setting as
Here \(\nu _\tau \) is a fixed constant and
denotes the local residual at \(x_{i,j}\in \varOmega ^h\) with \(\mathcal {I}_{i,j}\) being some suitable set of pixels around \(x_{i,j}\) of size \(M_{i,j}\), i.e., \(M_{i,j}= |\mathcal {I}_{i,j}|\). For example, in [37, 54, 58] for \(\mathcal {I}_{i,j}\) the set
with a symmetric extension at the boundary and with \(\omega \) being odd is used. That is, \(\varOmega _{i,j}^\omega \) is a set of pixels in a \(\omega \)-by-\(\omega \) window centered at \(x_{i,j}\), i.e., \(M_{i,j}=\omega ^2\) for all i, j, such that \(\varOmega _{i,j}^\omega \not \subset \varOmega ^h\) for \(x_{i,j}\) sufficiently close to \(\partial \varOmega \). Additionally, we denote by \(\tilde{\varOmega }_{i,j}^{\omega }\) a set of pixels in a window centered at \(x_{i,j}\) without any extension at the boundary, i.e.,
Hence \(\tilde{\varOmega }_{i,j}^\omega \subset \varOmega ^h\) for all \(x_{i,j}\in \varOmega ^h\). Before we analyze the difference between \(\varOmega _{i,j}^{\omega }\) and \(\tilde{\varOmega }_{i,j}^{\omega }\) with respect to the constrained minimization problem (4.5), we note that since \(T^h\) does not annihilate constant functions, the existence of a solution of (4.5) is guaranteed; see [37, Theorem 2][58, Theorem 2].
In the following, we set \(B_\tau :=\frac{\nu _\tau }{\tau }|\varOmega ^h|\).
Proposition 4.1
-
(i)
If \(u^h\) is a solution of (4.5) with \(\mathcal {I}_{i,j}= \varOmega _{i,j}^{\omega }\), then \(H_\tau (u^h) < B_\tau .\)
-
(ii)
If u is a solution of (4.5) with \(\mathcal {I}_{i,j}= \tilde{\varOmega }_{i,j}^{\omega }\), then \(H_\tau (u^h) \le B_\tau .\)
Proof
-
(i)
Since \(u^h\) is a solution of (4.5) and \(\varOmega _{i,j}^\omega \) is a set of pixels in a \(\omega \)-by-\(\omega \) window, we have
$$\begin{aligned} \begin{aligned} B_\tau&\ge \sum _{i,j}S_{i,j}^{\tau }(u^h) \\&= \sum _{i,j} \frac{1}{\tau \omega ^2}\sum _{x_{s,t}\in \varOmega _{i,j}^\omega }|g^h(x_{s,t})- T^h u^h(x_{s,t}) |^\tau \\&> \frac{1}{\tau }\sum _{i,j} |g^h(x_{i,j})- T^h u^h(x_{i,j})|^\tau = H_\tau (u^h). \end{aligned} \end{aligned}$$Here we used that due to the sum over i, j each element (pixel) in \(\varOmega _{i,j}^\omega \) appears at most \(\omega ^2\) times. More precisely, any pixel coordinate in the set \(\Lambda ^\omega := \{(i,j) : \min \{i-1,j-1,N_1-i,N_2-j\}\ge \frac{\omega -1}{2}\}\) occurs exactly \(\omega ^2\) times, while any other pixel coordinate appears strictly less than \(\omega ^2\) times. This shows the first statement.
-
(ii)
For a minimizer \(u^h\) of (4.5), we obtain
$$\begin{aligned} \begin{aligned} B_\tau&\ge \sum _{i,j}S_{i,j}^{\tau }(u^h) \\&= \sum _{i,j} \frac{1}{\tau M_{i,j}}\sum _{x_{s,t}\in \varOmega _{i,j}^\omega }|g^h(x_{s,t}) - T^h u^h(x_{s,t})|^\tau \\&= \frac{1}{\tau }\sum _{i,j} |g^h(x_{i,j})- T^h u^h(x_{i,j})|^\tau = H_\tau (u^h), \end{aligned} \end{aligned}$$which concludes the proof.\(\square \)
Note that if \(\mathcal {I}_{i,j} = \tilde{\varOmega }_{i,j}^{\omega }\), then by Proposition 4.1 a minimizer of (4.5) also satisfies the constraint of the problem
(discrete version of (2.1)) but is in general of course not a solution of (4.6).
Proposition 4.2
Let \(\mathcal {I}_{i,j} = \tilde{\varOmega }_{i,j}^{\omega }\), \(u_s^h\) be a minimizer of (4.6), and \(u_l^h\) be a minimizer of (4.5). Then \(R(u_s^h)\le R(u_l^h)\).
Proof
Assume that \(R(u_s^h)> R(u_l^h)\). Since \(u_s^h\) is a solution of (4.6), it satisfies the constraint \(H_\tau (u_s^h)\le B_\tau \). By Proposition 4.1, we also have \(H_\tau (u_l^h)\le B_\tau \). Since \(R(u_s^h) > R(u_l^h)\), \(u_s^h\) is not the solution of (4.6) which is a contradiction. Hence, \(R(u_s^h)\le R(u_l^h)\).\(\square \)
Remark 4.1
Proposition 4.1 and its consequence are not special properties of the discrete setting. Let the filter w in (4.1) be such that the inequality in (4.2) becomes an equality with \(\epsilon =1/|\varOmega |\), as it is the case in Proposition 4.1(ii). Then a solution \({u}_l\) of the locally constrained minimization problem (4.1) satisfies
where \({u}_s\) is a solution of (2.1).
From Proposition 4.2 and Remark 4.1, we conclude, since \(R(u_s^h) \le R(u_l^h)\) and \(\int _\varOmega |D{u}_s| \le \int _\varOmega |D{u}_l|\), that \(u_s^h\) and \({u}_s\) are smoother than \(u_l^h\) and \({u}_l\), respectively. Hence, the solution of the locally constrained minimization problem is expected to preserve details better than the minimizer of the globally constrained optimization problem. Since noise can be interpreted as fine details, which we actually want to eliminate, this could also mean that noise is possibly left in the image.
4.2 Locally Adaptive Total Variation Algorithm
Whenever \(\nu _\tau \) depends on \(\hat{u}\), problem (4.5) results in a quite nonlinear problem. Instead of considering nonlinear constraints, we choose as in Sect. 3 a reference image \(\tilde{u}\) and compute an approximate \(\nu _\tau =\nu _\tau (\tilde{u})\). Note that in our discrete setting for salt-and-pepper noise we have now
and for random-valued impulse noise we have
In our below proposed locally adaptive algorithms, we choose as a reference image the current approximation (see LATV- and pLATV-algorithm below), as also done in the pAPS- and APS-algorithm above. Then we are seeking for a solution \(u^h\) such that \(S_{i,j}^{\tau }(u^h)\) is close to \(\frac{\nu _\tau }{\tau }\). We note that for large \(\alpha >0\) the minimization of (4.3) yields an oversmoothed restoration \(u_\alpha ^h\) and the residual contains details, i.e., we expect \(H_\tau (u_\alpha ^h)> B_\tau \). Hence, if \( S_{i,j}^{\tau }(u_\alpha ^h) > \frac{\nu _\tau }{\tau }\), we suppose that this is due to image details contained in the local residual image. In this situation, we intend to decrease \(\alpha \) in the local regions \(\mathcal {I}_{i,j}\). In particular, we define, similar as in [37, 58], the local quantity \(f^\omega _{i,j}\) by
Note that \(\frac{\nu _\tau }{\tau f^\omega _{i,j}}\le 1\) for all i, j and hence we set
On the other hand, for small \(\alpha >0\) we get an undersmoothed image \(u_\alpha ^h\), which still contains noise, i.e., we expect \(H_\tau (u_\alpha ^h)< B_\tau \). Analogously, if \(S_{i,j}^{\tau }(u_\alpha ^h) \le \frac{\nu _\tau }{\tau }\), we suppose that there is still noise left outside the residual image in \(\mathcal {I}_{i,j}\). Hence, we intend to increase \(\alpha \) in the local regions \(\mathcal {I}_{i,j}\) by defining
and setting \(\alpha \) as in (4.7). Notice that now \(\frac{\nu _\tau }{\tau f^\omega _{i,j}}\ge 1\). These considerations lead to the following locally adapted total variation algorithm.
Here and below \(\varepsilon >0\) is a small constant (e.g., in our experiments we choose \(\varepsilon =10^{-14}\)) to ensure that \(f_{i,j}^\omega >0\), since it may happen that \(S_{i,j}^{\tau }(u_{\alpha _n}^h)=0\).
If \(H_\tau (u_{\alpha _0}^h) >B_\tau (u_{\alpha _0}^h)\), we stop the algorithm as soon as the residual \(H_\tau (u_{\alpha _n}^h)<B_\tau (u_{\alpha _n}^h)\) for the first time and set the desired locally varying \(\alpha ^*=\alpha _{n}\). If \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), we stop the algorithm as soon as the residual \(H_\tau (u_{\alpha _n}^h)>B_\tau (u_{\alpha _n}^h)\) for the first time and set the desired locally varying \(\alpha ^*=\alpha _{n-1}\), since \(H_\tau (u_{\alpha _{n-1}}^h)\le \frac{\nu _\tau (u_{\alpha _{n-1}}^h)}{\tau }\).
The LATV-algorithm has the following monotonicity properties with respect to \((\alpha _n)_n\).
Proposition 4.3
Assume \(\mathcal {I}_{i,j} = \varOmega _{i,j}^\omega \) and let \(\varepsilon >0\) be sufficiently small. If \(\alpha _0 >0\) such that \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), then the LATV-algorithm generates a sequence \((\alpha _n)_n\) such that
Proof
By the same argument as in the proof of Proposition 4.1, we obtain
Note that \(\nu _\tau (\cdot )\) is bounded from below, see Sect. 3.1. Consequently, there exists an \(\varepsilon >0\) such that \(\frac{\nu _\tau (u^h)}{\tau } \ge \varepsilon \) for any \(u^h\). Then, since \(H_\tau (u_{\alpha _0}^h)\le B_\tau (u_{\alpha _0}^h)\) we have by the LATV-algorithm that
and hence \(\sum _{i,j} (\alpha _{n+1})(x_{i,j}) > \sum _{i,j} (\alpha _{n})(x_{i,j}).\) \(\square \)
Proposition 4.4
Let \(\mathcal {I}_{i,j} = \tilde{\varOmega }_{i,j}^\omega \) and \(\varepsilon >0\) be sufficiently small.
-
(i)
If \(\alpha _0 >0\) such that \(H_\tau (u_{\alpha _0}^h) > B_\tau (u_{\alpha _0}^h)\), then the LATV-algorithm generates a sequence \((\alpha _n)_n\) such that
$$\begin{aligned} \sum _{i,j} (\alpha _{n+1})(x_{i,j}) \le \sum _{i,j} (\alpha _{n})(x_{i,j}). \end{aligned}$$ -
(ii)
If \(\alpha _0 >0\) such that \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), then the LATV-algorithm generates a sequence \((\alpha _n)_n\) such that
$$\begin{aligned} \sum _{i,j} (\alpha _{n+1})(x_{i,j}) \ge \sum _{i,j} (\alpha _{n})(x_{i,j}). \end{aligned}$$
Proof
-
(i)
By the same argument as in the proof of Proposition 4.1 and since \(f_{i,j}^{\omega }:= \max \left\{ S_{i,j}^{\tau }(u_{\alpha _n}^h),\frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right\} \ge \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\), we obtain
$$\begin{aligned} \begin{aligned} \sum _{i,j} \alpha _{n+1}(x_{i,j})&= \sum _{i,j} \left( \frac{\nu _\tau (u_{\alpha _n}^h)^p}{\tau ^p M_{i,j}}\sum _{x_{s,t}\in \tilde{\varOmega }_{i,j}^\omega } \frac{\alpha _n(x_{s,t})}{(f_{s,t}^\omega )^p}\right) \\&=\sum _{i,j} \left( \left( \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right) ^p \frac{\alpha _n(x_{i,j})}{(f_{i,j}^\omega )^p}\right) \\&\le \sum _{i,j} \alpha _{n}(x_{i,j}). \end{aligned} \end{aligned}$$ -
(ii)
Since \(\nu _\tau (\cdot )\) is bounded from below, see Sect. 3.1, there exists an \(\varepsilon >0\) such that \(\frac{\nu _\tau (u^h)}{\tau } \ge \varepsilon \) for any \(u^h\). Hence
$$\begin{aligned} \begin{aligned} f_{i,j}^\omega&:= \max \left\{ \min \left\{ S_{i,j}^{\tau }(u_{\alpha _n}^h),\frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right\} ,\varepsilon \right\} \\&\le \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau } \end{aligned} \end{aligned}$$and by the same arguments as above we get
$$\begin{aligned} \begin{aligned} \sum _{i,j} \alpha _{n+1}(x_{i,j})&=\sum _{i,j} \left( \left( \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right) ^p \frac{\alpha _n(x_{i,j})}{(f_{i,j}^\omega )^p}\right) \\&\ge \sum _{i,j} \alpha _{n}(x_{i,j}). \end{aligned} \end{aligned}$$\(\square \)
In contrast to the pAPS-algorithm in the LATV-algorithm, the power \(p>0\) is not changed during the iterations and should be chosen sufficiently small, e.g., we set \(p=\frac{1}{2}\) in our experiments. Note that small p only allows small changes of \(\alpha \) in each iteration. In this way, the algorithm is able to generate a function \(\alpha ^*\) such that \(H_\tau (u_{\alpha ^*}^h)\) is very close to \(\frac{\nu _\tau (u_{\alpha ^*}^h)}{\tau }\). On the contrary, small p has the drawback that the number of iterations till termination is kept large. Since the parameter p has to be chosen manually, the LATV-algorithm, at least in the spirit, seems to be similar to Uzawa’s method, where also a parameter has to be chosen. The proper choice of such a parameter might be complicated and hence we are desiring for an algorithm where we do not have to tune parameters manually. Because of this and motivated by the pAPS-algorithm, we propose the following p adaptive algorithm:
In our numerical experiments, this algorithm is terminated as soon as \(|H_\tau (u_{\alpha _n}^h) - B_\tau (u_{\alpha _n}^h)| \le 10^{-6}\) and \(H_\tau (u_{\alpha _{n}}^h) \le B_\tau (u_{\alpha _{n}}^h)\). Additionally, we stop iterating when p is less than machine precision, since then anyway no progress is to be expected. Due to the adaptive choice of p, we obtain a monotonic behavior of the sequence \((\alpha _n)_n\).
Proposition 4.5
The sequence \((\alpha _n)_n\) generated by the pLATV-algorithm is for any point \(x\in \varOmega \) monotone. In particular, it is monotonically decreasing for \(\alpha _0\) such that \(H_\tau (u_{\alpha _0}^h)>B_\tau (u_{\alpha _0}^h)\) and monotonically increasing for \(\alpha _0\) such that \(H_\tau (u_{\alpha _0}^h)\le B_\tau (u_{\alpha _0}^h)\).
Proof
For \(H_\tau (u_{\alpha _0}^h) > \mathcal {B}_\tau (u_{\alpha _0}^h)\), we can show by induction that by the pLATV-algorithm \(f_{i,j}^\omega \ge \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\) and hence \(1 \ge \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau f_{i,j}^\omega }\) for all n. Then by the definition of \(\alpha _{n+1}\) it follows
By similar arguments, we obtain for \(\alpha _0\) with \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\) that \( \alpha _{n+1}(x_{i,j}) \ge \alpha _n(x_{i,j}) \) for all \(x_{i,j}\in \varOmega \). \(\square \)
We are aware of the fact that using \(u_{\alpha _n}^h\) as a reference image in the LATV- and pLATV-algorithm to compute \(\nu _\tau \) may commit errors. However, we recall that for Gaussian noise we set \(\nu _2=\sigma ^2\) and for salt-and-pepper noise with \(r_1=r_2\) we have \(\nu _1= r_1\). In these cases, \(\nu _\tau \) does not depend on the original image and hence we do not commit any error by computing \(\nu _\tau \). For random-valued impulse noise-corrupted images, the situation is different and \(\nu _1\) indeed depends on the true image. In this situation, errors may be committed when \(u_{\alpha _n}^h\) is used as a reference image for calculating \(\nu _\tau \); see Figs. 2 and 3. Hence, in order to improve the proposed algorithm, for such cases for future research it might be of interest to find the optimal reference image to obtain a good approximation of the real value \(\nu _\tau \).
In contrast to the SA-TV algorithm presented in [37, 58], where the initial regularization parameter has to be chosen sufficiently small, in the LATV-algorithm as well as in the pLATV-algorithm the initial value \(\alpha _0\) can be chosen arbitrarily positive. However, in the case \(H_\tau (u_{\alpha _0}^h) > B_\tau (u_{\alpha _0}^h)\) we cannot guarantee in general that the solution \(u_{\alpha }\) obtained by the pLATV-algorithm fulfills \(H_\tau (u_{\alpha }^h) \le B_\tau (u_{\alpha }^h)\), not even if \(B_\tau (\cdot )\) is constant, due to the stopping criterion with respect to the power p. On the contrary, if \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), then the pLATV-algorithm generates a sequence \((u_{\alpha _n}^h)_n\) such that \(H_\tau (u_{\alpha _n}^h) \le B_\tau (u_{\alpha _n}^h)\) for all n and hence also for the solution of the algorithm. As a consequence, we would wish to choose \(\alpha _0>0\) such that \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), which may be realized by the following simple automated procedure:
5 Total Variation Minimization
In this section, we are concerned with developing numerical methods for computing a minimizer of the discrete multiscale \(L^{\tau }\)-TV model, i.e.,
5.1 \(L^2\)-TV Minimization
Here we consider the case \(\tau =2\), i.e., the minimization problem
and present solution methods, first for the case \(T^h=I\) and then for general linear bounded operators \(T^h\).
5.1.1 An Algorithm for Image Denoising
If \(T^h=I\), then (5.2) becomes an image denoising problem, i.e., the minimization problem
For solving this problem, we use the algorithm of Chambolle and Pock [22], which leads to the following iterative scheme:
In our numerical experiments, we choose \(\theta =1\). In particular, in [22] it is shown that for \(\theta =1\) and \(\tau \sigma \Vert \nabla ^h\Vert ^2< 1\) the algorithm converges.
5.1.2 An Algorithm for Linear Bounded Operators
Assume that \(T^h\) is a linear bounded operator from X to X, different to the identity I. Then instead of minimizing (5.2) directly, we introduce the surrogate functional
with \(a^h,u^h\in X, z(a^h)= a^h - \frac{1}{\delta }(T^h)^*(T^h a^h -g^h), \psi \) a function independent of \(u^h\), and where we assume \(\delta > \Vert T^h\Vert ^2\); see [31, 32]. Note that
and hence to obtain a minimizer amounts to solve a minimization problem of the type (5.3) and can be solved as described in Sect. 5.1.1. Then an approximate solution of (5.2) can be computed by the following iterative algorithm: Choose \(u_{0}^h \in X\) and iterate for \(n\ge 0\)
For scalar \(\alpha \), it is shown in [28, 31, 32] that this iterative procedure generates a sequence \((u_n^h)_n\) which converges to a minimizer of (5.2). This convergence property can be easily extended to our nonscalar case yielding the following result.
Theorem 5.1
For \(\alpha : \varOmega \rightarrow \mathbb R^+\) the scheme in (5.5) generates a sequence \((u_n^h)_n\), which converges to a solution of (5.2) for any initial choice of \(u_0^h\in X\).
Proof
A proof can be accomplished analogous to [32]. \(\square \)
5.2 An Algorithm for \(L^1\)-TV Minimization
The computation of a minimizer of
is due to the nonsmooth \(\ell ^1\)-term in general more complicated than obtaining a solution of the \(L^2\)-TV model. Here we suggest to employ a trick, proposed in [5] for \(L^1\)-TV minimization problems with a scalar regularization parameter, to solve (5.6) in two steps. In particular, we substitute the argument of the \(\ell ^1\)-norm by a new variable v, penalize the functional by an \(L^2\)-term, which should keep the difference between v and \(Tu-g\) small, and minimize with respect to v and u. That is, we replace the original minimization (5.6) by
where \(\gamma >0\) is small, so that we have \(g^h\approx T^h u^h - v^h\). Actually, it can be shown that (5.7) converges to (5.6) as \(\gamma \rightarrow 0\). In our experiments, we actually choose \(\gamma =10^{-2}\). This leads to the following alternating algorithm.
The minimizer \(v_{n+1}^h\) in step 1) of the \(L^1\)-TV\(_\alpha \) algorithm can be easily computed via a soft-thresholding, i.e., \(v_{n+1}^h={\text {ST}}(T^h u_n^h - g^h,\gamma )\), where
for all \(x\in \varOmega ^h\). The minimization problem in step 2) is equivalent to
and hence is of the type (5.2). Thus, an approximate solution of (5.8) can be computed as described above; see Sect. 5.1.
Theorem 5.2
The sequence \((u_n^h,v_n^h)_n\) generated by the \(L^1\)-TV\(_\alpha \) algorithm converges to a minimizer of (5.7).
Proof
The statement can be shown analogous to [5]. \(\square \)
5.3 A Primal–Dual Method for \(L^1\)-TV Minimization
For solving (1.6) with \(\tau =1\), we suggest, alternatively to the above method, to use the primal–dual method of [58] adapted to our setting, where a Huber regularization of the gradient of u is considered; see [58] for more details. Denoting by \(\bar{u}\) a corresponding solution of the primal problem and \(\bar{\mathbf {p}}\) the solution of the associated dual problem, the optimality conditions due to the Fenchel theorem [39] are given by
for all \(x\in \varOmega \), where \(\kappa , \beta , \mu \), and \(\gamma \) are fixed positive constants. The latter two conditions can be summarized to \(-\bar{\mathbf {p}}(x) = \frac{\alpha (x) \nabla \bar{u}(x)}{\max \{\gamma \alpha (x), |\nabla \bar{u}(x)|_{l^2} \}}\). Then setting \(\bar{\mathbf {q}} = - \bar{\mathbf {p}}\) and \(\bar{v} = \frac{T\bar{u} - g}{\max \{\beta , |T \bar{u} - g |\}}\) leads to the following system of equation:
for all \(x\in \varOmega \). This system can be solved efficiently by a semismooth Newton algorithm; see Appendix for a description of the method and for the choice of the parameters \(\kappa , \beta , \mu \), and \(\gamma \).
Note that different algorithms presented in the literature can also be adjusted to the case of a locally varying regularization parameter, such as [18, 21, 68]. However, it is not the scope of this paper to compare different algorithms in order to detect the most efficient one, although this is an interesting research topic in its own right.
6 Numerical Experiments
In the following, we present numerical experiments for studying the behavior of the proposed algorithms (i.e., APS-, pAPS-, LATV-, pLATV-algorithm) with respect to its image restoration capabilities and its stability concerning the choice of the initial value \(\alpha _0\). The performance of these methods is compared quantitatively by means of the peak signal-to-noise ratio (PSNR) [11], which is widely used as an image quality assessment measure and the structural similarity measure (MSSIM) [90], which relates to perceived visual quality better than PSNR. When an approximate solution of the \(L^1\)-TV model is calculated, we also compare the restorations by the mean absolute error (MAE), which is an \(L^1\)-based measure defined as
where \(\hat{u}\) denotes the true image and u represents the obtained restoration. In general, when comparing PSNR and MSSIM, large values indicate better reconstruction than smaller values, while the smaller the MAE becomes, the better the reconstruction results are.
Whenever an image is corrupted by Gaussian noise, we compute a solution by means of the (multiscale) \(L^2\)-TV model, while for images containing impulsive noise the (multiscale) \(L^1\)-TV model is always considered.
In our numerical experiments, the CPS-, APS-, and pAPS-algorithm are terminated as soon as
or the norm of the difference of two successive iterates \(\alpha _{n}\) and \(\alpha _{n+1}\) drops below the threshold \(\epsilon _\alpha =10^{-10}\), i.e., \(\Vert \alpha _{n} - \alpha _{n+1}\Vert < \epsilon _\alpha \). The latter stopping criterion is used to terminate the algorithms if \((\alpha _n)_n\) stagnates and only very little progress is to be expected. In fact, if our algorithm converges at least linearly, i.e., if there exists an \(\varepsilon _\alpha \in (0,1)\) and an \(m>0\) such that for all \(n\ge m\) we have \(\Vert \alpha _{n+1} - \alpha _{\infty }\Vert < \varepsilon _\alpha \Vert \alpha _{n} - \alpha _{\infty }\Vert \), the second stopping criterion at least ensures that the distance between our obtained result \(\alpha \) and \(\alpha _{\infty }\) is \(\Vert \alpha - \alpha _{\infty }\Vert \le \frac{\epsilon _\alpha \varepsilon _\alpha }{1-\varepsilon _\alpha }\).
6.1 Automatic Scalar Parameter Selection
For automatically selecting the scalar parameter \(\alpha \) in (1.6), we presented in Sect. 3 the APS- and pAPS-algorithm. Here we compare their performance for image denoising and image deblurring (Fig. 4).
6.1.1 Gaussian Noise Removal
For recovering images corrupted by Gaussian noise with mean zero and standard deviation \(\sigma \), we minimize the functional in (1.6) by setting \(\tau =2\) and \(T=I\). Then \(\mathcal {B}_2=\frac{\sigma ^2}{2}|\varOmega |\) is a constant independent of u. The automatic selection of a suitable regularization parameter \(\alpha \) is here performed by the CPS-, APS-, and pAPS-algorithm, where the contained minimization problem is solved by the method presented in Sect. 5.1.1. We recall that by [18, Theorem 4] and Theorem 3.1 it is ensured that the CPS- and the pAPS-algorithm generate sequences \((\alpha _n)_n\) which converge to \(\bar{\alpha }\) such that \(u_{\bar{\alpha }}\) solves (1.8). In particular, in the pAPS-algorithm the value p is chosen in dependency of \(\alpha \), i.e., \(p=p(\alpha )\), such that \(\alpha \rightarrow \frac{({\mathcal H}_\tau (u_\alpha ;g))^{p(\alpha )}}{\alpha }\) is nonincreasing; see Fig. 5a. This property is fundamental for obtaining convergence of this algorithm; see Theorem 3.1. For the APS-algorithm, such a monotonic behavior is not guaranteed and hence we cannot ensure its convergence. Nevertheless, if the APS-algorithm generates \(\alpha \)’s such that the function \(\alpha \rightarrow \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is nonincreasing, then it indeed converges to the desired solution; see Theorem 3.2. Unfortunately, the nonincrease of the function \(\alpha \rightarrow \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) does not hold always, see Fig. 5b.
For performance comparison, here we consider the phantom-image of size \(256 \times 256\) pixels, see Fig. 4a, corrupted only by Gaussian white noise with \(\sigma =0.3\). In the pAPS-algorithm we set \(p_0=32\). The behavior of the sequence \((\alpha _n)_n\) for different initial \(\alpha _0\), i.e., \(\alpha _0 \in \{1, 10^{-1}, 10^{-2}\}\) is depicted in Fig. 6. We observe that all three methods for arbitrary \(\alpha _0\) converge to the same regularization parameter \(\alpha \) and hence generate results with the same PSNR and MSSIM (i.e., PSNR\(=19.84\) and MSSIM\(=0.7989\)). Hence, in these experiments, despite the lack of theoretical convergence, also the APS-algorithm seems to converge to the desired solutions. We observe the same behavior for different \(\sigma \) as well.
Looking at the number of iterations needed till termination, we observe from Table 1 that the APS-algorithm always needs significantly less iterations than the CPS-algorithm till termination. This is attributed to the different updates of \(\alpha \). Recall that for a fixed \(\alpha _n\) in the CPS-algorithm we set \(\alpha ^{CPS}_{n+1}:=\sqrt{\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}}\alpha _n\), while in the APS-algorithm the update is performed as \(\alpha ^{APS}_{n+1}:=\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\alpha _n\). Note that \( \sqrt{\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}} \le \frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)} \), if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\ge 1\) and \( \sqrt{\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}} \ge \frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)} \), if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\le 1\). Hence, we obtain \(\alpha _n \le \alpha ^{CPS}_{n+1} \le \alpha ^{APS}_{n+1}\) if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\ge 1\) and \(\alpha _n\ge \alpha ^{CPS}_{n+1} \ge \alpha ^{APS}_{n+1}\) if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\le 1\). That is, in the APS-algorithm \(\alpha \) changes more significantly in each iteration than in the CPS-algorithm, which leads to a faster convergence with respect to the number of iterations. Nevertheless, exactly this behavior allows the function \(\alpha \rightarrow \frac{{\mathcal H}_2(u_\alpha ;g)}{\alpha }\) to increase which is responsible for that the convergence of the APS-algorithm is not guaranteed in general. However, in our experiments we observed that the function \(\alpha \rightarrow \frac{{\mathcal H}_2(u_\alpha ;g)}{\alpha }\) only increases in the first iterations, but does not increase (actually even decreases) afterwards; see Fig. 5b. This is actually enough to guarantee convergence, as discussed in Sect. 3, since we can consider the solution of the last step in which the desired monotonic behavior is not fulfilled as a “new” initial value. Since from this point on the nonincrease holds, we get convergence of the algorithm.
The pAPS-algorithm is designed to ensure the nonincrease of the function \(\alpha \rightarrow \frac{({\mathcal H}_\tau (u_\alpha ;g))^{p(\alpha )}}{\alpha }\) by choosing \(p(\alpha )\) in each iteration accordingly, which is done by the algorithm automatically. If \(p(\alpha )=p=1/2\) in each iteration, then the pAPS-algorithm becomes the CPS-algorithm, as it happens sometimes in practice (indicated by the same number of iterations in Table 1). Since the CPS-algorithm converges [18], the pAPS-algorithm always yields \(p\ge 1/2\). In particular, we observe that if the starting value \(\alpha _0\) is larger than the requested regularization parameter \(\alpha \), less iterations till termination are needed than with the CPS-algorithm. On the contrary, if \(\alpha _0\) is smaller than the desired \(\alpha \), \(p=1/2\) is chosen by the algorithm to ensure the monotonicity. The obtained result of the pAPS-algorithm is independent of the choice of \(p_0\) as visible from Fig. 7. In this plot, we also specify the number of iterations needed till termination. On the optimal choice of \(p_0\) with respect to the number of iterations, we conclude from Fig. 7 that \(p_0=32\) seems to do a good job, although the optimal value may depend on the noise level.
Similar behaviors as described above are also observed for denoising other and real images as well.
6.1.2 Image Deblurring
Now, we consider the situation when an image is corrupted by some additive Gaussian noise and additionally blurred. Then the operator T is chosen according to the blurring kernel, which we assume here to be known. For testing the APS- and pAPS-algorithm in this case, we take the cameraman-image of Fig. 4b, which is of size \(256\times 256\) pixels, blur it by a Gaussian blurring kernel of size \(5\times 5\) pixels and standard deviation 10, and additionally add some Gaussian white noise with variance \(\sigma ^2\). The minimization problem in the APS- and pAPS-algorithm is solved approximately by the algorithm in (5.5). In Fig. 8, the progress of \(\alpha _n\) for different \(\sigma \)’s, i.e., \(\sigma \in \{0.3, 0.1, 0.05\}\), and different \(\alpha _0\)’s, i.e., \(\alpha _0 \in \{1, 10^{-1}, 10^{-2}\}\), is presented. In these tests, both algorithms converge to the same regularization parameter and minimizer. From the figure, we observe that the pAPS-algorithm needs much less iterations than the APS-algorithm till termination. This behavior might be attributed to the choice of the power p in the pAPS-algorithm, since we observe in all our experiments that \(p>1\) till termination.
6.1.3 Impulsive Noise Removal
It has been demonstrated that for removing impulsive noise in images one should minimize the \(L^1\)-TV model rather than the \(L^2\)-TV model. Then for calculating a suitable regularization parameter \(\alpha \) in the \(L^1\)-TV model we use the APS- and pAPS-algorithm, in which the minimization problems are solved approximately by the \(L^1\)-TV\(_\alpha \)-algorithm. Here, we consider the cameraman-image corrupted by salt-and-pepper noise or random-valued impulse noise with different noise levels, i.e., \(r_1=r_2 \in \{0.3, 0.1, 0.05\}\) and \(r \in \{0.3, 0.1, 0.05\}\), respectively. The obtained results for different \(\alpha _0\)’s are depicted in Figs. 9 and 10. For the removal of salt-and-pepper noise, we observe from Fig. 9 similar behaviors of the APS- and pAPS-algorithm as above for removing Gaussian noise. In particular, both algorithms converge to the same regularization parameter. However, in many cases the APS-algorithm needs significantly less iterations than the pAPS-algorithm. These behaviors are also observed in Fig. 10 for removing random-valued impulse noise as long as the APS-algorithm finds a solution. In fact, for \(r=0.05\) it actually does not converge but oscillates as depicted in Fig. 10c.
6.2 Locally Adaptive Total Variation Minimization
In this section, various experiments are presented to evaluate the performance of the LATV- and pLATV-algorithm presented in Sect. 4. Their performance is compared with the proposed pAPS-algorithm as well as with the SA-TV-algorithm introduced in [37] for \(L^2\)-TV minimization and in [58] for \(L^1\)-TV minimization. We recall that the SA-TV methods perform an approximate solution for the optimization problem in (1.10), respectively, and compute automatically a spatially varying \(\lambda \) based on a local variance estimation. However, as pointed out in [37, 58], they only perform efficiently when the initial \(\lambda \) is chosen sufficiently small, as we will do in our numerics. On the contrary, for the LATV- and pLATV-algorithm any positive initial \(\alpha _0\) is sufficient. For the comparison, we consider four different images, shown in Fig. 4, which are all of size \(256 \times 256\) pixels. In all our experiments, for the SA-TV-algorithm we use \(\mathcal {I}_{i,j}= \varOmega _{i,j}^\omega \), see [37], and we set the window-size to \(11 \times 11\) pixels in the case of Gaussian noise and to \(21 \times 21\) pixels in case of impulse noise. For the LATV- and pLATV-algorithm, we use the window-size \(\omega =11\), if not otherwise specified, and choose \(p_0=\frac{1}{2}\).
6.3 Gaussian Noise Removal
6.3.1 Dependency on the Initial Regularization Parameter
We start this section by investigating the stability of the SA-TV-, LATV-, and pLATV-algorithm with respect to the initial regularization parameter, i.e., \(\lambda _0\) for the SA-TV-algorithm and \(\alpha _0\) for the other algorithms, by denoising the cameraman-image corrupted by Gaussian white noise with standard deviation \(\sigma =0.1\). In this context, we also compare the difference of the pLATV-algorithm with and without using Algorithm 1 for computing automatically an initial parameter, where we set \(c_{\alpha _0}=\frac{1}{5}\). The minimization problems contained in the LATV- and pLATV-algorithm are solved as described in Sect. 5.1.1. For comparison reasons, we define the values PSNR\(_{\text {diff}}:= \max _{\alpha _0}\) PSNR\((\alpha _0) - \min _{\alpha _0}\) PSNR\((\alpha _0)\) and MSSIM\(_{\text {diff}}:= \max _{\alpha _0}\) MSSIM\((\alpha _0) - \min _{\alpha _0}\) MSSIM\((\alpha _0)\) to measure the variation of the considered quality measures. Here PSNR\((\alpha _0)\) and MSSIM\((\alpha _0)\) are the PSNR and MSSIM-values of the reconstructions, respectively, which are obtained from the considered algorithms when the initial regularization parameter is set to \(\alpha _0\). From Table 2, we observe that the pLATV-algorithm with and without Algorithm 1 is more stable with respect to the initial regularization parameter than the LATV-algorithm and the SA-TV-algorithm. This stable performance of the pLATV-algorithm is reasoned by the adaptivity of the value p, which allows the algorithm to reach the desired residual (at least very closely) for any \(\alpha _0\). As expected, the pLATV-algorithm with Algorithm 1 is even more stable with respect to \(\alpha _0\) than the pLATV-algorithm alone, since, due to Algorithm 1, the difference of the actually used initial parameters in the pLATV-algorithm is rather small leading to very similar results. Note that if \(\alpha _0\) is sufficiently small, then the pLATV-algorithm with and without Algorithm 1 coincides; see Table 2 for \(\alpha _0\in \{10^{-2},10^{-3},10^{-4}\}\). Actually, in the rest of our experiments we choose \(\alpha _0\) always so small that Algorithm 1 returns the inputted \(\alpha _0\).
6.3.2 Dependency on the Local Window
In Table 3, we report on the performance tests of the pLATV-algorithm with respect to the chosen type of window, i.e., \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}^\omega \) and \(\mathcal {I}_{i,j}={\varOmega }_{i,j}^\omega \). We observe that independently which type of window is used the algorithm finds nearly the same reconstruction. This may be attributed to the fact that the windows in the interior are the same for both types of windows. Nevertheless, the boundaries are treated differently, which leads to different theoretical results, but seems not to have a significant influence on the practical behavior. A similar behavior is observed for the LATV-algorithm, as the LATV- and pLATV-algorithm return nearly the same reconstructions as observed below in Table 4. Since for both types of windows nearly the same results are obtained, in the rest of our experiments we limit ourselves to always set \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}^\omega \) in the LATV- and pLATV-algorithm.
Next, we test the pLATV-algorithm for different values of the window-size varying from 3 to 15. Fig. 11 shows the PSNR and MSSIM of the restoration of the cameraman-image degraded by different types of noise (i.e., Gaussian noise with \(\sigma =0.3\) or \(\sigma =0.1\), salt-and-pepper noise with \(r_1=r_2=0.3\) or \(r_1=r_2=0.1\), or random-valued impulse noise with \(r=0.3\) or \(r=0.1\)), where the pLATV-algorithm with \(\alpha _0=10^{-2}\) and \(p_0=1/2\) is used. We observe that the PSNR and MSSIM are varying only slightly with respect to changing window-size. However, in the case of Gaussian noise elimination the PSNR and MSSIM increase very slightly with increasing window-size, while in the case of impulse noise contamination such a behavior cannot be observed. In Fig. 11, we also specify the number of iterations needed till termination of the algorithm. From this, we observe that a larger window-size results in most experiments in more iterations.
6.3.3 Homogeneous Noise
Now, we test the algorithms for different images corrupted by Gaussian noise with zero mean and different standard deviations \(\sigma \), i.e., \(\sigma \in \{0.3, 0.1, 0.05, 0.01\}\). The initial regularization parameter \(\alpha _0\) is set to \(10^{-4}\) in the pAPS-, LATV-, and pLATV-algorithm. In the SA-TV-algorithm, we choose \(\lambda _0=10^{-4}\), which seems sufficiently small. From Table 4, we observe that all considered algorithms behave very similar. However, for \(\sigma \in \{ 0.1, 0.05, 0.01\}\) the SA-TV-algorithm most of the times performs best with respect to PSNR and MSSIM, while sometimes the LATV- and pLATV-algorithm have larger PSNR and MSSIM. That is, looking at these quality measures a locally varying regularization weight is preferred to a scalar one, as long as \(\sigma \) is sufficiently small. In Fig. 12, we present the reconstructions obtained via the considered algorithms and we observe that the LATV- and pLATV-algorithm generate visually the best results, while the result of the SA-TV-algorithm seems in some parts oversmoothed. For example, the very left tower in the SA-TV-reconstruction is completely vanished. This object is in the other restorations still visible. For large standard deviations, i.e. \(\sigma =0.3\), we observe from Table 4 that the SA-TV method performs clearly worse than the other methods, while the pAPS-algorithm usually has larger PSNR and the LATV- and pLATV-algorithm have larger MSSIM. Hence, whenever the noise level is too large and details are considerably lost due to noise, the locally adaptive methods are not able to improve the restoration quality.
6.3.4 Nonhomogeneous Noise
For this experiment, we consider the cameraman-image degraded by Gaussian white noise with variance \(\sigma ^2 = 0.0025\) in the whole domain \(\varOmega \) except a rather small area (highlighted in red in Fig. 13a), denoted by \(\tilde{\varOmega }\), where the variance is 6 times larger, i.e., \(\sigma ^2=0.015\) in this part. Since the noise level is in this application not homogeneous, the pLATV-algorithm presented in Sect. 4 has to be adjusted to this situation accordingly. This can be done by making \(\nu _\tau \) (here \(\tau =2\)) locally dependent and we write \(\nu _\tau =\nu _\tau (\hat{u})(x)\) to stress the dependency on the true image \(\hat{u}\) and on the location \(x\in \varOmega \) in the image. In particular, for our experiment we set \(\nu _2=0.015\) in \(\tilde{\varOmega }\), while \(\nu _2=0.0025\) in \(\varOmega \setminus \tilde{\varOmega }\). Since \(\nu _\tau \) is now varying, we also have to adjust the definition of \(\mathcal {B}_\tau \) and \(B_\tau \) to
respectively, for the continuous and discrete setting. Making these adaptations allows us to apply the pLATV-algorithm as well as the pAPS-algorithm to the application of removing nonuniform noise.
The reconstructions obtained by the pAPS-algorithm (with \(p_0=32\) and \(\alpha _0=10^{-2}\)) and by the pLATV-algorithm (with \(p_0=1/2\) and \(\alpha _0=10^{-2}\)) are shown in Figs. 13b and c, respectively. Due to the adaptive choice of \(\alpha \), see Fig. 13d where light colors indicate a large value, the pLATV-algorithm is able to remove all the noise considerably, while the pAPS-algorithm returns a restoration, which still retains noise in \(\tilde{\varOmega }\).
6.4 Deblurring and Gaussian Noise Removal
The performance of the algorithms for restoring images corrupted by Gaussian blur with blurring kernel of size \(5\times 5\) pixels and standard deviation 10 and additive Gaussian noise with standard deviation \(\sigma \) is reported in Table 5. Here we observe that the LATV- as well as the pLATV-algorithm outperforms the SA-TV-algorithm for nearly any example. This observation is also clearly visible in Fig. 14, where the SA-TV-algorithm produces a still blurred output. The pAPS-algorithm generates very similar reconstructions as the LATV- and pLATV-algorithm, which is also reflected by similar PSNR and MSSIM. Similarly as before, the pAPS-algorithm performs best when \(\sigma =0.3\), while for smaller \(\sigma \) the LATV-algorithm has always the best PSNR.
6.5 Impulse Noise Removal
Since it turns out that the LATV- and pLATV-algorithm produce nearly the same output, here, we compare only our pAPS- and pLATV-algorithm for \(L^1\)-TV minimization, with the SA-TV method introduced in [58], where a semismooth Newton method is used to generate an estimate of the minimizer of (1.10). For the sake of a fair comparison, an approximate solution of the minimization problem in the pLATV-algorithm is solved by the semismooth Newton method described in Appendix. For the SA-TV method, we use the parameters suggested in [58] and hence \(\mathcal {I}_{i,j}=\varOmega _{i,j}^\omega \). Moreover, we set \(\lambda _0=0.2\) in our experiments, which seems sufficiently small. In Tables 6 and 7, we report on the results obtained by the pAPS-, SA-TV-, and pLATV-algorithm for restoring images corrupted by salt-and-pepper noise or random-valued impulse noise, respectively. While the pAPS- and pLATV-algorithm produce quite similar restorations for both type of noises, the SA-TV algorithm seems to be outperformed in most examples. For example, in Fig. 15 we observe that the pAPS- and pLATV-algorithm remove the noise considerably, while the solution of the SA-TV method still contains noise. On the contrary, for the removal of random-valued impulse noise in Fig. 16 we see that all three methods produce similar restorations.
7 Conclusion and Extensions
For \(L^1\)-TV and \(L^2\)-TV minimization including convolution type of problems, automatic parameter selection algorithms for scalar and locally dependent weights \(\alpha \) are presented. In particular, we introduce the APS- and pAPS-algorithm for automatically determining a suitable scalar regularization parameter. While for the APS-algorithm its convergence only under some assumptions is shown, the pAPS-algorithm is guaranteed to converge always. Besides the general applicability of these two algorithms, they also possess a fast numerical convergence in practice.
In order to treat homogeneous regions differently than fine features in images, which promises a better reconstruction, cf. Proposition 4.2 and Remark 4.1, algorithms for automatically computing locally adapted weights \(\alpha \) are proposed. These methods are much more stable with respect to the initial \(\alpha _0\) than the state-of-the-art SA-TV method. Moreover, while in the SA-TV-algorithm the initial \(\lambda _0>0\) has to be chosen sufficiently small, in our proposed methods any arbitrary \(\alpha _0>0\) is allowed. Hence, the LATV- and pLATV-algorithm are much more flexible with respect to the initialization. By numerical experiments, it is shown that the reconstructions obtained by the newly introduced algorithms are similar with respect to image quality measure to the restorations obtained by the SA-TV algorithm. In the case of Gaussian noise removal (including deblurring) for sufficiently small noise levels, reconstructions obtained by locally varying weights seem to be qualitatively better than the results with scalar parameters. On the contrary, for removing impulse noise a spatially varying \(\alpha \) or \(\lambda \) is in general not always improving the restoration quality.
For computing a minimizer of the respective multiscale total variation model, we present first- and second-order methods and show their convergence to a respective minimizer.
Although the proposed parameter selection algorithms are constructed to estimate the parameter \(\alpha \) in (1.6) and (1.9), they can be easily adjusted to find a good candidate \(\lambda \) in (1.7) and (1.10), respectively, as well.
Note that the proposed parameter selection algorithms are not restricted to total variation minimization, but may be extended to other types of regularizers as well by imposing respective assumptions that guarantee a minimizer of the considered optimization problem. In order to obtain similar (convergence) results as presented in Sects. 3 and 4, the considered regularizer should be convex, lower semicontinuous, and one-homogeneous. In particular for proving convergence results as in Sect. 3 (cf. Theorem 3.1 and Theorem 3.2), an equivalence of the penalized and corresponding constrained minimization problem, as in Theorem 2.2, is needed. An example of such a regularizer for which the presented algorithms are extendable is the total generalized variation [13].
References
Alliney, S.: A property of the minimum vectors of a regularizing functional defined by means of the absolute norm. Signal Processing, IEEE Transactions on 45(4), 913–917 (1997)
Almansa, A., Ballester, C., Caselles, V., Haro, G.: A TV based restoration model with local constraints. J. Sci. Comput. 34(3), 209–236 (2008)
Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford University Press, New York (2000)
Aubert, G., Aujol, J.F.: A variational approach to removing multiplicative noise. SIAM Journal on Applied Mathematics 68(4), 925–946 (2008)
Aujol, J.F., Gilboa, G., Chan, T., Osher, S.: Structure-texture image decomposition - modeling, algorithms, and parameter selection. International Journal of Computer Vision 67(1), 111–136 (2006). doi:10.1007/s11263-006-4331-z
Babacan, S.D., Molina, R., Katsaggelos, A.K.: Parameter estimation in TV image restoration using variational distribution approximation. Image Processing, IEEE Transactions on 17(3), 326–339 (2008)
Bartels, S.: Numerical Methods for Nonlinear Partial Differential Equations, vol. 14. Springer, (2015)
Bertalmío, M., Caselles, V., Rougé, B., Solé, A.: TV based image restoration with local constraints. Journal of scientific computing 19(1–3), 95–122 (2003)
Blomgren, P., Chan, T.F.: Modular solvers for image restoration problems using the discrepancy principle. Numerical linear algebra with applications 9(5), 347–358 (2002)
Blu, T., Luisier, F.: The SURE-LET approach to image denoising. Image Processing, IEEE Transactions on 16(11), 2778–2786 (2007)
Bovik, A.C.: Handbook of Image and Video Processing. Academic press, (2010)
Braides, A.: \(\Gamma \)-Convergence for Beginners, Oxford Lecture Series in Mathematics and its Applications, vol. 22. Oxford University Press, Oxford (2002)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–526 (2010)
Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)
Cai, J.F., Chan, R.H., Nikolova, M.: Two-phase approach for deblurring images corrupted by impulse plus Gaussian noise. Inverse Problems and Imaging 2(2), 187–204 (2008)
Calatroni, L., Chung, C., De Los Reyes, J.C., Schönlieb, C.B., Valkonen, T.: Bilevel approaches for learning of variational imaging models. arXiv preprint arXiv:1505.02120 (2015)
Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Information Theory, IEEE Transactions on 52(2), 489–509 (2006)
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vision 20(1–2), 89–97 (2004). Special issue on mathematics and image analysis
Chambolle, A., Darbon, J.: On total variation minimization and surface evolution using parametric maximum flows. International journal of computer vision 84(3), 288–307 (2009)
Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Mathematical Programming pp. 1–35
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120–145 (2011)
Chan, T.F., Esedo\(\bar{{\rm g}}\)lu, S.: Aspects of total variation regularized \(L^1\) function approximation. SIAM J. Appl. Math. 65(5), 1817–1837 (2005)
Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999)
Chan, T.F., Shen, J., Zhou, H.M.: Total variation wavelet inpainting. Journal of Mathematical imaging and Vision 25(1), 107–125 (2006)
Chung, V.C., De Los Reyes, J.C., Schönlieb, C.B.: Learning optimal spatially-dependent regularization parameters in total variation image restoration. arXiv preprint arXiv:1603.09155 (2016)
Ciarlet, P.G.: Introduction to Numerical Linear Algebra and Optimisation. Cambridge Texts in Applied Mathematics. Cambridge University Press, Cambridge (1989). With the assistance of Bernadette Miara and Jean-Marie Thomas. Translated from the French by A. Buttigieg
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005). (electronic)
Darbon, J., Sigelle, M.: A fast and exact algorithm for total variation minimization. In: Pattern recognition and image analysis, pp. 351–359. Springer (2005)
Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation. I. Fast and exact optimization. J. Math. Imaging Vision 26(3), 261–276 (2006)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57(11), 1413–1457 (2004)
Daubechies, I., Teschke, G., Vese, L.: Iteratively solving linear inverse problems under general convex constraints. Inverse Probl. Imaging 1(1), 29–46 (2007)
De Los Reyes, J.C., Schönlieb, C.B.: Image denoising: Learning the noise model via nonsmooth PDE-constrained optimization. Inverse Probl. Imaging 7(4) (2013)
Deledalle, C.A., Vaiter, S., Fadili, J., Peyré, G.: Stein unbiased gradient estimator of the risk (sugar) for multiple parameter selection. SIAM Journal on Imaging Sciences 7(4), 2448–2487 (2014)
Dobson, D.C., Vogel, C.R.: Convergence of an iterative method for total variation denoising. SIAM J. Numer. Anal. 34(5), 1779–1791 (1997)
Dong, Y., Hintermüller, M., Neri, M.: An efficient primal-dual method for \(L^{1}\) TV image restoration. SIAM Journal on Imaging Sciences 2(4), 1168–1189 (2009)
Dong, Y., Hintermüller, M., Rincon-Camacho, M.M.: Automated regularization parameter selection in multi-scale total variation models for image restoration. J. Math. Imaging Vision 40(1), 82–104 (2011)
Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. Journal of the american statistical association 90(432), 1200–1224 (1995)
Ekeland, I., Témam, R.: Convex Analysis and Variational Problems, Classics in Applied Mathematics, vol. 28, english edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (1999). Translated from the French
Eldar, Y.C.: Generalized sure for exponential families: Applications to regularization. Signal Processing, IEEE Transactions on 57(2), 471–481 (2009)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems, Mathematics and its Applications, vol. 375. Kluwer Academic Publishers Group, Dordrecht (1996)
Fornasier, M., Naumova, V., Pereverzyev, S.V.: Parameter choice strategies for multipenalty regularization. SIAM Journal on Numerical Analysis 52(4), 1770–1794 (2014)
Getreuer, P., Tong, M., Vese, L.A.: A variational model for the restoration of MR images corrupted by blur and Rician noise. In: Advances in Visual Computing, pp. 686–698. Springer (2011)
Gilboa, G., Sochen, N., Zeevi, Y.Y.: Texture preserving variational denoising using an adaptive fidelity term. In: Proc. VLsM, vol. 3 (2003)
Giryes, R., Elad, M., Eldar, Y.C.: The projected GSURE for automatic parameter tuning in iterative shrinkage methods. Appl. Comput. Harmon. Anal. 30(3), 407–422 (2011)
Giusti, E.: Minimal Surfaces and Functions of Bounded Variation, Monographs in Mathematics, vol. 80. Birkhäuser Verlag, Basel (1984)
Goldstein, T., Osher, S.: The split Bregman method for \(L1\)-regularized problems. SIAM J. Imaging Sci. 2(2), 323–343 (2009)
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979)
Hansen, P.C.: Analysis of discrete ill-posed problems by means of the \({L}\)-curve. SIAM Rev. 34(4), 561–580 (1992)
Hansen, P.C., O’Leary, D.P.: The use of the \(L\)-curve in the regularization of discrete ill-posed problems. SIAM J. Sci. Comput. 14(6), 1487–1503 (1993)
He, C., Hu, C., Zhang, W., Shi, B.: A fast adaptive parameter estimation for total variation image restoration. Image Processing, IEEE Transactions on 23(12), 4954–4967 (2014)
Hintermüller, M., Kunisch, K.: Total bounded variation regularization as a bilaterally constrained optimization problem. SIAM Journal on Applied Mathematics 64(4), 1311–1333 (2004)
Hintermüller, M., Langer, A.: Subspace correction methods for a class of nonsmooth and nonadditive convex variational problems with mixed \(L^1/L^2\) data-fidelity in image processing. SIAM J. Imaging Sci. 6(4), 2134–2173 (2013)
Hintermüller, M., Langer, A.: Adaptive regularization for Parseval frames in image processing. SFB-Report No. 2014–014, 12 (2014)
Hintermüller, M., Langer, A.: Non-overlapping domain decomposition methods for dual total variation based image denoising. Journal of Scientific Computing 62(2), 456–481 (2015)
Hintermüller, M., Rautenberg, C.: Optimal selection of the regularization function in a generalized total variation model. Part I: Modelling and theory. WIAS preprint 2235 (2016)
Hintermüller, M., Rautenberg, C., Wu, T., Langer, A.: Optimal selection of the regularization function in a generalized total variation model. Part II: Algorithm, its analysis and numerical tests. WIAS preprint 2236 (2016)
Hintermüller, M., Rincon-Camacho, M.M.: Expected absolute value estimators for a spatially adapted regularization parameter choice rule in \(L^1\)-TV-based image restoration. Inverse Problems 26(8), 085,005, 30 (2010)
Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM J. Sci. Comput. 28(1), 1–23 (2006)
Kindermann, S., Osher, S., Jones, P.W.: Deblurring and denoising of images by nonlocal functionals. Multiscale Model. Simul. 4(4), 1091–1115 (2005). (electronic)
Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational models. SIAM Journal on Imaging Sciences 6(2), 938–983 (2013)
Langer, A.: Subspace correction and domain decomposition methods for total variation minimization. Ph.D. thesis, Johannes Kepler Universtät Linz (2011)
Langer, A., Osher, S., Schönlieb, C.B.: Bregmanized domain decomposition for image restoration. Journal of Scientific Computing 54(2–3), 549–576 (2013)
Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by Poisson noise. Journal of mathematical imaging and vision 27(3), 257–263 (2007)
Li, F., Ng, M.K., Shen, C.: Multiplicative noise removal with spatially varying regularization parameters. SIAM J. Imaging Sci. 3(1), 1–20 (2010)
Liao, H., Li, F., Ng, M.K.: Selection of regularization parameter in total variation image restoration. J. Opt. Soc. Amer. A 26(11), 2311–2320 (2009)
Lin, Y., Wohlberg, B., Guo, H.: UPRE method for total variation parameter selection. Signal Processing 90(8), 2546–2551 (2010)
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. Journal of Mathematical Imaging and Vision 51(2), 311–325 (2015)
Mallows, C.L.: Some comments on \(C_P\). Technometrics 15(4), 661–675 (1973)
Morozov, V.A.: Methods for Solving Incorrectly Posed Problems. Springer-Verlag, New York (1984). Translated from the Russian by A. B. Aries, Translation edited by Z. Nashed
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42(5), 577–685 (1989)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103((1, Ser. A)), 127–152 (2005)
Ng, M.K., Weiss, P., Yuan, X.: Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM journal on Scientific Computing 32(5), 2710–2736 (2010)
Nikolova, M.: Minimizers of cost-functions involving nonsmooth data-fidelity terms. Application to the processing of outliers. SIAM J. Numer. Anal. 40(3), 965–994 (2002). (electronic)
Nikolova, M.: A variational approach to remove outliers and impulse noise. Journal of Mathematical Imaging and Vision 20(1–2), 99–120 (2004)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4(2), 460–489 (2005). (electronic)
Papafitsoros, K., Schönlieb, C.B.: A combined first and second order variational approach for image reconstruction. Journal of mathematical imaging and vision 48(2), 308–338 (2014)
Ramani, S., Liu, Z., Rosen, J., Nielsen, J., Fessler, J.A.: Regularization parameter selection for nonlinear iterative image restoration and MRI reconstruction using GCV and SURE-based methods. IEEE Transactions on Image Processing 21(8), 3659–3672 (2012)
Rockafellar, R.T.: Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J (1970)
Rudin, L.I., Osher, S.: Total variation based image restoration with free local constraints. In: Image Processing, 1994. Proceedings. ICIP-94., IEEE International Conference, vol. 1, pp. 31–35. IEEE (1994)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1), 259–268 (1992)
Stein, C.M.: Estimation of the mean of a multivariate normal distribution. The annals of Statistics pp. 1135–1151 (1981)
Strong, D.M., Blomgren, P., Chan, T.F.: Spatially adaptive local-feature-driven total variation minimizing image restoration. In: Optical Science, Engineering and Instrumentation’97, pp. 222–233. International Society for Optics and Photonics (1997)
Strong, D.M., Chan, T.F.: Spatially and scale adaptive total variation based regularization and anisotropic diffusion in image processing. In: Diusion in Image Processing, UCLA Math Department CAM Report. Citeseer (1996)
Sutour, C., Deledalle, C.A., Aujol, J.F.: Adaptive regularization of the NL-means: Application to image and video denoising. Image Processing, IEEE Transactions on 23(8), 3506–3521 (2014)
Tadmor, E., Nezzar, S., Vese, L.: A multiscale image representation using hierarchical \((BV, L^2)\) decompositions. Multiscale Model. Simul. 2(4), 554–579 (2004). (electronic)
Tadmor, E., Nezzar, S., Vese, L.: Multiscale hierarchical decomposition of images with applications to deblurring, denoising and segmentation. Commun. Math. Sci. 6(2), 281–307 (2008)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Vh Winston, (1977)
Vogel, C.R.: Non-convergence of the \(L\)-curve regularization parameter selection method. Inverse Problems 12(4), 535–547 (1996)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. Image Processing, IEEE Transactions on 13(4), 600–612 (2004)
Weiss, P., Blanc-Féraud, L., Aubert, G.: Efficient schemes for total variation minimization under constraints in image processing. SIAM J. Sci. Comput. 31(3), 2047–2080 (2009)
Wen, Y.W., Chan, R.H.: Parameter selection for total-variation-based image restoration using discrepancy principle. IEEE Transactions on Image Processing 21(4), 1770–1781 (2012)
Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. UCLA CAM Report pp. 08–34 (2008)
Acknowledgments
The author would like to thank M. Monserrat Rincon-Camacho for providing the spatially adaptive parameter selection code for the \(L^1\)-TV model of [58].
Author information
Authors and Affiliations
Corresponding author
Appendix: Semismooth Newton Method for Solving (5.9)
Appendix: Semismooth Newton Method for Solving (5.9)
A semismooth Newton algorithm for solving (5.9) can be derived similar as in [58] by means of vector-valued variables. Therefore, let \(u^h\in \mathbb R^N\), \(q^h\in \mathbb R^{2N}\), \(\alpha ^h\in \mathbb R^N\), \(g^h\in \mathbb R^N\) where \(N=N_1 N_2\), denote the discrete image intensity, the dual variable, the spatially dependent regularization parameter, and the observed data vector, respectively. Correspondingly, we define \(\nabla ^h \in \mathbb R^{2N \times N}\) as the discrete gradient operator, \(\Delta ^h \in \mathbb R^{N\times N}\) as the discrete Laplace operator, \(T^h\in \mathbb R^{N\times N}\) as the discrete operator, and \((T^h)^t\) as the transpose of \(T^h\). Here \(|\cdot |\), \(\max \{\cdot , \cdot \}\), and \({\text {sign}}(\cdot )\) are understood for vectors in a componentwise sense. Moreover, we use the function \([|\cdot |] : \mathbb R^{2N} \rightarrow \mathbb R^{2N}\) with \([|v^h|]_i =[|v^h|]_{i+N} = \sqrt{(v^h_i)^2 + (v^h_{i+N})^2}\) for \(1\le i \le N\).
For solving (5.9) in every step of our Newton method we need to solve
where
\(e_N \in \mathbb R^{N}\) is the identity vector, \(D^h(v)\) is a diagonal matrix with the vector v in its diagonal, \(m_{\beta _k} = \max \{\beta , |T^h u_k^h - g^h| \}\), \(m_{\gamma _k} = \max \{\gamma \alpha ^h,[ |\nabla ^h u_k^h|]\}\),
Since the diagonal matrices \(D^h(m_{\beta _k}) \) and \(D^h(m_{\gamma _k}) \) are invertible, we eliminate \(\delta _v\) and \(\delta _q\) from (7.1), which leads to the following resulting system:
where
In general, \(B_k^h\) and hence \(H_k\) is not symmetric. In [59], it is shown that the matrix \(H_k^h\) at the solution \((u_k^h,v_k^h,q_k^h) = (\bar{u},\bar{v},\bar{q})\) is positive definite whenever
for \(i=1,\ldots , N\).
In case these two inequalities are not satisfied, we project \(q_k^h\) and \(v_k^h\) onto their feasible set, i.e., \(((q_k^h)_i,(q_k^h)_{i+N})\) is set to \((\alpha ^h)_i \max \{(\alpha ^h)_i, [|q_k^h|]_i\}^{-1} ( (q_k^h)_i, (q_k^h)_{i+N})\) and \((v_k^h)_i\) is replaced by \(\max \{1, (|v_k^h|)_i\} (v_k^h)_i\). Then the modified system matrix, denoted by \(H_k^+\), is positive definite; see [36]. As pointed out in [58], we may use \(H_k^+ + \varepsilon _k D^h(e_N)\) with \(\kappa =0\), \(\varepsilon _k>0\) and \(\varepsilon _k \downarrow 0\) as \(k\rightarrow \infty \) instead of \(\kappa >0\) to obtain a positive definite matrix. Then our semismooth Newton solver may be written as in [58]:
This algorithm converges at a superlinear rate, which follows from standard theory; see [52, 59].
In our experiments, we always choose \(\kappa =0\), \(\beta =10^{-3}\), \(\gamma =10^{-2}\), and \(\mu =10^{6}\).
Rights and permissions
About this article
Cite this article
Langer, A. Automated Parameter Selection for Total Variation Minimization in Image Restoration. J Math Imaging Vis 57, 239–268 (2017). https://doi.org/10.1007/s10851-016-0676-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-016-0676-2