Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising

Hui Shi¹,
Yann Traonmilin¹ &
Jean-François Aujol¹

137 Accesses
2 Altmetric
Explore all metrics

Abstract

We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well-adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database. To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch-based image denoising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Compressive Learning of Deep Regularization for Denoising

Sketched Learning for Image Denoising

Compressive Sensing and Neural Networks from a Statistical Learning Perspective

References

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet Google Scholar
Demoment, G.: Image reconstruction and restoration: overview of common estimation structures and problems. IEEE Trans. Acoust. Speech Signal Process. 37(12), 2024–2036 (1989)
Article Google Scholar
Engel, J., Resnick, C., Roberts, A., Dieleman, S., Norouzi, M., Eck, D., Simonyan, K.: Neural audio synthesis of musical notes with wavenet autoencoders. In: Int. Conf. on Machine Learning. pp. 1068–1077. PMLR (2017)
Gribonval, R., Blanchard, G., Keriven, N., Traonmilin, Y.: Compressive statistical learning with random feature moments. Math. Stat. Learn. 3(2), 113–164 (2021)
Article MathSciNet Google Scholar
Gribonval, R., Blanchard, G., Keriven, N., Traonmilin, Y.: Statistical learning guarantees for compressive clustering and compressive mixture modeling. Math. Stat. Learn. 3(2), 165–257 (2021)
Article MathSciNet Google Scholar
Gribonval, R., Chatalic, A., Keriven, N., Schellekens, V., Jacques, L., Schniter, P.: Sketching datasets for large-scale learning (long version). arXiv preprint arXiv:2008.01839 (2020)
Gribonval, R., Chatalic, A., Keriven, N., Schellekens, V., Jacques, L., Schniter, P.: Sketching data sets for large-scale learning: keeping only what you need. IEEE Signal Process. Mag. 38(5), 12–36 (2021)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Hurault, S., Leclaire, A., Papadakis, N.: Gradient step denoiser for convergent plug-and-play. In: International Conference on Learning Representations (2022)
Keriven, N., Bourrier, A., Gribonval, R., Pérez, P.: Sketching for large-scale learning of mixture models. Inf. Infer. 7(3), 447–508 (2018)
MathSciNet Google Scholar
Kobler, E., Effland, A., Kunisch, K., Pock, T.: Total deep variation: stable regularization method for inverse problems. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9163 (2021)
Article Google Scholar
Lunz, S., Öktem, O., Schönlieb, C.B.: Adversarial regularizers in inverse problems. Advances in Neural Information Processing systems 31 (2018)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Int Conf on Machine Learning (2010)
Pan, X., Srikumar, V.: Expressiveness of rectifier networks. In: Int. Conf. on Machine Learning. pp. 2427–2435. PMLR (2016)
Prost, J., Houdard, A., Almansa, A., Papadakis, N.: Learning local regularization for variational image restoration. In: Int. Conf. on Scale Space and Variational Methods in Computer Vision. pp. 358–370. Springer (2021)
Schellekens, V., Jacques, L.: Compressive classification (machine learning without learning). arXiv preprint arXiv:1812.01410 (2018)
Schellekens, V., Jacques, L.: Compressive learning of generative networks. arXiv preprint arXiv:2002.05095 (2020)
Shi, H., Traonmilin, Y., Aujol, J.F.: Compressive learning for patch-based image denoising. SIAM J. Imag. Sci. 15(3), 1184–1212 (2022)
Shi, H., Traonmilin, Y., Aujol, J.F.: Compressive learning of deep regularization for denoising. In: International Conference on Scale Space and Variational Methods in Computer Vision. pp. 162–174. Springer (2023)
Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based reconstruction. In: IEEE Global Conf on Signal and Information Processing. pp. 945–948. IEEE (2013)
Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: Int Conf on Computer Vision. pp. 479–486. IEEE (2011)

Download references

Author information

Authors and Affiliations

Bordeaux INP, CNRS, IMB, UMR 5251, Univ. Bordeaux, 33400, Talence, France
Hui Shi, Yann Traonmilin & Jean-François Aujol

Authors

Hui Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yann Traonmilin
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Aujol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yann Traonmilin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Annex: Consistency of CL-SGD with the Sketch Matching Problem

In this section, we give the necessary lemmas to link our stochastic descent directions with the original sketch matching problem (summarized by Lemmas of Sect. 3.2). The central idea of our method is that we can generally approximate $S\mu _\theta $ with $B_{\textbf{p}}\mu _\theta ({\textbf{p}})$, which will translate to the chosen stochastic gradients.

Lemma 6.1

Consider ${\mathcal {S}}$ constructed with frequencies $(\omega _l)_{l=1}^m$. Let $B_{\textbf{p}} \in {\mathbb {C}}^{m\times P}$ with general term $B_{{\textbf{p}},l,i} = \frac{e^{-j \langle \omega _l,p_i \rangle }}{P}$ and $\mu \in {\mathcal {M}}({\mathcal {D}})$. Then,

$$\begin{aligned} {\mathbb {E}}_{\textbf{p}} \left( B_{\textbf{p}}\mu ({\textbf{p}})\right) =S\mu . \end{aligned}$$

(34)

Proof

The expectation yields for $l \in \{1,\ldots ,m \}$

$$\begin{aligned} \begin{aligned} {[}{\mathbb {E}}_{\textbf{p}}(B_{\textbf{p}}\mu ({\textbf{p}}))]_l&= {\mathbb {E}}_{\textbf{p}} \left( \sum _{r=1}^P \frac{e^{-j \langle \omega _l,p_r\rangle }\mu (p_r) }{P}\right) . \end{aligned} \end{aligned}$$

(35)

As the $p_r$ are i.i.d and ${{\mathcal {D}} = [0,1]^d}$, we have ${\int _{p_1 \in {\mathcal {D}}}d p_1= 1}$ and

$$\begin{aligned} \begin{aligned} {[}{\mathbb {E}}_{\textbf{p}}(B_{\textbf{p}}\mu ({\textbf{p}}))]_l&= P {\mathbb {E}}_{\textbf{p}} \left( \frac{e^{-j \langle \omega _l,p_1\rangle } \mu (p_1) }{P} \right) \\&= \frac{\int _{p_1 \in {\mathcal {D}}} e^{-j \langle \omega _l,p_1\rangle } \mu (p_1) d p_1}{{\int _{p_1 \in {\mathcal {D}}}d p_1}}= [S\mu ]_l. \end{aligned} \end{aligned}$$

(36)

$\square $

This shows that on average, random discretization of the data domain for the forward sketching operator is consistent with the original sketch.

To calculate the expectation of our stochastic gradients, we provide the following Lemma which gives the expectation of the discretized cross-product between two measures. We write $\langle \mu _1, \mu _2 \rangle _{L^2({\mathcal {D}})}:= \int _{{\mathcal {D}}}\mu _1(x)\mu _2(x)dx $ the cross-product between two densities $\mu _1$ and $\mu _2$.

Lemma 6.2

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_{\textbf{p}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{p}}\mu _2({\textbf{p}}) \rangle&= \frac{m}{P} \langle \mu _1, \mu _2 \rangle _{L^2({\mathcal {D}})} \\&\quad +\frac{P-1}{P}\langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2 \rangle .\\ \end{aligned} \end{aligned}$$

(37)

Proof

We have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{\textbf{p}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{p}}\mu _2({\textbf{p}}) \rangle \\&\quad = {\mathbb {E}}_{\textbf{p}} (\mu _2({\textbf{p}})^T B_{\textbf{p}} ^* B_{\textbf{p}}\mu _1({\textbf{p}}) )\\&\quad =\frac{1}{P^2} {\mathbb {E}}_{\textbf{p}} \Bigl ( \sum _{t=1}^P \mu _2 (p_t)\sum _{g=1}^m e^{j\langle \omega _g,p_t\rangle } \sum _{r=1}^P e^{-j \langle \omega _g,p_r\rangle } \mu _1(p_r)\Bigr )\\&\quad =\frac{1}{P^2} \sum _{t=1}^P \sum _{g=1}^m\sum _{r=1}^P {\mathbb {E}}_{\textbf{p}} \left( e^{j\langle \omega _g,p_t-p_r\rangle }\mu _2 (p_t)\mu _1(p_r)\right) . \end{aligned}\nonumber \\ \end{aligned}$$

(38)

The diagonal terms in the double sum over t and r, i.e., where $p_t = p_r$, are

$$\begin{aligned} \begin{aligned} D&= \frac{1}{P^2} \sum _{t=1}^P \sum _{g=1}^m {\mathbb {E}}_{\textbf{p}}\left( \mu _2 (p_t)\mu _1(p_t)\right) = \frac{m}{P} \langle \mu _1, \mu _2 \rangle _{L^2({\mathcal {D}})}. \end{aligned}\nonumber \\ \end{aligned}$$

(39)

The non-diagonal terms $p_t \ne p_r$ give (with the fact that the $p_i$ are i.i.d.):

$$\begin{aligned} \begin{aligned} N&=\frac{1}{P^2} \sum _{t=1}^P \sum _{g=1}^m\sum _{r=1, r\ne t}^P {\mathbb {E}}_{\textbf{p}}\left( e^{j\langle \omega _g,p_t-p_r\rangle }\mu _2 (p_t)\mu _1(p_r)\right) \\&=\frac{P-1}{P} \sum _{g=1}^m \left( {\mathbb {E}}_{\textbf{p}} e^{j\langle \omega _g,p_1\rangle }\mu _2 (p_1)\right) \left( {\mathbb {E}}_{\textbf{p}}e^{-j \langle \omega _g,p_1\rangle } \mu _1(p_1)\right) \\&=\frac{P-1}{P} \sum _{g=1}^m ({\mathcal {S}}\mu _2)_g^*({\mathcal {S}}\mu _1)_g \\&= \frac{P-1}{P}\langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2 \rangle . \end{aligned}\nonumber \\ \end{aligned}$$

(40)

$\square $

We also calculate the variance of the unbiased estimator of the gradient of G thanks to the following Lemma.

Lemma 6.3

Consider ${\mathcal {S}}$ constructed with frequencies $(\omega _l)_{l=1}^m$. Let $B_{\textbf{p}} \in {\mathbb {C}}^{m\times P}$ with general term $B_{p,l,i} = \frac{e^{-j \langle \omega _l,p_i \rangle }}{P}$ and $\mu _1,\mu _2 \in {\mathcal {M}}({\mathcal {D}})$. We have

(41)

where we define ${\mathcal {S}}^*z: p \rightarrow \sum _g z_ge^{j \langle \omega _g,p \rangle }$, and for a kernel h (a function from ${\mathcal {D}}$ to ${\mathbb {R}}$) and two measures $\nu _1, \nu _2$, $\langle \nu _1, \nu _2 \rangle _{L^2({\mathcal {D}}),h}:= \int _{x,y} \nu _1(x)\nu _2(y)h(x-y) dxdy$, where

$$\begin{aligned} \begin{aligned}&C(\mu _1,\mu _2,z)\\&\quad := \frac{P-1}{P} ( \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2} + \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2}) \\&\qquad + {\mathcal {R}}e\langle \vert {\mathcal {S}}^*z\vert ^2 -2{\mathcal {S}}^*z({\mathcal {S}}^*{\mathcal {S}}\mu _2)^*, \vert \mu _1 \vert ^2\rangle _{L^2}\\&\qquad + 2{\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) - \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2 \\&\qquad - \frac{2P-1}{P} \vert \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$

(42)

Proof

We need to calculate a few terms separately. For $y\in {\mathbb {C}}^m$, using the fact that

$$\begin{aligned}{} & {} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z\rangle \\{} & {} \quad =\sum _{g=1}^m\sum _{t=1}^P e^{j\langle \omega _g,p_t\rangle }\mu _1 (p_t)z_g, \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z\rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), y\rangle ^* \\&\quad =\frac{1}{P^2} \sum _{g,t} \sum _{{\tilde{g}},{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}} \Big (e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _1 (p_t) \mu _1 (p_{{\tilde{t}}})z_g y_{{\tilde{g}}}^*\Big ) \\&\quad =\frac{1}{P^2} \sum _{g,{\tilde{g}}} z_g y_{{\tilde{g}}}^* \sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}}\Big ( e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle }\mu _1 (p_t) \mu _1 (p_{{\tilde{t}}})\Big ). \\ \end{aligned}\nonumber \\ \end{aligned}$$

(43)

As the $p_t$ are i.i.d., we have

$$\begin{aligned} \begin{aligned}&\sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}} e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle }\mu _1 (p_t) \mu _1 (p_{{\tilde{t}}}) \quad \\&\quad = P {\mathbb {E}}_{{\textbf{p}}} [e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_1\rangle } \vert \mu _1 (p_1)\vert ^2]\\&\qquad + P(P-1) ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}}. \end{aligned} \end{aligned}$$

(44)

We obtain

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z\rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), y\rangle ^*\\&\quad = \frac{1}{P^2} \sum _{g,{\tilde{g}}} \Big ( z_g y_{{\tilde{g}}}^* P {\mathbb {E}}_{{\textbf{p}}} [e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{1}\rangle } \vert \mu _1 (p_1)\vert ^2] \\&\qquad + P(P-1) ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}}z_g y_{{\tilde{g}}}^* \Big )\\&\quad = \frac{1}{P}{\mathbb {E}}_{{\textbf{p}}} [ {|} B_{\textbf{p}}^*z (B_{\textbf{p}}^*y)^*] (p_1) \vert ^2 \vert \mu _1 (p_1)\vert ^2]\\&\qquad +\frac{P-1}{P} \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,y \rangle ^*\\&\quad = \frac{1}{P}\langle {\mathcal {S}}^*z({\mathcal {S}}^*y)^*, \vert \mu _1 \vert ^2\rangle _{L^2}+\frac{P-1}{P} \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,y \rangle ^* \end{aligned} \end{aligned}$$

(45)

where ${\mathcal {S}}^*z$ is defined in the hypotheses of the Lemma. For $y= z$, we obtain

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \vert ^2 \\&\quad = \frac{1}{P}\langle \vert {\mathcal {S}}^*z\vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2}+\frac{P-1}{P} \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2. \end{aligned} \end{aligned}$$

(46)

We now calculate the following expectation:

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2 \\&\quad =\frac{1}{P^4} {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \sum _{g=1}^m\sum _{t=1} \sum _{r=1} e^{j\langle \omega _g,p_t-q_r\rangle }\mu _1 (p_t)\mu _2(q_r) \vert ^2\\&\quad =\frac{1}{P^4} \sum _{g,t,r} \sum _{{\tilde{g}},{\tilde{t}},{\tilde{r}}} {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}}\Big ( e^{j\langle \omega _g,p_t-q_r\rangle }e^{-j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}-q_{{\tilde{r}}}\rangle }\\&\quad \; \mu _1 (p_t)\mu _2(q_r) \mu _1 (p_{{\tilde{t}}})\mu _2(q_{{\tilde{r}}})\Big ). \end{aligned} \end{aligned}$$

(47)

As ${\textbf{p}}$ and ${\textbf{q}}$ are i.i.d.,

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2 \\&=\frac{1}{P^4} \sum _{g,t,r} \sum _{{\tilde{g}},{\tilde{t}},{\tilde{r}}} \Big ( {\mathbb {E}}_{{\textbf{p}}} e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _1 (p_t) \mu _1 (p_{{\tilde{t}}})\Big ) \\&\quad \;{\mathbb {E}}_{{\textbf{q}}}\Big (e^{-j\langle \omega _g,q_r\rangle +j\langle \omega _{{\tilde{g}}},q_{{\tilde{r}}}\rangle } \mu _2(q_r) \mu _2(q_{{\tilde{r}}})\Big )\\&=\frac{1}{P^4} \sum _{g,{\tilde{g}}} \sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}}\left( e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _1 (p_t) \mu _1 (p_{{\tilde{t}}}) \right) \\&\quad \;\sum _{r,{\tilde{r}}}{\mathbb {E}}_{{\textbf{q}}}\left( e^{-j\langle \omega _g,q_r\rangle +j\langle \omega _{{\tilde{g}}},q_{{\tilde{r}}}\rangle } \mu _2(q_r) \mu _2(q_{{\tilde{r}}})\right) \\&= \frac{1}{P^4} \sum _{g,{\tilde{g}}} A_{1,g,{\tilde{g}}} A_{2,g,{\tilde{g}}}^* \\ \end{aligned} \end{aligned}$$

(48)

where with the decomposition of the sum into diagonal and off-diagonal terms,

$$\begin{aligned} \begin{aligned} A_{i,g,{\tilde{g}}}&= \sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}} \left( e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _i (p_t) \mu _i (p_{{\tilde{t}}})\right) \\&= P {\mathbb {E}}_{{\textbf{p}}} \left( e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _i (p_t)\vert ^2 \right) \\&\quad \;+ P(P-1) ({\mathcal {S}}\mu _i)_g^*({\mathcal {S}}\mu _i)_{{\tilde{g}}}. \end{aligned} \end{aligned}$$

(49)

We obtain

$$\begin{aligned} \begin{aligned}&P^2{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2 \\&\quad = \sum _{g,{\tilde{g}}} {\mathbb {E}}_{{\textbf{p}}} (e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _1 (p_t)\vert ^2 ) {\mathbb {E}}_{{\textbf{q}}} (e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2)\\&\qquad + (P-1) ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} {\mathbb {E}}_{{\textbf{q}}} (e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2 )\\&\qquad + (p-1) ({\mathcal {S}}\mu _2)_g({\mathcal {S}}\mu _2)_{{\tilde{g}}}^* {\mathbb {E}}_{{\textbf{p}}} (e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _1 (p_t)\vert ^2 )\\&\qquad +(P-1)^2 ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} ({\mathcal {S}}\mu _2)_g({\mathcal {S}}\mu _2)_{{\tilde{g}}}^* \end{aligned} \end{aligned}$$

(50)

with

$$\begin{aligned} \begin{aligned}&\sum _{g,{\tilde{g}}} {\mathbb {E}}_{{\textbf{p}}} (e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _1 (p_t)\vert ^2 ) {\mathbb {E}}_{{\textbf{q}}} (e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2)\\&\quad ={\mathbb {E}}_{{\textbf{p}}} {\mathbb {E}}_{{\textbf{q}}} \left( \sum _{g,{\tilde{g}}}e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2. \end{aligned} \end{aligned}$$

(51)

We have inside the expectation,

$$\begin{aligned} \begin{aligned}&\left( \sum _{g,{\tilde{g}}}e^{j\langle \omega _{{\tilde{g}}},q_r-p_t\rangle } e^{-j\langle \omega _g,q_{r}-p_t\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2\\&\quad = \left( \sum _{{\tilde{g}}}e^{j\langle \omega _{{\tilde{g}}},q_r-p_t\rangle } \sum _{g}e^{-j\langle \omega _g,q_{r}-p_t\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2\\&\quad = \left( \sum _{{\tilde{g}}}e^{j\langle \omega _{{\tilde{g}}},q_r-p_t\rangle } \sum _{g}e^{-j\langle \omega _g,q_{r}-p_t\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2. \end{aligned} \end{aligned}$$

(52)

This gives

$$\begin{aligned} \begin{aligned}&P^2{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2\\&\quad = {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert [{\mathcal {S}}^* {\textbf{1}}] (q_r-p_t)\vert ^2 \vert \mu _1(p_t)\vert ^2\vert \mu _2(q_r) \vert ^2\\&\quad = \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}}),\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2} \end{aligned} \end{aligned}$$

(53)

where we define for a kernel h (a function from ${\mathcal {D}}$ to ${\mathbb {R}}$), and two measures $\nu _1, \nu _2$, $\langle \nu _1, \nu _2 \rangle _{L^2({\mathcal {D}}), {h}} = \int _{x,y} \nu _1(x)\nu _2(y)h(x-y) dxdy$.

We calculate the second term (and similarly the third term) of the right-hand side of (50).

$$\begin{aligned} \begin{aligned}&\sum _{g,{\tilde{g}}} ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} {\mathbb {E}}_{{\textbf{q}}} ( e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2)\\&\quad {\mathbb {E}}_{{\textbf{q}}} \sum _{g,{\tilde{g}}} e^{-j\langle \omega _g,q_{r}\rangle }({\mathcal {S}}\mu _1)_g^*e^{j\langle \omega _{{\tilde{g}}},q_{r}\rangle }({\mathcal {S}}\mu _1)_{{\tilde{g}}} \vert \mu _2(q_r)\vert ^2\\&\quad = {\mathbb {E}}_{{\textbf{q}}} \left( \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2_{q_r}\vert \mu _2(q_r)\vert ^2\right) = \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}})}. \end{aligned} \end{aligned}$$

(54)

The fourth term of the right-hand side of (50) yields

$$\begin{aligned} \begin{aligned}&\sum _{g,{\tilde{g}}} ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} ({\mathcal {S}}\mu _2)_g({\mathcal {S}}\mu _2)_{{\tilde{g}}}^* \\&\quad = \sum _g ({\mathcal {S}}\mu _1)_g^* ({\mathcal {S}}\mu _2)_g\langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle = \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$

(55)

Going back to (48), we have used expressions (53), (54) and (55) in (50)

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2\\&\quad = \frac{1}{P^2} \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}}),\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2}\\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{(P-1)^2}{P^2} \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$

(56)

By developing expressions, we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \\&\qquad - {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\quad = {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2- \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}} \mu _2 - z \rangle \vert ^2 \\&\quad ={\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}})\rangle \vert ^2 \\&\qquad - 2 {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} {\mathcal {R}}e\left( \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}})\rangle ^*\right) \\&\qquad +{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \vert ^2- \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}} \mu _2 - z \rangle \vert ^2 \\&\quad ={\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}})\rangle \vert ^2 \\&\qquad - 2 {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} {\mathcal {R}}e\left( \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), {\mathcal {S}}\mu _2\rangle ^*\right) \\&\qquad +{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \vert ^2- \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}} \mu _2 - z \rangle \vert ^2. \end{aligned} \end{aligned}$$

(57)

Using Eq. (56) and the fact that $ {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} {\mathcal {R}}e\langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle = {\mathcal {R}}e\langle {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle = {\mathcal {R}}e\langle {\mathcal {S}}\mu _1, z \rangle $ with Lemma 6.1, Eq. (45), we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \\&\qquad - {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\quad = \frac{1}{P^2} \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}}),\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2}\\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{(P-1)^2}{P^2} \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle \vert ^2\\&\qquad -2\frac{1}{P} {\mathcal {R}}e\langle ({\mathcal {S}}^*z)({\mathcal {S}}^*{\mathcal {S}}\mu _2)^*, \vert \mu _1 \vert ^2\rangle _{L^2}\\&\qquad -2 \frac{P-1}{P} {\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) \\&\qquad + \frac{1}{P}\langle \vert {\mathcal {S}}^*z\vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2({\mathcal {D}})}\\&\qquad +\frac{P-1}{P} \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2\\&\qquad - \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2 \\&\qquad + 2 {\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) \\&\qquad - \vert \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$

(58)

Regrouping terms yields

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \\&\qquad - {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\quad = \frac{1}{P^2} \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2,\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2}\\&\qquad + \frac{P-1}{P^2} \left( \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2} + \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2} \right) \\&\qquad + \frac{1}{P} {\mathcal {R}}e\langle \vert {\mathcal {S}}^*z\vert ^2 -2{\mathcal {S}}^*z({\mathcal {S}}^*{\mathcal {S}}\mu _2)^*, \vert \mu _1 \vert ^2\rangle _{L^2}\\&\qquad + \frac{2}{P} {\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) \\&\qquad - \frac{1}{P}\vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2 \\&\qquad - \frac{2P-1}{P^2} \vert \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$

(59)

$\square $

We have that the variance converges to 0 at the typical rate 1/P.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shi, H., Traonmilin, Y. & Aujol, JF. Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising. J Math Imaging Vis 66, 464–477 (2024). https://doi.org/10.1007/s10851-024-01178-x

Download citation

Received: 29 September 2023
Accepted: 12 February 2024
Published: 13 March 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s10851-024-01178-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compressive Learning of Deep Regularization for Denoising

Sketched Learning for Image Denoising

Compressive Sensing and Neural Networks from a Statistical Learning Perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Annex: Consistency of CL-SGD with the Sketch Matching Problem

Lemma 6.1

Proof

Lemma 6.2

Proof

Lemma 6.3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compressive Learning of Deep Regularization for Denoising

Sketched Learning for Image Denoising

Compressive Sensing and Neural Networks from a Statistical Learning Perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Annex: Consistency of CL-SGD with the Sketch Matching Problem

Annex: Consistency of CL-SGD with the Sketch Matching Problem

Lemma 6.1

Proof

Lemma 6.2

Proof

Lemma 6.3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation