[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well-adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database. To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch-based image denoising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  Google Scholar 

  2. Demoment, G.: Image reconstruction and restoration: overview of common estimation structures and problems. IEEE Trans. Acoust. Speech Signal Process. 37(12), 2024–2036 (1989)

    Article  Google Scholar 

  3. Engel, J., Resnick, C., Roberts, A., Dieleman, S., Norouzi, M., Eck, D., Simonyan, K.: Neural audio synthesis of musical notes with wavenet autoencoders. In: Int. Conf. on Machine Learning. pp. 1068–1077. PMLR (2017)

  4. Gribonval, R., Blanchard, G., Keriven, N., Traonmilin, Y.: Compressive statistical learning with random feature moments. Math. Stat. Learn. 3(2), 113–164 (2021)

    Article  MathSciNet  Google Scholar 

  5. Gribonval, R., Blanchard, G., Keriven, N., Traonmilin, Y.: Statistical learning guarantees for compressive clustering and compressive mixture modeling. Math. Stat. Learn. 3(2), 165–257 (2021)

    Article  MathSciNet  Google Scholar 

  6. Gribonval, R., Chatalic, A., Keriven, N., Schellekens, V., Jacques, L., Schniter, P.: Sketching datasets for large-scale learning (long version). arXiv preprint arXiv:2008.01839 (2020)

  7. Gribonval, R., Chatalic, A., Keriven, N., Schellekens, V., Jacques, L., Schniter, P.: Sketching data sets for large-scale learning: keeping only what you need. IEEE Signal Process. Mag. 38(5), 12–36 (2021)

    Article  Google Scholar 

  8. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)

    Article  Google Scholar 

  9. Hurault, S., Leclaire, A., Papadakis, N.: Gradient step denoiser for convergent plug-and-play. In: International Conference on Learning Representations (2022)

  10. Keriven, N., Bourrier, A., Gribonval, R., Pérez, P.: Sketching for large-scale learning of mixture models. Inf. Infer. 7(3), 447–508 (2018)

    MathSciNet  Google Scholar 

  11. Kobler, E., Effland, A., Kunisch, K., Pock, T.: Total deep variation: stable regularization method for inverse problems. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9163 (2021)

    Article  Google Scholar 

  12. Lunz, S., Öktem, O., Schönlieb, C.B.: Adversarial regularizers in inverse problems. Advances in Neural Information Processing systems 31 (2018)

  13. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Int Conf on Machine Learning (2010)

  14. Pan, X., Srikumar, V.: Expressiveness of rectifier networks. In: Int. Conf. on Machine Learning. pp. 2427–2435. PMLR (2016)

  15. Prost, J., Houdard, A., Almansa, A., Papadakis, N.: Learning local regularization for variational image restoration. In: Int. Conf. on Scale Space and Variational Methods in Computer Vision. pp. 358–370. Springer (2021)

  16. Schellekens, V., Jacques, L.: Compressive classification (machine learning without learning). arXiv preprint arXiv:1812.01410 (2018)

  17. Schellekens, V., Jacques, L.: Compressive learning of generative networks. arXiv preprint arXiv:2002.05095 (2020)

  18. Shi, H., Traonmilin, Y., Aujol, J.F.: Compressive learning for patch-based image denoising. SIAM J. Imag. Sci. 15(3), 1184–1212 (2022)

  19. Shi, H., Traonmilin, Y., Aujol, J.F.: Compressive learning of deep regularization for denoising. In: International Conference on Scale Space and Variational Methods in Computer Vision. pp. 162–174. Springer (2023)

  20. Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based reconstruction. In: IEEE Global Conf on Signal and Information Processing. pp. 945–948. IEEE (2013)

  21. Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: Int Conf on Computer Vision. pp. 479–486. IEEE (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yann Traonmilin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Annex: Consistency of CL-SGD with the Sketch Matching Problem

Annex: Consistency of CL-SGD with the Sketch Matching Problem

In this section, we give the necessary lemmas to link our stochastic descent directions with the original sketch matching problem (summarized by Lemmas of Sect. 3.2). The central idea of our method is that we can generally approximate \(S\mu _\theta \) with \(B_{\textbf{p}}\mu _\theta ({\textbf{p}})\), which will translate to the chosen stochastic gradients.

Lemma 6.1

Consider \({\mathcal {S}}\) constructed with frequencies \((\omega _l)_{l=1}^m\). Let \(B_{\textbf{p}} \in {\mathbb {C}}^{m\times P}\) with general term \(B_{{\textbf{p}},l,i} = \frac{e^{-j \langle \omega _l,p_i \rangle }}{P}\) and \(\mu \in {\mathcal {M}}({\mathcal {D}})\). Then,

$$\begin{aligned} {\mathbb {E}}_{\textbf{p}} \left( B_{\textbf{p}}\mu ({\textbf{p}})\right) =S\mu . \end{aligned}$$
(34)

Proof

The expectation yields for \(l \in \{1,\ldots ,m \}\)

$$\begin{aligned} \begin{aligned} {[}{\mathbb {E}}_{\textbf{p}}(B_{\textbf{p}}\mu ({\textbf{p}}))]_l&= {\mathbb {E}}_{\textbf{p}} \left( \sum _{r=1}^P \frac{e^{-j \langle \omega _l,p_r\rangle }\mu (p_r) }{P}\right) . \end{aligned} \end{aligned}$$
(35)

As the \(p_r\) are i.i.d and \({{\mathcal {D}} = [0,1]^d}\), we have \({\int _{p_1 \in {\mathcal {D}}}d p_1= 1}\) and

$$\begin{aligned} \begin{aligned} {[}{\mathbb {E}}_{\textbf{p}}(B_{\textbf{p}}\mu ({\textbf{p}}))]_l&= P {\mathbb {E}}_{\textbf{p}} \left( \frac{e^{-j \langle \omega _l,p_1\rangle } \mu (p_1) }{P} \right) \\&= \frac{\int _{p_1 \in {\mathcal {D}}} e^{-j \langle \omega _l,p_1\rangle } \mu (p_1) d p_1}{{\int _{p_1 \in {\mathcal {D}}}d p_1}}= [S\mu ]_l. \end{aligned} \end{aligned}$$
(36)

\(\square \)

This shows that on average, random discretization of the data domain for the forward sketching operator is consistent with the original sketch.

To calculate the expectation of our stochastic gradients, we provide the following Lemma which gives the expectation of the discretized cross-product between two measures. We write \(\langle \mu _1, \mu _2 \rangle _{L^2({\mathcal {D}})}:= \int _{{\mathcal {D}}}\mu _1(x)\mu _2(x)dx \) the cross-product between two densities \(\mu _1\) and \(\mu _2\).

Lemma 6.2

Consider \({\mathcal {S}}\) constructed with frequencies \((\omega _l)_{l=1}^m\). Let \(B_{\textbf{p}} \in {\mathbb {C}}^{m\times P}\) with general term \(B_{{\textbf{p}},l,i} = \frac{e^{-j \langle \omega _l,p_i \rangle }}{P}\) and \(\mu _1,\mu _2 \in {\mathcal {M}}({\mathcal {D}})\). We have

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_{\textbf{p}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{p}}\mu _2({\textbf{p}}) \rangle&= \frac{m}{P} \langle \mu _1, \mu _2 \rangle _{L^2({\mathcal {D}})} \\&\quad +\frac{P-1}{P}\langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2 \rangle .\\ \end{aligned} \end{aligned}$$
(37)

Proof

We have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{\textbf{p}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{p}}\mu _2({\textbf{p}}) \rangle \\&\quad = {\mathbb {E}}_{\textbf{p}} (\mu _2({\textbf{p}})^T B_{\textbf{p}} ^* B_{\textbf{p}}\mu _1({\textbf{p}}) )\\&\quad =\frac{1}{P^2} {\mathbb {E}}_{\textbf{p}} \Bigl ( \sum _{t=1}^P \mu _2 (p_t)\sum _{g=1}^m e^{j\langle \omega _g,p_t\rangle } \sum _{r=1}^P e^{-j \langle \omega _g,p_r\rangle } \mu _1(p_r)\Bigr )\\&\quad =\frac{1}{P^2} \sum _{t=1}^P \sum _{g=1}^m\sum _{r=1}^P {\mathbb {E}}_{\textbf{p}} \left( e^{j\langle \omega _g,p_t-p_r\rangle }\mu _2 (p_t)\mu _1(p_r)\right) . \end{aligned}\nonumber \\ \end{aligned}$$
(38)

The diagonal terms in the double sum over t and r, i.e., where \(p_t = p_r\), are

$$\begin{aligned} \begin{aligned} D&= \frac{1}{P^2} \sum _{t=1}^P \sum _{g=1}^m {\mathbb {E}}_{\textbf{p}}\left( \mu _2 (p_t)\mu _1(p_t)\right) = \frac{m}{P} \langle \mu _1, \mu _2 \rangle _{L^2({\mathcal {D}})}. \end{aligned}\nonumber \\ \end{aligned}$$
(39)

The non-diagonal terms \(p_t \ne p_r\) give (with the fact that the \(p_i\) are i.i.d.):

$$\begin{aligned} \begin{aligned} N&=\frac{1}{P^2} \sum _{t=1}^P \sum _{g=1}^m\sum _{r=1, r\ne t}^P {\mathbb {E}}_{\textbf{p}}\left( e^{j\langle \omega _g,p_t-p_r\rangle }\mu _2 (p_t)\mu _1(p_r)\right) \\&=\frac{P-1}{P} \sum _{g=1}^m \left( {\mathbb {E}}_{\textbf{p}} e^{j\langle \omega _g,p_1\rangle }\mu _2 (p_1)\right) \left( {\mathbb {E}}_{\textbf{p}}e^{-j \langle \omega _g,p_1\rangle } \mu _1(p_1)\right) \\&=\frac{P-1}{P} \sum _{g=1}^m ({\mathcal {S}}\mu _2)_g^*({\mathcal {S}}\mu _1)_g \\&= \frac{P-1}{P}\langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2 \rangle . \end{aligned}\nonumber \\ \end{aligned}$$
(40)

\(\square \)

We also calculate the variance of the unbiased estimator of the gradient of G thanks to the following Lemma.

Lemma 6.3

Consider \({\mathcal {S}}\) constructed with frequencies \((\omega _l)_{l=1}^m\). Let \(B_{\textbf{p}} \in {\mathbb {C}}^{m\times P}\) with general term \(B_{p,l,i} = \frac{e^{-j \langle \omega _l,p_i \rangle }}{P}\) and \(\mu _1,\mu _2 \in {\mathcal {M}}({\mathcal {D}})\). We have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\qquad - \vert {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\quad = \frac{1}{P^2} \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2,\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2} + \frac{1}{P} C(\mu _1,\mu _2,z) \end{aligned} \end{aligned}$$
(41)

where we define \({\mathcal {S}}^*z: p \rightarrow \sum _g z_ge^{j \langle \omega _g,p \rangle }\), and for a kernel h (a function from \({\mathcal {D}}\) to \({\mathbb {R}}\)) and two measures \(\nu _1, \nu _2\), \(\langle \nu _1, \nu _2 \rangle _{L^2({\mathcal {D}}),h}:= \int _{x,y} \nu _1(x)\nu _2(y)h(x-y) dxdy\), where

$$\begin{aligned} \begin{aligned}&C(\mu _1,\mu _2,z)\\&\quad := \frac{P-1}{P} ( \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2} + \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2}) \\&\qquad + {\mathcal {R}}e\langle \vert {\mathcal {S}}^*z\vert ^2 -2{\mathcal {S}}^*z({\mathcal {S}}^*{\mathcal {S}}\mu _2)^*, \vert \mu _1 \vert ^2\rangle _{L^2}\\&\qquad + 2{\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) - \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2 \\&\qquad - \frac{2P-1}{P} \vert \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$
(42)

Proof

We need to calculate a few terms separately. For \(y\in {\mathbb {C}}^m\), using the fact that

$$\begin{aligned}{} & {} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z\rangle \\{} & {} \quad =\sum _{g=1}^m\sum _{t=1}^P e^{j\langle \omega _g,p_t\rangle }\mu _1 (p_t)z_g, \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z\rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), y\rangle ^* \\&\quad =\frac{1}{P^2} \sum _{g,t} \sum _{{\tilde{g}},{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}} \Big (e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _1 (p_t) \mu _1 (p_{{\tilde{t}}})z_g y_{{\tilde{g}}}^*\Big ) \\&\quad =\frac{1}{P^2} \sum _{g,{\tilde{g}}} z_g y_{{\tilde{g}}}^* \sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}}\Big ( e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle }\mu _1 (p_t) \mu _1 (p_{{\tilde{t}}})\Big ). \\ \end{aligned}\nonumber \\ \end{aligned}$$
(43)

As the \(p_t\) are i.i.d., we have

$$\begin{aligned} \begin{aligned}&\sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}} e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle }\mu _1 (p_t) \mu _1 (p_{{\tilde{t}}}) \quad \\&\quad = P {\mathbb {E}}_{{\textbf{p}}} [e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_1\rangle } \vert \mu _1 (p_1)\vert ^2]\\&\qquad + P(P-1) ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}}. \end{aligned} \end{aligned}$$
(44)

We obtain

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z\rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), y\rangle ^*\\&\quad = \frac{1}{P^2} \sum _{g,{\tilde{g}}} \Big ( z_g y_{{\tilde{g}}}^* P {\mathbb {E}}_{{\textbf{p}}} [e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{1}\rangle } \vert \mu _1 (p_1)\vert ^2] \\&\qquad + P(P-1) ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}}z_g y_{{\tilde{g}}}^* \Big )\\&\quad = \frac{1}{P}{\mathbb {E}}_{{\textbf{p}}} [ {|} B_{\textbf{p}}^*z (B_{\textbf{p}}^*y)^*] (p_1) \vert ^2 \vert \mu _1 (p_1)\vert ^2]\\&\qquad +\frac{P-1}{P} \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,y \rangle ^*\\&\quad = \frac{1}{P}\langle {\mathcal {S}}^*z({\mathcal {S}}^*y)^*, \vert \mu _1 \vert ^2\rangle _{L^2}+\frac{P-1}{P} \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,y \rangle ^* \end{aligned} \end{aligned}$$
(45)

where \({\mathcal {S}}^*z\) is defined in the hypotheses of the Lemma. For \(y= z\), we obtain

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \vert ^2 \\&\quad = \frac{1}{P}\langle \vert {\mathcal {S}}^*z\vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2}+\frac{P-1}{P} \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2. \end{aligned} \end{aligned}$$
(46)

We now calculate the following expectation:

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2 \\&\quad =\frac{1}{P^4} {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \sum _{g=1}^m\sum _{t=1} \sum _{r=1} e^{j\langle \omega _g,p_t-q_r\rangle }\mu _1 (p_t)\mu _2(q_r) \vert ^2\\&\quad =\frac{1}{P^4} \sum _{g,t,r} \sum _{{\tilde{g}},{\tilde{t}},{\tilde{r}}} {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}}\Big ( e^{j\langle \omega _g,p_t-q_r\rangle }e^{-j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}-q_{{\tilde{r}}}\rangle }\\&\quad \; \mu _1 (p_t)\mu _2(q_r) \mu _1 (p_{{\tilde{t}}})\mu _2(q_{{\tilde{r}}})\Big ). \end{aligned} \end{aligned}$$
(47)

As \({\textbf{p}}\) and \({\textbf{q}}\) are i.i.d.,

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2 \\&=\frac{1}{P^4} \sum _{g,t,r} \sum _{{\tilde{g}},{\tilde{t}},{\tilde{r}}} \Big ( {\mathbb {E}}_{{\textbf{p}}} e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _1 (p_t) \mu _1 (p_{{\tilde{t}}})\Big ) \\&\quad \;{\mathbb {E}}_{{\textbf{q}}}\Big (e^{-j\langle \omega _g,q_r\rangle +j\langle \omega _{{\tilde{g}}},q_{{\tilde{r}}}\rangle } \mu _2(q_r) \mu _2(q_{{\tilde{r}}})\Big )\\&=\frac{1}{P^4} \sum _{g,{\tilde{g}}} \sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}}\left( e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _1 (p_t) \mu _1 (p_{{\tilde{t}}}) \right) \\&\quad \;\sum _{r,{\tilde{r}}}{\mathbb {E}}_{{\textbf{q}}}\left( e^{-j\langle \omega _g,q_r\rangle +j\langle \omega _{{\tilde{g}}},q_{{\tilde{r}}}\rangle } \mu _2(q_r) \mu _2(q_{{\tilde{r}}})\right) \\&= \frac{1}{P^4} \sum _{g,{\tilde{g}}} A_{1,g,{\tilde{g}}} A_{2,g,{\tilde{g}}}^* \\ \end{aligned} \end{aligned}$$
(48)

where with the decomposition of the sum into diagonal and off-diagonal terms,

$$\begin{aligned} \begin{aligned} A_{i,g,{\tilde{g}}}&= \sum _{t,{\tilde{t}}} {\mathbb {E}}_{{\textbf{p}}} \left( e^{j\langle \omega _g,p_t\rangle -j\langle \omega _{{\tilde{g}}},p_{{\tilde{t}}}\rangle } \mu _i (p_t) \mu _i (p_{{\tilde{t}}})\right) \\&= P {\mathbb {E}}_{{\textbf{p}}} \left( e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _i (p_t)\vert ^2 \right) \\&\quad \;+ P(P-1) ({\mathcal {S}}\mu _i)_g^*({\mathcal {S}}\mu _i)_{{\tilde{g}}}. \end{aligned} \end{aligned}$$
(49)

We obtain

$$\begin{aligned} \begin{aligned}&P^2{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2 \\&\quad = \sum _{g,{\tilde{g}}} {\mathbb {E}}_{{\textbf{p}}} (e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _1 (p_t)\vert ^2 ) {\mathbb {E}}_{{\textbf{q}}} (e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2)\\&\qquad + (P-1) ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} {\mathbb {E}}_{{\textbf{q}}} (e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2 )\\&\qquad + (p-1) ({\mathcal {S}}\mu _2)_g({\mathcal {S}}\mu _2)_{{\tilde{g}}}^* {\mathbb {E}}_{{\textbf{p}}} (e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _1 (p_t)\vert ^2 )\\&\qquad +(P-1)^2 ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} ({\mathcal {S}}\mu _2)_g({\mathcal {S}}\mu _2)_{{\tilde{g}}}^* \end{aligned} \end{aligned}$$
(50)

with

$$\begin{aligned} \begin{aligned}&\sum _{g,{\tilde{g}}} {\mathbb {E}}_{{\textbf{p}}} (e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } \vert \mu _1 (p_t)\vert ^2 ) {\mathbb {E}}_{{\textbf{q}}} (e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2)\\&\quad ={\mathbb {E}}_{{\textbf{p}}} {\mathbb {E}}_{{\textbf{q}}} \left( \sum _{g,{\tilde{g}}}e^{-j\langle \omega _{{\tilde{g}}}-\omega _g,p_{t}\rangle } e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2. \end{aligned} \end{aligned}$$
(51)

We have inside the expectation,

$$\begin{aligned} \begin{aligned}&\left( \sum _{g,{\tilde{g}}}e^{j\langle \omega _{{\tilde{g}}},q_r-p_t\rangle } e^{-j\langle \omega _g,q_{r}-p_t\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2\\&\quad = \left( \sum _{{\tilde{g}}}e^{j\langle \omega _{{\tilde{g}}},q_r-p_t\rangle } \sum _{g}e^{-j\langle \omega _g,q_{r}-p_t\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2\\&\quad = \left( \sum _{{\tilde{g}}}e^{j\langle \omega _{{\tilde{g}}},q_r-p_t\rangle } \sum _{g}e^{-j\langle \omega _g,q_{r}-p_t\rangle } \right) \vert \mu _1 (p_t)\vert ^2\vert \mu _2(q_r)\vert ^2. \end{aligned} \end{aligned}$$
(52)

This gives

$$\begin{aligned} \begin{aligned}&P^2{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2\\&\quad = {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert [{\mathcal {S}}^* {\textbf{1}}] (q_r-p_t)\vert ^2 \vert \mu _1(p_t)\vert ^2\vert \mu _2(q_r) \vert ^2\\&\quad = \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}}),\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2} \end{aligned} \end{aligned}$$
(53)

where we define for a kernel h (a function from \({\mathcal {D}}\) to \({\mathbb {R}}\)), and two measures \(\nu _1, \nu _2\), \(\langle \nu _1, \nu _2 \rangle _{L^2({\mathcal {D}}), {h}} = \int _{x,y} \nu _1(x)\nu _2(y)h(x-y) dxdy\).

We calculate the second term (and similarly the third term) of the right-hand side of (50).

$$\begin{aligned} \begin{aligned}&\sum _{g,{\tilde{g}}} ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} {\mathbb {E}}_{{\textbf{q}}} ( e^{j\langle \omega _{{\tilde{g}}}-\omega _g,q_{r}\rangle } \vert \mu _2(q_r)\vert ^2)\\&\quad {\mathbb {E}}_{{\textbf{q}}} \sum _{g,{\tilde{g}}} e^{-j\langle \omega _g,q_{r}\rangle }({\mathcal {S}}\mu _1)_g^*e^{j\langle \omega _{{\tilde{g}}},q_{r}\rangle }({\mathcal {S}}\mu _1)_{{\tilde{g}}} \vert \mu _2(q_r)\vert ^2\\&\quad = {\mathbb {E}}_{{\textbf{q}}} \left( \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2_{q_r}\vert \mu _2(q_r)\vert ^2\right) = \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}})}. \end{aligned} \end{aligned}$$
(54)

The fourth term of the right-hand side of (50) yields

$$\begin{aligned} \begin{aligned}&\sum _{g,{\tilde{g}}} ({\mathcal {S}}\mu _1)_g^*({\mathcal {S}}\mu _1)_{{\tilde{g}}} ({\mathcal {S}}\mu _2)_g({\mathcal {S}}\mu _2)_{{\tilde{g}}}^* \\&\quad = \sum _g ({\mathcal {S}}\mu _1)_g^* ({\mathcal {S}}\mu _2)_g\langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle = \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$
(55)

Going back to (48), we have used expressions (53), (54) and (55) in (50)

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) \rangle \vert ^2\\&\quad = \frac{1}{P^2} \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}}),\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2}\\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{(P-1)^2}{P^2} \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$
(56)

By developing expressions, we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \\&\qquad - {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\quad = {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2- \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}} \mu _2 - z \rangle \vert ^2 \\&\quad ={\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}})\rangle \vert ^2 \\&\qquad - 2 {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} {\mathcal {R}}e\left( \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}})\rangle ^*\right) \\&\qquad +{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \vert ^2- \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}} \mu _2 - z \rangle \vert ^2 \\&\quad ={\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}})\rangle \vert ^2 \\&\qquad - 2 {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} {\mathcal {R}}e\left( \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \langle B_{\textbf{p}}\mu _1({\textbf{p}}), {\mathcal {S}}\mu _2\rangle ^*\right) \\&\qquad +{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle \vert ^2- \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}} \mu _2 - z \rangle \vert ^2. \end{aligned} \end{aligned}$$
(57)

Using Eq. (56) and the fact that \( {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} {\mathcal {R}}e\langle B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle = {\mathcal {R}}e\langle {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} B_{\textbf{p}}\mu _1({\textbf{p}}), z \rangle = {\mathcal {R}}e\langle {\mathcal {S}}\mu _1, z \rangle \) with Lemma 6.1, Eq. (45), we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \\&\qquad - {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\quad = \frac{1}{P^2} \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}}),\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2}\\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{P-1}{P^2} \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2({\mathcal {D}})} \\&\qquad + \frac{(P-1)^2}{P^2} \vert \langle {\mathcal {S}}\mu _1, {\mathcal {S}}\mu _2\rangle \vert ^2\\&\qquad -2\frac{1}{P} {\mathcal {R}}e\langle ({\mathcal {S}}^*z)({\mathcal {S}}^*{\mathcal {S}}\mu _2)^*, \vert \mu _1 \vert ^2\rangle _{L^2}\\&\qquad -2 \frac{P-1}{P} {\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) \\&\qquad + \frac{1}{P}\langle \vert {\mathcal {S}}^*z\vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2({\mathcal {D}})}\\&\qquad +\frac{P-1}{P} \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2\\&\qquad - \vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2 \\&\qquad + 2 {\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) \\&\qquad - \vert \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$
(58)

Regrouping terms yields

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \vert \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \\&\qquad - {\mathbb {E}}_{{\textbf{p}},{\textbf{q}}} \langle B_{\textbf{p}}\mu _1({\textbf{p}}), B_{\textbf{q}}\mu _2({\textbf{q}}) -z \rangle \vert ^2 \\&\quad = \frac{1}{P^2} \langle \vert \mu _1\vert ^2,\vert \mu _2 \vert ^2\rangle _{L^2,\vert {\mathcal {S}}^* {\textbf{1}} \vert ^2}\\&\qquad + \frac{P-1}{P^2} \left( \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _1 \vert ^2, \vert \mu _2 \vert ^2\rangle _{L^2} + \langle \vert {\mathcal {S}}^*{\mathcal {S}}\mu _2 \vert ^2, \vert \mu _1 \vert ^2\rangle _{L^2} \right) \\&\qquad + \frac{1}{P} {\mathcal {R}}e\langle \vert {\mathcal {S}}^*z\vert ^2 -2{\mathcal {S}}^*z({\mathcal {S}}^*{\mathcal {S}}\mu _2)^*, \vert \mu _1 \vert ^2\rangle _{L^2}\\&\qquad + \frac{2}{P} {\mathcal {R}}e\left( \langle {\mathcal {S}}\mu _1,z \rangle \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2 \rangle ^*\right) \\&\qquad - \frac{1}{P}\vert \langle {\mathcal {S}}\mu _1,z \rangle \vert ^2 \\&\qquad - \frac{2P-1}{P^2} \vert \langle {\mathcal {S}}\mu _1,{\mathcal {S}}\mu _2\rangle \vert ^2. \end{aligned} \end{aligned}$$
(59)

\(\square \)

We have that the variance converges to 0 at the typical rate 1/P.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, H., Traonmilin, Y. & Aujol, JF. Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising. J Math Imaging Vis 66, 464–477 (2024). https://doi.org/10.1007/s10851-024-01178-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-024-01178-x

Keywords

Navigation