Dual subgradient algorithms for large-scale nonsmooth learning problems

Bruce Cox¹,
Anatoli Juditsky² &
Arkadi Nemirovski³

779 Accesses
16 Citations
Explore all metrics

Abstract

“Classical” First Order (FO) algorithms of convex optimization, such as Mirror Descent algorithm or Nesterov’s optimal algorithm of smooth convex optimization, are well known to have optimal (theoretical) complexity estimates which do not depend on the problem dimension. However, to attain the optimality, the domain of the problem should admit a “good proximal setup”. The latter essentially means that (1) the problem domain should satisfy certain geometric conditions of “favorable geometry”, and (2) the practical use of these methods is conditioned by our ability to compute at a moderate cost proximal transformation at each iteration. More often than not these two conditions are satisfied in optimization problems arising in computational learning, what explains why proximal type FO methods recently became methods of choice when solving various learning problems. Yet, they meet their limits in several important problems such as multi-task learning with large number of tasks, where the problem domain does not exhibit favorable geometry, and learning and matrix completion problems with nuclear norm constraint, when the numerical cost of computing proximal transformation becomes prohibitive in large-scale problems. We propose a novel approach to solving nonsmooth optimization problems arising in learning applications where Fenchel-type representation of the objective function is available. The approach is based on applying FO algorithms to the dual problem and using the accuracy certificates supplied by the method to recover the primal solution. While suboptimal in terms of accuracy guaranties, the proposed approach does not rely upon “good proximal setup” for the primal problem but requires the problem domain to admit a Linear Optimization oracle—the ability to efficiently maximize a linear form on the domain of the primal problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Primal-Dual Proximal Algorithms for Structured Convex Optimization: A Unifying Framework

Optimal subgradient algorithms for large-scale convex optimization in simple domains

Article Open access 14 March 2017

A Modified Primal-Dual Algorithm for Structured Convex Optimization with a Lipschitzian Term

Article 08 November 2024

Notes

This is a natural normalization: indeed, $\Vert e_j\Vert \ll 1$ means that $j$th coordinate of $x$ has a large Lipschitz constant w.r.t. $\Vert \cdot \Vert $, in spite of the fact that this coordinate is “perfectly well behaved” on $X$—its variation on the set is just 2.
We assume here that $g_\tau \ne 0$ for all $\tau \le t$. In the opposite case, the situation is trivial: when $g(y_{\tau _*})=0$, for some $\tau *\le t$, setting $\lambda ^t_\tau =0$ for $\tau \ne \tau _*$ and $\lambda ^t_{\tau _*}=1$, we ensure that $\epsilon (y^t,\lambda ^t)=0$.
We assume that $f'(y_\tau )\ne 0$, for $\tau \le t$; otherwise, as we remember, the situation is trivial.
Note that to ensure the nonemptiness of $Y^s_0$, it suffices to set $h_{0,j}^s(\cdot )=h^s(\cdot )$, so that $h_{0,j}(y)>\ell _s$ for $y\in \mathop {\hbox {Argmax}}_Yh^s(\cdot )$; recall that $f_s=\max _{y\in Y} h^s(y)>0$.
In typical applications, $\psi $ is just linear, so that computing $y(x)$ is as easy as computing the value of the prox-mapping associated with $Y$, $\omega (\cdot )$.
If our only goal were to approximate $f_*$ by a smooth concave function, we could use other options, most notably the famous Moreau–Yosida regularization [14, 25] . The advantage of the smoothing based on Fenchel-type representation (1) of $f_*$ and proximal setup for $Y$ is that in many cases it indeed allows for computationally cheap approximations, which is usually not the case for Moreau–Yosida regularization.
Provided the parameters $\gamma \in (0,1)$, $\theta \in (0,1)$ in (29), (37) are treated as absolute constants.
The restriction $R\ge 1$ is quite natural. Indeed, with the optimal choice of $x$, we want most of the terms $\left[ 1 -\epsilon _j\left[ \langle x,z_j\rangle +b\right] \right] _+$ to be $\ll 1$; assuming that the number of examples with $\epsilon _j=-1$ and $\epsilon _j=1$ are of order of $N$, this condition can be met only when $|\langle x,z_j\rangle |$ are at least of order of 1 for most of $j$’s. The latter, in view of (48), implies that $\Vert \sigma (x)\Vert _{1}$ should be at least $O(1)$.
To arrive at this situation, one can augment $z_j$ by additional entry, equal to 1, and to redefine $x$: the new $x$ is the old $[x,b]$.
In our straightforward software implementation of the algorithm, handling memory of depth $m$ requires storing in RAM up to $m+1$ $p\times p$ matrices, which makes the implementation too space-consuming when $p$ and $m$ are large; this is why in our experiments the larger $p$, the smaller is the allowed values of $m$. Note that with a more sophisticated software implementation, handling memory $m$ would require storing just $m+1$ of rank 1 $p\times p$ matrices, reducing dramatically the required space.

References

Amit, Y., Fink, M., Srebro, N., Ullman, S.: Uncovering shared structures in multiclass classification. In: Proceedings of the 24th International Conference on Machine Learning, pp. 17–24. ACM (2007)
Ben-Tal, A., Nemirovski, A.: Non-euclidean restricted memory level method for large-scale convex optimization. Math. Program. 102(3), 407–456 (2005)
Article MATH MathSciNet Google Scholar
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
Article Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Article MATH MathSciNet Google Scholar
Combettes, P.L., Dũng, D., Vũ, B.C.: Dualization of signal recovery problems. Set-Valued Var. Anal. 18(3–4), 373–404 (2010)
Article MATH MathSciNet Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)
MATH Google Scholar
Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. Elsevier, Amsterdam (1970)
Google Scholar
Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)
Article MATH MathSciNet Google Scholar
Fan, J., Liao, Y., Mincheva, M.: High dimensional covariance matrix estimation in approximate factor models. Ann. Stat. 39(6), 3320–3356 (2011)
Article MATH MathSciNet Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)
Article MathSciNet Google Scholar
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth large-scale convex minimization, i: General purpose methods; ii: Utilizing problem’s structure. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 121–254. MIT Press, Cambridge, MA (2011)
Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1–3), 111–147 (1995)
Article MATH Google Scholar
Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. CR Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962)
MATH MathSciNet Google Scholar
Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France 93, 273–299 (1965)
MATH MathSciNet Google Scholar
Nemirovski, A.: Prox-method with rate of convergence $ {O}(1/t)$ for variational inequalities with lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MATH MathSciNet Google Scholar
Nemirovski, A., Onn, S., Rothblum, U.G.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35(1), 52–78 (2010)
Article MATH MathSciNet Google Scholar
Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, Chichester (1983)
Google Scholar
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $ {O}(1/k^2)$. Soviet Math. Dokl. 27(2), 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, Berlin (2004)
Book Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MATH MathSciNet Google Scholar
Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3), 319–344 (2007)
Article MATH MathSciNet Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
Article MATH MathSciNet Google Scholar
Nesterov, Y., Nemirovski, A.: Some first order algorithms for $\ell _1$/nuclear norm minimization. Acta Numerica 22, 509–575 (2013)
Google Scholar
Pshenichnyi, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)
Google Scholar
Yosida, K.: Functional Analysis. Springer, Berlin (1964)
Google Scholar

Download references

Author information

Authors and Affiliations

US Air Force, Washington, DC, USA
Bruce Cox
LJK, Université J. Fourier, B.P. 53, 38041 , Grenoble Cedex 9, France
Anatoli Juditsky
Georgia Institute of Technology, Atlanta, GA, 30332, USA
Arkadi Nemirovski

Authors

Bruce Cox
View author publications
You can also search for this author in PubMed Google Scholar
Anatoli Juditsky
View author publications
You can also search for this author in PubMed Google Scholar
Arkadi Nemirovski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anatoli Juditsky.

Additional information

Research of the third author was supported by the ONR Grant N000140811104 and the NSF Grants DMS 0914785, CMMI 1232623.

Appendix: Proofs

We need the following technical result originating from [4] (for proof, see [4], or section 2.1 and Lemma A.2 of [23]).

Lemma 7.1

. Let $Y$ be a nonempty closed and bounded subset of a Euclidean space $E_y$, and let $\Vert \cdot \Vert $, $\omega (\cdot )$ be the corresponding proximal setup. Let, further, $U$ be a closed convex subset of $Y$ intersecting the relative interior of $Y$, and let $p\in E_y$.

(i)
The optimization problem
$$\begin{aligned} \min _{y\in U} h_p(y):=\left[ \langle p,y\rangle + \omega (y)\right] \end{aligned}$$
has a unique solution $y_*$. This solution is fully characterized by the inclusion $y_*\in U\cap Y^o$, $Y^o=\{y\in Y:\partial \omega (y)\ne \emptyset \}$, coupled with the relation
$$\begin{aligned} \langle p+\omega '(y_*),u-y_*\rangle \ge 0\;\;\forall u\in U. \end{aligned}$$
(63)
(ii)
When $U$ is cut off $Y$ by a system of linear inequalities $e_i(y)\le 0$, $i=1,\ldots ,m$, there exist Lagrange multipliers $\lambda _i\ge 0$ such that $\lambda _ie_i(y_*)=0,\;1\le i\le m$, and
$$\begin{aligned} \forall u\in Y:\;\;\left\langle p+\omega '(y_*)+\sum _{i=1}^m\lambda _ie_i^\prime ,u-y_*\right\rangle \ge 0. \end{aligned}$$
(64)
(iii)
In the situation of (ii), assuming $p=\xi -\omega '(y)$ for some $y\in Y^o$, we have
$$\begin{aligned} \forall u\in Y: \langle \xi ,y_*-u\rangle -\sum _i\lambda _ie_i(u)\le V_y(u)-V_{y_*}(u)-V_{y}(y_*). \end{aligned}$$
(65)

1.1 Proof of Proposition 3.2

1$^0$. Observe that when $t$ is a non-terminal step of the algorithm, the level set $U_t$ is a closed convex subset of $Y$ which intersects the relative interior of $Y$; indeed, by (25) and due to $\epsilon _t>0$ for a non-terminal $t$, there exists $y\in Y$ such that $h_\tau (y)\ge \epsilon _t>\ell _t=\gamma \epsilon _t$, $\tau \in S_t$, that is, $U_t$ is cut off $Y$ by a system of constraints satisfying the Slater condition. Denoting $S_t=\{y_{\tau _1},y_{\tau _2},\ldots ,y_{\tau _k}\}$ and invoking item (iii) of Lemma 7.1 with $y=\widehat{y}_t$, $\xi =0$ and $e_i(\cdot )=\gamma \epsilon _t-h_{\tau _i}(\cdot )$ (so that $U_t=\{u\in Y: e_i(u)\le 0, 1\le i\le k\}$, we get $y_*=y_{t+1}$ and

$$\begin{aligned} -\sum _i\lambda _ie_i(u) \le V_{\widehat{y}_t}(u)-V_{y_{t+1}}(u)-V_{\widehat{y}_t}(y_{t+1}). \end{aligned}$$

with some $\lambda _i\ge 0$. When $u\in U_t$, we have $e_i(u)\le 0$, that is,

$$\begin{aligned} \forall u\in U_t: V_{y_{t+1}}(u)\le V_{\widehat{y}_t}(u) - V_{\widehat{y}_t}(y_{t+1}). \end{aligned}$$

(66)

2$^0$. When $t$ starts a phase, we have $\widehat{y}_t=y_\omega =y_1$, and clearly $1\in I_t$, whence $h_\tau (\widehat{y}_t)\le 0$ for some $\tau \in I_t$ (specifically, for $\tau =1$). When $t$ does not start a phase, we have $\widehat{y}_t=y_t$ and $t\in I_t$, so that here again $h_\tau (\widehat{y}_t)\le 0$ for some $\tau \in I_t$. On the other hand, $h_\tau (y_{t+1})\ge \gamma \epsilon _t$ for all $\tau \in I_t$ due to $y_{t+1}\in U_t$. Thus, when passing from $\widehat{y}_t$ to $y_{t+1}$, at least one of $h_\tau (\cdot )$ grows by at least $\gamma \epsilon _t$. Taking into account that $h_\tau (z)=\langle g(y_\tau ),y_\tau -z\rangle $ is Lipschitz continuous with constant $L[g]$ w.r.t. $\Vert \cdot \Vert $ (by (14)), we conclude that $\Vert \widehat{y}_t-y_{t+1}\Vert \ge \gamma \epsilon _t/L[g]$. With this in mind, (66) combines with (8) to imply that

$$\begin{aligned} \forall u\in U_t: V_{y_{t+1}}(u)\le V_{\widehat{y}_t}(u) - {1\over 2}V_{\widehat{y}_t}(y_{t+1})\le V_{\widehat{y}_t}(u)-{\gamma ^2\epsilon _t^2\over 2 L^2[g]}. \end{aligned}$$

(67)

3$^0$. Let the algorithm perform phase $s$, let $t_s$ be the first step of this phase, and $r$ be another step of the phase. We claim that all level sets $U_t$, $t_s\le t\le r$, have a point in common, specifically, (any) $u\in \mathop {\hbox {Argmax}}_{y\in Y}\min _{\tau \in I_r} h_\tau (y)$. Indeed, since $r$ belongs to phase $s$, we have

$$\begin{aligned} \gamma \Delta _s<\epsilon _r=\max _{y\in Y}\min _{\tau \in I_r} h_\tau (y)=\min _{\tau \in I_r} h_\tau (u) \end{aligned}$$

and $\Delta _s=\epsilon _{t_s}=\max _{y\in Y}\min _{\tau \in I_{t_s}} h_\tau (y)$ (see (25) and the definition of $\Delta _s$). Besides this, $r$ belongs to phase $s$, and within a phase, sets $I_t$ extend as $t$ grows, so that $I_{t_s}\subset I_t\subset I_r$ when $t_s\le t\le r$, implying that $\epsilon _{t_s}\ge \epsilon _{t_s+1}\ge \cdots \ge \epsilon _r$. Thus, for $t\in \{t_s,t_s+1,\ldots ,r\}$ we have

$$\begin{aligned} \min _{\tau \in I_t}h_\tau (u)\ge \min _{\tau \in I_r}h_\tau (u)\ge \gamma \Delta _s=\gamma \epsilon _{t_s}\ge \gamma \epsilon _t, \end{aligned}$$

implying that $u\in U_t$.

With the just defined $u$, let us look at the quantities $v_t:=V_{\widehat{y}_t}(u)$, $t_s\le t\le r$. We have $v_{t_s}\le {1\over 2}\Omega ^2$ due to $\widehat{y}_{t_s}=y_\omega $ and (11), and

$$\begin{aligned} 0\le v_{t+1}\le v_t -{\gamma ^2\epsilon _t^2\over 2L^2[g]}\le v_t-{\gamma ^4\Delta _s^2\over 2L^2[g]} \end{aligned}$$

when $t_s\le t<r$ (due to (67) combined with $\widehat{y}_t=y_t$ when $t_s<t\le r$). We conclude that $(r-t_s)\gamma ^4\Delta _s^2\le \Omega ^2L^2[g]$. Thus, the number $T_s$ of steps of phase $s$ admits the bound

$$\begin{aligned} T_s\le {\Omega ^2L^2[g]\over \gamma ^4\Delta _s^2}+1\le {2\Omega ^2L^2[g]\over \gamma ^4\Delta _s^2}, \end{aligned}$$

(68)

where the concluding inequality follows from $\Delta _s\le \Delta _1=\max _{y\in Y} \langle g(y_\omega ),y_\omega -y\rangle \le L[g]\Omega $, see (11), combined with $\gamma \in (0,1)$.

4$^0$. Assume that MDL does not terminate in course of first $T\ge 1$ steps, and let $s_*$ be the index of the phase to which the step $T$ belongs. Then $\Delta _{s_*}>\epsilon $ (otherwise we would terminate not later than at the first step of phase $s_*$); and besides this, by construction, $\Delta _{s+1}\le \gamma \Delta _s$ whenever phase $s+1$ takes place. Therefore

$$\begin{aligned} T\le \sum _{s=1}^{s_*} T_s&\le {2\Omega ^2L^2[g]\over \gamma ^4}\sum _{r=0}^{s_*-1}\Delta _{s_*-r}^{-2} \le {2\Omega ^2L^2[g]\over \gamma ^4\Delta _{s_*}^2}\sum _{r=0}^{s_*-1}\gamma ^{2r}\nonumber \\&\le {2\Omega ^2L^2[g]\over \gamma ^4(1-\gamma ^2)\Delta _{s_*}^2}\le {2\Omega ^2L^2[g]\over \gamma ^4(1-\gamma ^2)\epsilon ^2}. \end{aligned}$$

$\square $

1.2 Proof of Proposition 3.3

Observe that the algorithm can terminate only in the case A of B.2.2.3, and in this case the output is indeed as claimed in Proposition. Thus, all we need to prove is the upper bound (36) on the number of steps before termination.

1$^0$. Let us bound from above the number of steps at an arbitrary phase $s$. Assume that phase $s$ did not terminate in course of the first $T$ steps, so that $u_1,\ldots ,u_T$ are well defined. We claim that then

$$\begin{aligned} \Vert u_\tau -u_{\tau +1}\Vert \ge \ell _s/L[g],\,1\le \tau <T. \end{aligned}$$

(69)

Indeed, by construction $h^s_{\tau -1,m+1}(y):=\langle g(u_\tau ),u_\tau -y\rangle $ is $\ge \ell _s=\gamma f_s$ when $y=u_{\tau +1}$ (due to $u_{\tau +1}\in Y_\tau $). Since $\Vert g(u)\Vert _*\le L[g]$ for all $u\in Y$, (69) follows.

Now let us look at what happens with the quantities $\omega (u_\tau )$ as $\tau $ grows. By strong convexity of $\omega $ we have

$$\begin{aligned} \omega (u_{\tau +1})-\omega (u_\tau )\ge \langle \omega ^\prime (u_\tau ),u_{\tau +1}-u_\tau \rangle +{1\over 2}\Vert u_\tau -u_{\tau +1}\Vert ^2 \end{aligned}$$

The first term in the right hand side is $\ge 0$, since $u_\tau $ is the minimizer of $\omega (\cdot )$ over $Y^s_{\tau -1}$, while $u_{\tau +1}\in Y_\tau \subset Y^s_{\tau -1}$. The second term in the right hand side is $\ge {\ell _s^2\over 2L^2[g]}$ by (69). Thus, $\omega (u_{\tau +1})-\omega (u_\tau )\ge {\ell _s^2\over 2L^2[g]}$, whence $ \omega (u_T)-\omega (u_1)\ge (T-1){\ell _s^2\over 2L^2[g]}=(T-1){\gamma ^2f_s^2\over 2L^2[g]}. $ Recalling the definition of $\Omega $ and that $u_1=y_\omega $, the left hand side in this inequality is $\le {1\over 2}\Omega ^2$. It follows that whenever phase $s$ does not terminate in course of the first $T$ steps, one has $ T\le {\Omega ^2L^2[g]\over \gamma ^2f_s^2}+1, $ that is, the total number of steps at phase $s$, provided this phase exists, is at most $ T_s={\Omega ^2L^2[g]\over \gamma ^2f_s^2}+2. $ Now, we have

$$\begin{aligned} f_s\le f_1=\max \limits _{y\in Y}\langle g(y_\omega ),y-y_\omega \rangle \le L[g]\max \limits _{x\in Y}\Vert y-y_\omega \Vert \le \Omega L[g] \end{aligned}$$

(recall that $\Vert g(y)\Vert _*\le L[g]$ and see (11)). Thus

$$\begin{aligned} T_s={\Omega ^2L^2[g]\over \gamma ^2f_s^2}+2\le {(1+2\gamma ^2)\over \gamma ^2}{\Omega ^2 L^2[g]\over f_s^2} \end{aligned}$$

for all $s$ such that $s$th phase exists. By construction, we have $f_s\ge \epsilon $ and $f_s\le (\gamma +(1-\gamma )\theta ) f_{s-1}$, whence the method eventually terminates (since $\gamma +(1-\gamma )\theta <1$). Assuming that the termination happens at phase ${s_*}$, we have $f_s\ge (\gamma +(1-\gamma )\theta )^{s-{s_*}}f_{{s_*}}$ when $1\le s\le {s_*}$, so that the total number of steps is bounded by

$$\begin{aligned}&{\sum _{s=1}^{{s_*}}{(1+2\gamma ^{2})\over \gamma ^{2}}{\Omega ^{2}L^{2}[g]\over f_s^2} \le \sum _{s=1}^{{s_*}} {(1+2\gamma ^{2})\over \gamma ^{2}}{\Omega ^{2}L^{2}[g](\gamma +(1-\gamma )\theta )^{2({s_{*}}-s)}\over f_{{s_*}}^2}}\\&\quad \le \sum _{s=1}^{{s_{*}}} {(1+2\gamma ^{2})\over \gamma ^{2}}{\Omega ^{2}L^{2}[g](\gamma +(1-\gamma )\theta )^{2({s_{*}}-s)}\over \epsilon ^{2}}\\&\quad \le {(1+2\gamma ^{2})\over \gamma ^{2}[1-(\gamma +(1-\gamma )\theta )^{2}]}{\Omega ^{2}L^{2}[g]\over \epsilon ^{2}}, \end{aligned}$$

as claimed. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cox, B., Juditsky, A. & Nemirovski, A. Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148, 143–180 (2014). https://doi.org/10.1007/s10107-013-0725-1

Download citation

Received: 23 February 2013
Accepted: 28 October 2013
Published: 12 November 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10107-013-0725-1

Dual subgradient algorithms for large-scale nonsmooth learning problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Primal-Dual Proximal Algorithms for Structured Convex Optimization: A Unifying Framework

Optimal subgradient algorithms for large-scale convex optimization in simple domains

A Modified Primal-Dual Algorithm for Structured Convex Optimization with a Lipschitzian Term

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proofs

Lemma 7.1

1.1 Proof of Proposition 3.2

1.2 Proof of Proposition 3.3

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification (2010)

Subscribe and save

Buy Now

Navigation

Dual subgradient algorithms for large-scale nonsmooth learning problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Primal-Dual Proximal Algorithms for Structured Convex Optimization: A Unifying Framework

Optimal subgradient algorithms for large-scale convex optimization in simple domains

A Modified Primal-Dual Algorithm for Structured Convex Optimization with a Lipschitzian Term

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proofs

Appendix: Proofs

Lemma 7.1

1.1 Proof of Proposition 3.2

1.2 Proof of Proposition 3.3

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification (2010)

Subscribe and save

Buy Now

Search

Navigation