Abstract
We contribute improvements to a Lagrangian dual solution approach applied to large-scale optimization problems whose objective functions are convex, continuously differentiable and possibly nonlinear, while the non-relaxed constraint set is compact but not necessarily convex. Such problems arise, for example, in the split-variable deterministic reformulation of stochastic mixed-integer optimization problems. We adapt the augmented Lagrangian method framework to address the presence of nonconvexity in the non-relaxed constraint set and to enable efficient parallelization. The development of our approach is most naturally compared with the development of proximal bundle methods and especially with their use of serious step conditions. However, deviations from these developments allow for an improvement in efficiency with which parallelization can be utilized. Pivotal in our modification to the augmented Lagrangian method is an integration of the simplicial decomposition method and the nonlinear block Gauss–Seidel method. An adaptation of a serious step condition associated with proximal bundle methods allows for the approximation tolerance to be automatically adjusted. Under mild conditions optimal dual convergence is proven, and we report computational results on test instances from the stochastic optimization literature. We demonstrate improvement in parallel speedup over a baseline parallel approach.
Similar content being viewed by others
Notes
Implementable is used in the sense of ability to implement the required functionality of an algorithm, not the meaning of implementable specific to stochastic optimization.
References
Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer, Berlin (2011)
Shor, N.: Minimization Methods for Non-differentiable Functions. Springer, New York (1985)
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Ruszczyński, A.: Nonlinear Optimization. Princeton University Press, Princeton (2006)
Carøe, C.C., Schultz, R.: Dual decomposition in stochastic integer programming. Oper. Res. Lett. 24(1), 37–45 (1999)
Bertsekas, D.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, London (1982)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press, New York (1969)
Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Lemaréchal, C.: An Extension of Davidon Methods to Non differentiable Problems, pp. 95–109. Springer, Berlin Heidelberg (1975)
de Oliveira, W., Sagastizábal, C.: Bundle methods in the XXIst century: a bird’s-eye view. Pesqui. Oper. 34, 647–670 (2014)
Hare, W., Sagastizábal, C., Solodov, M.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl. 63, 1–28 (2016)
de Oliveira, W., Sagastizábal, C., Lemaréchal, C.: Convex proximal bundle methods in depth: a unified analysis for inexact oracles. Math. Program. 148(1), 241–277 (2014)
Amor, H.B., Desrosiers, J., Frangioni, A.: On the choice of explicit stabilizing terms in column generation. Discrete Appl. Math. 157(6), 1167–1184 (2009)
Bertsekas, D.: Incremental aggregated proximal and augmented Lagrangian algorithms. arXiv preprint arXiv:1509.09257 (2015)
Fischer, F., Helmberg, C.: A parallel bundle framework for asynchronous subspace optimization of nonsmooth convex functions. SIAM J. Optim. 24(2), 795–822 (2014)
Lubin, M., Martin, K., Petra, C., Sandıkçı, B.: On parallelizing dual decomposition in stochastic integer programming. Oper. Res. Lett. 41(3), 252–258 (2013)
Bertsekas, D.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
Holloway, C.: An extension of the Frank and Wolfe method of feasible directions. Math. Program. 6(1), 14–27 (1974)
Von Hohenbalken, B.: Simplicial decomposition in nonlinear programming algorithms. Math. Program. 13(1), 49–68 (1977)
Bonettini, S.: Inexact block coordinate descent methods with application to non-negative matrix factorization. IMA J. Numer. Anal. 31(4), 1431–1452 (2011)
Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26(3), 127–136 (2000)
Hildreth, C.: A quadratic programming procedure. Nav. Res. Logist. Q. 4, 79–85 (1957). 361
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Warga, J.: Minimizing certain convex functions. SIAM J. Appl. Math. 11, 588–593 (1963)
Eckstein, J.: A practical general approximation criterion for methods of multipliers based on Bregman distances. Math. Program. 96(1), 61–86 (2003)
Eckstein, J., Silva, P.: A practical relative error criterion for augmented lagrangians. Math. Program. 141(1), 319–348 (2013)
Hamdi, A., Mahey, P., Dussault, J.P.: Recent Advances in Optimization: Proceedings of the 8th French–German Conference on Optimization Trier, July 21–26, 1996, chap. A New Decomposition Method in Nonconvex Programming via a Separable Augmented Lagrangian, pp. 90–104. Springer Berlin Heidelberg, Berlin, Heidelberg (1997)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)
Glowinski, R., Marrocco, A.: Sur l’approximation, par elements finis d’ordre un, et la resolution, par penalisation-dualité, d’une classe de problems de dirichlet non lineares. Revue Française d’Automatique, Informatique, et Recherche Opérationelle 9, 41–76 (1975)
Chatzipanagiotis, N., Dentcheva, D., Zavlanos, M.: An augmented Lagrangian method for distributed optimization. Math. Program. 152(1), 405–434 (2014)
Tappenden, R., Richtárik, P., Büke, B.: Separable approximations and decomposition methods for the augmented Lagrangian. Optim. Methods Softw. 30(3), 643–668 (2015)
Mulvey, J., Ruszczyński, A.: A diagonal quadratic approximation method for large scale linear programs. Oper. Res. Lett. 12(4), 205–215 (1992)
Ruszczyński, A.: On convergence of an augmented Lagrangian decomposition method for sparse convex optimization. Math. Oper. Res. 20(3), 634–656 (1995)
Kiwiel, K., Rosa, C., Ruszczyński, A.: Proximal decomposition via alternating linearization. SIAM J. Optim. 9(3), 668–689 (1999)
Lin, X., Pham, M., Ruszczyński, A.: Alternating linearization for structured regularization problems. J. Mach. Learn. Res. 15, 3447–3481 (2014)
Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64, 81–101 (1994)
He, B., Liao, L.Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92, 103–118 (2002)
Boland, N., Christiansen, J., Dandurand, B., Eberhard, A., Linderoth, J., Luedtke, J., Oliveira, F.: Progressive hedging with a Frank–Wolfe based method for computing stochastic mixed-integer programming Lagrangian dual bounds. Optimization Online (2016). http://www.optimization-online.org/DB_HTML/2016/03/5391.html
Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
Eckstein, J., Yao, W.: Understanding the Convergence of the Alternating Direction Method of Multipliers: Theoretical and Computational Perspectives. Rutgers University, New Brunswick (2014). Tech. rep
Mahey, P., Oualibouch, S., Tao, P.D.: Proximal decomposition on the graph of a maximal monotone operator. SIAM J. Optim. 5(2), 454–466 (1995)
Feizollahi, M.J., Costley, M., Ahmed, S., Grijalva, S.: Large-scale decentralized unit commitment. Int. J. Electr. Power Energy Syst. 73, 97–106 (2015)
Gade, D., Hackebeil, G., Ryan, S.M., Watson, J.P., Wets, R.J.B., Woodruff, D.L.: Obtaining lower bounds from the progressive hedging algorithm for stochastic mixed-integer programs. Math. Program. 157(1), 47–67 (2016)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Rockafellar, R.: Convex Analysis. Princeton University Press, Princeton (1970)
Hathaway, R.J., Bezdek, J.C.: Grouped coordinate minimization using Newton’s method for inexact minimization in one vector coordinate. J. Optim. Theory Appl. 71(3), 503–516 (1991)
Kiwiel, K.C.: Approximations in proximal bundle methods and decomposition of convex programs. J. Optim. Theory Appl. 84(3), 529–548 (1995)
Bodur, M., Dash, S., Günlük, O., Luedtke, J.: Strengthened Benders cuts for stochastic integer programs with continuous recourse (2014). http://www.optimization-online.org/DB_FILE/2014/03/4263.pdf. Last Accessed 13 Jan (2015)
Ahmed, S., Garcia, R., Kong, N., Ntaimo, L., Parija, G., Qiu, F., Sen, S.: SIPLIB: A stochastic integer programming test problem library (2015). http://www.isye.gatech.edu/sahmed/siplib
Ntaimo, L.: Decomposition algorithms for stochastic combinatorial optimization: Computational experiments and extensions. Ph.D. thesis (2004)
The MathWorks, Natick: MATLAB 2012b (2014)
IBM Corporation: IBM ILOG CPLEX Optimization Studio CPLEX Users Manual. http://www.ibm.com/support/knowledgecenter/en/SSSA5P_12.6.1/ilog.odms.studio.help/pdf/usrcplex.pdf. Last Accessed 22 Aug (2016)
IBM Corporation: IBM ILOG CPLEX V12.5. http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/. Last Accessed 28 Jan (2016)
COmputational INfrastructure for Operations Research. http://www.coin-or.org/. Last Accessed 28 Jan (2016)
National Computing Infrastructure (NCI): NCI Website. http://www.nci.org.au. Last Accessed 19 Nov 2016
Gertz, E., Wright, S.: Object-oriented software for quadratic programming. ACM Trans. Math. Softw. 29(1), 58–81 (2003)
Lubin, M., Petra, C., Anitescu, M., Zavala, V.: Scalable stochastic optimization of complex energy systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 64:164:10. ACM, Seattle, WA (2011)
Clarke, F.: Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics (1990)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Australian Research Council (ARC) Grant ARC DP140100985.
Electronic supplementary material
Below is the link to the electronic supplementary material.
A Technical lemmas for establishing optimal convergence of SDM-GS
A Technical lemmas for establishing optimal convergence of SDM-GS
Given initial \((x^0,z^0) \in X \times Z \subset \mathbb {R}^n \times \mathbb {R}^q\), we consider the generation of the sequence \(\left\{ (x^k,z^k) \right\} \) with iterations computed using Algorithm 4, whose target problem is given by
where \((x,z) \mapsto F(x,z)\) is convex and continuously differentiable over \(X \times Z\), and sets X and Z are closed and convex, with X bounded and \(z \mapsto F(x,z)\) is inf-compact for each \(x \in X\).
We define the directional derivative with respect to x as
Of interest is the satisfaction of the following local stationarity condition at \(x \in X\):
for any limit point \((x,z)=(\bar{x},\bar{z})\) of some sequence \(\left\{ (x^k,z^k) \right\} \) of feasible solutions to problem (32). For the sake of nontriviality, we shall assume that the x-stationarity condition (33) never holds at \((x,z)=(x^k,z^k)\) for any \(k \ge 0\). Thus, for each \(x^k\), \(k \ge 0\), there always exists a \(d^k \in X-\left\{ x^k \right\} \) for which \(F_x'(x^k,z^k;d^k) < 0\).
Direction Assumptions (DAs) For each iteration \(k \ge 0\), given \(x^k \in X\) and \(z^k \in Z\), we have \(d^k\) chosen so that (1) \(x^k + d^k \in X\); and (2) \(F_x'(x^k,z^k;d^k) < 0\).
Gradient Related Assumption (GRA) Given a sequence \(\left\{ (x^k,z^k) \right\} \) with \(\lim _{k \rightarrow \infty } (x^k,z^k) = (\overline{x},\overline{z})\), and a bounded sequence \(\left\{ d^k \right\} \) of directions, then the existence of a direction \(\overline{d} \in X - \left\{ \overline{x} \right\} \) such that \(F_{x}'(\overline{x},\overline{z};\overline{d}) < 0\) implies that
In this case, we say that \(\left\{ d^k \right\} \) is gradient related to \(\left\{ x^k \right\} \). This gradient related condition is similar to the one defined in [3]. The sequence of directions \({d^k}\) is typically gradient related to \(\left\{ x^k \right\} \) by construction. (See Lemma 7.)
To state the last assumption, we require the notion of an Armijo rule step length \(\alpha ^k \in (0,1]\) given \((x^k,z^k,d^k)\) and parameters \(\beta ,\sigma \in (0,1)\).
Remark 6
Under mild assumptions on F such as continuity that guarantee the existence of finite \(F_x'(x,z;d)\) for all \((x,z,d) \in \left\{ (x,z,d) : x \in X, d \in X-\left\{ x \right\} , z \in Z \right\} \), we may assume that the while loop of Lines 3–5 terminates after a finite number of iterations. Thus, we have \(\alpha ^k \in (0,1]\) for each \(k \ge 1\).
The last significant assumption is stated as follows.
Sufficient Decrease Assumption (SDA) For sequences \(\left\{ (x^k,z^k,d^k) \right\} \) and step lengths \(\left\{ \alpha ^k \right\} \) computed according to Algorithm 4, we assume for each \(k \ge 0\), that \((x^{k+1},z^{k+1})\) satisfies
Lemma 6
For problem (32), let \(F : \mathbb {R}^{n_x} \times \mathbb {R}^{n_z} \mapsto \mathbb {R}\) be convex and continuously differentiable, \(X \subset \mathbb {R}^{n_x}\) convex and compact, and \(Z \subseteq \mathbb {R}^{n_z}\) closed and convex. Furthermore, assume for each \(x \in X\) that \(z \mapsto F(x,z)\) is inf-compact. If a sequence \(\left\{ (x^k,z^k,d^k) \right\} \) satisfies the DA, the GRA, and the SDA for some fixed \(\beta ,\sigma \in (0,1)\), then the sequence \({(x^k,z^k)}\) has limit points \((\overline{x},\overline{z})\), each of which satisfies the stationarity condition (33).
Proof
The existence of limit points \((\overline{x},\overline{z})\) follows from the compactness of X, the inf-compactness of \(z \mapsto F(x,z)\) for each \(x \in X\), and the SDA. In generating \(\left\{ \alpha ^k \right\} \) according to the Armijo rule as implemented in Lines 2–5 of Algorithm 4, we have
By the DA, \(F_x'(x^k,z^k ; d^k) < 0\) and since \(\alpha ^k > 0\) for each \(k \ge 1\) by Remark 6, we infer from (35) that \(F(x^k + \alpha ^k d^k,z^k) < F(x^k,z^k)\). By construction, we have \(F(x^{k+1},z^{k+1}) \le F(x^k + \alpha ^k d^k,z^k) < F(x^k,z^k).\). By the monotonicity \(F(x^{k+1},z^{k+1}) < F(x^k,z^k)\) and F being bounded from below on \(X \times Z\), we have \(\lim _{k \rightarrow \infty } F(x^k,z^k) = \bar{F} > -\infty \). Therefore,
which implies
We assume for sake of contradiction that \(\lim _{k \rightarrow \infty } (x^k,z^k) = (\overline{x},\overline{y})\) does not satisfy the stationarity condition (33). By GRA, we have that \(\left\{ d^k \right\} \) is gradient related to \(\left\{ x^k \right\} \); that is,
Thus, it follows from (35)–(37) that \(\lim _{k \rightarrow \infty } \alpha ^k = 0\).
Consequently, after a certain iteration \(k \ge \bar{k}\), we can define \(\left\{ \bar{\alpha }^k \right\} \), \(\bar{\alpha }^k = \alpha ^k / \beta \), where \(\bar{\alpha }^k \le 1\) for \(k \ge \bar{k}\), and so we have
Since F is continuously differentiable, the mean value theorem may be applied to the right-hand side of (38) to get
for some \(\widetilde{\alpha }^k \in [0,\overline{\alpha }^k]\).
Again, using the assumption \(\limsup _{k \rightarrow \infty } F_x'(x^k,z^k ; d^k) < 0\), and also the compactness of \(X - X\), we take a limit point \(\overline{d}\) of \(\left\{ d^k \right\} \), with its associated subsequence index set denoted by \(\mathcal {K}\), such that \(F_x'(\overline{x},\overline{z},\overline{d}) < 0\). Taking the limits over the subsequence indexed by \(\mathcal {K}\), we have \(\lim _{k \rightarrow \infty , k \in \mathcal {K}} F_x'(x^k,z^k ; d^k) = F_x'(\overline{x},\overline{z} ; \overline{d})\) and \(\lim _{k \rightarrow \infty , k \in \mathcal {K}} F_x'(x^k +\widetilde{\alpha }^k d^k, z^k ; d^k) = F_x'(\overline{x},\overline{z} ; \overline{d})\). These two limits holds since (1) \((x,z) \mapsto F_x'(x,z;d)\) for each \(d \in X - X\) is continuous and (2) \(d \mapsto F_x'(x,z;d)\) is locally Lipschitz continuous for each \((x,z) \in X \times Z\) (e.g., Proposition 2.1.1 of [60]); these two facts together imply that \((x,z;d) \mapsto F_x'(x,z;d)\) is continuous. Then, inequality (39) becomes in the limit as \(k \rightarrow \infty \), \(k \in \mathcal {K}\),
Since \((1-\sigma ) > 0\) and \(F_x'(\overline{x},\overline{z} ; \overline{d})<0\), we have a contradiction. Thus, \(\overline{x}\) must satisfy the stationary condition (33). \(\square \)
Remark 7
Noting that \(F_x'(x^k,z^k;d^k) = \nabla _x F(x^k,z^k) ^\top d^k\) under the assumption of continuous differentiability of F, one means of constructing \(\left\{ d^k \right\} \) is as follows:
Lemma 7
Given sequence \(\left\{ (x^k,z^k) \right\} \) with \(\lim _{k \rightarrow \infty } (x^k,z^k) = (\overline{x},\overline{z})\), let each \(d^k\), \(k \ge 1\), be generated as in (40). Then \(\left\{ d^k \right\} \) is gradient related to \(\left\{ x^k \right\} \).
Proof
By the construction of \(d^k\), \(k \ge 1\), we have
Taking the limit, we have
where the last inequality follows from the upper semicontinuity of the function \((x,z,d) \mapsto F_x'(x,z;d)\), which holds in our setting due, primarily, to Proposition 2.1.1 (b) of [60] given that F is assumed to be convex and continuous on \(\mathbb {R}^n\). Taking
we have by the assumed nonstationarity that \(F_x'(\overline{x},\overline{z};\overline{d}) < 0\). Thus, \(\limsup _{k \rightarrow \infty } F_x'(x^k, z^k ; d^k) < 0\), and so GRA holds. \(\square \)
Rights and permissions
About this article
Cite this article
Boland, N., Christiansen, J., Dandurand, B. et al. A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problems. Math. Program. 175, 503–536 (2019). https://doi.org/10.1007/s10107-018-1253-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1253-9
Keywords
- Augmented Lagrangian method
- Proximal bundle method
- Nonlinear block Gauss–Seidel method
- Simplicial decomposition method
- Parallel computing