[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A Multi-Scale Method for Distributed Convex Optimization with Constraints

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a multi-scale method to design a continuous-time distributed algorithm for constrained convex optimization problems by using multi-agents with Markov switched network dynamics and noisy inter-agent communications. Unlike most previous work which mainly puts emphasis on dealing with fixed network topology, this paper tackles the challenging problem of investigating the joint effects of stochastic networks and the inter-agent communication noises on the distributed optimization dynamics, which has not been systemically studied in the past literature. Also, in sharp contrast to previous work in constrained optimization, we depart from the use of projected gradient flow which is non-smooth and hard to analyze; instead, we design a smooth optimization dynamics which leads to easier convergence analysis and more efficient numerical simulations. Moreover, the multi-scale method presented in this paper generalizes previously known distributed convex optimization algorithms from the fixed network topology to the switching case and the stochastic averaging obtained in this paper is a generalization of the existing deterministic averaging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    MATH  Google Scholar 

  2. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  3. Brunner, F.D., Dürr, H.B., Ebenbauer, C.: Feedback design for multi-agent systems: a saddle point approach. In: The 51st IEEE Conference on Decision and Control, pp. 3783–3789, (2012)

  4. Carli, R., Fagnani, F., Speranzon, A., Zampiei, S.: Communication constraints in the average consensus problem. IEEE Trans. Autom. Control 44(3), 671–684 (2008)

    MathSciNet  MATH  Google Scholar 

  5. Charalamous, C.: Distributed constrained optimization by consensus-based primal-dual perturbation method. IEEE Trans. Autom. Control 59(6), 1524–1538 (2014)

    MathSciNet  MATH  Google Scholar 

  6. Chen, G., Yi, P., Hong, Y.: Distributed optimization with projection-free dynamics (2021). arXiv:2105.02450

  7. Duchi, J.C., Agarwal, A., Wainwright, M.M.: Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans. Autom. Control 57(3), 592–606 (2012)

    MathSciNet  MATH  Google Scholar 

  8. Evans, L.C.: Partial Differential Equations, 2nd edn. American Mathematical Society, Providence (2010)

    MATH  Google Scholar 

  9. Feijer, D., Paganini, F.: Stability of primal-dual gradient dynamics and applications to network optimization. Automatica 46(12), 1974–1981 (2010)

    MathSciNet  MATH  Google Scholar 

  10. Fragoso, M.D., Costa, O.L.V.: A unified approach for stochastic and mean square stability of continuous-time linear systems with markovian jumping parameters and additive disturbances. SIAM J. Control Optim. 44(4), 1165–1190 (2015)

    MathSciNet  MATH  Google Scholar 

  11. Freedman, D.: Markov Chains. Springer, New York (1983)

    MATH  Google Scholar 

  12. Fukushima, M.: A relaxed projection method for variational inequalities. Math. Program. 35(1), 58–79 (1986)

    MathSciNet  MATH  Google Scholar 

  13. Haken, H.: Synergetik. Springer, New York (1982)

    MATH  Google Scholar 

  14. Hosseini, S., Chapman, A., Mesbahi, M.: Online distributed convex optimization on dynamic networks. IEEE Trans. Autom. Control 61(11), 3545–3550 (2016)

    MathSciNet  MATH  Google Scholar 

  15. Jakovetic, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)

    MathSciNet  MATH  Google Scholar 

  16. Kia, S.S., Corts, J., Martnez, S.: Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication. Automatica 55, 254–264 (2015)

    MathSciNet  Google Scholar 

  17. Li, T., Wu, F., Zhang, J.: Multi-agent consensus with relative-state-dependent measurement noises. IEEE Trans. Autom. Control 59(9), 2463–2468 (2014)

    MathSciNet  MATH  Google Scholar 

  18. Lobel, I., Ozdaglar, A.: Distributed subgradient methods for convex optimization over random networks. IEEE Trans. Autom. Control 56(6), 1291–1306 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Lou, Y., Hong, Y., Wang, S.: Distributed continuous-time approximate projection protocols for shortest distance optimization problems. Automatica 69, 289–297 (2016)

    MathSciNet  MATH  Google Scholar 

  20. Mao, X.: Stochastic versions of the lasalle theorem. J. Differ. Equ. 153, 175–195 (1999)

    MathSciNet  MATH  Google Scholar 

  21. Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)

    MathSciNet  MATH  Google Scholar 

  22. Nedic, A., Ozdaglar, A.: Distributed optimization over time-varying directed graphs. IEEE Trans. Autom. Control 60(3), 601–615 (2015)

    MathSciNet  MATH  Google Scholar 

  23. Nedic, A., Ozdaglar, A., Parrilo, P.A.: Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)

    MathSciNet  MATH  Google Scholar 

  24. Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, Chichester (1983)

    Google Scholar 

  25. Ni, W., Wang, X.: Averaging method to distributed convex optimization for continuous-time multi-agent systems. Kybernetika 52(6), 898–913 (2016)

    MathSciNet  MATH  Google Scholar 

  26. Ni, W., Wang, X., Xiong, C.: Leader-following consensus of multiple linear systems under switching topologies: an averaging method. Kybernetika 48(6), 1194–1210 (2012)

    MathSciNet  MATH  Google Scholar 

  27. Ni, W., Wang, X., Xiong, C.: Consensus controllability, observability and robust design for leader-following linear multi-agent systems. Automatica 49(7), 2199–2205 (2013)

    MathSciNet  MATH  Google Scholar 

  28. Ni, W., Zhao, D., Ni, Y., Wang, X.: Stochastic averaging approach to leader-following consensus of linear multi-agent systems. J. Frankl. Inst. 353(12), 2650–2669 (2016)

    MathSciNet  MATH  Google Scholar 

  29. Parikh, N.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

    Google Scholar 

  30. Pavliotis, G.V., Stuart, A.M.: Multiscale Methods: Averaging and Homogenization. Springer, New York (2008)

    MATH  Google Scholar 

  31. Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Netw. Syst. 5(3), 1245–1260 (2017)

    MathSciNet  MATH  Google Scholar 

  32. Raginsky, M., Bouvrie, J.: Continuous-time stochastic mirror descent on a network: variance reduction, consensus, convergence. In: The 51st IEEE Conference on Decision and Control, pp. 6793–6800, (2012)

  33. Ram, S.S., Nedic, A., Veeravalli, V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147, 516–545 (2010)

    MathSciNet  MATH  Google Scholar 

  34. Ram, S.S., Nedic, A., Veeravalli, V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20(2), 691–717 (2010)

    MathSciNet  MATH  Google Scholar 

  35. Tsitsiklis, J.N., Bertsekas, D.P., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)

    MathSciNet  MATH  Google Scholar 

  36. Wachsmuth, G.: On licq and the uniqueness of lagrange multipliers. Oper. Res. Lett. 41(1), 78–80 (2013)

    MathSciNet  MATH  Google Scholar 

  37. Wang, H., Li, Huaqing H., Zhou, B.: Distributed optimization under inequality constraints and random projections. In: Distributed Optimization, Game and Learning Algorithms, pp. 39–65. Springer, (2021)

  38. Wang, J., Elia, N.: Mitigation of complex behavior over networked systems: analysis of spatially invariant structures. Automatica 49(6), 1626–1638 (2013)

    MathSciNet  MATH  Google Scholar 

  39. Wolfowitz, J.: Products of indecomposable, aperiodic, stochastic matrices. Proc. Am. Math. Soc. 14(4), 733–737 (1963)

    MathSciNet  MATH  Google Scholar 

  40. Xie, P., You, K., Tempo, R., Wu, C.: Distributed convex optimization with inequality constraints over time-varying unbalanced digraphs. IEEE Trans. Autom. Control 63(12), 4331–4337 (2018)

    MathSciNet  MATH  Google Scholar 

  41. Xin, R., Khan, U.A.: A linear algorithm for optimization over directed graphs with geometric convergence. IEEE Control Syst. Lett. 2(3), 315–320 (2018)

    MathSciNet  Google Scholar 

  42. Yang, Q.: The relaxed CQ algorithm solving the split feasibility problem. Inverse Probl. 20(4), 1261–1266 (2004)

    MathSciNet  MATH  Google Scholar 

  43. Yi, P., Hong, Y., Liu, F.: Distributed gradient algorithm for constrained optimization with application to load sharing in power systems. Syst. Control Lett. 83, 45–52 (2015)

    MathSciNet  MATH  Google Scholar 

  44. Yuan, D., Ho, D.W.C., Hong, Y.: On convergence rate of distributed stochastic gradient algorithm for convex optimization with inequality constraints. SIAM J. Control Optim. 54(5), 2872–2892 (2016)

    MathSciNet  MATH  Google Scholar 

  45. Zeng, X., Yi, P., Hong, Y.: Distributed continuous-time algorithm for constrained convex optimizations via nonsmooth analysis approach. IEEE Trans. Autom. Control 62(10), 5227–5233 (2017)

    MathSciNet  MATH  Google Scholar 

  46. Zhu, M., Martnez, S.: On distributed convex optimization under inequality and equality constraints. IEEE Trans. Autom. Control 57(1), 691–719 (2011)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The first author would like to thank Professor Zhong-Ping Jiang for his discussions on this paper when he visited New York University. This work is supported by the NSFC of China under the Grants 61663026, 61963028, 62066026, 61866023 and Jiangxi NSF Under Grant 20192BAB207025.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Ni.

Additional information

Communicated by Xiaoqi Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Derivation of The SDE (6)

To characterize the noisy term \(\omega _i:=\sum \nolimits _{j\in {\mathscr {N}}_i(t)}\sigma _{ji}\xi _{ji}(x_j-x_i)\) in Eqs. (5a)–(5b), define \(\xi _i=[\xi _{1i}, \xi _{2i}, \cdots , \xi _{Ni}]^T\), \(i=1, \cdots , N\). Also, for each switching mode \(s\in {\mathbb {S}}\), define \({\mathcal {M}}^i_{s}=[a^{i1}_{s}\sigma _{1i}(x_1-x_i), \cdots , a^{iN}_{s}\sigma _{Ni}(x_N-x_i)]\). Then the noisy term \(\omega _i\) above can be written as \(\omega _i={\mathcal {M}}^i_{s} \xi _i\). Therefore, Eq. (5a) becomes

where \(\xi _i dt= d w_i\) with \(w_i\) an N-dimensional standard Brownian motion on the probability space \((\varOmega , {\mathcal {F}}, {\mathbb {P}}, \{{\mathcal {F}}_t\}_{t\ge 0})\). Let \({\varvec{x}}=\mathrm{col}\{x_1, \cdots , x_N\}\). Define the stacked functions \(G({\varvec{x}})=\mathrm{col}\{\mathrm{col}\{g_{ij}(x_i)\}_{j=1}^{r_i}\}_{i=1}^N \in \mathbb {R}^r\), \(H({\varvec{x}})=\mathrm{col}\{\mathrm{col}\{h_{ij}(x_i)\}_{j=1}^{s_i}\}_{i=1}^N \in \mathbb {R}^s\) and the stacked gradients \(\nabla F({\varvec{x}})=\mathrm{col}\{\nabla f_{i}(x_i)\}_{i=1}^{N}\in \mathbb {R}^{nN}\), \(\nabla G({\varvec{x}})=\mathrm{col}\big \{\mathrm{col}\{\nabla g_{ij}(x_i)\}_{j=1}^{r_i}\big \}_{i=1}^{N} \in \mathbb {R}^{rn}\), \(\nabla H({\varvec{x}})=\mathrm{col}\big \{\mathrm{col}\{\nabla h_{ij}(x_i)\}_{j=1}^{s_i}\big \}_{i=1}^{N}\in \mathbb {R}^{sn}\). For each switching mode \(s\in {\mathbb {S}}\), set \({\mathcal {M}}_s=\mathrm{diag}\{{\mathcal {M}}_s^i\}_{i=1}^{N}\) \(\in \mathbb {R}^{nN\times N^2}\). In addition, let \({\varvec{w}}=\mathrm{col} \{w_i\}_{i=1}^{N} \in \mathbb {R}^{N^2}\) and define \(r=\sum _{i=1}^{N}r_i\), \(s=\sum _{i=1}^{N}s_i\). Set

figure i

with

figure j

, and

figure k

with

figure l

. Define \({\varvec{\theta }}=\mathrm{col} \{\theta _1, \cdots , \theta _N\}\in {\mathbb {R}}^{nN}\).

With these, Eq. (5) can be written in a compact form as in (6).

Appendix B: Proof of The Inequality (10)

Firstly, for \(V_1\), defining \({\varvec{\chi }}= \nabla F({\varvec{x}})+\varvec{\lambda } \odot \nabla G({\varvec{x}})+\varvec{\nu } \odot \nabla H({\varvec{x}})\) and \({\varvec{\chi }}^*= \nabla F({\varvec{x}}^*)+\varvec{\lambda }^* \odot \nabla G({\varvec{x}}^*)+\varvec{\nu }^* \odot \nabla H({\varvec{x}}^*)\) and noting \({\varvec{\theta }}^*=-{\varvec{\chi }}^*\) in view of (7), the action of the infinitesimal operator \(\mathcal A\) on \(V_1\) can be calculated as

$$\begin{aligned} \mathcal {A}V_1&=\underbrace{({\varvec{x}}-{\varvec{x}}^*)^T[-c\varvec{{\mathcal {L}}}{\varvec{x}}-{\varvec{\theta }}-\nabla F({\varvec{x}})]+\frac{3}{2}c^2tr({\mathcal {M}}^T{\mathcal {M}})}_{A} \nonumber \\&\quad -\sum _{i=1}^{N}\sum _{j=1}^{r_i}\lambda _{ij}(x_i-x^*)^T \nabla g_{ij}(x_i)+\underbrace{[({\varvec{x}}-{\varvec{x}}^*)+({\varvec{\theta }}-{\varvec{\theta }}^*)]^T[-({\varvec{\theta }}-{\varvec{\theta }}^*)-({\varvec{\chi }}-{\varvec{\chi }}^*)]}_{B}\nonumber \\&\quad -\sum _{i=1}^{N}\sum _{j=1}^{s_i}\nu _{ij}(x_i-x^*)^T \nabla h_{ij}(x_i). \end{aligned}$$
(19)

The trace \(\mathrm{tr}({\mathcal {M}}^T{\mathcal {M}})\) brings difficulty to the convergence analysis. To get rid of this difficulty, we give an estimation of \(\mathrm{tr}({\mathcal {M}}^T{\mathcal {M}})\) as follows,

$$\begin{aligned} \mathrm{tr}({\mathcal {M}}^T{\mathcal {M}})&=\sum \limits _{i=1}^N \mathrm{tr}[({\mathcal {M}}^i)^T{\mathcal {M}}^i]=\sum \limits _{i=1}^N \sum \limits _{j=1}^N(a^{ij})^2(\sigma _{ji})^2(x_j-x_i)^T(x_j-x_i)\nonumber \\&\le \kappa ^2\sum \limits _{i=1}^N \sum \limits _{j=1}^Na^{ij}(x_j-x_i)^T(x_j-x_i) =\kappa ^2 {\varvec{x}}^T \varvec{{\mathcal {L}}} {\varvec{x}}. \end{aligned}$$
(20)

As for Part A in (19) , noting \({\varvec{x}}^T \varvec{{\mathcal {L}}} {\varvec{x}}= ({\varvec{x}}-{\varvec{x}}^*)^T \varvec{{\mathcal {L}}} ({\varvec{x}}-{\varvec{x}}^*)\) and (c.f. Eq. (9)), the part A can be estimated as \(A\le ({\varvec{x}}-{\varvec{x}}^*)^T[-\hbar \varvec{{\mathcal {L}}}{\varvec{x}}-{\varvec{\theta }}-\nabla F({\varvec{x}})]-\hbar ({\varvec{x}}-{\varvec{x}}^*)^T \varvec{{\mathcal {L}}}({\varvec{x}}-{\varvec{x}}^*)\). Noting that the square bracket is exactly the minus gradient of \(\varPsi ({\varvec{x}}, {\varvec{\theta }})\) (c.f. Eq. (9)) with respect to \({\varvec{x}}\) and denoting by \(\lambda ^{{\mathcal {G}}}_2\) the smallest nonzero eigenvalue of the Laplacian \({\mathcal {L}}\), one obtains \(A \le ({\varvec{x}}^*-{\varvec{x}})^T \nabla _{{\varvec{x}}} \varPsi ({\varvec{x}}, {\varvec{\theta }})-\hbar \lambda ^{{\mathcal {G}}}_2({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)\le \varPsi ({\varvec{x}}^*, {\varvec{\theta }})-\varPsi ({\varvec{x}},{\varvec{\theta }})-\hbar \lambda ^{{\mathcal {G}}}_2({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)\), where the last inequality uses the fact that \( \varPsi ({\varvec{x}}, {\varvec{\theta }})\) is convex in its first argument. As for part B in (19) , we can prove the inequality \(B\le ({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{\theta }}-{\varvec{\theta }}^*)+({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)\) by rewriting it into a quadratic form in terms of \({\varvec{x}}-{\varvec{x}}^*, {\varvec{\theta }}-{\varvec{\theta }}^*, {\varvec{\chi }}-{\varvec{\chi }}^*\) and by showing the corresponding matrix to be semi-positive definite. In view of \(({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{\theta }}-{\varvec{\theta }}^*)=\varPsi ({\varvec{x}}, {\varvec{\theta }})-\varPsi ({\varvec{x}}, {\varvec{\theta }}^*)\), one obtains \(A+B\le \left[ \varPsi ({\varvec{x}}^*, {\varvec{\theta }})-\varPsi ({\varvec{x}}, {\varvec{\theta }}^*)\right] -(\hbar \lambda ^{{\mathcal {G}}}_2-1)({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)\). Now,

$$\begin{aligned} {\mathcal {A}}V_1&\le \varPsi ({\varvec{x}}^*, {\varvec{\theta }})- \varPsi ({\varvec{x}}, {\varvec{\theta }}^*)-(\hbar \lambda ^{{\mathcal {G}}}_2-1)({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)+\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i}\lambda _{ij}g_{ij}(x^*)\nonumber \\&-\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i}\lambda _{ij}g_{ij}(x_i)+\sum \limits _{i=1}^{N}\sum \nolimits _{j=1}^{s_i}\nu _{ij}h_{ij}(x^*)-\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{s_i}\nu _{ij}h_{ij}(x_i), \end{aligned}$$

where we use the convexity of the functions \(g_{ij}\) and \(h_{ij}\).

Secondly, we calculate the action of the infinitesimal operator \(\mathcal A\) on \(V_2+V_3\). Firstly note that the function \(V_3\) can be calculated by the definition of Bregman divergence as \(V_3=\sum \nolimits _{i=1}^{N} \sum \nolimits _{j=1}^{r_i}(\lambda _{ij}-\lambda _{ij}^*)-\sum _{(i,j)\in \varOmega }\lambda _{ij}^*(\ln \lambda _{ij}-\ln \lambda _{ij}^*)\). Therefore,

$$\begin{aligned} {\mathcal {A}}V_2 + \mathcal {A}V_3&=\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i} \frac{\eta _{ij}\lambda _{ij}(\lambda _{ij}-\lambda _{ij}^*)}{1+\eta _{ij}\lambda _{ij}} g_{ij}(x_i)+\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i} \frac{\lambda _{ij}}{1+\eta _{ij}\lambda _{ij}} g_{ij}(x_i)\\&\quad -\sum \limits _{(i,j)\in \varOmega } \frac{\lambda _{ij}^*}{1+\eta _{ij}\lambda _{ij}} g_{ij}(x_i)=\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i} (\lambda _{ij}-\lambda _{ij}^*)g_{ij}(x_i). \end{aligned}$$

Furthermore, the action of the infinitesimal operator \(\mathcal A\) on \(V_4\) can be easily calculated as \({\mathcal {A}}V_4=\sum \nolimits _{i=1}^{N}\sum \nolimits _{j=1}^{s_i} (v_{ij}-v_{ij}^*) h_{ij}(x_i)\). Collecting above results for \(\mathcal {A}V_i, i=1,2,3,4\) and recalling the definition of \( \varPhi \) in (8), one obtains the inequality (10).

Appendix C: Proof of The Saddle Point Conditions (11)

Due to convexity and affinity, the following four results hold: \(\varSigma _{i=1}^N f_i(x_i)+\hbar {\varvec{x}}{\varvec{\mathcal {L}}}{\varvec{x}}\ge \varSigma _{i=1}^N f_i(x^*)+\varSigma _{i=1}^N \nabla f_i(x^*)(x_i-x^*)\), \(g_{ij}(x_i)\ge g_{ij}(x^*)+\nabla g_{ij}(x^*)(x_i-x^*)\), \(h_{ij}(x_i) = h_{ij}(x^*)+ \nabla h_{ij}(x^*)(x_i-x^*)\), and \({\varvec{x}}-{\varvec{x}}^*={\varvec{x}}-{\varvec{x}}^*\). Multiplying them by \(1, \lambda _{ij}^*, \nu _{ij}^*, {\varvec{\theta }}^*\), respectively (for the forth equality we use \(({\varvec{x}}-{\varvec{x}}^*)^T {\varvec{\theta }}^*\) for multiplication), and adding gives \(\varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\ge \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)+[{\varvec{\theta }}^*+ \nabla F({\varvec{x}}^*) +\varvec{\lambda }^* \odot \nabla G({\varvec{x}}^*)+\varvec{\nu }^* \odot \nabla H({\varvec{x}}^*)]^T({\varvec{x}}-{\varvec{x}}^*)\). In view of Eq. (7), the terms in the square bracket sum to be zero. Therefore, \(\varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*) \ge \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\).

On other hand, noting that \(\lambda _{ij}g_{ij}(x_i^*)\le 0=\lambda _{ij}^*g_{ij}(x_i^*)\) and \(h_{ij}(x^*)=0\), it can be directly checked that \(\varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})\le \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\). This inequality, together with the inequality derived in last paragraph, gives rise to the saddle point condition (11).

Appendix D: Proof of “\({\mathcal {A}}V=0 \Rightarrow ({\varvec{x}}, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})=({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\)

Letting \({\mathcal {A}}V=0\) gives \(({\varvec{x}}-{\varvec{x}}^*)^T \,({\varvec{x}}-{\varvec{x}}^*)=0\) and \(\varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})= \varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\), where the former implies \({\varvec{x}}={\varvec{x}}^*\) and the latter, together with the fact that \(({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\) is a saddle point of the Lagrangian \( \varPhi \), implies \( \varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})= \varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)=\varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\). Recall that \({\varvec{x}}^*={\mathbf {1}}_N \otimes x^*\) with \(x^*\) being the optimal solution which satisfies \(g_{ij}(x^*)\le 0\) in the KKT condition (2a). Inserting \(x_i=x^*\) into (5c) yields \(\dot{\lambda }_{ij}=\frac{\lambda _{ij}g_{ij}(x^*)}{1+\eta _{ij}\lambda _{ij}}\). If \(g_{ij}(x^*)=0\), then \(\dot{\lambda }_{ij}=0\) which implies that \(\lambda _{ij}(t)\) stays positive for all \(t\ge 0\) since the initial value of \(\lambda _{ij}\) is chosen to be positive. If \(g_{ij}(x^*)<0\), then with \(g_{ij}(x^*)=-a<0\), one has \(\dot{\lambda }_{ij}=\frac{-a\lambda _{ij}}{1+\eta _{ij}\lambda _{ij}}\) whose trajectory can be shown by elementary analysis as \(\lambda _{ij}(t)\ge 0\) for \(t\ge 0\) since the initial value of \(\lambda _{ij}\) is chosen to be positive. In short, in both cases, for Eq. (5c) with \(x_i=x^*\), one has that \(\lambda _{ij}(t)\ge 0, \forall t\ge 0\) provided the initial value is positive, and consequently \(\dot{\lambda }_{ij}(t)\le 0\) since \(g_{ij}(x^*)\le 0\). Therefore, \(\lambda _{ij}\ge \lambda _{ij}^*\). On the other hand, the fact \( \varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})= \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\) yields \(\sum _{i=1}^{N}\sum _{j=1}^{r_i}(\lambda _{ij}-\lambda _{ij}^*)g_{ij}(x^*)\) \(=0\). Since \(g_{ij}(x^*)\le 0\) and \(\lambda _{ij}\ge \lambda _{ij}^*\), each sum in this summation is non-positive and therefore \((\lambda _{ij}-\lambda _{ij}^*)g_{ij}(x^*)=0\), i.e., \(\lambda _{ij}g_{ij}(x^*)=\lambda _{ij}^*g_{ij}(x^*)=0\). This fact implies that the right-hand side of Eq. (5c) with \(x_i=x^*\) is zero and thus \( \lambda _{ij}=\lambda _{ij}^{\dag }\) for some constant \(\lambda _{ij}^{\dag }\ge 0\). Thus, \(\lambda _{ij}^{\dag }g_{ij}(x^*)=0\). Inserting \({\varvec{x}}^*={\mathbf {1}}_N\otimes x^*\) into Eqs. (5b) and (5d) gives that \(\theta _i=\theta _i^{\dag }\) and \(\nu _{ij}=\nu _{ij}^{\dag }\) for some constants \(\theta _i^{\dag }\) and \(\nu _{ij}^{\dag }\). Inserting \(({\varvec{x}}^*, \theta _i^{\dag }, \lambda _{ij}^{\dag }, \nu _{ij}^{\dag })\) into (5a) gives \(\sum _{i=1}^N\nabla f_i(x^*)+\sum _{i=1}^N\sum _{j}^{r_i}\lambda _{ij}^{\dag }\nabla g_{ij}(x^*)+\sum _{i=1}^N\sum _{j}^{s_i}\nu _{ij}^{\dag }\nabla h_{ij}(x^*)=0\). In conclusion, the KKT conditions (2) are satisfied at the point \(({\varvec{x}}^*, {\varvec{\lambda }}^\dag , {\varvec{\nu }}^\dag )\). Then by uniqueness of multipliers in (4), \(\lambda _{ij}^{\dag }=\lambda _{ij}^*\), \(\nu _{ij}^{\dag }=\nu _{ij}^*\).

Also, by differentiating both sides of \( \varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)= \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\) with respect to \({\varvec{x}}\) and then by enforcing \({\varvec{x}}={\varvec{x}}^*\), one obtains \({\varvec{\theta }}+ \nabla F({\varvec{x}}^*) +\varvec{\lambda }^* \odot \nabla G({\varvec{x}}^*)+\varvec{\nu }^* \odot \nabla H({\varvec{x}}^*)=0\). This, combined with Eq. (7), gives \({\varvec{\theta }}={\varvec{\theta }}^*\). In conclusion, letting \({\mathcal {A}}V=0\) gives rise to \(({\varvec{x}}, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})=({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ni, W., Wang, X. A Multi-Scale Method for Distributed Convex Optimization with Constraints. J Optim Theory Appl 192, 379–400 (2022). https://doi.org/10.1007/s10957-021-01982-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-021-01982-0

Keywords

Navigation