A Multi-Scale Method for Distributed Convex Optimization with Constraints

587 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

This paper proposes a multi-scale method to design a continuous-time distributed algorithm for constrained convex optimization problems by using multi-agents with Markov switched network dynamics and noisy inter-agent communications. Unlike most previous work which mainly puts emphasis on dealing with fixed network topology, this paper tackles the challenging problem of investigating the joint effects of stochastic networks and the inter-agent communication noises on the distributed optimization dynamics, which has not been systemically studied in the past literature. Also, in sharp contrast to previous work in constrained optimization, we depart from the use of projected gradient flow which is non-smooth and hard to analyze; instead, we design a smooth optimization dynamics which leads to easier convergence analysis and more efficient numerical simulations. Moreover, the multi-scale method presented in this paper generalizes previously known distributed convex optimization algorithms from the fixed network topology to the switching case and the stochastic averaging obtained in this paper is a generalization of the existing deterministic averaging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Distributed Nonsmooth Convex Optimization over Markovian Switching Random Networks with Two Step-Sizes

Article 12 January 2021

Distributed Heterogeneous Multi-Agent Optimization with Stochastic Sub-Gradient

Article 11 June 2024

Continuous-time Distributed Heavy-ball Algorithm for Distributed Convex Optimization over Undirected and Directed Graphs

Article 21 January 2022

References

Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Brunner, F.D., Dürr, H.B., Ebenbauer, C.: Feedback design for multi-agent systems: a saddle point approach. In: The 51st IEEE Conference on Decision and Control, pp. 3783–3789, (2012)
Carli, R., Fagnani, F., Speranzon, A., Zampiei, S.: Communication constraints in the average consensus problem. IEEE Trans. Autom. Control 44(3), 671–684 (2008)
MathSciNet MATH Google Scholar
Charalamous, C.: Distributed constrained optimization by consensus-based primal-dual perturbation method. IEEE Trans. Autom. Control 59(6), 1524–1538 (2014)
MathSciNet MATH Google Scholar
Chen, G., Yi, P., Hong, Y.: Distributed optimization with projection-free dynamics (2021). arXiv:2105.02450
Duchi, J.C., Agarwal, A., Wainwright, M.M.: Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans. Autom. Control 57(3), 592–606 (2012)
MathSciNet MATH Google Scholar
Evans, L.C.: Partial Differential Equations, 2nd edn. American Mathematical Society, Providence (2010)
MATH Google Scholar
Feijer, D., Paganini, F.: Stability of primal-dual gradient dynamics and applications to network optimization. Automatica 46(12), 1974–1981 (2010)
MathSciNet MATH Google Scholar
Fragoso, M.D., Costa, O.L.V.: A unified approach for stochastic and mean square stability of continuous-time linear systems with markovian jumping parameters and additive disturbances. SIAM J. Control Optim. 44(4), 1165–1190 (2015)
MathSciNet MATH Google Scholar
Freedman, D.: Markov Chains. Springer, New York (1983)
MATH Google Scholar
Fukushima, M.: A relaxed projection method for variational inequalities. Math. Program. 35(1), 58–79 (1986)
MathSciNet MATH Google Scholar
Haken, H.: Synergetik. Springer, New York (1982)
MATH Google Scholar
Hosseini, S., Chapman, A., Mesbahi, M.: Online distributed convex optimization on dynamic networks. IEEE Trans. Autom. Control 61(11), 3545–3550 (2016)
MathSciNet MATH Google Scholar
Jakovetic, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)
MathSciNet MATH Google Scholar
Kia, S.S., Corts, J., Martnez, S.: Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication. Automatica 55, 254–264 (2015)
MathSciNet Google Scholar
Li, T., Wu, F., Zhang, J.: Multi-agent consensus with relative-state-dependent measurement noises. IEEE Trans. Autom. Control 59(9), 2463–2468 (2014)
MathSciNet MATH Google Scholar
Lobel, I., Ozdaglar, A.: Distributed subgradient methods for convex optimization over random networks. IEEE Trans. Autom. Control 56(6), 1291–1306 (2011)
MathSciNet MATH Google Scholar
Lou, Y., Hong, Y., Wang, S.: Distributed continuous-time approximate projection protocols for shortest distance optimization problems. Automatica 69, 289–297 (2016)
MathSciNet MATH Google Scholar
Mao, X.: Stochastic versions of the lasalle theorem. J. Differ. Equ. 153, 175–195 (1999)
MathSciNet MATH Google Scholar
Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
MathSciNet MATH Google Scholar
Nedic, A., Ozdaglar, A.: Distributed optimization over time-varying directed graphs. IEEE Trans. Autom. Control 60(3), 601–615 (2015)
MathSciNet MATH Google Scholar
Nedic, A., Ozdaglar, A., Parrilo, P.A.: Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)
MathSciNet MATH Google Scholar
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, Chichester (1983)
Google Scholar
Ni, W., Wang, X.: Averaging method to distributed convex optimization for continuous-time multi-agent systems. Kybernetika 52(6), 898–913 (2016)
MathSciNet MATH Google Scholar
Ni, W., Wang, X., Xiong, C.: Leader-following consensus of multiple linear systems under switching topologies: an averaging method. Kybernetika 48(6), 1194–1210 (2012)
MathSciNet MATH Google Scholar
Ni, W., Wang, X., Xiong, C.: Consensus controllability, observability and robust design for leader-following linear multi-agent systems. Automatica 49(7), 2199–2205 (2013)
MathSciNet MATH Google Scholar
Ni, W., Zhao, D., Ni, Y., Wang, X.: Stochastic averaging approach to leader-following consensus of linear multi-agent systems. J. Frankl. Inst. 353(12), 2650–2669 (2016)
MathSciNet MATH Google Scholar
Parikh, N.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)
Google Scholar
Pavliotis, G.V., Stuart, A.M.: Multiscale Methods: Averaging and Homogenization. Springer, New York (2008)
MATH Google Scholar
Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Netw. Syst. 5(3), 1245–1260 (2017)
MathSciNet MATH Google Scholar
Raginsky, M., Bouvrie, J.: Continuous-time stochastic mirror descent on a network: variance reduction, consensus, convergence. In: The 51st IEEE Conference on Decision and Control, pp. 6793–6800, (2012)
Ram, S.S., Nedic, A., Veeravalli, V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147, 516–545 (2010)
MathSciNet MATH Google Scholar
Ram, S.S., Nedic, A., Veeravalli, V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20(2), 691–717 (2010)
MathSciNet MATH Google Scholar
Tsitsiklis, J.N., Bertsekas, D.P., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)
MathSciNet MATH Google Scholar
Wachsmuth, G.: On licq and the uniqueness of lagrange multipliers. Oper. Res. Lett. 41(1), 78–80 (2013)
MathSciNet MATH Google Scholar
Wang, H., Li, Huaqing H., Zhou, B.: Distributed optimization under inequality constraints and random projections. In: Distributed Optimization, Game and Learning Algorithms, pp. 39–65. Springer, (2021)
Wang, J., Elia, N.: Mitigation of complex behavior over networked systems: analysis of spatially invariant structures. Automatica 49(6), 1626–1638 (2013)
MathSciNet MATH Google Scholar
Wolfowitz, J.: Products of indecomposable, aperiodic, stochastic matrices. Proc. Am. Math. Soc. 14(4), 733–737 (1963)
MathSciNet MATH Google Scholar
Xie, P., You, K., Tempo, R., Wu, C.: Distributed convex optimization with inequality constraints over time-varying unbalanced digraphs. IEEE Trans. Autom. Control 63(12), 4331–4337 (2018)
MathSciNet MATH Google Scholar
Xin, R., Khan, U.A.: A linear algorithm for optimization over directed graphs with geometric convergence. IEEE Control Syst. Lett. 2(3), 315–320 (2018)
MathSciNet Google Scholar
Yang, Q.: The relaxed CQ algorithm solving the split feasibility problem. Inverse Probl. 20(4), 1261–1266 (2004)
MathSciNet MATH Google Scholar
Yi, P., Hong, Y., Liu, F.: Distributed gradient algorithm for constrained optimization with application to load sharing in power systems. Syst. Control Lett. 83, 45–52 (2015)
MathSciNet MATH Google Scholar
Yuan, D., Ho, D.W.C., Hong, Y.: On convergence rate of distributed stochastic gradient algorithm for convex optimization with inequality constraints. SIAM J. Control Optim. 54(5), 2872–2892 (2016)
MathSciNet MATH Google Scholar
Zeng, X., Yi, P., Hong, Y.: Distributed continuous-time algorithm for constrained convex optimizations via nonsmooth analysis approach. IEEE Trans. Autom. Control 62(10), 5227–5233 (2017)
MathSciNet MATH Google Scholar
Zhu, M., Martnez, S.: On distributed convex optimization under inequality and equality constraints. IEEE Trans. Autom. Control 57(1), 691–719 (2011)
MathSciNet Google Scholar

Download references

Acknowledgements

The first author would like to thank Professor Zhong-Ping Jiang for his discussions on this paper when he visited New York University. This work is supported by the NSFC of China under the Grants 61663026, 61963028, 62066026, 61866023 and Jiangxi NSF Under Grant 20192BAB207025.

Author information

Authors and Affiliations

School of Sciences, Nanchang University, Nanchang, 330031, China
Wei Ni
School of Information Science and Engineering, Harbin Institute of Technology at Weihai, Weihai, 264209, China
Xiaoli Wang

Authors

Wei Ni
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Ni.

Additional information

Communicated by Xiaoqi Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Derivation of The SDE (6)

To characterize the noisy term $\omega _i:=\sum \nolimits _{j\in {\mathscr {N}}_i(t)}\sigma _{ji}\xi _{ji}(x_j-x_i)$ in Eqs. (5a)–(5b), define $\xi _i=[\xi _{1i}, \xi _{2i}, \cdots , \xi _{Ni}]^T$, $i=1, \cdots , N$. Also, for each switching mode $s\in {\mathbb {S}}$, define ${\mathcal {M}}^i_{s}=[a^{i1}_{s}\sigma _{1i}(x_1-x_i), \cdots , a^{iN}_{s}\sigma _{Ni}(x_N-x_i)]$. Then the noisy term $\omega _i$ above can be written as $\omega _i={\mathcal {M}}^i_{s} \xi _i$. Therefore, Eq. (5a) becomes

where $\xi _i dt= d w_i$ with $w_i$ an N-dimensional standard Brownian motion on the probability space $(\varOmega , {\mathcal {F}}, {\mathbb {P}}, \{{\mathcal {F}}_t\}_{t\ge 0})$. Let ${\varvec{x}}=\mathrm{col}\{x_1, \cdots , x_N\}$. Define the stacked functions $G({\varvec{x}})=\mathrm{col}\{\mathrm{col}\{g_{ij}(x_i)\}_{j=1}^{r_i}\}_{i=1}^N \in \mathbb {R}^r$, $H({\varvec{x}})=\mathrm{col}\{\mathrm{col}\{h_{ij}(x_i)\}_{j=1}^{s_i}\}_{i=1}^N \in \mathbb {R}^s$ and the stacked gradients $\nabla F({\varvec{x}})=\mathrm{col}\{\nabla f_{i}(x_i)\}_{i=1}^{N}\in \mathbb {R}^{nN}$, $\nabla G({\varvec{x}})=\mathrm{col}\big \{\mathrm{col}\{\nabla g_{ij}(x_i)\}_{j=1}^{r_i}\big \}_{i=1}^{N} \in \mathbb {R}^{rn}$, $\nabla H({\varvec{x}})=\mathrm{col}\big \{\mathrm{col}\{\nabla h_{ij}(x_i)\}_{j=1}^{s_i}\big \}_{i=1}^{N}\in \mathbb {R}^{sn}$. For each switching mode $s\in {\mathbb {S}}$, set ${\mathcal {M}}_s=\mathrm{diag}\{{\mathcal {M}}_s^i\}_{i=1}^{N}$ $\in \mathbb {R}^{nN\times N^2}$. In addition, let ${\varvec{w}}=\mathrm{col} \{w_i\}_{i=1}^{N} \in \mathbb {R}^{N^2}$ and define $r=\sum _{i=1}^{N}r_i$, $s=\sum _{i=1}^{N}s_i$. Set

with

, and

with

. Define ${\varvec{\theta }}=\mathrm{col} \{\theta _1, \cdots , \theta _N\}\in {\mathbb {R}}^{nN}$.

With these, Eq. (5) can be written in a compact form as in (6).

Appendix B: Proof of The Inequality (10)

Firstly, for $V_1$, defining ${\varvec{\chi }}= \nabla F({\varvec{x}})+\varvec{\lambda } \odot \nabla G({\varvec{x}})+\varvec{\nu } \odot \nabla H({\varvec{x}})$ and ${\varvec{\chi }}^*= \nabla F({\varvec{x}}^*)+\varvec{\lambda }^* \odot \nabla G({\varvec{x}}^*)+\varvec{\nu }^* \odot \nabla H({\varvec{x}}^*)$ and noting ${\varvec{\theta }}^*=-{\varvec{\chi }}^*$ in view of (7), the action of the infinitesimal operator $\mathcal A$ on $V_1$ can be calculated as

$$\begin{aligned} \mathcal {A}V_1&=\underbrace{({\varvec{x}}-{\varvec{x}}^*)^T[-c\varvec{{\mathcal {L}}}{\varvec{x}}-{\varvec{\theta }}-\nabla F({\varvec{x}})]+\frac{3}{2}c^2tr({\mathcal {M}}^T{\mathcal {M}})}_{A} \nonumber \\&\quad -\sum _{i=1}^{N}\sum _{j=1}^{r_i}\lambda _{ij}(x_i-x^*)^T \nabla g_{ij}(x_i)+\underbrace{[({\varvec{x}}-{\varvec{x}}^*)+({\varvec{\theta }}-{\varvec{\theta }}^*)]^T[-({\varvec{\theta }}-{\varvec{\theta }}^*)-({\varvec{\chi }}-{\varvec{\chi }}^*)]}_{B}\nonumber \\&\quad -\sum _{i=1}^{N}\sum _{j=1}^{s_i}\nu _{ij}(x_i-x^*)^T \nabla h_{ij}(x_i). \end{aligned}$$

(19)

The trace $\mathrm{tr}({\mathcal {M}}^T{\mathcal {M}})$ brings difficulty to the convergence analysis. To get rid of this difficulty, we give an estimation of $\mathrm{tr}({\mathcal {M}}^T{\mathcal {M}})$ as follows,

$$\begin{aligned} \mathrm{tr}({\mathcal {M}}^T{\mathcal {M}})&=\sum \limits _{i=1}^N \mathrm{tr}[({\mathcal {M}}^i)^T{\mathcal {M}}^i]=\sum \limits _{i=1}^N \sum \limits _{j=1}^N(a^{ij})^2(\sigma _{ji})^2(x_j-x_i)^T(x_j-x_i)\nonumber \\&\le \kappa ^2\sum \limits _{i=1}^N \sum \limits _{j=1}^Na^{ij}(x_j-x_i)^T(x_j-x_i) =\kappa ^2 {\varvec{x}}^T \varvec{{\mathcal {L}}} {\varvec{x}}. \end{aligned}$$

(20)

As for Part A in (19) , noting ${\varvec{x}}^T \varvec{{\mathcal {L}}} {\varvec{x}}= ({\varvec{x}}-{\varvec{x}}^*)^T \varvec{{\mathcal {L}}} ({\varvec{x}}-{\varvec{x}}^*)$ and (c.f. Eq. (9)), the part A can be estimated as $A\le ({\varvec{x}}-{\varvec{x}}^*)^T[-\hbar \varvec{{\mathcal {L}}}{\varvec{x}}-{\varvec{\theta }}-\nabla F({\varvec{x}})]-\hbar ({\varvec{x}}-{\varvec{x}}^*)^T \varvec{{\mathcal {L}}}({\varvec{x}}-{\varvec{x}}^*)$. Noting that the square bracket is exactly the minus gradient of $\varPsi ({\varvec{x}}, {\varvec{\theta }})$ (c.f. Eq. (9)) with respect to ${\varvec{x}}$ and denoting by $\lambda ^{{\mathcal {G}}}_2$ the smallest nonzero eigenvalue of the Laplacian ${\mathcal {L}}$, one obtains $A \le ({\varvec{x}}^*-{\varvec{x}})^T \nabla _{{\varvec{x}}} \varPsi ({\varvec{x}}, {\varvec{\theta }})-\hbar \lambda ^{{\mathcal {G}}}_2({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)\le \varPsi ({\varvec{x}}^*, {\varvec{\theta }})-\varPsi ({\varvec{x}},{\varvec{\theta }})-\hbar \lambda ^{{\mathcal {G}}}_2({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)$, where the last inequality uses the fact that $ \varPsi ({\varvec{x}}, {\varvec{\theta }})$ is convex in its first argument. As for part B in (19) , we can prove the inequality $B\le ({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{\theta }}-{\varvec{\theta }}^*)+({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)$ by rewriting it into a quadratic form in terms of ${\varvec{x}}-{\varvec{x}}^*, {\varvec{\theta }}-{\varvec{\theta }}^*, {\varvec{\chi }}-{\varvec{\chi }}^*$ and by showing the corresponding matrix to be semi-positive definite. In view of $({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{\theta }}-{\varvec{\theta }}^*)=\varPsi ({\varvec{x}}, {\varvec{\theta }})-\varPsi ({\varvec{x}}, {\varvec{\theta }}^*)$, one obtains $A+B\le \left[ \varPsi ({\varvec{x}}^*, {\varvec{\theta }})-\varPsi ({\varvec{x}}, {\varvec{\theta }}^*)\right] -(\hbar \lambda ^{{\mathcal {G}}}_2-1)({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)$. Now,

$$\begin{aligned} {\mathcal {A}}V_1&\le \varPsi ({\varvec{x}}^*, {\varvec{\theta }})- \varPsi ({\varvec{x}}, {\varvec{\theta }}^*)-(\hbar \lambda ^{{\mathcal {G}}}_2-1)({\varvec{x}}-{\varvec{x}}^*)^T({\varvec{x}}-{\varvec{x}}^*)+\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i}\lambda _{ij}g_{ij}(x^*)\nonumber \\&-\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i}\lambda _{ij}g_{ij}(x_i)+\sum \limits _{i=1}^{N}\sum \nolimits _{j=1}^{s_i}\nu _{ij}h_{ij}(x^*)-\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{s_i}\nu _{ij}h_{ij}(x_i), \end{aligned}$$

where we use the convexity of the functions $g_{ij}$ and $h_{ij}$.

Secondly, we calculate the action of the infinitesimal operator $\mathcal A$ on $V_2+V_3$. Firstly note that the function $V_3$ can be calculated by the definition of Bregman divergence as $V_3=\sum \nolimits _{i=1}^{N} \sum \nolimits _{j=1}^{r_i}(\lambda _{ij}-\lambda _{ij}^*)-\sum _{(i,j)\in \varOmega }\lambda _{ij}^*(\ln \lambda _{ij}-\ln \lambda _{ij}^*)$. Therefore,

$$\begin{aligned} {\mathcal {A}}V_2 + \mathcal {A}V_3&=\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i} \frac{\eta _{ij}\lambda _{ij}(\lambda _{ij}-\lambda _{ij}^*)}{1+\eta _{ij}\lambda _{ij}} g_{ij}(x_i)+\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i} \frac{\lambda _{ij}}{1+\eta _{ij}\lambda _{ij}} g_{ij}(x_i)\\&\quad -\sum \limits _{(i,j)\in \varOmega } \frac{\lambda _{ij}^*}{1+\eta _{ij}\lambda _{ij}} g_{ij}(x_i)=\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{r_i} (\lambda _{ij}-\lambda _{ij}^*)g_{ij}(x_i). \end{aligned}$$

Furthermore, the action of the infinitesimal operator $\mathcal A$ on $V_4$ can be easily calculated as ${\mathcal {A}}V_4=\sum \nolimits _{i=1}^{N}\sum \nolimits _{j=1}^{s_i} (v_{ij}-v_{ij}^*) h_{ij}(x_i)$. Collecting above results for $\mathcal {A}V_i, i=1,2,3,4$ and recalling the definition of $ \varPhi $ in (8), one obtains the inequality (10).

Appendix C: Proof of The Saddle Point Conditions (11)

Due to convexity and affinity, the following four results hold: $\varSigma _{i=1}^N f_i(x_i)+\hbar {\varvec{x}}{\varvec{\mathcal {L}}}{\varvec{x}}\ge \varSigma _{i=1}^N f_i(x^*)+\varSigma _{i=1}^N \nabla f_i(x^*)(x_i-x^*)$, $g_{ij}(x_i)\ge g_{ij}(x^*)+\nabla g_{ij}(x^*)(x_i-x^*)$, $h_{ij}(x_i) = h_{ij}(x^*)+ \nabla h_{ij}(x^*)(x_i-x^*)$, and ${\varvec{x}}-{\varvec{x}}^*={\varvec{x}}-{\varvec{x}}^*$. Multiplying them by $1, \lambda _{ij}^*, \nu _{ij}^*, {\varvec{\theta }}^*$, respectively (for the forth equality we use $({\varvec{x}}-{\varvec{x}}^*)^T {\varvec{\theta }}^*$ for multiplication), and adding gives $\varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)\ge \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)+[{\varvec{\theta }}^*+ \nabla F({\varvec{x}}^*) +\varvec{\lambda }^* \odot \nabla G({\varvec{x}}^*)+\varvec{\nu }^* \odot \nabla H({\varvec{x}}^*)]^T({\varvec{x}}-{\varvec{x}}^*)$. In view of Eq. (7), the terms in the square bracket sum to be zero. Therefore, $\varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*) \ge \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$.

On other hand, noting that $\lambda _{ij}g_{ij}(x_i^*)\le 0=\lambda _{ij}^*g_{ij}(x_i^*)$ and $h_{ij}(x^*)=0$, it can be directly checked that $\varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})\le \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$. This inequality, together with the inequality derived in last paragraph, gives rise to the saddle point condition (11).

Appendix D: Proof of “${\mathcal {A}}V=0 \Rightarrow ({\varvec{x}}, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})=({\varvec{x}}^, {\varvec{\theta }}^, {\varvec{\lambda }}^, {\varvec{\nu }}^)$”

Letting ${\mathcal {A}}V=0$ gives $({\varvec{x}}-{\varvec{x}}^*)^T \,({\varvec{x}}-{\varvec{x}}^*)=0$ and $\varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})= \varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$, where the former implies ${\varvec{x}}={\varvec{x}}^*$ and the latter, together with the fact that $({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$ is a saddle point of the Lagrangian $ \varPhi $, implies $ \varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})= \varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)=\varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$. Recall that ${\varvec{x}}^*={\mathbf {1}}_N \otimes x^*$ with $x^*$ being the optimal solution which satisfies $g_{ij}(x^*)\le 0$ in the KKT condition (2a). Inserting $x_i=x^*$ into (5c) yields $\dot{\lambda }_{ij}=\frac{\lambda _{ij}g_{ij}(x^*)}{1+\eta _{ij}\lambda _{ij}}$. If $g_{ij}(x^*)=0$, then $\dot{\lambda }_{ij}=0$ which implies that $\lambda _{ij}(t)$ stays positive for all $t\ge 0$ since the initial value of $\lambda _{ij}$ is chosen to be positive. If $g_{ij}(x^*)<0$, then with $g_{ij}(x^*)=-a<0$, one has $\dot{\lambda }_{ij}=\frac{-a\lambda _{ij}}{1+\eta _{ij}\lambda _{ij}}$ whose trajectory can be shown by elementary analysis as $\lambda _{ij}(t)\ge 0$ for $t\ge 0$ since the initial value of $\lambda _{ij}$ is chosen to be positive. In short, in both cases, for Eq. (5c) with $x_i=x^*$, one has that $\lambda _{ij}(t)\ge 0, \forall t\ge 0$ provided the initial value is positive, and consequently $\dot{\lambda }_{ij}(t)\le 0$ since $g_{ij}(x^*)\le 0$. Therefore, $\lambda _{ij}\ge \lambda _{ij}^*$. On the other hand, the fact $ \varPhi ({\varvec{x}}^*, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})= \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$ yields $\sum _{i=1}^{N}\sum _{j=1}^{r_i}(\lambda _{ij}-\lambda _{ij}^*)g_{ij}(x^*)$ $=0$. Since $g_{ij}(x^*)\le 0$ and $\lambda _{ij}\ge \lambda _{ij}^*$, each sum in this summation is non-positive and therefore $(\lambda _{ij}-\lambda _{ij}^*)g_{ij}(x^*)=0$, i.e., $\lambda _{ij}g_{ij}(x^*)=\lambda _{ij}^*g_{ij}(x^*)=0$. This fact implies that the right-hand side of Eq. (5c) with $x_i=x^*$ is zero and thus $ \lambda _{ij}=\lambda _{ij}^{\dag }$ for some constant $\lambda _{ij}^{\dag }\ge 0$. Thus, $\lambda _{ij}^{\dag }g_{ij}(x^*)=0$. Inserting ${\varvec{x}}^*={\mathbf {1}}_N\otimes x^*$ into Eqs. (5b) and (5d) gives that $\theta _i=\theta _i^{\dag }$ and $\nu _{ij}=\nu _{ij}^{\dag }$ for some constants $\theta _i^{\dag }$ and $\nu _{ij}^{\dag }$. Inserting $({\varvec{x}}^*, \theta _i^{\dag }, \lambda _{ij}^{\dag }, \nu _{ij}^{\dag })$ into (5a) gives $\sum _{i=1}^N\nabla f_i(x^*)+\sum _{i=1}^N\sum _{j}^{r_i}\lambda _{ij}^{\dag }\nabla g_{ij}(x^*)+\sum _{i=1}^N\sum _{j}^{s_i}\nu _{ij}^{\dag }\nabla h_{ij}(x^*)=0$. In conclusion, the KKT conditions (2) are satisfied at the point $({\varvec{x}}^*, {\varvec{\lambda }}^\dag , {\varvec{\nu }}^\dag )$. Then by uniqueness of multipliers in (4), $\lambda _{ij}^{\dag }=\lambda _{ij}^*$, $\nu _{ij}^{\dag }=\nu _{ij}^*$.

Also, by differentiating both sides of $ \varPhi ({\varvec{x}}, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)= \varPhi ({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$ with respect to ${\varvec{x}}$ and then by enforcing ${\varvec{x}}={\varvec{x}}^*$, one obtains ${\varvec{\theta }}+ \nabla F({\varvec{x}}^*) +\varvec{\lambda }^* \odot \nabla G({\varvec{x}}^*)+\varvec{\nu }^* \odot \nabla H({\varvec{x}}^*)=0$. This, combined with Eq. (7), gives ${\varvec{\theta }}={\varvec{\theta }}^*$. In conclusion, letting ${\mathcal {A}}V=0$ gives rise to $({\varvec{x}}, {\varvec{\theta }}, {\varvec{\lambda }}, {\varvec{\nu }})=({\varvec{x}}^*, {\varvec{\theta }}^*, {\varvec{\lambda }}^*, {\varvec{\nu }}^*)$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, W., Wang, X. A Multi-Scale Method for Distributed Convex Optimization with Constraints. J Optim Theory Appl 192, 379–400 (2022). https://doi.org/10.1007/s10957-021-01982-0

Download citation

Received: 02 January 2021
Accepted: 19 November 2021
Published: 06 January 2022
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10957-021-01982-0