1 Introduction

Throughout this paper, the set of nonnegative integers is denoted by \({\mathbb {N}}\), the set of real numbers by \({\mathbb {R}}\), the set of nonnegative real numbers by \({\mathbb {R}}_+\), and the set of the positive real numbers by \({\mathbb {R}}_{++}\). We use \({\mathbb {R}}^d\) to denote the d-dimensional Euclidean space with inner product \(\langle \cdot , \cdot \rangle \) and Euclidean norm \(\Vert \cdot \Vert \). For a given matrix M, \({\text {Im}}(M)\) denotes its image.

We consider the linearly constrained composite optimization problem

$$\begin{aligned} \min _{x_i \in {\mathbb {R}}^{d_i}, y\in {\mathbb {R}}^q} ~F(\textbf{x},y) {:}{=}\sum _{i=1}^m f_i(x_i)+H(y)+P(\textbf{x})-G(\textbf{x}) \quad \text {s.t.}~~ \sum _{i=1}^{m}A_ix_i + By=b, \end{aligned}$$
(1)

where \(\textbf{x}=(x_1,\dots ,x_m)\); for each \(i\in \{1,\dots ,m\}\), \(f_i: {\mathbb {R}}^{d_i} \rightarrow (-\infty ,+\infty ]\) is a proper lower semicontinuous function; \(H:{\mathbb {R}}^q \rightarrow {\mathbb {R}}\) is a differentiable (possibly nonconvex) function whose gradient is Lipschitz continuous; \(P:{\mathbb {R}}^{d_1}\times \dots \times {\mathbb {R}}^{d_m} \rightarrow {\mathbb {R}}\) is another differentiable (possibly nonconvex) function whose gradient is Lipschitz continuous; \(G:{\mathbb {R}}^{d_1}\times \dots \times {\mathbb {R}}^{d_m} \rightarrow {\mathbb {R}}\) is a continuous (possibly nonsmooth) weakly convex function, \(A_i \in {\mathbb {R}}^{p\times d_i}\), \(B\in {\mathbb {R}}^{p\times q}\), and \(b \in {\mathbb {R}}^p\). Problem (1) covers two important models in the literature. Firstly, when \(G \equiv 0\), problem (1) reduces to the form

$$\begin{aligned} \min _{x_i \in {\mathbb {R}}^{d_i}, y\in {\mathbb {R}}^q} ~\sum _{i=1}^m f_i(x_i)+H(y)+P(\textbf{x}) \quad \text {s.t.}~~ \sum _{i=1}^{m}A_ix_i + By=b, \end{aligned}$$
(2)

which is studied in [27]. This class of problems, together with its special cases considered in [11, 28], has a broad range of useful applications in image processing, matrix decomposition, statistical learning, and others [27, 29]. In [27], the author studied the multi-block alternating direction method of multipliers (ADMM) in the nonconvex nonsmooth setting, and they introduced new assumptions such as restricted prox-regularity and Lipschitz subminimization path to establish the global convergence. A special case of (2) in which \(P\equiv 0\) was considered in [30], and proximal terms and dual stepsize were incorporated into the ADMM to solve the problem. An inertial proximal splitting method was also developed in [26] to revisit that problem. Other simpler special cases of (2) can be addressed by variants of the three-block ADMM [14, 15, 28], or two-block ADMM [11, 13, 25], and their convergence are well studied in the literature. Secondly, when \(P \equiv 0\), \(m =1\), \(x_1 =x \in {\mathbb {R}}^d\), and \(A_1 =A \in {\mathbb {R}}^{p\times d}\), problem (1) reduces to

$$\begin{aligned} \min _{x \in {\mathbb {R}}^d,y \in {\mathbb {R}}^q}~ f(x)-G(x)+H(y) \quad \text {s.t.~} Ax+By=c. \end{aligned}$$
(3)

This is known as the linearly constrained difference-of-convex problem [23, 24], in which f, H, and G are all assumed to be convex. Problem (3) arises from the use of the difference-of-convex regularization terms, which has shown very promising performance [10, 31]. One practical engineering problem that can be solved via problem (3) is the total variation image restoration problem [20, 23, 24], which is one of the most fundamental issues in imaging science. The authors of [23] developed a variant of the ADMM which incorporates the Bregman distance in updating steps for solving problem (3) in the case when f, G, and H are convex functions, with H being differentiable with a Lipschitz continuous gradient. A hybrid Bregman ADMM was then proposed in [24] to revisit problem (3) with the same assumptions on f, G, and H. This version introduces an additional step to evaluate the convex conjugate of G, allowing the flexibility in choosing the subgradient step or proximal step to update the solution by choosing a control parameter to either activate or deactivate the additional step. By this strategy, this algorithm can include the one in [23] as a special case. However, further research on more general form of problem (3) is still not available in the literature, to the best of the authors’ knowledge.

Inspired by the success of those previous studies, we aim to study a class of problems that can cover both (2) and (3). We develop a splitting algorithm based on ADMM to solve problem (1), which not only includes problem (2) as a special case, but also extends problem (3) into a multi-block, nonsmooth, and nonconvex setting. Bregman distances are also incorporated into the algorithm, making it more general and allowing the flexibility in choosing the appropriate proximal terms to solve the subproblems efficiently. The sequence generated by our algorithm is bounded and each of its cluster points is a stationary point of the problem. We also prove the convergence of the full sequence under the Kurdyka–Łojasiewicz property of a suitable function.

Compared to the two closely related works in [23, 24], the convexity of \(f_i\) is not required, H can be nonconvex, and G can also be weakly convex. In other words, the convexity requirement in our work is weaker, allowing a more general class of structured optimization problems. For the convergence of the full sequence, we do not require the restricted prox-regularity assumption on \(f_i\) which was used in [27]. Also, although we need the assumption that B has full column rank, we do not require any assumptions on the column rank of any matrices \(A_i\).

In the following examples, we present two practical problems which motivate our research.

Example 1.1

(Robust principal component analysis (RPCA) [14]) The RPCA problem is formulated as

$$\begin{aligned} \min _{L,S \in {\mathbb {R}}^{m\times d}} \Vert L\Vert _{*} + \tau \Vert S\Vert _1 + \frac{\gamma }{2}\Vert L+S-M\Vert _F^2, \end{aligned}$$

where M is a given \(m\times d\) observation matrix, \(\Vert \cdot \Vert _{*}\) is the nuclear norm, \(\Vert \cdot \Vert _1\) is the component-wise \(L_1\) norm (defined as the sum of absolute values of all entries) which controls the sparsity of the solution, and \(\Vert \cdot \Vert _F\) is the standard Frobenius norm which controls the noise level, where \(\tau \) and \(\gamma \) are positive penalty constants. By introducing a new variable \(T=L+S\), the problem is rewritten as

$$\begin{aligned} \min _{L,S,T \in {\mathbb {R}}^{m\times d}} \Vert L\Vert _{*} + \tau \Vert S\Vert _1 + \frac{\gamma }{2}\Vert T-M\Vert _F^2 \quad \text {s.t.~} T=L+S. \end{aligned}$$

This problem can be addressed by using the three-block ADMM as shown in [14]. Recently, \(L_1-L_2\) regularization has been shown to have better performance than \(L_1\) alone (see [17] and references therein). That motivates us to investigate the effects of \(\Vert \cdot \Vert _1-\Vert \cdot \Vert \) on this problem, where \(\Vert \cdot \Vert \) is the spectral norm. By using this modified regularization term, the optimization problem fits into the structure of (1). We will later solve this problem using our proposed algorithm in Sect. 4.

Example 1.2

(Direct current optimal power flow (DC-OPF) [2]) Optimal power flow is an important problem in power system engineering [1]. The problem’s simple formulation is given as follows. Let \(x_i \in {\mathbb {R}}^{d_i}\) be the variable associated with the \(i^{th}\) node in the network. The mathematical model is then given by

$$\begin{aligned} \min _{x_i \in {\mathbb {R}}^{d_i}}~~ \sum _{i=1}^{m} \left( \frac{1}{2}x_i^\top Q_i x_i + q_i^\top x_i\right) \quad \text {s.t.~} \sum _{i=1}^{m}A_ix_i \le b, \end{aligned}$$

where \(Q_i\), \(A_i\), \(q_i\), and b are coefficient matrices and vectors. The authors of [2] have successfully used multi-block ADMM to solve this problem with promising results. As modern electrical grids require the integration of renewable energy resources such as photovoltaic (PV) systems, it is necessary to find their optimal locations in the network while satisfying the power flow. This leads to the introduction of binary variables into the OPF formulation. Binary relaxation and reformulation into difference-of-convex program have been successfully used to address this problem [21]. However, the algorithm proposed in [21] has to rely on solvers for the subproblems. Therefore, it is tempting to investigate whether using ADMM-based algorithms can further decompose the problem and reduce the computational complexity. In Sect. 4 we will also give the detailed formulation to show how it fits into the structure of (1), and address it with our algorithm.

The rest of the paper is organized as follows. In Sect. 2, the preliminary materials used in this work are presented. In Sect. 3, we introduce our algorithm, and prove the subsequential and full sequential convergence of the proposed algorithm under suitable assumptions. Section 4 presents the numerical results of the algorithm. Finally, the conclusions are given in Sect. 5.

2 Preliminaries

Let \(f:{\mathbb {R}}^d\rightarrow \left( -\infty ,+\infty \right] \). The domain of f is \({\text {dom}}f {:}{=}\{x\in {\mathbb {R}}^d: f(x) <+\infty \}\) and the epigraph of f is \({\text {epi}}f {:}{=} \{(x,\xi )\in {\mathbb {R}}^d\times {\mathbb {R}}: f(x)\le \xi \}\). The function f is said to be proper if \({\text {dom}}f \ne \varnothing \) and it never takes the value \(-\infty \), lower semicontinuous if its epigraph is closed, and convex if its epigraph is convex. The function f is said to be \(\alpha \)-strongly convex (\(\alpha \in {\mathbb {R}}_+\)) if \(f-\frac{\alpha }{2}\Vert \cdot \Vert ^2\) is convex, and \(\beta \)-weakly convex (\(\beta \in {\mathbb {R}}_+\)) if \(f+\frac{\beta }{2}\Vert \cdot \Vert ^2\) is convex. We say that f is coercive if \(f(x)\rightarrow +\infty \) as \(\Vert x\Vert \rightarrow +\infty \).

Let \(x\in {\mathbb {R}}^d\) with \(|f(x)| <+\infty \). The regular (or Fréchet) subdifferential of f at x is defined by

$$\begin{aligned} \widehat{\partial } f(x) {:}{=}\left\{ x^*\in {\mathbb {R}}^d:\; \liminf _{y\rightarrow x}\frac{f(y)-f(x)-\langle x^*,y-x\rangle }{\Vert y-x\Vert }\ge 0\right\} \end{aligned}$$

and the limiting (or Mordukhovich) subdifferential of f at x is defined by

$$\begin{aligned} \partial _L f(x) {:}{=}\left\{ x^*\in {\mathbb {R}}^d:\; \exists x_n{\mathop {\rightarrow }\limits ^{f}}x,\; x_n^*\rightarrow x^* \text {~~with}~~ x_n^*\in \widehat{\partial } f(x_n)\right\} , \end{aligned}$$

where the notation \(y{\mathop {\rightarrow }\limits ^{f}}x\) means \(y\rightarrow x\) with \(f(y)\rightarrow f(x)\). In the case where \(|f(x) |=+\infty \), both regular subdifferential and limiting subdifferential of f at x are defined to be the empty set. The domain of \(\partial _L f\) is given by \({\text {dom}}\partial _L f {:}{=}\{x\in {\mathbb {R}}^d: \partial _L f(x) \ne \varnothing \}\). We now collect some important properties of the limiting subdifferential.

Lemma 2.1

(Calculus rules) Let \(f, g:{\mathbb {R}}^d \rightarrow (-\infty , +\infty ]\) be proper lower semicontinuous functions, and let \(x \in {\mathbb {R}}^d\).

  1. (i)

    (Sum rule) Suppose that f is finite at x and g is locally Lipschitz around \(\overline{x}\). Then \(\partial _L (f + g)(\overline{x}) \subseteq \partial _L f(\overline{x})+\partial _L g(\overline{x})\). Moreover, if g is strictly differentiable at \(\overline{x}\), then \(\partial _L(f + g)(\overline{x}) = \partial _Lf(\overline{x}) + \nabla g(\overline{x})\).

  2. (ii)

    (Separable sum rule) If \(f(\textbf{x})=\sum _{i=1}^{m}f_i(x_i)\) with \(\textbf{x}=(x_1,\dots ,x_m)\), then \(\partial _L f(\textbf{x}) = \partial _L f_1(x_1) \times \dots \times \partial _L f_m(x_m)\).

Proof

(i) follows from [18, Proposition 1.107(ii) and Theorem 3.36], while (ii) follows from [22, Proposition 10.5]. \(\square \)

Lemma 2.2

(Upper semicontinuity of subdifferential) Let \(f:{\mathbb {R}}^d \rightarrow [-\infty , +\infty ]\) be Lipschitz continuous around \(x \in {\mathbb {R}}^d\) with \(|f(x)| <+\infty \), and consider sequences \((x_n)_{n\in {\mathbb {N}}}\) and \((x_n^*)_{n\in {\mathbb {N}}}\) in \({\mathbb {R}}^d\) such that \(x_n \rightarrow x\) and, for all \({n\in {\mathbb {N}}}\), \(x_n^* \in \partial _L f(x_n)\). Then \((x_n^*)_{n\in {\mathbb {N}}}\) is bounded and its cluster points are contained in \(\partial _L f(x)\).

Proof

This follows from [21, Lemma 2.2]. \(\square \)

We recall the concept of the well-known Bregman distance in a real Hilbert space \({\mathcal {H}}\), which was introduced in [7]. Given a differentiable convex function \(\phi : {\mathcal {H}} \rightarrow {\mathbb {R}}\), the Bregman distance associated with \(\phi \) is defined as

$$\begin{aligned} \forall u, v\in {\mathcal {H}},\quad D_{\phi }(u,v)=\phi (u)-\phi (v)-\langle \nabla \phi (v),u-v\rangle , \end{aligned}$$

When \(\phi (x)=\Vert x\Vert ^2\), then \(D_{\phi }(u,v)=\Vert u-v\Vert ^2\). When \(\phi (x)=x^\top M x\), where M is a positive semidefinite matrix, then \(D_{\phi }(u,v) = \Vert u-v\Vert ^2_M {:}{=}(u-v)^\top M (u-v)\). When \({\mathcal {H}}={\mathbb {R}}^{p\times d}\) and \(\phi (X)=\Vert X\Vert _F^2\), then \(D_{\phi }(U,V) =\Vert U-V\Vert _F^2\) [9]. Some useful properties of the Bregman distance are listed in the following proposition.

Proposition 2.1

Let \(\phi : {\mathcal {H}} \rightarrow {\mathbb {R}}\) be a differentiable convex function and \(D_{\phi }\) is the Bregman distance associated with \(\phi \). Then the following hold:

  1. (i)

    For all \(u, v \in {\mathcal {H}}\), \(D_{\phi }(u, v)\ge 0\), and \(D_{\phi }(u, v)=0\) if and only if \(u = v\).

  2. (ii)

    For each \(v \in {\mathcal {H}}\), \(D_{\phi }(\cdot , v)\) is convex.

  3. (iii)

    If \(\phi \) is \(\alpha \)-strongly convex, then, for all \(u, v \in {\mathcal {H}}\), \(D_{\phi }(u, v) \ge \frac{\alpha }{2} \Vert u-v\Vert ^2\).

  4. (iv)

    If \(\nabla \phi \) is \(\ell _{\phi }\)-Lipschitz continuous, then, for all \(u, v \in {\mathcal {H}}\), \(D_{\phi }(u,v)\le \frac{\ell _{\phi }}{2}\Vert u-v\Vert ^2\).

Proof

(i) is given in [7, Section 1]. (ii), (iii), and (iv) can be easily verified by using the definitions. \(\square \)

We end this section by the following lemma which will be instrumental in proving the convergence results in the next section.

Lemma 2.3

Let \(h:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) be a differentiable function with \(\ell \)-Lipschitz continuous gradient. Then the following hold:

  1. (i)

    For all \(x, y\in {\mathbb {R}}^d\), \(|h(y) -h(x) -\langle \nabla h(x), y -x\rangle |\le \frac{\ell }{2}\Vert y -x\Vert ^2\).

  2. (ii)

    If h is bounded below and \(\ell >0\), then

    $$\begin{aligned} \inf _{x\in {\mathbb {R}}^d} \left( h(x) -\frac{1}{2\ell }\Vert \nabla h(x)\Vert ^2\right) >-\infty . \end{aligned}$$

Proof

(i): This follows from [19, Lemma 1.2.3].

(ii): Let \(x\in {\mathbb {R}}^d\). Following the argument of [19, Equation (1.2.19)], we apply (i) with \(y =x -\frac{1}{\ell }\nabla h(x)\) to obtain that

$$\begin{aligned} h\left( x -\frac{1}{\ell }\nabla h(x)\right)&\le h(x) +\langle \nabla h(x), y -x\rangle +\frac{\ell }{2}\Vert y -x\Vert ^2 \\ &= h(x) -\frac{1}{\ell }\Vert \nabla h(x)\Vert ^2 +\frac{1}{2\ell }\Vert \nabla h(x)\Vert ^2 \\ &= h(x) -\frac{1}{2\ell }\Vert \nabla h(x)\Vert ^2, \end{aligned}$$

which together with the boundedness of h implies the conclusion. \(\square \)

3 Proposed Algorithm and Convergence Analysis

We first fix some notations which are used throughout the paper from now on. Let \(\textbf{A}=[A_1|A_2|\dots |A_m]\) and let \(\textbf{I}_{q\times q}\) be the \(q\times q\) identity matrix. Then \({\textbf{A}}{\textbf{x}}=\sum _{i=1}^{m}A_ix_i\). We also denote by \(\partial _L^{x}\) the limiting subdifferential with respect to the x-variable, and by \(\nabla _i f\) the \(i^{th}\) block of the gradient vector \(\nabla f\). Given a matrix M, \(\lambda _{\min }(M)\) denotes its smallest eigenvalue while \(\lambda _{\max }(M)\) denotes its largest eigenvalue. The following assumptions are used in our convergence analysis

Assumption 3.1

(Standing assumptions)  

  1. (i)

    \(\nabla H\) is \(\ell _H\)-Lipschitz continuous, \(\nabla P\) is \(\ell _P\)-Lipschitz continuous, and G is \(\beta \)-weakly convex.

  2. (ii)

    \({\text {Im}}(\textbf{A}) \bigcup \{b \}\subseteq {\text {Im}}(B)\).

  3. (iii)

    \(\lambda {:}{=}\lambda _{\min }(B^\top B) >0\) (equivalently, B has full column rank).

We note from [27, Section 4.1] that Assumption 3.1(ii) is crucial for the convergence of ADMM methods in nonconvex settings and generally cannot be completely removed. For this assumption to hold, a sufficient condition is that B has full row rank.

Recall that the augmented Lagrangian of problem (1) is given by

$$\begin{aligned} {\mathcal {L}}_\rho (\textbf{x},y,z)&=\sum _{i=1}^m f_i(x_i)+H(y)+P(\textbf{x})-G(\textbf{x}) + \langle z, {\textbf{A}}{\textbf{x}}+By-b \rangle \\&\quad +\frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}+By-b\Vert ^2, \end{aligned}$$

where \(z\in {\mathbb {R}}^p\) and \(\rho \in {\mathbb {R}}_{++}\). Now, we present our splitting algorithm with guaranteed global convergence to a stationary point \((\overline{\textbf{x}},\overline{y},\overline{z})\) of \({\mathcal {L}}_\rho \), i.e., \(0 \in \partial _L {\mathcal {L}}_\rho (\overline{\textbf{x}},\overline{y},\overline{z})\), or equivalently,

$$\begin{aligned} \begin{aligned}&0 \in \partial _L\left( \sum _{i=1}^{m}f_i -G\right) (\overline{\textbf{x}}) +\nabla P(\overline{\textbf{x}}) +\textbf{A}^\top \overline{z},\\ &0 =\nabla H(\overline{y}) +B^\top \overline{z}, \\&0 =\mathbf {A\overline{x}} +B\overline{y} -b. \end{aligned} \end{aligned}$$
(4)
figure a

Remark 3.1

(Discussion of the algorithm structure) Some comments on Algorithm 3.1 are in order.

  1. (i)

    For each \(n\in {\mathbb {N}}\) and \(g_n\in \partial _L G(\textbf{x}_n)\), we define the linearized augmented Lagrangian as

    $$\begin{aligned} {\mathcal {L}}_{\rho ,n}(\textbf{x},y,z)&=\sum _{i=1}^m f_i(x_i)+H(y)+\langle \nabla P(\textbf{x}_n)-g_{n}, \textbf{x} \rangle + \langle z, {\textbf{A}}{\textbf{x}}+By-b \rangle \\&\quad +\frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}+By-b\Vert ^2. \end{aligned}$$

    Then the updating scheme given in Step 2 of Algorithm 3.1 can be written as

    $$\begin{aligned} \begin{array}{ll} x_{i,n+1}\!\!\! & =\mathop {{\text {argmin}}}\limits _{x_i \in {\mathbb {R}}^{d_i}}\left( {\mathcal {L}}_{\rho ,n}(\textbf{u}_{i,n+1}(x_i),y_n,z_n) + \mu _n D_{\phi _i}(x_i,x_{i,n})\right) , \\ y_{n+1} & =\mathop {{\text {argmin}}}\limits _{y\in {\mathbb {R}}^q} \left( {\mathcal {L}}_{\rho ,n}(\textbf{x}_{n+1},y,z_n) +\nu _n D_{\psi _i}(y,y_{n})\right) ,\\ z_{n+1} & =z_n+\rho ({\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b). \end{array} \end{aligned}$$
  2. (ii)

    In the case that \(P \equiv 0\), \(m =1\), \(x_1 =x \in {\mathbb {R}}^d\), \(A_1 =A \in {\mathbb {R}}^{p\times d}\), and \(\mu _n =\nu _n =1\) for all \(n\in {\mathbb {N}}\), the updates in Step 2 of Algorithm 3.1 becomes

    $$\begin{aligned} \begin{array}{ll} & x_{n+1} \in \mathop {{\text {argmin}}}\limits _{x \in {\mathbb {R}}^{d}} \left( f(x) -\langle g_n, x \rangle + \langle z_n, Ax\rangle +\frac{\rho }{2}\Vert Ax+By_n-b\Vert ^2 + D_{\phi }(x,x_n)\right) ,\\ & y_{n+1} \in \mathop {{\text {argmin}}}\limits _{y \in {\mathbb {R}}^q} \left( H(y) + \langle z_n, By\rangle +\frac{\rho }{2}\Vert Ax_{n+1} + By-b \Vert ^2 + D_{\psi }(y,y_n) \right) ,\\ & z_{n+1}=z_n+\rho (Ax_{n+1}+By_{n+1}-b). \end{array} \end{aligned}$$

    In turn, Algorithm 3.1 reduces to the one introduced in [23], in which f, H, G are all convex, and both \(\phi \) and \(\psi \) are also assumed to be strongly convex functions and have Lipschitz continuous gradients.

  3. (iii)

    It is also worth noting that since the problems considered in [11, 13, 15, 25, 26, 28, 30] are all special cases of (2), they can also be solved by our proposed algorithm.

  4. (iv)

    The performance of the ADMM heavily relies on whether the \(x_i-\)updates and the \(y-\)update can be efficiently solved or not. The ideal situation is that those updates can be written into tractable forms, such as quadratic forms or closed-form proximal operators (see Sect. 4 for two examples). In such cases, the ADMM can converge faster than some algorithms which need external solvers. For further discussion on the various closed-form proximal operators, we refer the readers to [21, Remark 3.1]. Incorporating Bregman distance into ADMM also has many benefits in terms of flexibility and performance, as shown in [13, 23,24,25], when appropriate choices of Bregman distance are used.

The following lemmas will be useful for our convergence analysis.

Lemma 3.1

Let L be a symmetric matrix in \({\mathbb {R}}^{d\times d}\) and let M be a matrix in \({\mathbb {R}}^{p\times d}\). Then the following hold:

  1. (i)

    For all \(x\in {\mathbb {R}}^d\), \(\sqrt{\lambda _{\min }(M^\top M)}\Vert x\Vert \le \Vert Mx\Vert \).

  2. (ii)

    For all \(x\in {\mathbb {R}}^d\), \(\gamma (x^\top Lx) \le \Vert Lx\Vert ^2\), where \(\gamma =\min \{|\lambda |: \lambda \) is a nonzero eigenvalue of \(L\}\). Consequently, for all \(z\in {\text {Im}}(M)\), \(\sqrt{\lambda _*(M^\top M)}\Vert z\Vert \le \Vert M^\top z\Vert \), where \(\lambda _*(M^\top M)\) is the smallest strictly positive eigenvalue of \(M^\top M\).

Proof

(i): This follows from the fact that \(\Vert Mx\Vert ^2 =x^\top (M^\top M)x\).

(ii): The first assertion follows from [6, Lemma 4.5] and its proof. Now, let \(z\in {\text {Im}}(M)\). Then there exists \(x\in {\mathbb {R}}^d\) such that \(z =Mx\). Since \(M^\top M\) is a symmetric matrix in \({\mathbb {R}}^{d\times d}\) with all eigenvalues being nonnegative, we have from the first assertion that

$$\begin{aligned} \lambda _*(M^\top M)\Vert z\Vert ^2 =\lambda _*(M^\top M)(x^\top M^\top Mx) \le \Vert M^\top Mx\Vert ^2 =\Vert M^\top z\Vert ^2, \end{aligned}$$

which completes the proof. \(\square \)

We move on to prove the main results of the paper, starting with the guaranteed subsequential convergence.

Theorem 3.1

(Subsequential convergence) Suppose that Assumption 3.1 holds. Let \((\textbf{x}_n,y_n,z_n)_{n\in {\mathbb {N}}}\) be the sequence generated by Algorithm 3.1 and, for each \(n\in {\mathbb {N}}^*\), define

$$\begin{aligned} L_n ={\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}) +c\Vert y_n-y_{n-1}\Vert ^2, \text {~where~} c=\frac{(\ell _H+2\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda \rho }. \end{aligned}$$

Then the following hold:

  1. (i)

    For all \(n\in {\mathbb {N}}^*\),

    $$\begin{aligned} L_{n+1} +\delta _{\textbf{x}}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 +\delta _y\Vert y_{n+1}-y_n\Vert ^2 \le L_n, \end{aligned}$$
    (5)

    where \(\delta _{\textbf{x}} =\frac{\mu \alpha -\ell _P -\beta }{2} > 0\) and \(\delta _y =\frac{\lambda \rho }{2} -\frac{(\ell _H+2\nu \ell _{\psi })^2}{\lambda \rho } -\frac{\ell _H}{2} >0\).

  2. (ii)

    Suppose that \(z_0\in {\text {Im}}(B)\), that \(\sum _{i=1}^m f_i(x_i)+P(\textbf{x})-G(\textbf{x})\) is coercive, and that H is bounded below. Then the sequence \((\textbf{x}_n,{y}_n,{z}_n)_{n\in {\mathbb {N}}}\) is bounded and \(\textbf{x}_{n+1}-\textbf{x}_n\rightarrow 0\), \(y_{n+1}-y_n\rightarrow 0\), \(z_{n+1}-z_n\rightarrow 0\), and \({\textbf{A}}{\textbf{x}}_n +By_n \rightarrow b\) as \(n\rightarrow +\infty \). Furthermore, for every cluster point \((\overline{\textbf{x}},\overline{y},\overline{z})\) of \((\textbf{x}_n,{y}_n,{z}_n)_{n\in {\mathbb {N}}}\), it holds that

    $$\begin{aligned} \textbf{A}\overline{\textbf{x}} +b\overline{y}&=b \text {~~and}~~\\ \lim _{n \rightarrow +\infty } F(\textbf{x}_{n },y_{n})&=\lim _{n \rightarrow +\infty }{\mathcal {L}}_\rho (\textbf{x}_{n },y_{n},z_{n}) ={\mathcal {L}}_\rho (\overline{\textbf{x}},\overline{y},\overline{z})\\&=F(\overline{\textbf{x}},\overline{y}), \end{aligned}$$

    and that if G is strictly differentiable and, for each \(i \in \{1,\dots ,m\}\), \(\nabla \phi _i\) is \(\ell _{\phi }\)-Lipschitz continuous, then \((\overline{\textbf{x}},\overline{y},\overline{z})\) is a stationary point of \({\mathcal {L}}_\rho \).

Proof

(i): Let \(n\in {\mathbb {N}}\). The update of \(z_{n+1}\) yields

$$\begin{aligned} z_{n+1} -z_n =\rho ({\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b). \end{aligned}$$
(6)

By the optimality condition for the update of \(y_{n+1}\) in Step 2 of Algorithm 3.1,

$$\begin{aligned} 0&= \nabla H(y_{n+1}) + B^\top z_n +\rho B^\top ({\textbf{A}}{\textbf{x}}_{n+1} +By_{n+1} -b) +\nu _n(\nabla \psi (y_{n+1}) - \nabla \psi (y_n)), \end{aligned}$$

which combined with (6) yields

$$\begin{aligned} B^\top z_{n+1} =-\nabla H(y_{n+1}) -\nu _n(\nabla \psi (y_{n+1}) -\nabla \psi (y_n)). \end{aligned}$$
(7)

From the definition of the augmented Lagrangian, we have that

$$\begin{aligned} {\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_{n+1}) -{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_{n})&= \langle z_{n+1}-z_n, {\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b \rangle \nonumber \\&= \frac{1}{\rho }\Vert z_{n+1}-z_n\Vert ^2 \end{aligned}$$
(8)

and that

$$\begin{aligned} {\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_n) -{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_n,z_n)&= H(y_{n+1}) - H(y_n) + \langle z_n, B(y_{n+1}-y_n) \rangle \\ &\quad + \frac{\rho }{2}(\Vert {\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b\Vert ^2\\ &\quad - \Vert {\textbf{A}}{\textbf{x}}_{n+1}+By_{n}-b\Vert ^2). \end{aligned}$$

Writing \({\textbf{A}}{\textbf{x}}_{n+1}+By_{n}-b =({\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b) -(By_{n+1}- By_n)\), we obtain that

$$\begin{aligned}&\Vert {\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b\Vert ^2 - \Vert {\textbf{A}}{\textbf{x}}_{n+1}+By_{n}-b\Vert ^2 \\ &= -\Vert B(y_{n+1}-y_n)\Vert ^2 +2\langle {\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b, B(y_{n+1}-y_n)\rangle \\&= -\Vert B(y_{n+1}-y_n)\Vert ^2 +\frac{2}{\rho }\langle z_{n+1}-z_{n}, B(y_{n+1}-y_n)\rangle , \end{aligned}$$

and so

$$\begin{aligned}&{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_n) -{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_n,z_n) = H(y_{n+1}) - H(y_n)\\&\quad +\langle z_{n+1}, B(y_{n+1}-y_n) \rangle -\frac{\rho }{2}\Vert B(y_{n+1}-y_n)\Vert ^2. \end{aligned}$$

In view of (7),

$$\begin{aligned} \langle z_{n+1}, B(y_{n+1}-y_n) \rangle&=\langle B^\top z_{n+1}, y_{n+1}-y_n \rangle \\ &=-\langle \nabla H(y_{n+1}), y_{n+1}-y_n \rangle -\nu _n\langle \nabla \psi (y_{n+1}) \\&\qquad - \nabla \psi (y_n), y_{n+1}-y_n \rangle \\ &\le -H(y_{n+1}) +H(y_n) +\frac{\ell _H}{2}\Vert y_{n+1}-y_n\Vert ^2, \end{aligned}$$

where the last inequality is from the \(\ell _H\)-Lipschitz continuity of \(\nabla H\) and Lemma 2.3(i), and from the monotonicity of \(\nabla \psi \). We therefore obtain that

$$\begin{aligned} {\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_n) -{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_n,z_n) \le \frac{\ell _H}{2}\Vert y_{n+1}-y_n\Vert ^2 -\frac{\rho }{2}\Vert B(y_{n+1}-y_n)\Vert ^2. \end{aligned}$$
(9)

Next, by the update of \(x_{i,n+1}\) in Step 2 of Algorithm 3.1, for all \(i \in \{1,\dots ,m\}\) and all \(x_i \in {\mathbb {R}}^{d_i}\),

$$\begin{aligned}&f_i(x_{i,n+1}) +\langle \nabla _i P(\textbf{x}_n) -g_{i,n}, x_{i,n+1} \rangle +\langle z_n, A_{i}x_{i,n+1} \rangle +\frac{\rho }{2}\Vert \textbf{A}\textbf{u}_{i,n+1}(x_{i,n+1})\nonumber \\&\qquad +By_n -b\Vert ^2 +\mu _n D_{\phi _i}(x_{i,n+1},x_{i,n}) \nonumber \\ &\quad \le f_i(x_i) +\langle \nabla _i P(\textbf{x}_n) -g_{i,n}, x_i \rangle +\langle z_n, A_{i}x_i\rangle \nonumber \\&\qquad +\frac{\rho }{2}\Vert \textbf{A}\textbf{u}_{i,n+1}(x_i) +By_n -b\Vert ^2 +\mu _n D_{\phi _i}(x_i,x_{i,n}). \end{aligned}$$
(10)

Letting \(x_i =x_{i,n}\), using Proposition 2.1(i) &(iii), and summing up over \(i \in \{1,\dots ,m\}\), we derive that

$$\begin{aligned}&\sum _{i=1}^m f_i(x_{i,n+1}) +\langle \nabla P(\textbf{x}_n) -g_n, \textbf{x}_{n+1} \rangle +\langle z_n, {\textbf{A}}{\textbf{x}}_{n+1} \rangle \nonumber \\&\quad +\frac{\rho }{2}\Vert \textbf{A}\textbf{x}_{n+1} +By_n -b\Vert ^2 +\frac{\mu \alpha }{2}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 \nonumber \\&\le \sum _{i=1}^m f_i(x_{i,n}) +\langle \nabla P(\textbf{x}_n) -g_n, \textbf{x}_{n} \rangle +\langle z_n, {\textbf{A}}{\textbf{x}}_{n} \rangle +\frac{\rho }{2}\Vert \textbf{A}\textbf{x}_n +By_n -b\Vert ^2. \end{aligned}$$
(11)

On the other hand,

$$\begin{aligned}&{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n},z_n) -{\mathcal {L}}_\rho (\textbf{x}_{n},y_n,z_n) \nonumber \\ &= \sum _{i=1}^m f_i(x_{i,n+1}) -\sum _{i=1}^m f_i(x_{i,n}) +P(\textbf{x}_{n+1}) -P(\textbf{x}_{n}) -G(\textbf{x}_{n+1}) +G(\textbf{x}_{n}) \nonumber \\&\quad +\langle z_n, {\textbf{A}}{\textbf{x}}_{n+1}-{\textbf{A}}{\textbf{x}}_{n}\rangle +\frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}_{n+1}+By_{n}-b\Vert ^2 -\frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}_{n}+By_{n}-b\Vert ^2 \nonumber \\&\le \sum _{i=1}^m f_i(x_{i,n+1}) -\sum _{i=1}^m f_i(x_{i,n}) +\langle \nabla P(\textbf{x}_n), \textbf{x}_{n+1}-\textbf{x}_n \rangle \nonumber \\&\quad + \frac{\ell _P}{2}\Vert \textbf{x}_{n+1}-\textbf{x}_n\Vert ^2 - \langle g_n, \textbf{x}_{n+1} -\textbf{x}_{n} \rangle \nonumber \\&\quad + \frac{\beta }{2}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 +\langle z_n, {\textbf{A}}{\textbf{x}}_{n+1}-{\textbf{A}}{\textbf{x}}_{n}\rangle + \frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}_{n+1}\nonumber \\&\quad +By_{n}-b\Vert ^2 - \frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}_{n}+By_{n}-b\Vert ^2 \nonumber \\&\le -\frac{\mu \alpha -\ell _P -\beta }{2}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 =-\delta _{\textbf{x}}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2, \end{aligned}$$
(12)

where the first inequality is obtained by using Lemma 2.3(i) on the \(\ell _P\)-Lipschitz continuity of \(\nabla P\) and [5, Lemma 4.1] on the \(\beta \)-weak convexity of G, while the last inequality follows from (11). Summing up three relations (8), (9), and (12) yields

$$\begin{aligned}&{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_{n+1}) - {\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}) \nonumber \\ &\le -\delta _{\textbf{x}}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 +\frac{\ell _H}{2}\Vert y_{n+1}-y_n\Vert ^2 -\frac{\rho }{2}\Vert B(y_{n+1}-y_n)\Vert ^2 +\frac{1}{\rho }\Vert z_{n+1}-z_n\Vert ^2. \end{aligned}$$
(13)

Now, since \({\text {Im}}(\textbf{A}) \cup \{b\} \subseteq {\text {Im}}(B)\), it follows from (6) that

$$\begin{aligned} z_{n+1}-z_n \in {\text {Im}}(B) \end{aligned}$$
(14)

which, by Lemma 3.1(ii), yields

$$\begin{aligned} \Vert z_{n+1} -z_n\Vert \le \frac{1}{\sqrt{\lambda }}\Vert B^\top z_{n+1} -B^\top z_n\Vert . \end{aligned}$$

Let \(n\in {\mathbb {N}}^*\). We derive from (7) and the Lipschitz continuity of \(\nabla H\) and \(\nabla \psi \) that

$$\begin{aligned}&\Vert B^\top z_{n+1} -B^\top z_n\Vert \nonumber \\&\quad \le \Vert \nabla H(y_{n+1}) -\nabla H(y_n)\Vert +\nu _n\Vert \nabla \psi (y_{n+1})\nonumber \\&\qquad -\nabla \psi (y_n)\Vert +\nu _{n-1}\Vert \nabla \psi (y_n) - \nabla \psi (y_{n-1})\Vert \nonumber \\&\quad \le (\ell _H +\nu \ell _{\psi })\Vert y_{n+1} -y_n\Vert +\nu \ell _{\psi }\Vert y_n -y_{n-1}\Vert , \end{aligned}$$
(15)

and thus,

$$\begin{aligned}&\Vert z_{n+1} -z_n\Vert \le \frac{1}{\sqrt{\lambda }}\Vert B^\top z_{n+1} -B^\top z_n\Vert \nonumber \\&\quad \le \frac{1}{\sqrt{\lambda }}((\ell _H +\nu \ell _{\psi })\Vert y_{n+1} -y_n\Vert +\nu \ell _{\psi }\Vert y_n -y_{n-1}\Vert ). \end{aligned}$$
(16)

Applying Cauchy–Schwarz inequality to two vectors \(u =(\sqrt{\ell _H+\nu \ell _{\psi }},\sqrt{\nu \ell _{\psi }})\) and \(v =(\sqrt{\ell _H+\nu \ell _{\psi }}\Vert y_{n+1} -y_n\Vert ,\sqrt{\nu \ell _{\psi }}\Vert y_n -y_{n-1}\Vert )\), we obtain that

$$\begin{aligned} \Vert z_{n+1} -z_n\Vert ^2 \le \frac{\ell _H +2\nu \ell _{\psi }}{\lambda }((\ell _H +\nu \ell _{\psi })\Vert y_{n+1} -y_n\Vert ^2 +\nu \ell _{\psi }\Vert y_n -y_{n-1}\Vert ^2). \end{aligned}$$

By combining this with (13) and using Lemma 3.1(i),

$$\begin{aligned}&{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_{n+1}) - {\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}) \nonumber \\ &\le -\delta _{\textbf{x}}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 +\left( \frac{(\ell _H+2\nu \ell _{\psi })(\ell _H+\nu \ell _{\psi })}{\lambda \rho } +\frac{\ell _H}{2}\right) \Vert y_{n+1}-y_n\Vert ^2 \\ &\qquad -\frac{\rho }{2}\Vert B(y_{n+1}-y_n)\Vert ^2 +\frac{(\ell _H+2\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda \rho }\Vert y_n-y_{n-1}\Vert ^2 \\ &\le -\delta _{\textbf{x}}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 -\left( \frac{\lambda \rho }{2} -\frac{(\ell _H+2\nu \ell _{\psi })(\ell _H+\nu \ell _{\psi })}{\lambda \rho } -\frac{\ell _H}{2}\right) \Vert y_{n+1}-y_n\Vert ^2 \\ &\qquad +\frac{(\ell _H+2\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda \rho }\Vert y_n-y_{n-1}\Vert ^2, \end{aligned}$$

or equivalently,

$$\begin{aligned}&{\mathcal {L}}_\rho (\textbf{x}_{n+1},y_{n+1},z_{n+1}) +\frac{(\ell _H +2\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda \rho }\Vert y_{n+1}-y_{n}\Vert ^2\\&\quad +\delta _{\textbf{x}}\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2 +\delta _y\Vert y_{n+1}-y_n\Vert ^2 \\ &\le {\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}) +\frac{(\ell _H +2\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda \rho }\Vert y_n-y_{n-1}\Vert ^2, \end{aligned}$$

which proves (5). Here, we note that \(\delta _{\textbf{x}} =\frac{\mu \alpha -\ell _P -\beta }{2} > 0\) since \(\mu > \frac{\ell _P+\beta }{\alpha }\) and that \(\delta _y =\frac{\lambda \rho }{2} -\frac{(\ell _H+2\nu \ell _{\psi })^2}{\lambda \rho } -\frac{\ell _H}{2} >0\) since \(\rho > \frac{\ell _H +\sqrt{\ell _H^2 +8(\ell _H +2\nu \ell _{\psi })^2}}{2\lambda }\).

(ii): Let \(n\in {\mathbb {N}}^*\). As \(z_0 \in {\text {Im}}(B)\), we have from (14) that \(z_n \in {\text {Im}}(B)\). Using Lemma 3.1(ii), (7), and the Lipschitz continuity of \(\psi \) yields

$$\begin{aligned}&\sqrt{\lambda }\Vert z_n\Vert \le \Vert B^\top z_n\Vert \le \Vert \nabla H(y_n)\Vert + \nu _n\Vert \nabla \psi (y_{n}) \\&\quad -\nabla \psi (y_{n-1})\Vert \le \Vert \nabla H(y_n)\Vert + \nu \ell _{\psi }\Vert y_{n}-y_{n-1}\Vert . \end{aligned}$$

For \(\omega \in {\mathbb {R}}_{++}\) to be chosen later, invoking Cauchy–Schwarz inequality with two vectors \(u =(\sqrt{\omega },\sqrt{\nu \ell _{\psi }})\) and \(v =(\frac{1}{\sqrt{\omega }}\Vert \nabla H(y_n)\Vert ,\sqrt{\nu \ell _{\psi }}\Vert y_n-y_{n-1}\Vert )\), we have that

$$\begin{aligned} \Vert z_n\Vert ^2&\le \frac{\omega +\nu \ell _{\psi }}{\lambda } \left( \frac{1}{\omega }\Vert \nabla H(y_n)\Vert ^2 +\nu \ell _{\psi }\Vert y_n-y_{n-1}\Vert ^2\right) \nonumber \\ &= \frac{1}{\lambda }\left( 1 +\frac{\nu \ell _{\psi }}{\omega }\right) \Vert \nabla H(y_n)\Vert ^2 +\frac{(\omega +\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda }\Vert y_n-y_{n-1}\Vert ^2. \end{aligned}$$
(17)

Combining this with the fact that

$$\begin{aligned}&\langle z_n, {\textbf{A}}{\textbf{x}}_n+By_n-b \rangle +\frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}_n+By_n-b\Vert ^2 \\&\quad =\frac{\rho }{2}\left\| \frac{z_n}{\rho } +{\textbf{A}}{\textbf{x}}_n+By_n-b \right\| ^2 -\frac{1}{2\rho }\Vert z_n\Vert ^2, \end{aligned}$$

we deduce that

$$\begin{aligned} L_n&={\mathcal {L}}_\rho (\textbf{x}_n,y_n,z_n) + c\Vert y_n-y_{n-1}\Vert ^2 \\ &\ge \sum _{i=1}^m f_i(x_{i,n}) +H(y_n) +P(\textbf{x}_n) -G(\textbf{x}_n) \\ &\quad +\frac{\rho }{2}\left\| \frac{z_n}{\rho } + {\textbf{A}}{\textbf{x}}_n+By_n-b \right\| ^2 -\frac{1}{2\lambda \rho }\left( 1 +\frac{\nu \ell _{\psi }}{\omega }\right) \Vert \nabla H(y_n)\Vert ^2 \\&\quad +\left( c -\frac{(\omega +\nu \ell _{\psi })\nu \ell _{\psi }}{2\lambda \rho }\right) \Vert y_n-y_{n-1}\Vert ^2 \\ &= \left( \sum _{i=1}^m f_i(x_{i,n}) +P(\textbf{x}_n) -G(\textbf{x}_n)\right) +\left( H(y_n) -\frac{1}{2\ell _H}\Vert \nabla H(y_n)\Vert ^2\right) \\&\quad +\frac{\rho }{2}\left\| \frac{z_n}{\rho } + {\textbf{A}}{\textbf{x}}_n+By_n-b \right\| ^2 \\ &\quad +\left( \frac{1}{2\ell _H} -\frac{1}{2\lambda \rho }\left( 1 +\frac{\nu \ell _{\psi }}{\omega }\right) \right) \Vert \nabla H(y_n)\Vert ^2 \\&\quad +\frac{\nu \ell _{\psi }}{2\lambda \rho }(2\ell _H +3\nu \ell _{\psi } -\omega )\Vert y_n-y_{n-1}\Vert ^2. \end{aligned}$$

Now, we note that

$$\begin{aligned} \rho > \frac{\ell _H +\sqrt{\ell _H^2 +8(\ell _H +2\nu \ell _{\psi })^2}}{2\lambda } \ge \frac{\ell _H +\sqrt{\ell _H^2 +8\ell _H^2}}{2\lambda } =\frac{2\ell _H}{\lambda }, \end{aligned}$$

and so \(\frac{1}{2\ell _H} >\frac{1}{\lambda \rho }\). Choosing \(\omega =1\) if \(\nu \ell _{\psi } =0\), and \(\omega =\nu \ell _{\psi }\) if \(\nu \ell _{\psi } >0\), it holds that

$$\begin{aligned}&\theta {:}{=}\frac{1}{2\ell _H} -\frac{1}{2\lambda \rho }\left( 1 +\frac{\nu \ell _{\psi }}{\omega }\right) \ge \frac{1}{2\ell _H} -\frac{1}{2\lambda \rho }(1 +1) >0 \text {~~and}~~ \\&\frac{\nu \ell _{\psi }}{2\lambda \rho }(2\ell _H +3\nu \ell _{\psi } -\omega ) \ge \frac{\nu \ell _{\psi }}{2\lambda \rho }(3\nu \ell _{\psi } -\omega ) =\frac{\nu ^2\ell _{\psi }^2}{\lambda \rho }. \end{aligned}$$

This together with (i) yields

$$\begin{aligned} L_1 \ge L_n&\ge \left( \sum _{i=1}^m f_i(x_{i,n}) +P(\textbf{x}_n) -G(\textbf{x}_n)\right) +\left( H(y_n) -\frac{1}{2\ell _H}\Vert \nabla H(y_n)\Vert ^2\right) \nonumber \\&\quad \, +\frac{\rho }{2}\left\| \frac{z_n}{\rho } + {\textbf{A}}{\textbf{x}}_n+By_n-b \right\| ^2 \nonumber \\ &\quad \, +\theta \Vert \nabla H(y_n)\Vert ^2 +\frac{\nu ^2\ell _{\psi }^2}{\lambda \rho }\Vert y_n-y_{n-1}\Vert ^2, \end{aligned}$$
(18)

in which the first term on the right hand side is bounded below since \(\sum _{i=1}^m f_i(x_i)+P(\textbf{x})-G(\textbf{x})\) is coercive, the second term is bounded below due to the assumption on H and Lemma 2.3(ii), while the remaining terms are nonnegative due to their positive coefficients. Therefore, all the terms are bounded. Again, as \(\sum _{i=1}^m f_i(x_i)+P(\textbf{x})-G(\textbf{x})\) is coercive, the boundedness of the first term implies that \((\textbf{x}_{n})_{n\in {\mathbb {N}}}\) is bounded. Since the last two terms are bounded, it follows from (17) that \((z_n)_{n\in {\mathbb {N}}}\) is bounded. Since \(z_{n+1}=z_n+\rho ({\textbf{A}}{\textbf{x}}_{n+1}+By_{n+1}-b)\) and B has full column rank, we also obtain that \((y_n)_{n\in {\mathbb {N}}}\) is bounded. Consequently, the sequence \((\textbf{x}_n,y_n,z_n)_{n\in {\mathbb {N}}}\) is bounded.

On the other hand, since all the terms on the right hand side of (18) are bounded below, so is the sequence \((L_n)_{n\in {\mathbb {N}}}\). Combining with the nonincreasing property of \((L_n)_{n\in {\mathbb {N}}}\) due to (i), it follows that \((L_n)_{n\in {\mathbb {N}}}\) is convergent. By rearranging (5) and performing telescoping, we then have that

$$\begin{aligned}&\sum _{n=1}^{+\infty }\delta _y \Vert y_{n+1}-y_n\Vert ^2 + \delta _{\textbf{x}}\sum _{n=1}^{+\infty }\Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert ^2\\&\quad \le \sum _{n=1}^{+\infty }(L_n - L_{n+1}) = L_1 - \lim _{n\rightarrow +\infty } L_n < +\infty , \end{aligned}$$

which implies that \(\sum _{n=1}^{+\infty }\Vert \textbf{x}_{n+1}-\textbf{x}_n\Vert ^2 < +\infty \), \(\sum _{n=1}^{+\infty }\Vert y_{n+1}-y_n\Vert ^2 <+\infty \), and by (16), \(\sum _{n=1}^{+\infty }\Vert z_{n+1}-z_n\Vert ^2 < +\infty \). As a result, \(\textbf{x}_{n+1}-\textbf{x}_{n} \rightarrow 0\), \(y_{n+1}-y_n \rightarrow 0\), \(z_{n+1}-z_n \rightarrow 0\), and \({\textbf{A}}{\textbf{x}}_n +By_n =(z_n -z_{n-1})/\rho +b \rightarrow b\) as \(n \rightarrow +\infty \). It then follows from the definition of \(L_n\) that the sequence \(({\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}))_{n\in {\mathbb {N}}}\) is also convergent with \(\lim _{n\rightarrow +\infty } {\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}) =\lim _{n\rightarrow +\infty }L_n\).

Let \((\overline{\textbf{x}},\overline{y},\overline{z})\) be a cluster point of the sequence \((\textbf{x}_n,y_n,z_n)_{n\in {\mathbb {N}}}\). Then there exists a subsequence \((\textbf{x}_{k_n},y_{k_n},z_{k_n})_{n\in {\mathbb {N}}}\) that converges to \((\overline{\textbf{x}},\overline{y},\overline{z})\). In view of (10), letting \(n =k_n\) and using Proposition 2.1(iii) &(iv), we obtain that, for all \(i \in \{1,\dots ,m\}\) and all \(x_i\in {\mathbb {R}}^{d_i}\),

$$\begin{aligned}&f_i(x_{i,k_n+1}) +\langle \nabla _i P(\textbf{x}_{k_n})- g_{i,k_n}, x_{i,k_n+1} -x_i \rangle + \langle z_{k_n}, A_i x_{i,k_n+1} - A_ix_i\rangle \nonumber \\&\quad +\frac{\rho }{2}\Vert \textbf{A}\textbf{u}_{i,k_n+1}(x_{i,k_n+1}) +By_{k_n} -b \Vert ^2 \nonumber \\ &\quad - \frac{\rho }{2}\Vert \textbf{A}\textbf{u}_{i,k_n+1}(x_{i}) +By_{k_n} -b \Vert ^2 \nonumber \\&\quad +\frac{\mu \alpha }{2}\Vert x_{i,k_n+1}-x_{i,k_n}\Vert ^2 - \frac{\nu \ell _{\phi }}{2}\Vert x_{i}-x_{i,k_n}\Vert ^2 \le f_i(x_{i}). \end{aligned}$$
(19)

Since \(\nabla P\) is continuous, \(\nabla P(\textbf{x}_{k_n}) \rightarrow \nabla P(\overline{\textbf{x}})\) as \(n\rightarrow +\infty \). As \(G +\frac{\beta }{2}\Vert \cdot \Vert ^2\) is a continuous convex function, it follows from [22, Example 9.14] that G is locally Lipschitz continuous. Since \(\textbf{x}_{k_n} \rightarrow \overline{\textbf{x}}\) as \(n\rightarrow +\infty \), using Lemma 2.2 and passing to a subsequence if necessary, we can and do assume that \(g_{k_n}\rightarrow \overline{g} \in \partial _L G (\overline{\textbf{x}})\) as \(n\rightarrow +\infty \). Now, for each \(i \in \{1,\dots ,m\}\), letting \(x_i =\overline{x}_i\) and \(n\rightarrow +\infty \) in (19) with noting that \(x_{i,k_n+1} \rightarrow \overline{x}_i\), \(y_{k_n} \rightarrow \overline{y}\), and \(z_{k_n} \rightarrow \overline{z}\), we have that \(\limsup _{n \rightarrow +\infty } f_i(x_{i,k_n+1}) \le f_i(\overline{x}_i)\), which together with the lower semicontinuity of \(f_i\) yields

$$\begin{aligned} \lim _{n \rightarrow +\infty } f_i(x_{i,k_n+1}) = f_i(\overline{x}_i). \end{aligned}$$

As H, P, and G are continuous, it follows that \(\lim _{n \rightarrow +\infty }{\mathcal {L}}_\rho (\textbf{x}_{k_n +1 },y_{k_n+1},z_{k_n +1}) = {\mathcal {L}}_\rho (\overline{\textbf{x}},\overline{y},\overline{z})\). Since the sequence \(({\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}))_{n\in {\mathbb {N}}}\) is convergent, we deduce that

$$\begin{aligned} \lim _{n \rightarrow +\infty }{\mathcal {L}}_\rho (\textbf{x}_{n },y_{n},z_{n}) ={\mathcal {L}}_\rho (\overline{\textbf{x}},\overline{y},\overline{z}). \end{aligned}$$

We next show that \((\overline{\textbf{x}},\overline{y},\overline{z})\) is a stationary point of \({\mathcal {L}}_\rho \). In (6) and (7) with n replaced by \(k_n-1\), letting \(n\rightarrow +\infty \) and using the coninuity of \(\nabla H\) and the Lipschitz continuity of \(\nabla \psi \), we obtain that \(0 =\mathbf {A\overline{x}} + B\overline{y} -b\) and \(0 =\nabla H(\overline{y}) + B^\top \overline{z}\). In turn, letting \(n \rightarrow +\infty \) in (19), we have for all \(i \in \{1,\dots ,m\}\) and all \(x_i\in {\mathbb {R}}^{d_i}\) that

$$\begin{aligned}&f_i(\overline{x}_i) +\langle \nabla _i P(\overline{\textbf{x}})-\overline{g}_i, \overline{x}_i-x_i \rangle + \langle \overline{z}, A_i\overline{x}_i-A_ix_i \rangle \\ &\quad +\frac{\rho }{2}\Vert A_1\overline{x}_1 +\dots +A_{i-1}\overline{x}_{i-1} +A_i\overline{x}_i +A_{i+1}\overline{x}_{i+1}+\dots +A_m \overline{x}_{m} +B\overline{y} -b \Vert ^2 \\ &\quad -\frac{\rho }{2}\Vert A_1\overline{x}_1 +\dots +A_{i-1}\overline{x}_{i-1} +A_ix_i +A_{i+1}\overline{x}_{i+1}+\dots +A_m \overline{x}_{m}\\&\quad +B\overline{y} -b \Vert ^2 -\frac{\nu \ell _{ \phi }}{2}\Vert x_i-\overline{x}_i\Vert ^2 \le f_i(x_i), \end{aligned}$$

which can be written as

$$\begin{aligned} h_i(\overline{x}_i) \le h_i(x_i), \end{aligned}$$

where \(h_i(x_i) =f_i(x_i) +\langle \nabla _i P(\overline{\textbf{x}}) -\overline{g}_i, x_i \rangle + \langle \overline{z}, A_ix_i \rangle +\frac{\rho }{2}\Vert A_1\overline{x}_1 +\dots +A_{i-1}\overline{x}_{i-1} +A_ix_i +A_{i+1}\overline{x}_{i+1}+\dots +A_m \overline{x}_{m} +B\overline{y} -b \Vert ^2 +\frac{\nu \ell _{\phi }}{2}\Vert x_i-\overline{x}_i\Vert ^2\). It follows that, for all \(i \in \{1,\dots ,m\}\),

$$\begin{aligned} 0&\in \partial _L h_i(\overline{x}_i) =\partial _L (f_i)(\overline{x}_i) +\nabla _i P(\overline{\textbf{x}})- \overline{g}_i +A_i^\top \overline{z} +\rho A_i^\top (\textbf{A}\overline{\textbf{x}}+B\overline{y}-b). \end{aligned}$$

Since \(\textbf{A}\overline{\textbf{x}}+B\overline{y}-b =0\), we obtain that

$$\begin{aligned} 0 \in \partial _L (f_1)(\overline{x}_1)\times \cdots \times \partial _L (f_m)(\overline{x}_m) +\nabla P(\overline{\textbf{x}}) -\overline{g} +\textbf{A}^\top \overline{z}, \end{aligned}$$

which completes the proof due to Lemma 2.1(i) &(ii) and the strict differentiability of G. \(\square \)

Remark 3.2

(Conditions for the boundedness of the generated sequence) In order to establish the boundedness of the sequence \((\textbf{x}_n,y_n,z_n)_{n\in {\mathbb {N}}}\), authors in [11, 23, 24] require that there exists \(\sigma \in (0, +\infty )\) such that

$$\begin{aligned} \inf _{y\in {\mathbb {R}}^q} \left( H(y) -\sigma \Vert \nabla H(y)\Vert ^2\right) >-\infty . \end{aligned}$$

This makes it challenging to verify the existence of \(\sigma \) in practice. In our analysis in Theorem 3.1(ii), we replace this condition with the requirement that H is bounded below, which is much easier to verify.

We now establish the convergence of the full sequence generated by Algorithm 3.1. Recall that a proper lower semicontinuous function \({\mathcal {F}}:{\mathcal {H}}\rightarrow \left( -\infty , +\infty \right] \) satisfies the Kurdyka–Łojasiewicz (KL) property [12, 16] at \(\overline{x} \in {\text {dom}}\partial _L {\mathcal {F}}\) if there are \(\eta \in (0, +\infty ]\), a neighborhood V of \(\overline{x}\), and a continuous concave function \(\varphi : \left[ 0, \eta \right) \rightarrow {\mathbb {R}}_+\) such that \(\varphi \) is continuously differentiable with \(\varphi ' > 0\) on \((0, \eta )\), \(\varphi (0) = 0\), and, for all \(x \in V\) with \( {\mathcal {F}}(\overline{x})< {\mathcal {F}}(x) < {\mathcal {F}}(\overline{x}) + \eta \),

$$\begin{aligned} \varphi '({\mathcal {F}}(x) -{\mathcal {F}}(\overline{x})) {\text {dist}}(0, \partial _L {\mathcal {F}}(x)) \ge 1. \end{aligned}$$

We say that \({\mathcal {F}}\) is a KL function if it satisfies the KL property at any point in \({\text {dom}}\partial _L {\mathcal {F}}\). If \({\mathcal {F}}\) satisfies the KL property at \(\overline{x} \in {\text {dom}}\partial _L {\mathcal {F}}\), in which the corresponding function \(\varphi \) can be chosen as \(\varphi (t) = c t ^{1 - \lambda }\) for some \(c \in {\mathbb {R}}_{++}\) and \(\lambda \in [0, 1)\), then \({\mathcal {F}}\) is said to satisfy the KL property at \(\overline{x}\) with exponent \(\lambda \). The function \({\mathcal {F}}\) is called a KL function with exponent \(\lambda \) if it is a KL function and has the same exponent \(\lambda \) at any \(x \in {\text {dom}}\partial _L {\mathcal {F}}\). Now we will present the proof of the full sequential convergence in the following theorem.

Theorem 3.2

(Full sequential convergence) Suppose that Assumption 3.1 holds, that \(\sum _{i=1}^m f_i(x_i)+P(\textbf{x})-G(\textbf{x})\) is coercive, that H is bounded below, that G is differentiable with \(\ell _G\)-Lipschitz continuous gradient, and that, for all \(i\in \{1, \dots , m\}\), \(\nabla {\phi _i}\) is \(\ell _{\phi }\)-Lipschitz continuous. Let \((\textbf{x}_{n},y_{n},z_{n})_{n\in {\mathbb {N}}}\) be the sequence generated by Algorithm 3.1 with \(z_0\in {\text {Im}}(B)\) and \(\limsup _{n\rightarrow +\infty } \mu _n =\overline{\mu } <+\infty \). Define

$$\begin{aligned} {\mathcal {F}}(\textbf{x},y,z,t){:}{=}{\mathcal {L}}_\rho (\textbf{x},y,z) +c\Vert y-t\Vert ^2, \text {~~where~} c=\frac{(\ell _H+2\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda \rho }. \end{aligned}$$
  1. (i)

    Suppose that \({\mathcal {F}}\) is a KL function. Then the sequence \((\textbf{x}_{n},y_{n},z_{n})_{n\in {\mathbb {N}}}\) converges to a stationary point \((\textbf{x}^{*},y^{*},z^{*})\) of \({\mathcal {L}}_\rho \) and

    $$\begin{aligned} \sum _{n=0}^{+\infty } \Vert (\textbf{x}_{n+1},y_{n+1},z_{n+1})-(\textbf{x}_{n},y_{n},z_{n})\Vert < +\infty . \end{aligned}$$
  2. (ii)

    Suppose that \({\mathcal {F}}\) is a KL function with exponent \(\kappa \in [0,1)\). Then the following hold:

    1. (a)

      If \(\kappa =0\), then \((\textbf{x}_{n},y_{n},z_{n})_{n\in {\mathbb {N}}}\) converges to \((\textbf{x}^{*},y^{*},z^{*})\) in a finite number of steps.

    2. (b)

      If \(\kappa \in (0,\frac{1}{2}]\), then there exist \(\overline{\gamma }\in {\mathbb {R}}_{++}\) and \(\zeta \in \left( 0,1\right) \) such that, for all \(n\in {\mathbb {N}}\),

      $$\begin{aligned}&\Vert (\textbf{x}_{n},y_{n},z_{n})-(\textbf{x}^{*},y^{*},z^{*})\Vert \le \overline{\gamma }\zeta ^{\frac{n}{2}}, \\&\Vert {\textbf{A}}{\textbf{x}}_n+By_n-b\Vert \le \overline{\gamma } \zeta ^{\frac{n}{2}}, \\&|{\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}) -{\mathcal {L}}_\rho (\textbf{x}^{*},y^{*},z^{*}) |\le \overline{\gamma }\zeta ^n, \\ \text {and~}&|F(\textbf{x}_n,y_n)-F(\textbf{x}^{*},y^{*})| \le \overline{\gamma } \zeta ^{\frac{n}{2}}. \end{aligned}$$
    3. (c)

      If \(\kappa \in (\frac{1}{2},1)\), then there exists \(\overline{\gamma }\in {\mathbb {R}}_{++}\) such that, for all \(n\in {\mathbb {N}}\),

      $$\begin{aligned}&\Vert (\textbf{x}_{n},y_{n},z_{n})-(\textbf{x}^{*},y^{*},z^{*})\Vert \le \overline{\gamma } n^{-\frac{1-\kappa }{2\kappa -1}}, \\&\Vert {\textbf{A}}{\textbf{x}}_n+By_n-b\Vert \le \overline{\gamma } n^{-\frac{1-\kappa }{2\kappa -1}}, \\ &|{\mathcal {L}}_\rho (\textbf{x}_{n},y_{n},z_{n}) -{\mathcal {L}}_\rho (\textbf{x}^{*},y^{*},z^{*})|\le \overline{\gamma } n^{-\frac{2-2\kappa }{2\kappa -1}}, \\ \text {and~}&|F(\textbf{x}_n,y_n)-F(\textbf{x}^{*},y^{*})| \le \overline{\gamma } n^{-\frac{1-\kappa }{2\kappa -1}}. \end{aligned}$$

Proof

For each \(n\in {\mathbb {N}}\), set \(w_n=(\textbf{x}_{n+1},y_{n+1},z_{n+1},y_n)\) and \(\varDelta _n =\Vert \textbf{x}_{n+2}-\textbf{x}_{n+1}\Vert +\Vert y_{n+2}-y_{n+1}\Vert \). Let \(n \in {\mathbb {N}}^*\). According to Theorem 3.1, we have that

$$\begin{aligned} {\mathcal {F}}(w_{n+1}) +\delta _{\textbf{x}}\Vert \textbf{x}_{n+2}-\textbf{x}_{n+1}\Vert ^2 + \delta _y \Vert y_{n+2}-y_{n+1}\Vert ^2 \le {\mathcal {F}}(w_{n}), \end{aligned}$$
(20)

that \(w_{n+1} - w_n \rightarrow 0\) as \(n\rightarrow +\infty \), that \((w_n)_{n\in {\mathbb {N}}}\) is bounded, that any of its cluster point \(\overline{w}=(\overline{\textbf{x}},\overline{y},\overline{z},\overline{y})\) is a stationary point of (1), and also that \({\mathcal {F}}(w_{n}) \rightarrow {\mathcal {F}}(\overline{w}) = {\mathcal {L}}_{\rho }(\overline{\textbf{x}},\overline{y},\overline{z})\) as \(n\rightarrow +\infty \). Notice that

$$\begin{aligned} \varDelta _n^2 \le \left( \frac{1}{\delta _{\textbf{x}}} +\frac{1}{\delta _y}\right) \left( \delta _{\textbf{x}}\Vert \textbf{x}_{n+2}-\textbf{x}_{n+1}\Vert ^2 + \delta _y \Vert y_{n+2}-y_{n+1}\Vert ^2\right) , \end{aligned}$$

which together with (20) yields

$$\begin{aligned} {\mathcal {F}}(w_{n+1}) +C_1\varDelta _n^2 \le {\mathcal {F}}(w_{n}), \text {~~where~} C_1 =\frac{\delta _{\textbf{x}}\delta _y}{\delta _{\textbf{x}}+\delta _y}. \end{aligned}$$

Next, we have that

$$\begin{aligned} \partial _L ^\textbf{x} {\mathcal {F}}(w_n)&= \partial _L \left( \sum _{i=1}^{m} f_i\right) (\textbf{x}_{n+1}) + \nabla P(\textbf{x}_{n+1}) - \nabla G(\textbf{x}_{n+1}) \\&\quad + \textbf{A}^\top z_{n+1} + \rho \textbf{A}^\top ({\textbf{A}}{\textbf{x}}_{n+1} + By_{n+1} -b) \end{aligned}$$

and, for all \(i \in \{1,\dots ,m\}\),

$$\begin{aligned} \partial _L ^{x_i} {\mathcal {F}}(w_n)&= \partial _L f_i(x_{i,n+1})+\nabla _i P(\textbf{x}_{n+1}) - \nabla _i G(\textbf{x}_{n+1}) \\&\quad + A_i^\top z_{n+1} + \rho A_i^\top ({\textbf{A}}{\textbf{x}}_{n+1} + By_{n+1} -b). \end{aligned}$$

By revoking the optimality condition for the update of \(x_{i,n+1}\) in Step 2 of Algorithm 3.1,

$$\begin{aligned} 0&\in \partial _L f_i(x_{i,n+1}) +\nabla _i P(\textbf{x}_{n}) - \nabla _i G(\textbf{x}_n) + A_i^\top z_n +\rho A_i^\top (\textbf{A}\textbf{u}_{i,n+1}(x_{i,n+1} )\\&\quad +By_n -b ) +\mu _n(\nabla \phi _i (x_{i,n+1})- \nabla \phi _i (x_{i,n})), \end{aligned}$$

which combined with the above equality implies that

$$\begin{aligned}&(\nabla _i P(\textbf{x}_{n+1}) -\nabla _i P(\textbf{x}_{n})) -(\nabla _i G(\textbf{x}_{n+1}) -\nabla _i G(\textbf{x}_n))\\&\quad +A_i^\top (z_{n+1}-z_n) -\mu _n(\nabla \phi _i (x_{i,n+1}) -\nabla \phi _i (x_{i,n})) \\&\quad +\rho A_i^\top (A_{i+1}x_{i+1,n+1} +\dots +A_mx_{m,n+1} -A_{i+1}x_{i+1,n} -\dots -A_mx_{m,n} \\&\quad +By_{n+1} -By_n) \in \partial _L ^{x_i} {\mathcal {F}}(w_n). \end{aligned}$$

Therefore,

$$\begin{aligned}&{\text {dist}}(0,\partial _L ^{x_i} {\mathcal {F}}(w_n)) \nonumber \\ &\le \Vert \nabla _i P(\textbf{x}_{n+1}) -\nabla _i P(\textbf{x}_{n})\Vert +\Vert \nabla _i G(\textbf{x}_{n+1}) -\nabla _i G(\textbf{x}_n)\Vert \nonumber \\ &\quad +\Vert A_i^\top (z_{n+1}-z_n)\Vert +\mu _n\Vert \nabla \phi _i (x_{i,n+1}) -\nabla \phi _i (x_{i,n})\Vert \nonumber \\ &\quad +\rho \Vert A_i^\top (A_{i+1}x_{i+1,n+1} +\dots +A_mx_{m,n+1} -A_{i+1}x_{i+1,n} -\dots \nonumber \\&\quad -A_mx_{m,n} +By_{n+1} -By_n\Vert \nonumber \\&\le \ell _P\Vert \textbf{x}_{n+1} - \textbf{x}_{n}\Vert +\ell _G\Vert \textbf{x}_{n} - \textbf{x}_{n+1}\Vert +\Vert A_i^\top \Vert \Vert z_{n+1}-z_n\Vert +\mu _n \ell _{\phi }\Vert x_{i,n+1}-x_{i,n}\Vert \nonumber \\ &\quad +\rho \sum _{j=i+1}^{m} \Vert A_i^\top \Vert \Vert A_j\Vert \Vert x_{j,n+1}-x_{j,n}\Vert +\rho \Vert A_i^\top \Vert \Vert B\Vert \Vert y_{n+1}-y_n\Vert \nonumber \\ &\le \left( \ell _P +\ell _G +\mu _n\ell _{\phi } +\rho \sum _{j=i+1}^{m} \Vert A_i^\top \Vert \Vert A_j\Vert \right) \Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert +\Vert A_i^\top \Vert \Vert z_{n+1}-z_n\Vert \nonumber \\ &\quad +\rho \Vert A_i^\top \Vert \Vert B\Vert \Vert y_{n+1}-y_n\Vert . \end{aligned}$$
(21)

We now see that

$$\begin{aligned} \partial _L ^y {\mathcal {F}}(w_n) =\nabla H(y_{n+1}) + B^\top z_{n+1} +\rho B^\top ({\textbf{A}}{\textbf{x}}_{n+1} + By_{n+1} -b) +2c(y_{n+1}-y_{n}) \end{aligned}$$

and from the optimality condition for the update of \(y_{n+1}\) in Algorithm 3.1 that

$$\begin{aligned} 0&= \nabla H(y_{n+1}) + B^\top z_n +\rho B^\top ({\textbf{A}}{\textbf{x}}_{n+1} +By_{n+1} -b) +\nu _n(\nabla \psi (y_{n+1}) - \nabla \psi (y_n)). \end{aligned}$$

This leads to

$$\begin{aligned} \partial _L ^y {\mathcal {F}}(w_n)&=B^{\top }(z_{n+1}-z_n) -\nu _n(\nabla \psi (y_{n+1}) -\nabla \psi (y_n)) + 2c(y_{n+1}-y_{n}), \end{aligned}$$

and so

$$\begin{aligned} {\text {dist}}(0,\partial _L ^{y} {\mathcal {F}}(w_n))&\le \Vert B^{\top }(z_{n+1}-z_n)\Vert +\nu _n\Vert \nabla \psi (y_{n+1}) -\nabla \psi (y_n)\Vert \nonumber \\ &\quad + 2c\Vert y_{n+1}-y_{n})\Vert \nonumber \\ &\le \Vert B^{\top }(z_{n+1}-z_n)\Vert +(\nu \ell _\psi +2c)\Vert y_{n+1}-y_{n})\Vert . \end{aligned}$$
(22)

Since \(\partial _L ^z {\mathcal {F}}(w_n) ={\textbf{A}}{\textbf{x}}_{n+1} + By_{n+1} -b =\frac{1}{\rho }(z_{n+1} -z_n)\) and \(\partial _L ^t {\mathcal {F}}(w_n) =-2c(y_{n+1}-y_{n})\), it holds that

$$\begin{aligned} {\text {dist}}(0,\partial _L ^{z} {\mathcal {F}}(w_n)) =\frac{1}{\rho }\Vert z_{n+1}-z_n\Vert \text {~~and}~~ {\text {dist}}(0,\partial _L ^{t} {\mathcal {F}}(w_n)) =2c\Vert y_{n+1}-y_n\Vert . \end{aligned}$$
(23)

Combining (21), (22), and (23), noting that \(\limsup _{n\rightarrow +\infty } \mu _n =\overline{\mu } <+\infty \), and using (15) and (16) in the proof of Theorem 3.1, we derive that there exists \(n_0\in {\mathbb {N}}^*\) and \(C_2 \in {\mathbb {R}}_{++}\) such that, for all \(n\ge n_0\),

$$\begin{aligned}&{\text {dist}}(0, \partial _L {\mathcal {F}}(w_n)) \le C_2 \left( \Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert +\Vert y_{n+1}-y_n\Vert +\Vert y_{n}-y_{n-1}\Vert \right) \\&\le C_2(\varDelta _{n-1} +\varDelta _{n-2}). \end{aligned}$$

(i): We see that all the conditions in the abstract convergence framework [5, Theorem 5.1] are satisfied with \(I =\{1,2\}\), \(\lambda _1=\lambda _2=1/2\), \(\alpha _n \equiv C_1\), \(\beta _n \equiv 1/(2C_2)\), and \(\varepsilon _n \equiv 0 \). By [5, Theorem 5.1(i)],

$$\begin{aligned} \sum _{n=0}^{+\infty } (\Vert \textbf{x}_{n+2}-\textbf{x}_{n+1}\Vert +\Vert y_{n+2}-y_{n+1}\Vert ) =\sum _{n=0}^{+\infty } \varDelta _n < +\infty , \end{aligned}$$

which implies that \(\sum _{n=0}^{+\infty } \Vert \textbf{x}_{n+1}-\textbf{x}_{n}\Vert <+\infty \) and \(\sum _{n=0}^{+\infty } \Vert y_{n+1}-y_{n}\Vert < +\infty \). Together with (16), we derive that \(\sum _{n=0}^{+\infty } \Vert z_{n+1}-z_n\Vert <+\infty \), and hence

$$\begin{aligned} \sum _{n=0}^{+\infty } \Vert (\textbf{x}_{n+1},y_{n+1},z_{n+1})-(\textbf{x}_{n},y_{n},z_{n})\Vert < +\infty , \end{aligned}$$

which yields the convergence of \((\textbf{x}_{n},y_{n},z_{n})_{n\in {\mathbb {N}}}\) to \((\textbf{x}^{*},y^{*},z^{*})\). In view of Theorem 3.1(ii), \((\textbf{x}^{*},y^{*},z^{*})\) is a stationary point of \({\mathcal {L}}_\rho \).

(ii)(a): This follows from the arguments in the proof of [5, Theorem 5.1] and [4, Theorem 2(i)].

(ii)(b): According to [5, Theorem 5.1(iv)], there exist \(\gamma _0\in {\mathbb {R}}_{++}\) and \(\zeta \in \left( 0,1\right) \) such that, for all \(n\in {\mathbb {N}}^*\),

$$\begin{aligned} \Vert (\textbf{x}_{n},y_{n})-(\textbf{x}^{*},y^{*})\Vert \le \gamma _0 \zeta ^{\frac{n}{2}} \text {~~and}~~ |{\mathcal {F}}(\textbf{x}_{n},y_{n},z_{n},y_{n-1})-{\mathcal {F}}(\textbf{x}^{*},y^{*},z^{*},y^*)| \le \gamma _0\zeta ^n. \end{aligned}$$

We derive that, for all \(n\in {\mathbb {N}}^*\), \(\Vert \textbf{x}_n-\textbf{x}^{*}\Vert \le \gamma _0\zeta ^{\frac{n}{2}}\) and \(\Vert y_n-y^{*}\Vert \le \gamma _0\zeta ^{\frac{n}{2}}\), which yields

$$\begin{aligned} \Vert y_n-y_{n-1}\Vert \le \Vert y_n-y^{*}\Vert +\Vert y_{n-1}-y^{*}\Vert \le (1+\zeta ^{-\frac{1}{2}})\gamma _0\zeta ^{\frac{n}{2}}. \end{aligned}$$

By passing to the limit in (7), \(B^\top z^{*}=-\nabla H(y^{*})\). Following the same steps as in (15) and (16), we obtain that, for all \(n\in {\mathbb {N}}^*\),

$$\begin{aligned}&\Vert z_n-z^{*}\Vert \le \frac{\ell _H}{\sqrt{\lambda }}\Vert y_n-y^{*}\Vert + \frac{\nu \ell _{\psi }}{\sqrt{\lambda }}\Vert y_{n}-y_{n-1}\Vert \le \gamma _1\zeta ^{\frac{n}{2}}, \\&\quad \text {~~where~} \gamma _1 {:}{=}\frac{\ell _H\gamma _0+(1+\zeta ^{-\frac{1}{2}})\nu \ell _{\psi }\gamma _0}{\sqrt{\lambda }}. \end{aligned}$$

Consequently,

$$\begin{aligned}&\Vert (\textbf{x}_{n},y_{n},z_{n})-(\textbf{x}^{*},y^{*},z^{*})\Vert \le \Vert \textbf{x}_{n}-\textbf{x}^{*}\Vert + \Vert y_{n}-y^{*}\Vert + \Vert z_{n}-z^{*}\Vert \le (2\gamma _0 +\gamma _1)\zeta ^{\frac{n}{2}}, \\&\Vert {\textbf{A}}{\textbf{x}}_n+By_n-b\Vert =\frac{1}{\rho }\Vert z_n-z_{n-1}\Vert \le \frac{1}{\rho }(\Vert z_n-z^{*}\Vert +\Vert z_{n-1}-z^{*}\Vert ) \le \frac{(1+\zeta ^{-\frac{1}{2}})\gamma _1}{\rho } \zeta ^{\frac{n}{2}}, \\ \text {and~}&\Vert z_n\Vert \le \Vert z_n-z^{*}\Vert + \Vert z^{*}\Vert \le \gamma _1 \zeta ^{\frac{n}{2}} + \Vert z^{*}\Vert . \end{aligned}$$

We now deduce from the definition of \({\mathcal {F}}\) that

$$\begin{aligned}&|{\mathcal {L}}_{\rho }(\textbf{x}_{n},y_{n},z_{n})- {\mathcal {L}}_{\rho }(\textbf{x}^{*},y^{*},z^{*})|\\&= \left| {\mathcal {F}}(\textbf{x}_{n},y_{n},z_{n},y_{n-1})-{\mathcal {F}}(\textbf{x}^{*},y^{*},z^{*},y^*) - c\Vert y_n-y_{n-1}\Vert ^2\right| \\&\le |{\mathcal {F}}(\textbf{x}_{n},y_{n},z_{n},y_{n-1})-{\mathcal {F}}(\textbf{x}^{*},y^{*},z^{*},y^*)| + c\Vert y_n-y_{n-1}\Vert ^2\\ &\le \gamma _2\zeta ^n, \text {~~where~} \gamma _2 {:}{=}\gamma _0 + c(1+\zeta ^{-\frac{1}{2}})^2\gamma _0^2 \end{aligned}$$

and from the definition of \({\mathcal {L}}_\rho \) that

$$\begin{aligned}&|F(\textbf{x}_n,y_n)-F(\textbf{x}^{*},y^{*})| = |{\mathcal {L}}_{\rho }(\textbf{x}_n,y_n,z_n)-{\mathcal {L}}_{\rho }(\textbf{x}^{*},y^{*},z^{*})-\langle z_n, {\textbf{A}}{\textbf{x}}_n \\&\quad + By_n-b\rangle - \frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}_n + By_n-b\Vert ^2| \\&\quad \le |{\mathcal {L}}_{\rho }(\textbf{x}_n,y_n,z_n)-{\mathcal {L}}_{\rho }(\textbf{x}^{*},y^{*},z^{*})| + \Vert z_n\Vert \Vert {\textbf{A}}{\textbf{x}}_n \\&\quad + By_n-b\Vert + \frac{\rho }{2}\Vert {\textbf{A}}{\textbf{x}}_n + By_n-b\Vert ^2 \\&\quad \le \gamma _2\zeta ^n + \left( \gamma _1 \zeta ^{\frac{n}{2}} + \Vert z^{*}\Vert \right) \frac{(1+\zeta ^{-\frac{1}{2}})\gamma _1}{\rho } \zeta ^{\frac{n}{2}}+ \frac{(1+\zeta ^{-\frac{1}{2}})^2\gamma _1^2}{2\rho }\zeta ^n \\&\quad \le \left( \gamma _2 +\frac{(3+\zeta ^{-\frac{1}{2}})(1+\zeta ^{-\frac{1}{2}})\gamma _1^2}{2\rho } +\frac{\Vert z^{*}\Vert (1+\zeta ^{-\frac{1}{2}})\gamma _1}{\rho }\right) \zeta ^{\frac{n}{2}}, \end{aligned}$$

where the last inequality follows from the fact that \(\zeta ^n \le \zeta ^{\frac{n}{2}}\) since \(\zeta \in (0,1)\). By letting

$$\begin{aligned} \overline{\gamma } =\max \left\{ 2\gamma _0+\gamma _1, \frac{(1+\zeta ^{-\frac{1}{2}})\gamma _1}{\rho }, \gamma _2 +\frac{(3+\zeta ^{-\frac{1}{2}})(1+\zeta ^{-\frac{1}{2}})\gamma _1^2}{2\rho } +\frac{\Vert z^{*}\Vert (1+\zeta ^{-\frac{1}{2}})\gamma _1}{\rho } \right\} \end{aligned}$$

and increasing it if necessary, we arrive at the conclusion.

(ii)(c): Following the arguments in [5, Theorem 5.1(iv)] and [4, Theorem 2(iii)], we find \(\gamma _0 \in {\mathbb {R}}_{++}\) such that, for all \(n\in {\mathbb {N}}^*\),

$$\begin{aligned}&\Vert (\textbf{x}_{n},y_{n})-(\textbf{x}^{*},y^{*})\Vert \le \gamma _0 n^{-\frac{1-\kappa }{2\kappa -1}} \text {~~and}~~\\&|{\mathcal {F}}(\textbf{x}_{n},y_{n},z_{n},y_{n-1})-{\mathcal {F}}(\textbf{x}^{*},y^{*},z^{*},y^*)| \le \gamma _0 n^{-\frac{2-2\kappa }{2\kappa -1}}. \end{aligned}$$

Similar to (ii)(b), for all \(n\ge 2\), since \(n-1\ge \frac{1}{2}n\) and \(\frac{1-\kappa }{2\kappa -1} >0\), we derive that

$$\begin{aligned}&\Vert (\textbf{x}_{n},y_{n},z_{n})-(\textbf{x}^{*},y^{*},z^{*})\Vert \le (2\gamma _0+\gamma _1) n^{-\frac{1-\kappa }{2\kappa -1}},\\&\Vert {\textbf{A}}{\textbf{x}}_n+By_n-b\Vert \le \frac{(1 +2^{\frac{1-\kappa }{2\kappa -1}})\gamma _1}{\rho } n^{-\frac{1-\kappa }{2\kappa -1}},\\&|{\mathcal {L}}_{\rho }(\textbf{x}_{n},y_{n},z_{n})- {\mathcal {L}}_{\rho }(\textbf{x}^{*},y^{*},z^{*})|\le \gamma _2 n^{-\frac{2-2\kappa }{2\kappa -1}}, \end{aligned}$$

and that

$$\begin{aligned}&|F(\textbf{x}_n,y_n)-F(\textbf{x}^{*},y^{*})|\\&\quad \le \gamma _2 n^{-\frac{2-2\kappa }{2\kappa -1}}+ \left( \gamma _1 n^{-\frac{1-\kappa }{2\kappa -1}} + \Vert z^{*}\Vert \right) \frac{(1 +2^{\frac{1-\kappa }{2\kappa -1}})\gamma _1}{\rho } n^{-\frac{1-\kappa }{2\kappa -1}} + \frac{(1 +2^{\frac{1-\kappa }{2\kappa -1}})^2\gamma _1^2}{2\rho } n^{-\frac{2-2\kappa }{2\kappa -1}}\\&\quad \le \left( \gamma _2 +\frac{(3 +2^{\frac{1-\kappa }{2\kappa -1}})(1 +2^{\frac{1-\kappa }{2\kappa -1}})\gamma _1^2}{2\rho } +\frac{\Vert z^{*}\Vert (1 +2^{\frac{1-\kappa }{2\kappa -1}})\gamma _1}{\rho } \right) n^{-\frac{1-\kappa }{2\kappa -1}}, \end{aligned}$$

where \(\gamma _1 {:}{=}\frac{\ell _H\gamma _0 +(1 +2^{\frac{1-\kappa }{2\kappa -1}})\nu \ell _{\psi }\gamma _0}{\sqrt{\lambda }}\), \(\gamma _2 {:}{=}\gamma _0 +c(1 +2^{\frac{1-\kappa }{2\kappa -1}})^2\gamma _0^2\), and the last inequality follows from the fact that \(n^{-\frac{2-2\kappa }{2\kappa -1}}<n^{-\frac{1-\kappa }{2\kappa -1}}\) for \(\kappa \in (\frac{1}{2},1)\). By setting \(\overline{\gamma }\) in the same way as in (ii)(b), we obtain the conclusion. \(\square \)

In Theorem 3.2, to obtain the convergence of the full sequence generated by Algorithm 3.1, we require that \({\mathcal {F}}(\textbf{x},y,z,t) {:}{=}{\mathcal {L}}_\rho (\textbf{x},y,z) +\frac{(\ell _H+2\nu \ell _{\psi })\nu \ell _{\psi }}{\lambda \rho }\Vert y-t\Vert ^2\) is a KL function. It is worthwhile mentioning that if the objective function \(F(\textbf{x},y)\) is a semi-algebraic function, then so is \({\mathcal {F}}(\textbf{x},y,z,t)\), and hence \({\mathcal {F}}(\textbf{x},y,z,t)\) is a KL function with exponent \(\kappa \in [0, 1)\); see, e.g., [4, Example 1].

4 Numerical Results

In this section, we provide the numerical results of our proposed algorithm for two case studies: RPCA with modified regularization, and an DC-OPF problem which considers optimal photovoltaic systems placement. All of the experiments are performed in MATLAB R2021b on a 64-bit laptop with Intel(R) Core(TM) i7-1165G7 CPU (2.80GHz) and 16GB of RAM.

4.1 RPCA with Modified Regularization

We consider the following RPCA model introduced in Example 1.1

$$\begin{aligned} \min _{L,S,T \in {\mathbb {R}}^{m\times d}} \Vert L\Vert _{*} + \tau \Vert S\Vert _1 + \frac{\gamma }{2}\Vert T-M\Vert _F^2 \quad \text {s.t.~} T=L+S. \end{aligned}$$
(24)

We modify the problem (24) as follows

$$\begin{aligned} \min _{L,S,T \in {\mathbb {R}}^{m\times d}} \Vert L\Vert _{*} + \tau \Vert S\Vert _1 - \tau \Vert S\Vert + \frac{\gamma }{2}\Vert T-M\Vert _F^2 \quad \text {s.t.~} T=L+S, \end{aligned}$$
(25)

where \(\Vert \cdot \Vert _1 - \Vert \cdot \Vert \) is the modified regularization. In order to use Algorithm 3.1, we let \(f_1(L)= \Vert L\Vert _{*}\), \(f_2(S)= \tau \Vert S\Vert _1\), \(H(T)=\frac{\gamma }{2}\Vert T-M\Vert _F^2\), \(P\equiv 0\), \(G(S)=\tau \Vert S\Vert \), \(D_{\phi _1}(L,L_n)=\frac{\alpha }{2}\Vert L-L_n\Vert _F^2\), and \(D_{\phi _2}(S,S_n)=\frac{\alpha }{2}\Vert S-S_n\Vert _F^2\). The coefficient matrices are defined by \(A_1=A_2=\textbf{I}_{m\times m}\), and \(B=-\textbf{I}_{m\times m}\). Note that \(b=0\) in this case, and for all \(n\in {\mathbb {N}}\) we fix \(\mu _n=1,~\nu _n=0\). The problem (25) can be solved by Algorithm 3.1 with the following steps

$$\begin{aligned} \begin{array}{l} L_{n+1} = \mathop {{\text {argmin}}}\limits _{L \in {\mathbb {R}}^{m\times d}}~ \Vert L\Vert _{*} - \langle \tau g_{1,n}, L \rangle + \langle Z_n, L \rangle + \frac{\rho }{2}\Vert L + S_n - T_n\Vert _F^2 + \frac{\alpha }{2}\Vert L-L_n\Vert ^2_F,\\ S_{n+1}= \mathop {{\text {argmin}}}\limits _{S\in {\mathbb {R}}^{m\times d}}~ \tau \Vert S\Vert _1 - \langle \tau g_{2,n}, S \rangle + \langle Z_n, S \rangle + \frac{\rho }{2}\Vert L_{n+1} + S - T_n\Vert _F^2+ \frac{\alpha }{2}\Vert S-S_n\Vert ^2_F, \\ T_{n+1} = \mathop {{\text {argmin}}}\limits _{T\in {\mathbb {R}}^{m\times d}}~ \frac{\gamma }{2}\Vert T-M\Vert ^2_F - \langle Z_n, T \rangle + \frac{\rho }{2}\Vert L_{n+1} + S_{n+1} - T\Vert ^2_F,\\ Z_{n+1}= Z_n + \rho (L_{n+1} + S_{n+1} - T_{n+1}), \end{array} \end{aligned}$$

where \(g_n=(g_{1,n},g_{2,n}) \in \partial _L \Vert \cdot \Vert ((L_n,S_n))\). In this case, since \(G(S)=\tau \Vert S\Vert \), \(g_{1,n}\) is a zero matrix with size \(m\times d\). This updating scheme can be rewritten as

$$\begin{aligned} \begin{array}{l} L_{n+1} = \mathop {{\text {argmin}}}\limits _{L \in {\mathbb {R}}^{m\times d}} ~ \frac{1}{\rho +\alpha }\Vert L\Vert _{*} + \frac{1}{2}\left\| L - \left( \frac{-Z_n-\rho S_n +\rho T_n +\alpha L_n}{\rho +\alpha } \right) \right\| _F^2,\\ S_{n+1}=\mathop {{\text {argmin}}}\limits _{S \in {\mathbb {R}}^{m\times d}} ~ \frac{\tau }{\rho +\alpha }\Vert S\Vert _1 + \frac{1}{2}\left\| S - \left( \frac{\tau g_{2,n}-Z_n -\rho L_{n+1} +\rho T_n +\alpha S_n}{\rho +\alpha } \right) \right\| _F^2,\\ T_{n+1}=\frac{\gamma M +Z_n +\rho L_{n+1}+\rho S_{n+1}}{\gamma +\rho },\\ Z_{n+1}= Z_n + \rho (L_{n+1} + S_{n+1} - T_{n+1}). \end{array} \end{aligned}$$

Both of the L and S subproblems have closed-form solutions, which are

$$\begin{aligned} L_{n+1}&= {\mathcal {P}}_{\frac{1}{\rho +\alpha }} \left( \frac{-Z_n-\rho S_n +\rho T_n +\alpha L_n}{\rho +\alpha }\right) ,~S_{n+1}\\&={\mathcal {S}}_{\frac{\tau }{\rho +\alpha }}\left( \frac{\tau g_{2,n}-Z_n -\rho L_{n+1} +\rho T_n +\alpha S_n}{\rho +\alpha }\right) , \end{aligned}$$

where \({\mathcal {P}}_c(\cdot )\) denotes the soft shrinkage operator imposed on the singular values of the input matrix [8], \({\mathcal {S}}_c(\cdot )\) denotes the soft shrinkage operator imposed on all entries of the input matrix [14], and c is the threshold value. The subgradient \(g_{2,n}\) is given by \(u_1 v_1^\top \), where \(u_1\) and \(v_1\) are the first left and right singular column vectors which are obtained via the singular value decomposition of \(S_n\). We randomly generate the ground truth matrices for this case study by the procedures described in Appendix Appendix A. We set \(\tau =1/\sqrt{\max (m,d)}\), \(\alpha =10^{-2}\), and \(\gamma =1\) for all test cases. To apply Algorithm 3.1, we choose \(\rho =2 + 10^{-10}\) (since \(\lambda =1,~\ell _H = \gamma \)). We compare our proposed algorithm with the three-block ADMM (ADMM-3) used in [14], with their corresponding Lagrangian penalty term \(\rho =2\). The ADMM-3 solves problem (24) while our algorithm solves problem (25). The noise parameter \(\varGamma \) is set to \(10^{-2}\) and \(2\times 10^{-2}\), respectively. The algorithms are terminated when the relative change between the two consecutive iterates is less than \(10^{-6}\), specifically

$$\begin{aligned} \frac{\Vert (L_{n+1},S_{n+1},T_{n+1})-(L_{n},S_{n},T_{n})\Vert _F}{\Vert (L_{n},S_{n},T_{n})\Vert _F+1} \le 10^{-6}. \end{aligned}$$

We denote by \((\widehat{L}_n,\widehat{S}_n,\widehat{T}_n)\) the solution found at each iteration, and by \((L_O,S_O,T_O)\) the ground truth matrices that we want to recover by solving the problem (25). The following relative error (RE) with respects to the ground truth is used to measure the quality of the solution

$$\begin{aligned} \text {RE}{:}{=}\frac{\Vert (\widehat{L}_n,\widehat{S}_n,\widehat{T}_n)-(L_O,S_O,T_O)\Vert _F}{\Vert (L_O,S_O,T_O)\Vert _F +1}. \end{aligned}$$

All algorithms are run for 30 times for each test cases with a maximum number of iterations of 4000. At each time, the initial matrices L and S are created randomly, T is initialized by the value of M, and Z is initialized as a zero matrix. We perform our experiments on both square (\(100\times 100\) and \(1000\times 1000\) matrices) and rectangular matrices (here we arbitrary choose \(200\times 100\) and \(2000 \times 1000\) matrices). The average CPU time (in seconds), RE, rank, and sparsity of the solutions are presented in Table 1 and Table 2, respectively. The tuple (rs) in the tables shows the rank and sparsity ratio of the randomly-generated ground truth. Although the proposed algorithm takes more time to run (mainly due to the singular value decomposition process required to compute the subgradient at each iteration), it outperforms ADMM-3 in terms of solution quality for both test cases. Interestingly, the sparsity of the solutions found by our algorithm is also closer to the ground truth values, which indicates the efficiency of the modified regularization. Our algorithm can successfully recover the rank of the ground truth matrices in all test cases for \(\varGamma =10^{-2}\). Also, our algorithm can still recover better sparsity level in almost all test cases. When the size of the test matrices is larger, the CPU time of our algorithm is still comparable to that of the ADMM-3, and it still outperforms the ADMM-3 in terms of solution quality. When \(\varGamma =2\times 10^{-2}\) for the larger case, the rank and sparsity are more challenging to be recovered, and our proposed algorithm can still manage to obtain better sparsity level in almost all test cases.

Table 1 RPCA results on random generated square matrices
Table 2 RPCA results on random generated rectangular matrices

4.2 DC-OPF with optimal photovoltaic system placement

We revisit the DC-OPF problem with optimal PV allocation based on the formulations given in [2, 21]. The details of the variables as well as the parameters are given in Table 4 in the Appendix. The formulation is given as follows

$$\begin{aligned} \min ~~&\left( \sum _{i\in N}Cu_i + \sum _{i\in N}\left( a_i(P_{i}^{G})^2+b_iP_{i}^{G}+c_i\right) \right) \end{aligned}$$
(OPF-1)
$$\begin{aligned} \text {subject to}~~&P_{i}^{PV}+P_{i}^{G}- \sum _{j \in M_i}b_{ij}(\varTheta _i-\varTheta _j)\ge D_i, \quad \forall i \in N \end{aligned}$$
(26a)
$$\begin{aligned}&\sum _{i\in N}u_i \ge \frac{\sum _{i\in N}D_i}{2\overline{P^{PV}}} \end{aligned}$$
(26b)
$$\begin{aligned}&b_{ij}(\varTheta _i-\varTheta _j)\le \overline{P}, \quad \forall i \in N,~j \in M_i \end{aligned}$$
(26c)
$$\begin{aligned}&\varTheta _i \in [0,2\pi ], \quad \forall i \in N \end{aligned}$$
(26d)
$$\begin{aligned}&0\le P_{i}^{PV}\le u_i\overline{P^{PV}}, \quad \forall i \in N \end{aligned}$$
(26e)
$$\begin{aligned}&0\le P_{i}^{G}\le \overline{P^G}, \quad \forall i \in N \end{aligned}$$
(26f)
$$\begin{aligned}&u_i \in \{0,1\}, \quad \forall i \in N. \end{aligned}$$
(26g)

The objective function aims at minimizing the installation cost of PV systems and the operation cost of conventional generators. Constraint (26a) ensures that the demand is met by the power flow from both conventional generators and renewable sources. Constraint (26b) makes sure that we have enough PV systems to achieve 50 percent renewable penetration rate. Constraint (26c) is the limit of the power flow between any two nodes. The remaining constraints define the boundaries of the decision variables. Using the binary relaxation technique used in [21], we have that

$$\begin{aligned}&(\forall i\in N,\quad u_i \in \{0,1\}) \iff \left( \forall i\in N,\ u_i \in [0,1] \text {~and~} \sum _{i\in N} \left( u_i^2-u_i \right) \ge 0\right) . \end{aligned}$$

Taking into account of the above equivalence, a plausible alternative optimization model for the OPF problem with PV is as follows

$$\begin{aligned} \min ~~&\left( \sum _{i\in N}Cu_i + \sum _{i\in N}\left( a_i(P_{i}^{G})^2+b_iP_{i}^{G}+c_i\right) -\gamma \sum _{i\in N}(u_i^2 - u_i) \right) \end{aligned}$$
(OPF-M)
$$\begin{aligned} \text {subject to}&\,\,(26a) \rightarrow (26f)\end{aligned}$$
(27a)
$$\begin{aligned}&\,\, u_i \in [0,1], \quad \forall i \in N. \end{aligned}$$
(27b)

In order to apply our algorithm, we let \(\textbf{x}=(x_1,x_2,\dots ,x_{|N|})\), in which \(x_i=[P_i^{PV},P_i^G,\varTheta _i,u_i]^\top \) denotes the set of variables associated with bus i, for \(i \in \{1,\dots ,|N|\}\). Here we use the notation |N| to denote the cardinality of the set N. Similarly, let \(f_i=Cu_i + a_i(P_{i}^{G})^2+b_iP_{i}^{G}+c_i\) be the cost function associated with bus i and \(G(\textbf{x})=\gamma \sum _{i\in N}(u_i^2 - u_i)\), we can reformulate the problem (OPF-M) as follows

$$\begin{aligned} \min _{x_i}~~&\left( \sum _{i \in N}f_i(x_i) -G(\textbf{x})\right) \end{aligned}$$
(OPF-M2)
$$\begin{aligned} \text {s.t.}~~ {\textbf{A}}{\textbf{x}} \le b, \end{aligned}$$
(28a)

where \(A_i\) are coefficient matrices associated with each \(x_i\), and b is the vector form by the right hand side of the problem (OPF-M). We then introduce slack variables and then penalise the slack variables to transform all inequalities into equalities as in [2]. Two worth noting points are that our DC-OPF problem is nonconvex due to the presence of binary variables (and later by the reformulation into difference-of-convex form), while the DC-OPF problem considered in [2] is convex, and that our algorithm is more general and can also cover their ADMM version. Let p be the number of inequality constraints, where \(p=9|N|+\sum _{i\in N}|M_i| +1\). By letting \(y=(y_1,y_2,\dots ,y_p)\) be the slack variables, the problem (OPF-M2) can be equivalently reformulated as follows

$$\begin{aligned} \min _{x_i}~~&\left( \sum _{i \in N}f_i(x_i) -G(\textbf{x})\right) \text {s.t.}~~ {\textbf{A}}{\textbf{x}} +y = b,~ y \in {\mathbb {R}}_+^p. \end{aligned}$$
(OPF-M2')

Here, we observe that

$$\begin{aligned} y \in {\mathbb {R}}_+^p \iff {\text {dist}}(y,{\mathbb {R}}^{p}_{+}) =0 \iff {\text {dist}}^2(y,{\mathbb {R}}^{p}_{+}) \le 0 \end{aligned}$$

and that \(y\mapsto {\text {dist}}^2(y,{\mathbb {R}}^{p}_{+})\) is a differentiable function whose gradient \(\nabla {\text {dist}}^2(y,{\mathbb {R}}^{p}_{+}) =2(y -{\text {Proj}}_{{\mathbb {R}}_+^p}(y))\) is 2-Lipschitz continuous. So, the final relaxed problem is given as follows

$$\begin{aligned} \min _{x_i}~~&\left( \sum _{i \in N}f_i(x_i) +\frac{\eta }{2}{\text {dist}}^2(y,{\mathbb {R}}^{p}_{+})-G(\textbf{x})\right) \text {s.t.}~~ {\textbf{A}}{\textbf{x}} +y = b, \end{aligned}$$
(OPF-M3)

where \(\eta \) is a positive parameter. Now, (OPF-M3) is a special case of problem (1), in which \(f_i(x_i)=Cu_i + a_i(P_{i}^{G})^2+b_iP_{i}^{G}+c_i\), \(H(y)=\frac{\eta }{2}{\text {dist}}^2(y,{\mathbb {R}}^{p}_{+})\), \(G(\textbf{x})=\gamma \sum _{i\in N}(u_i^2 - u_i)\), \(P(\textbf{x})\equiv 0\), \(B=\textbf{I}_{p\times p}\), and \(A_i\) is constructed as described in Appendix Appendix C. Each function \(f_i(x_i)\) can be rewritten in quadratic form as \(f_i(x_i)=\frac{1}{2}x_i^\top Q_ix_i +q_i^\top x_i +c_i\), in which

$$\begin{aligned} Q_i=\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 2a_i& 0 & 0 \\ 0& 0& 0& 0 \\ 0& 0& 0& 0 \end{bmatrix},~ q_i=\begin{bmatrix} 0\\ b_i\\ 0\\ C \end{bmatrix}. \end{aligned}$$

Using Algorithm 3.1 with \(D_{\phi _i}(x_i,x_{i,n})=\frac{\alpha }{2}\Vert x_i-x_{i,n}\Vert ^2\), where \(\alpha =10^{-2}\), and we also fix \(\mu _n=1,~\nu _n=0\) for all \(n\in {\mathbb {N}}\), the updates of the variables are given as follows

$$\begin{aligned} \begin{array}{ll} x_{i,n+1}=-\left( Q_i+\rho A_i^\top A_i+\alpha \textbf{I}_{p\times p}\right) ^{-1}\left( q_i-\nabla _iG(\textbf{x}_n)+\rho A_i^\top \left( \sum _{k=1}^{i-1}A_kx_{k,n+1}+\sum _{k=i+1}^{N}A_kx_{k,n}+ \right. \right. \\ \left. \left. \qquad \qquad \quad y_n+\frac{z_n}{\rho }-b\right) -\alpha x_{i,n}\right) ,~i\in \{1,\dots ,|N|\},\\ y_{j,n+1}=\max \left\{ 0,\left( -{\textbf{A}}{\textbf{x}}_{n+1}-\frac{z_n}{\rho } +b \right) _j \right\} + \frac{\rho }{\eta +\rho } \min \left\{ 0,\left( -{\textbf{A}}{\textbf{x}}_{n+1}-\frac{z_n}{\rho } +b \right) _j\right\} ,~j\in \{1,\dots ,p\},\\ z_{n+1}=z_n+\rho ({\textbf{A}}{\textbf{x}}_{n+1}+y_{n+1}-b). \end{array} \end{aligned}$$

Note that the closed-form update of y can be derived with the same approach used in [2, Theorem 1]. The proposed algorithm is run for a maximum of 4000 iterations and it is terminated when

$$\begin{aligned} \frac{\Vert (\textbf{x}_{n+1},y_{n+1},z_{n+1})-(\textbf{x}_{n},y_{n},z_{n})\Vert }{\Vert (\textbf{x}_{n},y_{n},z_{n})\Vert } < 10^{-5}. \end{aligned}$$

The test systems considered here are the 14-bus system used in [21, Case study 4.2], and the 141-bus test case taken from MATPOWER [32]. We set \((\eta , \rho , \gamma )=(900, 1800+10^{-10},80)\) for the 14-bus case, and then \((\eta , \rho , \gamma )=(3000, 6000+10^{-10},80)\) for the 141-bus case. We compare the performance of our proposed algorithm with the generalized proximal point algorithm (GPPA) [3], and the proximal subgradient algorithm with extrapolation (PSAe) [21]. We run all algorithms for 30 times, initialized at the lower bound of the variables, for a maximum of 4000 iterations. The results are given in Table 3. The objective function values are calculated using (OPF-1).

Table 3 Comparison of GPPA, PSAe, and the proposed algorithm on 30 runs of the DC OPF model
Fig. 1
figure 1

Best solution found for 14-bus case (the blue color indicates that there is a PV system at the bus)

Fig. 2
figure 2

Best solution found for 141-bus case

The best solution found in the 14-bus case using this proposed BPL-ADMM algorithm is approximately the same as the one found in [21]. This is due to the relaxation used during the modelling process. However, it can be seen that the CPU time is shorter, due to the availability of closed-form solutions of the subproblems. This also makes each iteration of the proposed algorithm less computationally expensive than those of the remaining algorithms. Moreover, it can be observed that for the larger test case, the proposed algorithm benefits from the closed-form solutions of the subproblems and can converge to a stationary point within an acceptable running time. The best solutions found by the proposed algorithm are shown in Figs. 1 and 2, respectively.

5 Conclusion

In this paper, we have proposed a splitting algorithm with linearization based on ADMM to solve a class of structured nonconvex and nonsmooth optimization problems, which can cover two important classes of problems in the literature. Our proposed structure and the proposed algorithm impose less restrictions on the convexity requirements than some algorithms existing in the current literature. The convergence of the whole sequence generated by our algorithm is proved with some mild additional assumptions and KL property. The efficiency of the proposed algorithm is illustrated on two important nonconvex optimization problems, and it shows competitive results in comparison with the existing algorithms.