[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Generalized Dimensions of Self-Affine Sets with Overlaps
Previous Article in Journal
Fractional-Order Controller for the Course Tracking of Underactuated Surface Vessels Based on Dynamic Neural Fuzzy Model
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Flexible Framework for Decentralized Composite Optimization with Compressed Communication

1
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
2
School of Mathematics, Southeast University, Nanjing 211189, China
*
Author to whom correspondence should be addressed.
Fractal Fract. 2024, 8(12), 721; https://doi.org/10.3390/fractalfract8120721
Submission received: 9 October 2024 / Revised: 25 November 2024 / Accepted: 4 December 2024 / Published: 5 December 2024
(This article belongs to the Section Optimization, Big Data, and AI/ML)
Figure 1
<p>Distribution of samples across agents for the a9a dataset (<b>left</b>) and ijcnn1 (<b>right</b>).</p> ">
Figure 2
<p>Random communication graph of network with 10 agents.</p> ">
Figure 3
<p>Performance comparison of distributed logistic regression the on a9a dataset: Plots of iteration number (<b>left</b>) and total communication bits (<b>right</b>) versus distance error.</p> ">
Figure 4
<p>Performance comparison of distributed logistic regression the on ijcnn1 dataset: Plots of iteration number (<b>left</b>) and total communication bits (<b>right</b>) versus distance error.</p> ">
Figure 5
<p>Performance comparison of distributed ridge regression on the a9a dataset: Plots of iteration number (<b>left</b>) and total communication bits (<b>right</b>) versus distance error.</p> ">
Figure 6
<p>Performance comparison of distributed ridge regression on the ijcnn1 dataset: Plots of iteration number (<b>left</b>) and total communication bits (<b>right</b>) versus distance error.</p> ">
Figure 7
<p>Performance comparison of distributed LASSO on the a9a dataset: Plots of iteration number (<b>left</b>) and total communication bits (<b>right</b>) versus distance error.</p> ">
Figure 8
<p>Performance comparison of distributed LASSO on the ijcnn1 dataset: Plots of iteration number (<b>left</b>) and total communication bits (<b>right</b>) versus distance error.</p> ">
Versions Notes

Abstract

:
This paper addresses the decentralized composite optimization problem, where a network of agents cooperatively minimize the sum of their local objective functions with non-differentiable terms. We propose a novel communication-efficient decentralized ADMM framework, termed as CE-DADMM, by combining the ADMM framework with the three-point compressed (3PC) communication mechanism. This framework not only covers existing mainstream communication-efficient algorithms but also introduces a series of new algorithms. One of the key features of the CE-DADMM framework is its flexibility, allowing it to adapt to different communication and computation needs, balancing communication efficiency and computational overhead. Notably, when employing quasi-Newton updates, CE-DADMM becomes the first communication-efficient second-order algorithm based on compression that can efficiently handle composite optimization problems. Theoretical analysis shows that, even in the presence of compression errors, the proposed algorithm maintains exact linear convergence when the local objective functions are strongly convex. Finally, numerical experiments demonstrate the algorithm’s impressive communication efficiency.

1. Introduction

The recent increase in the number of mobile devices with enhanced computing and communication capabilities has led to significant development of multiagent systems [1,2]. Within these systems, many applications, including smart grid management [3,4], wireless communications [5], multi-robot coordination [6,7], large-scale machine learning [8], etc., can be cast to decentralized optimization problems, in which a network of nodes cooperatively solve a finite-sum optimization problem using local information.
A vast of decentralized optimization algorithms have been proposed, since the pioneer work DGD [9], in which each node performs gradient descent and simultaneously communicates decision vector with its neighbors for consensus. As DGD requires diminishing stepsize, which might slower the convergence rate, gradient tracking (GT) based algorithms using constant stepsizes are then developed [10,11] and have been extensively investigated under various scenarios [12,13,14,15], to name a few. However, GT-based methods require to transmit both the decision vector and an additional gradient estimation vector, which increase the communication cost. In parallel, another type of decentralized algorithms based on alternating direction method of multipliers (ADMM) are proposed and analyzed [16,17]. Compared with GT-based algorithms, ADMM-type algorithms can achieve the same convergence rate but require the transmission of only decision vector, which can be more communication-efficient. Following this line, some decentralized optimization algorithms are proposed for accelerating the convergence rate by introducing second-order information [18,19,20,21]. More recently, Ref. [22] proposed a family of decentralized curvature aided primal dual algorithms, which can include gradient, Newton, and BFGS type of updates.
In the decentralized algorithms, it is of great significance to improve communication-efficiency. The methods can be classified into two types. One method is to adopt compressed communication scheme, using quantization [23] or sparsification [24,25] techniques to reduce communication overhead per transmission. Recently, the compressed communication scheme has been combined with DGD [26,27], GT-based algorithms [28,29], and ADMM-type algorithms [30,31]. Another method is to employ intermittent communication scheme which aims to reduce the communication frequency. Such type of methods includes event-triggered communication [32,33,34], lazy aggregation scheme (LAG) [35], etc. Besides, there are also some works combining the both methods [36,37,38]. In particular, ref. [39] combined event-triggered and compressed communication scheme with an ADMM-type algorithm and proposed a communication-efficient decentralized second-order optimization algorithm, which improve both computation and communication efficiency. It is worthy noting that the information distortion arisen by compressed communication scheme may have a negative effect on the convergence performance of the decentralized optimization algorithms. To overcome this shortage, Ref. [40] developed an error-feedback communication scheme (EF21) to avoid the negative effect of information distortion. Recently, a more general efficient communication scheme termed as three point compressor (3PC) is proposed in [41], which provides a unified method including EF21 and LAG as special cases. However, 3PC scheme is only investigated in distributed gradient descent algorithms under the parameter-server framework.
Despite of the progress, the development of communication-efficient decentralized optimization algorithms over the general networks is still lack an in-deep analysis, especially for the objective with non-differentiable part, i.e., decentralized composite optimization problems. Note that such problems have widely applications in the field of machine learning due to the existence of the non-differentiable regularization terms. Currently, some decentralized composite optimization algorithms have been proposed [22,42,43,44], but without employing efficient-communication schemes. Moreover, to the best of our knowledge, no work has been reported on communication-efficient decentralized composite optimization using second-order information. To fill this gap, in this paper, we incorporate the general efficient communication scheme 3PC [41] into the ADMM-based decentralized optimization framework [22], which result in a family of communication-efficient decentralized composite optimization algorithms with theoretical guarantees. It is worthy noting that such incorporation is not trivial as we need to overcome the negative effect arisen by the propagation of communication error over networks. The main contribution of this work can be summarized in the following two aspects:
  • First, we propose a flexible framework termed as CE-DADMM for communication-efficient decentralized composite optimization problems. The framework not only encompasses some existing algorithms, such as COLA [32] and CC-DQM [39], but also introduces several new algorithms. Specifically, by incorporating quasi-Newton updates into CE-DADMM, we derive CE-DADMM-BFGS, the first communication-efficient decentralized second-order algorithm for composite optimization. Compared with CC-DQM, it avoids computing the Hessian matrix and its inversion, significantly reducing the computational cost. Compared with DRUID [22], CE-DADMM can reduce the communication cost due to the efficient communication scheme.
  • Second, we theoretically prove that CE-DADMM can achieve exact linear convergence under the assumption of strong convexity by carefully analyzing the mixing error arisen by the efficient communication scheme and the disagreement of decision vectors. The dependency of the convergence rate on the parameters of the compression mechanism is also established. Additionally, sufficient numerical experiments are presented to substantiate the superior performance of our algorithms in terms of the communication efficiency.
Notation. If not specified, x and A represent the Euclidean norm and the spectral norm, respectively. For a positive definite matrix P 0 , let x P : = x P x . Use [ n ] to denote the set { 1 , , n } . The proximal mapping for a function g ( · ) : R d R is defined by prox g / μ ( v ) = arg min θ R d { g ( θ ) + μ 2 θ v 2 } . Let I d represent the d-dimensional identity matrix, and A B represent the Kronecker product of matrices A and B .

2. Problem Setting

In this paper, we study the decentralized composite optimization problem on an undirected connected network with n agents (or, nodes), which takes the form
min x R d i = 1 n f i ( x ) + g ( x ) ,
where x refers to the decision vector and f i ( · ) : R d R is a convex and smooth function accessible only by node i and g ( · ) : R d R is a convex (possibly non-smooth) regularizer.
Next, we equivalently reformulate problem (1) into a compact form in terms of the whole network following the same idea in [22]. Denote the communication graph as G = V , E , where V : = [ n ] is the set of agents and E V × V is the set of edges containing the pair ( i , j ) if and only if agent i can communicate with agent j. There is no self-loops in G , i.e., ( i , i ) E for any i [ n ] . Note that the edges in E are enumerated in arbitrary order, with e k : = ( i , j ) E denoting the k-th edge, where k [ m ] and m : = | E | is the number of edges. The neighbor set of agent i is N i : = { j V : ( i , j ) E } . Let x i R d and z k R d be the local decision vectors corresponding the ith-node kth-edge, respectively. We assume that G is connected. Then, problem (1) is equivalent to the following constrained form:
min { x i } , θ , { z i j } i = 1 n f i ( x i ) + g ( θ ) , s . t . x i = x j = z k e k = ( i , j ) E , x l = θ , for one arbitrary l V ,
where θ R d is an auxiliary variable for decoupling the smooth and non-smooth functions. Denote the optimal solution of problem (1) and (2) as x and { x i , z k , θ } , respectively. It is straightforward to verify that x = x i = z k = θ for all i [ n ] and k [ m ] .
In what follows, define x ˜ : = [ x 1 ; x 2 ; ; x n ] R n d and z ˜ : = [ z 1 ; z 2 ; ; z m ] R m d . In addition, define two matrices A ^ s and A ^ d R m × n as follows: the k-th row of both A ^ s and A ^ d represents the k-th edge e k . Specifically, the entries [ A ^ s ] k i and [ A ^ d ] k j are both equal to 1 if and only if the edge e k = ( i , j ) ; otherwise, they are 0. We also define S : = ( s l I d ) R n d × d , where s l R n is a vector with a 1 at its l-th position and 0 elsewhere. Clearly, the matrix S extracts the component of x ˜ that corresponds to agent l, meaning that S x ˜ = x l . Let F ( x ˜ ) : = i = 1 n f i ( x i ) . Then, problem (2) can be written as
min x ˜ , θ , z ˜ F ( x ˜ ) + g ( θ ) s . t . A x ˜ = B z ˜ , S x ˜ = θ , where A = A ^ s I d A ^ d I d and B = I m d I m d .
Note that problem (3) is written from the network level, which will be the basis for designing our algorithm.

3. Algorithm Formulation

In this section, we first introduce the basic iterations of our algorithm based on ADMM method. Then, by combining compressed communication techniques with the ADMM-based algorithm, we will devise our algorithm and discuss its relationship with existing algorithms.

3.1. Background: ADMM-Based Algorithm

ADMM is a powerful tool to solve an optimization problem with several blocks of variables. To apply ADMM for solving problem (3), define its augmented Lagrangian as:
L ( x ˜ , θ , z ˜ ; ν , λ ) : = F ( x ˜ ) + g ( θ ) + ν ( A x ˜ B z ˜ ) + λ ( S x ˜ θ ) + μ z 2 A x ˜ B z ˜ 2 + μ θ 2 S x ˜ θ 2 ,
where μ z and μ θ are positive constants, ν R 2 m d and λ R d are Lagrange multipliers. Then, the kth-iteration in ADMM is written as
x ˜ t + 1 = arg min x ˜ L ( x ˜ , θ t , z ˜ t ; ν t , λ t )
θ t + 1 = arg min θ L ( x ˜ t + 1 , θ , z ˜ t ; ν t , λ t )
z ˜ t + 1 = arg min z ˜ L ( x ˜ t + 1 , θ t + 1 , z ˜ ; ν t , λ t )
ν t + 1 = ν t + μ z ( A x ˜ t + 1 B z ˜ t + 1 )
λ t + 1 = λ t + μ θ ( S x ˜ t + 1 θ t + 1 ) .
Define E ^ s = A ^ s A ^ d , E ^ u = A ^ s + A ^ d , L ^ s = E ^ s E ^ s , L ^ u = E ^ u E ^ u , D ^ = 1 2 ( L ^ s + L ^ u ) . In addition, define E s = E ^ s I d , and similarly for E u , L s , L u , and D . Similar as [22], if we initialize the multiplier ν t : = [ α t ; β t ] R 2 m d with α 0 = β 0 , E u x ˜ 0 = 2 z ˜ 0 , there is E u x ˜ t = 2 z ˜ t . Let ϕ t = E s α t , and approximate the augmented Lagrangian L ( · ) in (5a) by employing a second-order expansion at x t as
L ^ ( x ˜ , θ t , z ˜ t ; ν t , λ t ) = L t ( x ˜ t ) + ( x ˜ x ˜ t ) x ˜ L t ( x ˜ t ) + 1 2 x ˜ x ˜ t H t 2 ,
where L t ( x ˜ t ) is short for L ( x ˜ t , θ t , z ˜ t ; ν t , λ t ) , and H t is an invertible matrix representing the approximated Hessian of the augmented Lagrangian, then the iteration (5) can be simplified as
x ˜ t + 1 = x ˜ t H t 1 F ( x ˜ t ) + ϕ t + S λ t + μ z 2 L s x ˜ t + μ θ S S x ˜ t θ t
θ t + 1 = prox g / μ θ S x ˜ t + 1 + μ θ 1 λ t
ϕ t + 1 = ϕ t + μ z 2 L s x ˜ t + 1
λ t + 1 = λ t + μ θ ( S x ˜ t + 1 θ t + 1 ) .
Compared with (5), the iteration (6) contains fewer vectors by eliminating the vector z ˜ and replacing ν by ϕ to halve the dimension of ν . Note that the iteration (6) is written in terms of the whole network. To implement (6) in a decentralized manner, we require the matrix H t be block-diagonal, so that each block can be computed independently by each agent. Here, we assume that H t = diag { H 1 , t , H 2 , t , , H n , t } . The choice of H i , t will be discussed later. Then, agent i will perform the following iteration:
x i , t + 1 = x i , t H i , t 1 f ( x i , t ) + ϕ i , t + μ z 2 j N i ( x i , t x j , t ) + δ i l λ t + μ θ x i , t θ t
ϕ i , t + 1 = ϕ i , t + μ z 2 j N i ( x i , t + 1 x j , t + 1 )
θ t + 1 = δ i l prox g / μ θ x i , t + 1 + μ θ 1 λ t
λ t + 1 = δ i l λ t + μ θ ( x i , t + 1 θ t + 1 ) ,
where δ i l = 1 if i = l , otherwise δ i l = 0 .

3.2. Communication-Efficient Decentralized ADMM

Recalling the iteration (7a) and (7b), it can be seen that agent i will communicate the information on x to its neighbors at each iteration. However, such communication might not be realized for scenarios with limited communication resources. To reduce the communication overhead, we introduce the idea of compressed communication scheme into iteration (7), which results in our algorithm, termed as communication-efficient decentralized ADMM for composite optimization (CE-DADMM).
To compress the communication, we first give a definition of compressor.
Definition 1
(Compressor). A randomized map C : R d R d is called a compressor if there exists a constant δ [ 0 , 1 ) such that E C ( x ) x 2 ( 1 δ ) x 2 holds for any x R d .
In Definition 1, the compressor is characterized using the relationship between the compression error and the original state. Clearly, δ refers to the compression ratio. We can also call C a δ -compressor. In our algorithm, we do not apply C ( · ) directly to the transmitted x , as it will lead to a non-dismissing compression error. Instead, we will adopt a general compressor termed as Three Point Compressor (3PC) [41], whose definition is given below.
Definition 2
(Three Point Compressor, see [41]). A randomized map C h , y : R d × R d × R d R d is called a three point compressor (3PC) if there exist constants 0 < A 1 and B 0 such that
E C h , y ( x ) x 2 ( 1 A ) h y 2 + B x y 2 , x , y , h R d ,
where y and h are parameters of the compressor.
3PC can be realized using δ -compressor C . Two examples of C h , y are given below [41]:
E F 21 : C h , y ( x ) : = h + C ( x h ) ,
C L A G : C h , y ( x ) : = h + C ( x h ) , if x h 2 > ς x y 2 h , otherwise .
It can be checked that A : = 1 δ and B : = δ 1 δ for EF21, and A : = 1 δ and B : = max δ 1 δ , ς for CLAG.
Next, we formulate our algorithm based on 3PC, whose pseudo code is presented in Algorithm 1. We introduce a new state y i , t relating to x i , t , which refers to the estimation on x i , t of agent i’s neighbors. The computation of H i , t ( y i , t ) 1 relies on y i , t , and since y i , t consists of compressed information, this significantly reduces the computational cost associated with the inverse of H i , t ( y i , t ) . Then, at iteration t, agent i transmits the compressed vector C y i , t 1 , x i , t 1 ( x i , t ) rather than x i , t to its neighbors, which lead to the new iterations as below:
x i , t + 1 = x i , t ( H i , t ( y i , t ) ) 1 ( f ( x i , t ) + ϕ i , t + μ z 2 j N i ( y i , t y j , t )
+ δ i l λ t + μ θ ( y i , t θ t ) )
ϕ i , t + 1 = ϕ i , t + μ z 2 j N i ( y i , t + 1 y j , t + 1 )
θ t + 1 = δ i l prox g / μ θ y l , t + 1 + μ θ 1 λ t
λ t + 1 = δ i l λ t + μ θ ( y i , t + 1 θ t + 1 ) .
Algorithm 1 CE-DADMM
1: Initialization:  x i , 0 = 0 , y i , 0 = 0 , ϕ i , 0 = 0 , θ 0 = 0 , λ 0 = 0 , p i , t = f i ( 0 ) .
2: for t = 0,1,… do
3:     for agent i do
4:         Compute H i , t 1 using (13), (14), or (15) according to its choice;
5:         Compute x i , t + 1 using (11a);
6:          y i , t + 1 = C y i , t , x i , t ( x i , t + 1 ) // Compressing information
7:         Broadcast y i , t + 1 to neighbors
8:          ϕ i , t + 1 = ϕ i , t + μ z 2 j N i ( y i , t + 1 y j , t + 1 )
9:         if  i = l  then// Dealing with the non-smooth function
10:             θ t + 1 = prox g / μ θ ( y l , t + 1 + 1 μ θ λ t )
11:             λ t + 1 = λ t + μ θ ( y l , t + 1 θ t + 1 )
12:         end if
13:     end for
14: end for

3.3. Discussion

Our algorithm CE-DADMM presents a flexible framework that accommodates gradient updates, Newton updates, and quasi-Newton updates, depending on the choice of matrix H i , t 1 . Here, H i , t 1 has the following general structure:
H i , t 1 = J i , t + μ z N i + δ i l μ θ + ϵ I d 1 ,
where ϵ > 0 is used to provide additional robustness and J i , t is a matrix to be determined. A detailed discussion is presented below.
Case 1: Gradient Updates. By choosing J i , t 0 , (12) equals to
H i , t 1 = μ z N i + δ i l μ θ + ϵ I d 1 .
Clearly, H t is diagonal. The computation of H t 1 requires O ( d ) computational cost. Compared with COLA [45], CE-DADMM considers the presence of the non-smooth term g ( · ) and allows for more options in the choice of the compression mechanism C . When g ( · ) is excluded, only the lazy aggregation compression mechanism is applied, and Gradient Updates are used, CE-DADMM aligns with the form of COLA.
Case 2: Newton Updates. By choosing J i , t = 2 f i ( y i , t ) , (12) equals to
H i , t 1 = 2 f i ( y i , t ) + μ z N i + δ i l μ θ + ϵ I d 1 .
According to the definition of F ( y ˜ t ) , 2 F ( y ˜ t ) is a block diagonal matrix with the ith block being 2 f i ( y i , t ) . The computation of H t 1 incurs O ( d 3 ) computational cost. When CE-DADMM uses Newton updates, excludes g ( · ) , and adopts the same communication compression approach as CC-DQM [39], it recovers the form consistent with CC-DQM.
Case 3: Quasi-Newton Updates. Inspire by the distributed BFGS scheme in [22], we can derive a novel decentralized algorithm termed as CE-DADMM-BFGS, which combines the BFGS method with communication-efficient mechanisms. According to secant condition, each agent i constructs a model of the inverse Hessian directly using the pairs { q i , t , s i , t } i = 1 n defined as
q i , t : = ( f i ( y i , t + 1 ) f i ( y i , t ) ) + ( μ z N i + δ i l μ θ + ϵ ) s i , t and s i , t : = y i , t + 1 y i , t .
The Hessian inverse approximation is then iteratively updated as:
H i , t + 1 1 = s i , t ( s i , t ) q i , t s i , t + I d s i , t ( q i , t ) q i , t s i , t ( H i , t ) 1 I d q i , t ( s i , t ) q i , t s i , t .
Notably, the explicit inverse of H i , t + 1 is unnecessary, as this expression serves merely as a formal representation. Consequently, the computational cost for each agent is reduced from O ( d 3 ) to O ( d 2 ) .

4. Convergence Analysis

In this section, we propose a unified framework to analyze the proposed algorithms that incorporate gradient, Newton, and BFGS updates, along with a communication-efficient mechanism. First, we make the following assumptions throughout of the paper.
Assumption 1.
Each f i is twice continuously differentiable, m f -strongly convex, and M f –smooth, i.e., m f I d 2 f i ( x i ) M f I d , where M f m f > 0 . g ( · ) : R d R is proper, closed, and convex, i.e., ( x y ) ( s x s y ) 0 holds for any subgradients s x g ( x ) and s y g ( y ) .
Assumption 2.
Each 2 f i is Lipschitz continuous with constant L f , i.e., 2 f i ( x ) 2 f i ( y ) L f x y holds for any x , y R d .
Assumption 3.
Each H i , t is uniformly upper bounded, i.e., for any t 0 , there exists a constant ψ > 0 such that H i , t ψ I d .
It is worthy noting that Assumption 3 is only required for the quasi-Newton update case. Next, we introduce the optimal condition of problem (3), which is independent of the algorithm and has been proved in [22]. The result is given below.
Lemma 1
(optimal condition, see Lemma 2 in [22]). Suppose ( x ˜ , z ˜ , α , θ , λ ) is a primal-dual optimal pair of problem (3), if and only if the following holds:
F ( x ˜ ) + E s α + S λ = 0 ,
g ( θ ) λ 0 ,
E s x ˜ = 0 ,
E u x ˜ = 2 z ˜ ,
S x ˜ = θ .
Moreover, there exists a unique dual optimal pair α ; λ R ( m + 1 ) d that lies in the column space of C : = E s S R ( m + 1 ) d × n d .
Now, we are ready to analyze our algorithm CE-DADMM. First, we write (11) into a compact form:
x ˜ t + 1 = x ˜ t ( H t ) 1 F ( x ˜ t ) + ϕ t + μ z 2 L s y ˜ t + S λ t + μ θ S y ˜ t θ t
ϕ t + 1 = ϕ t + μ z 2 L s y ˜ t + 1
θ t + 1 = prox g / μ θ ( S y ˜ t + 1 + 1 μ θ λ t )
λ t + 1 = λ t + μ θ ( S y ˜ t + 1 θ t + 1 ) .
According to the discussion in Section 3.1, we have E u x ˜ t = 2 z ˜ t and ϕ t = E s α t hold in (17). Then, it follows from Lemma 1 that the convergence of CE-DADMM can be obtained by showing ( x ˜ t , z ˜ t , α t , θ t , λ t ) converges to ( x ˜ , z ˜ , α , θ , λ ) .
Due to the existence of the efficient communication scheme, we need to analyze the impact of communication error on the convergence of CE-DADMM. Define e ˜ : = [ e 1 ; e 2 ; ; e n ] , where e i : = y i x i for all agent i. Clearly, e ˜ describes the error caused by efficient communication scheme. Regarding e ˜ , the following result holds.
Lemma 2.
The error e ˜ t + 1 in CE-DADMM satisfies e ˜ t + 1 2 ( 1 A ) e ˜ t 2 + B x ˜ t + 1 x ˜ t 2 , where A and B are the parameters of 3PC.
Proof. 
According to the definition of 3PC, we have
e ˜ t + 1 2 = y ˜ t + 1 x ˜ t + 1 2 ( 1 A ) e ˜ t 2 + B x ˜ t + 1 x ˜ t 2 ,
which completes the proof. □
Clearly, if 3PC is set as EF21 (9), it follows from Lemma 2 that
e ˜ t + 1 2 δ e ˜ t 2 + δ 1 δ x ˜ t + 1 x ˜ t 2 .
If 3PC is set as CLAG (10), we have
e ˜ t + 1 2 δ e ˜ t 2 + max δ 1 δ , ς x ˜ t + 1 x ˜ t 2 .
Then, to characterize the suboptimality of the iterates when (5a) is replaced by (17a), we introduce the following error term:
r t : = F ( x ˜ t ) F ( x ˜ t + 1 ) + J t ( x ˜ t + 1 x ˜ t ) .
The bound of the error term (18) is give below, which is important for our main result.
Lemma 3.
It holds that r t τ t x ˜ t + 1 x ˜ t + γ t ( e ˜ t + 1 + e ˜ t ) , where τ t and γ t correspond to the update case as below:
C a s e   1 : τ t = M f , γ t = 0 .
C a s e   2 : τ t = min 2 M f , L f 2 x ˜ t + 1 x ˜ t + L f e ˜ t , γ t = 0 .
C a s e   3 : τ t 2 ψ , γ t 2 ( M f + ψ ) .
Proof. 
See Appendix A. □
Lemma 3 extends the results in [17,18] by providing an upper bound on the error introduced when replacing the exact sub-optimization step (5a) with a one-step update using the compressed variable (17a). Under Newton updates, as the error e ˜ t approaches zero, the term L f 2 x ˜ t + 1 x ˜ t + L f e ˜ t becomes smaller than 2 M f in (20). In the case of Quasi-Newton updates, the error r t remains bounded by a constant, ensuring it do not grow indefinitely.
Let σ max L u and σ min L u denote the maximum and minimum eigenvalues of L u , respectively. Let σ max L s denote the maximum eigenvalue of L s . Denote by σ min + the smallest positive eigenvalue of C C , where C is given in Lemma 1. Define v ˜ t : = [ x ˜ t , z ˜ t , α t , θ t , λ t ] , v ˜ : = [ x , z , α , θ , λ ] , and H 1 = diag [ ϵ , 2 μ z , 2 μ z , μ θ , 1 μ θ ] . Consider the following Lyapunov function:
V t = v ˜ t v ˜ H 1 2 + ξ e ˜ t 2 .
Clearly, V t converges to zero implies that v ˜ t converges to the optimal solution. We will use V t to establish the convergence result of our algorithm, which is given below.
Theorem 1.
Suppose Assumptions 1–3 hold. Let μ z = 2 μ θ , c 1 = 7 σ max L s μ θ + 25 μ θ , c 2 = 3 σ max L s μ θ + 17 μ θ , and c 3 = 2 μ θ 2 1 + ( σ max L s ) 2 + 4 γ t 2 . If ζ satisfies
ζ 2 m f M f m f + M f μ θ , min ϵ + B ( 6 c 1 μ θ ) τ t 2 + 2 B γ t 2 , 4 c 2 + ( 1 A ) ( 6 c 1 μ θ ) 2 ( 2 A ) τ t 2 ,
then the iterates generated by CE-DADMM satisfy V t + 1 1 1 + η t V t , where
η t = min ϵ + B ( 6 c 1 μ θ ) ζ ( τ t 2 + 2 B γ t 2 ) 8 B + 7 μ θ σ min + ( ϵ 2 + 2 τ t 2 + c 3 B ) , 2 m f M f m f + M f 1 ζ μ θ ϵ + 4 μ θ + σ max L u μ θ , 3 σ min + σ max L u , 4 c 2 + ( 1 A ) ( 6 c 1 μ θ ) 2 ( 2 A ) ζ τ t 2 ( 1 A ) 8 + 7 μ θ σ min + ( 2 A ) c 3 , μ θ σ min + 7 ( m f + M f ) , σ min + 7 , 3 16 .
Proof. 
See Appendix B. □
Clearly, Theorem 1 implies that CE-DADMM can achieve exact linear convergence at a rate of O ( ρ t ) with ρ = 1 1 + η t . The larger η t is, the faster CE-DADMM converges. To ensure positive η t in (23), it can be obtain that the step size μ θ should not be too large. Besides, excessive compression of x k should be avoided. This is because, when the compression ratio A is very small, B approaches infinity. According to (23), as B , the first term in (23) becomes negative. It is worth noting that when η t , γ t = 0 , the compression ratio A can be arbitrarily small, making the approach highly applicable in scenarios with extremely limited bandwidth. Also, at this situation, we can get the fastest convergence rate and the smallest communication cost. However, setting η t , γ t = 0 implies solving the subproblem (5a) exactly, which may result in high computational cost. This shows a trade-off between communication cost and computation cost in decentralized optimization.

5. Numerical Experiments

In this section, CE-DADMM is compared with existing state-of-the-art algorithms, including DRUID [22], PG-EXTRA [42], P2D2 [43], and CC-DQM [39], in distributed logistic/ridge/lasso regression problems. Noting that CC-DQM do not support non-smooth terms.
Datasets. We use real-world datasets from the LIBSVM library (Available Online: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html, accessed on 3 December 2024): a9a (32,561 samples, 123 dimensions) and ijcnn1 (49,990 samples, 22 dimensions). The samples are evenly distributed across n agents. The distribution of samples across agents for the a9a dataset (left) and ijcnn1 (right) is shown in Figure 1. Our experiments are implemented in Python 3.10.13.
Experimental setting. The communication graph is randomly generated, with connections based on a Bernoulli distribution ( p = 0.5 ) among n = 10 agents, as shown in Figure 2. We evaluate performance based on the total communication bits and the number of iterations. CE-DADMM employs compression mechanisms EF21 and CLAG, using a Top-K compressor to reduce dimensions to 30 dimensions for the a9a dataset and 6 dimensions for the ijcnn1 dataset. In the experiment, we examine the algorithms from two perspectives: the number of iterations and total communication bits. The number of iterations refers to the number of times the algorithm runs, while total communication bits is calculated based on the cumulative number of bits of variable y transmitted between agents. Additionally, we define e r r t : = x t x x 0 x to measure the algorithm’s convergence.

5.1. Distributed Logistic Regression

The distributed logistics regression solves problem (1) with g ( x ) = γ 2 x 1 , f i ( · ) : R d R defined as:
f i ( x ) : = 1 m i j = 1 m i log 1 + e b i j a i j x + γ 1 2 x 2 ,
where a i j R d represents the feature vector, b i j { 1 , 1 } denotes the label, and m i is the number of sample data for agent i. The parameters γ 1 = 10 2 and γ 2 = 10 6 are regularization terms.
In Figure 3 and Figure 4, when measured by the number of iterations, CE-DADMM with EF21 and CLAG compression mechanisms performs on par with DRUID without compression, and surpasses P2D2 and PG-EXTRA in both convergence speed and accuracy. Additionally, we observe that CE-DADMM, when using (quasi) Newton methods, significantly reduces the number of iterations required to reach a given accuracy compared to first-order methods. When measured by total communication bits, the introduction of EF21 and CLAG in CE-DADMM allows for a substantial reduction in communication overhead compared to DRUID, even under the same update scheme. Notably, when using quasi-Newton updates, CE-DADMM requires fewer communication bits to achieve the same accuracy than DRUID-Newton updates, and also outperforms P2D2 and PG-EXTRA in this regard. When achieving the same convergence accuracy as shown in Table 1, the detailed numerical results are presented in Table 2 and Table 3. Note that for P2D2 and PG-EXTRA, since they fail to reach the predefined convergence accuracy, we use the number of iterations required for them to converge to their optimal values as a benchmark.

5.2. Distributed Ridge Regression

The distributed ridge regression solves problem (1), whose f i ( · ) is the same as in (24) but with g ( x ) = 0 .
In Figure 5 and Figure 6, we observe that when measured by the number of iterations, CE-DADMM with EF21 and CLAG compression mechanisms performs similarly to DRUID without compression. However, CE-DADMM using (quasi) Newton updates converges faster compared to CC-DQM, which also employs an communication-efficient mechanism. On the other hand, when CE-DADMM employs first-order method, its convergence is slower than second-order method, CC-DQM, due to the latter benefiting from additional Hessian information. When measured by total communication bits, CE-DADMM with (quasi) Newton updates requires fewer bits to achieve the same accuracy compared to CC-DQM. Additionally, CE-DADMM benefits from communication-efficient mechanisms, resulting in a substantial reduction in communication bits needed to achieve the same convergence accuracy compared to DRUID. Similarly, the corresponding numerical results are also presented in Table 2 and Table 3.

5.3. Distributed LASSO

The distributed LASSO solves problem (1) with g ( x ) = γ 2 x 1 , f i ( · ) : R d R defined as:
f i ( x ) : = 1 2 m i j = 1 m i a i j x b j 2 + γ 1 2 x 2 ,
where a i j , b i j , and m i are as defined in Section 5.1. The parameters γ 1 = 10 2 and γ 2 = 10 6 are regularization terms.
In Figure 7 and Figure 8, when measured by the number of iterations, CE-DADMM performs similarly to DRUID and outperforms P2D2 and PG-EXTRA in terms of convergence speed and accuracy, indicating that the introduction of the 3PC compression mechanism does not negatively affect the convergence speed of the CE-DADMM algorithm. When measured by total communication bits, after introducing the EF21 and CLAG compression mechanisms, CE-DADMM significantly reduces communication overhead compared to DRUID, while also outperforming P2D2 and PG-EXTRA, demonstrating that the communication compression mechanisms can significantly lower communication costs between agents. Similarly, the corresponding numerical results are also presented in Table 2 and Table 3.

6. Conclusions

This paper presents a communication-efficient ADMM algorithm for composite optimization, named as CE-DADMM. The algorithm utilizes the ADMM framework with the 3PC communication mechanism, effectively adapting to various communication and computational demands while balancing communication efficiency and computational cost. Notably, when employing quasi-Newton updates, CE-DADMM becomes the first compression-based second-order communication-efficient algorithm. Theoretical analysis demonstrates that the proposed algorithm achieves linear convergence when the local objective functions are strongly convex. Numerical experiments further validate the effectiveness and superior performance of the algorithm. Future work will focus on extending the algorithm to fully asynchronous settings and stochastic problems.

Author Contributions

Z.C.: Conceptualization, writing—original draft and methodology; Z.Z.: Conceptualization, writing—original draft and methodology; S.Y.: writing—review and editing and supervision; J.C.: writing—review and editing and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62176056, and in part by the Young Elite Scientists Sponsorship Program by the China Association for Science and Technology (CAST) under Grant 2021QNRC001.

Data Availability Statement

The data supporting the findings of this study are openly available in a9a and ijcnn1 at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html (accessed on 3 December 2024).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Appendix A. Proof of Lemma 3

Proof of Lemma 3.
First, according to (18), it follows from the triangle inequality and Cauchy–Schwartz inequality that
r t F ( x ˜ t ) F ( x ˜ t + 1 ) + J t x ˜ t + 1 x ˜ t .
For case 1 (gradient updates), there is J t 0 . By Assumption 1, we obtain r t F ( x ˜ t ) F ( x ˜ t + 1 ) M f x ˜ t + 1 x ˜ t .
For case 2 (Newton updates), there is J t = 2 F ( y ˜ t ) . Applying Assumption 1 and (A1) yields
r t 2 M f x ˜ t + 1 x ˜ t .
In parallel, by the fundamental theorem of calculus, we can obtain
F ( x ˜ t + 1 ) F ( x ˜ t ) = 0 1 2 F ( s x ˜ t + 1 + ( 1 s ) x ˜ t ) ( x ˜ t + 1 x ˜ t ) d s = 0 1 ( 2 F ( s x ˜ t + 1 + ( 1 s ) x ˜ t ) 2 F ( y ˜ t ) ) ( x ˜ t + 1 x ˜ t ) d s + 2 F ( y ˜ t ) ( x ˜ t + 1 x ˜ t ) ,
which implies that
F ( x ˜ t + 1 ) F ( x ˜ t ) 2 F ( y ˜ t ) ( x ˜ t + 1 x ˜ t ) = 0 1 ( 2 F ( s x ˜ t + 1 + ( 1 s ) x ˜ t ) 2 F ( y ˜ t ) ) ( x ˜ t + 1 x ˜ t ) d s 0 1 2 F ( s x ˜ t + 1 + ( 1 s ) x ˜ t ) 2 F ( y ˜ t ) x ˜ t + 1 x ˜ t d s 0 1 L f s ( x ˜ t + 1 x ˜ t ) + ( x ˜ t y ˜ t ) x ˜ t + 1 x ˜ t d s 0 1 L f s x ˜ t + 1 x ˜ t 2 d s + L f x ˜ t y ˜ t x ˜ t + 1 x ˜ t = L f 2 x ˜ t + 1 x ˜ t + L f e ˜ t x ˜ t + 1 x ˜ t .
The result for case 2 can be obtained by comparing (A2) and (A4).
For case 3 (quasi-Newton updates), according to Assumption 3, the secant condition H t + 1 s t = q t , and the definition of { q t , s t } , we have
r t = F ( x ˜ t ) F ( x ˜ t + 1 ) + J t ( x ˜ t + 1 x ˜ t ) = F ( x ˜ t ) F ( y ˜ t ) + J t ( y ˜ t x ˜ t ) F ( x ˜ t + 1 ) F ( y ˜ t + 1 ) + J t ( y ˜ t + 1 x ˜ t + 1 ) + F ( y ˜ t ) F ( y ˜ t + 1 ) + J t ( y ˜ t + 1 y ˜ t ) F ( x ˜ t ) F ( y ˜ t ) + J t ( y ˜ t x ˜ t ) + F ( x ˜ t + 1 ) F ( y ˜ t + 1 ) + J t ( y ˜ t + 1 x ˜ t + 1 ) + F ( y ˜ t ) F ( y ˜ t + 1 ) + J t ( y ˜ t + 1 y ˜ t ) 2 ( M f + ψ ) ( e ˜ t + 1 + e ˜ t ) + 2 ψ x ˜ t + 1 x ˜ t ,
which completes the proof of case 3. □

Appendix B. Proof of Theorem 1

Lemma A1.
Let ( α , λ ) be the unique dual optimal pair which lies in the column space of C as established in Lemma 1. The following inequality holds:
σ min + α t + 1 α 2 + λ t + 1 λ 2 E s ( α t + 1 α ) + S ( λ t + 1 λ ) 2 .
Proof. 
We rewrite (17b) and (17d) as
α t + 1 λ t + 1 = α t λ t + μ z 2 E s μ θ S : = N y ˜ t + 1 0 μ θ I d : = M θ t + 1 .
First, we show that the column space of M belongs to the column space of N . We fix all y i as y , i.e., y = [ y ; ; y ] , then it holds that N y = M y , which shows col ( M ) col ( N ) . By setting μ z = 2 μ θ , we conclude that [ α t + 1 α ; λ t + 1 λ ] lies in the column space of C . □
Lemma A2.
The following two inequalities hold:
( λ t + 1 λ t ) ( θ t + 1 θ t ) 0 ,
( λ t + 1 λ ) ( θ t + 1 θ ) 0 .
Proof. 
From the definition of the proximal operator, it holds that
θ t + 1 = arg min θ g ( θ ) + μ θ 2 S y ˜ t + 1 + 1 μ θ λ t θ 2 .
By the optimal condition of (A9) and the dual update (17d), we obtain
0 g ( θ t + 1 ) μ θ S y ˜ t + 1 + 1 μ θ λ t θ t + 1 = g ( θ t + 1 ) λ t + 1 ,
which implies that λ t g ( θ t ) . Then, it follows from the convexity of g ( · ) that
( λ t + 1 λ t ) ( θ t + 1 θ t ) ( g ( θ t + 1 ) g ( θ t ) ) ( θ t + 1 θ t ) 0 .
Similarly, using (16b), we also have
( λ t + 1 λ ) ( θ t + 1 θ ) ( g ( θ t + 1 ) g ( θ ) ) ( θ t + 1 θ ) 0 .
The proof is completed. □
Proof of Theorem 1.
To make the proof more concise, let Θ t + 1 = m f M f m f + M f x ˜ t + 1 x ˜ 2 + 1 m f + M f F ( x ˜ t + 1 ) F ( x ˜ ) 2 . As F ( x ) is strongly convex with Lipschitz continuous gradient, the following inequality holds:
Θ t + 1 ( x ˜ t + 1 x ˜ ) ( F ( x ˜ t + 1 ) F ( x ˜ ) ) ( x ˜ t + 1 x ˜ ) r t ϵ ( x ˜ t + 1 x ˜ ) ( x ˜ t + 1 x ˜ t ) Ξ 1 ( x ˜ t + 1 x ˜ ) E s ( α t + 1 α ) Ξ 2 μ z ( x ˜ t + 1 x ˜ ) E u ( z ˜ t + 1 z ˜ t ) Ξ 3 ( x ˜ t + 1 x ˜ ) S λ t + 1 λ + μ θ ( θ t + 1 θ t ) Ξ 4 + μ z 2 ( x ˜ t + 1 x ˜ ) L s ( e ˜ t + 1 e ˜ t ) Ξ 5 + μ θ ( x ˜ t + 1 x ˜ ) S S ( e ˜ t + 1 e ˜ t ) Ξ 6 .
For Ξ 1 , we have
2 Ξ 1 ϵ ( x ˜ t x ˜ 2 x ˜ t + 1 x ˜ 2 x ˜ t + 1 x ˜ t 2 ) .
For Ξ 2 , since E s α t + 1 = E s α t + μ z 2 L s y ˜ t + 1 , it follows from (16c) that
2 Ξ 2 = 2 ( y ˜ t + 1 e ˜ t + 1 ) E s ( α t + 1 α ) = 4 μ z ( α t + 1 α t ) ( α t + 1 α ) ) + 2 e t + 1 E s ( α t + 1 α ) = 2 μ z ( α t α 2 α t + 1 α 2 α t + 1 α t 2 ) + 2 e t + 1 E s ( α t + 1 α ) 2 μ z ( α t α 2 α t + 1 α 2 α t + 1 α t 2 ) + 4 μ θ e ˜ t + 1 E s E s e ˜ t + 1 + 1 4 μ θ α t + 1 α 2 .
For Ξ 3 , using z ˜ t = 1 2 E u x ˜ t and (16d), there is
2 Ξ 3 = 4 μ z ( z ˜ t + 1 z ˜ ) ( z ˜ t + 1 z ˜ t ) = 2 μ z ( z ˜ t z ˜ 2 z ˜ t + 1 z ˜ 2 z ˜ t + 1 z ˜ t 2 ) .
For Ξ 4 , using (16e), we have
2 Ξ 4 = 2 ( y ˜ t + 1 e ˜ t + 1 x ˜ ) S ( λ t + 1 λ + μ θ ( θ t + 1 θ t ) ) = 2 1 μ θ ( λ t + 1 λ t ) + θ t + 1 θ λ t + 1 λ + μ θ ( θ t + 1 θ t ) + 2 e ˜ t + 1 S λ t + 1 λ + μ θ ( θ t + 1 θ t ) = 2 μ θ ( λ t + 1 λ t ) ( λ t + 1 λ ) 2 ( λ t + 1 λ t ) ( θ t + 1 θ t ) 0 , using   ( A 7 ) 2 μ θ ( θ t + 1 θ ) ( θ t + 1 θ t ) 2 ( θ t + 1 θ ) ( λ t + 1 λ ) 0 , using   ( A 8 ) + 2 μ θ e ˜ t + 1 S 1 μ θ ( λ t + 1 λ ) + θ t + 1 θ t 1 μ θ ( λ t λ 2 λ t + 1 λ 2 λ t + 1 λ t 2 ) + μ θ ( θ t θ 2 θ t + 1 θ 2 θ t + 1 θ t 2 ) + 8 μ θ e ˜ t + 1 S S e ˜ t + 1 + 1 4 μ θ λ t + 1 λ 2 + μ θ 4 θ t + 1 θ t 2 .
For Ξ 5 , using E s α t + 1 = E s α t + μ z 2 L s y ˜ t + 1 and (16c), we have
2 Ξ 5 = μ z ( y ˜ t + 1 e ˜ t + 1 ) L s ( e ˜ t + 1 e ˜ t ) = μ z y ˜ t + 1 E s E s ( e ˜ t + 1 e ˜ t ) μ z e ˜ t + 1 L s ( e ˜ t + 1 e ˜ t ) = 2 ( α t + 1 α t ) E s ( e ˜ t + 1 e ˜ t ) μ z e ˜ t + 1 L s ( e ˜ t + 1 e ˜ t ) μ θ ( e ˜ t + 1 e ˜ t ) E s E s ( e ˜ t + 1 e ˜ t ) + μ θ 1 α t + 1 α t 2 + μ z e ˜ t + 1 L s e ˜ t .
For Ξ 6 , using (17d) and (16e), we have
2 Ξ 6 = 2 μ θ y ˜ t + 1 S S ( e ˜ t + 1 e ˜ t ) 2 μ θ e ˜ t + 1 S S ( e ˜ t + 1 e ˜ t ) 2 μ θ x ˜ S S ( e ˜ t + 1 e ˜ t ) = 2 μ θ 1 μ θ ( λ t + 1 λ t ) + θ t + 1 θ S ( e ˜ t + 1 e ˜ t ) 2 μ θ e ˜ t + 1 S S ( e ˜ t + 1 e ˜ t ) 8 μ θ ( e ˜ t + 1 e ˜ t ) S S ( e ˜ t + 1 e ˜ t ) + 1 4 μ θ λ t + 1 λ t 2 + μ θ 4 θ t + 1 θ 2 + 2 μ θ e ˜ t + 1 S S e ˜ t .
Substituting (A11)–(A16) into (A10) yields
2 Θ t + 1 ϵ ( x ˜ t x ˜ 2 x ˜ t + 1 x ˜ 2 x ˜ t + 1 x ˜ t 2 ) + 2 μ z ( z ˜ t z ˜ 2 z ˜ t + 1 z ˜ 2 z ˜ t + 1 z ˜ t 2 ) + 2 μ z ( α t α 2 α t + 1 α 2 α t + 1 α t 2 ) + μ θ ( θ t θ 2 θ t + 1 θ 2 θ t + 1 θ t 2 ) + 1 μ θ ( λ t λ 2 λ t + 1 λ 2 λ t + 1 λ t 2 ) + 4 μ θ e ˜ t + 1 L s e ˜ t + 1 + μ θ ( e ˜ t + 1 e ˜ t ) L s ( e ˜ t + 1 e ˜ t ) + μ z e ˜ t + 1 L s e ˜ t Ξ 7 + 8 μ θ e ˜ t + 1 S S e ˜ t + 1 + 8 μ θ ( e ˜ t + 1 e ˜ t ) S S ( e ˜ t + 1 e ˜ t ) + 2 μ θ e ˜ t + 1 S S e ˜ t Ξ 8 + 1 4 μ θ α t + 1 α 2 + 1 μ θ α t + 1 α t 2 + μ θ 4 θ t + 1 θ 2 + μ θ 4 θ t + 1 θ t 2 + 1 4 μ θ λ t + 1 λ 2 + 4 1 μ θ λ t + 1 λ t 2 2 ( x ˜ t + 1 x ˜ ) r t ,
where Ξ 7 and Ξ 8 can be further estimated as
Ξ 7 σ max L s ( 4 μ θ e ˜ t + 1 2 + μ θ e ˜ t + 1 e ˜ t 2 + μ z e ˜ t + 1 e ˜ t ) σ max L s ( μ z 2 + 6 μ θ ) e ˜ t + 1 2 + σ max L s ( μ z 2 + 2 μ θ ) e ˜ t 2 ,
and
Ξ 8 8 μ θ e ˜ t + 1 2 + 8 μ θ e ˜ t + 1 e ˜ t 2 + 2 μ θ e ˜ t + 1 e ˜ t 25 μ θ e ˜ t + 1 2 + 17 μ θ e ˜ t 2 ,
where (A19) uses the fact that the largest eigenvalue of S S is 1. Finally, substituting (A18) and (A19) into (A17), we obtain
2 Θ t + 1 ϵ ( x ˜ t x ˜ 2 x ˜ t + 1 x ˜ 2 x ˜ t + 1 x ˜ t 2 ) + 2 μ z ( z ˜ t z ˜ 2 z ˜ t + 1 z ˜ 2 z ˜ t + 1 z ˜ t 2 ) + 2 μ z α t α 2 ( 2 μ z 1 4 μ θ ) α t + 1 α 2 ( 2 μ z 1 μ θ ) α t + 1 α t 2 + μ θ θ t θ 2 3 μ θ 4 θ t + 1 θ 2 3 μ θ 4 θ t + 1 θ t 2 + 1 μ θ λ t λ 2 3 μ θ 4 λ t + 1 λ 2 3 μ θ 4 λ t + 1 λ t 2 2 ( x ˜ t + 1 x ˜ ) r t + c 1 e ˜ t + 1 2 + c 2 e ˜ t 2 .
Recall the definitions of v ˜ and H 1 , and define H 2 = diag [ ϵ , 4 μ θ , 3 4 μ θ , 3 μ θ 4 , 3 4 μ θ ] and H 3 = diag [ ϵ , 4 μ θ , 0 , 3 μ θ 4 , 3 4 μ θ ] . The inequalities H 1 H 2 and H 1 H 3 hold, along with the assumption that ξ > 0 . Thus, (A20) can be rewritten as
2 Θ t + 1 + v ˜ t + 1 v ˜ t H 3 2 + 2 ( x ˜ t + 1 x ˜ ) r t + ( ξ c 1 ) e ˜ t + 1 2 + ( ξ c 2 ) e ˜ t 2 ( 1 4 μ θ α t + 1 α 2 + μ θ 4 θ t + 1 θ 2 + 1 4 μ θ λ t + 1 λ 2 ) v ˜ t v ˜ H 1 2 v ˜ t + 1 v ˜ H 1 2 + ξ e ˜ t 2 ξ e ˜ t + 1 2 .
To establish linear convergence, we need to show the following holds for some η t > 0 :
( 1 + η t ) ( v ˜ t + 1 v ˜ H 1 2 + ξ e ˜ t + 1 2 ) v ˜ t v ˜ H 1 2 + ξ e ˜ t 2 .
Note that
η t ( v ˜ t + 1 v ˜ H 1 2 + ξ e ˜ t + 1 2 ) = η t ( ϵ x ˜ t + 1 x ˜ 2 + 2 μ z z ˜ t + 1 z ˜ 2 + 2 μ z α t + 1 α 2 + μ θ θ t + 1 θ 2 + 1 μ θ λ t + 1 λ 2 + ξ e ˜ t + 1 2 ) .
Next, we establish an upper bound for each component of (A23), primarily using the inequality ( i = 1 n α i ) 2 i = 1 n n α i 2 and μ z = 2 μ θ . First, according to Lemma A1, we obtain
2 μ z α t + 1 α 2 + 1 μ θ λ t + 1 λ 2 1 μ θ σ min + E s ( α t + 1 α ) + S ( λ t + 1 λ ) 2 = 1 μ θ σ min + ( F ( x ˜ t + 1 ) F ( x ˜ ) + ϵ ( x ˜ t + 1 x ˜ t ) + 2 μ θ E u ( z ˜ t + 1 z ˜ t ) + μ θ S ( θ t + 1 θ t ) μ θ L s ( e ˜ t + 1 e ˜ t ) μ θ S S ( e ˜ t + 1 e ˜ t ) + r t ) 2 7 μ θ σ min + ( F ( x ˜ t + 1 ) F ( x ˜ ) 2 + ϵ 2 x ˜ t + 1 x ˜ t 2 + 4 σ max L u μ θ 2 z ˜ t + 1 z ˜ t 2 + μ θ 2 θ t + 1 θ t 2 + 2 μ θ 2 1 + ( σ max L s ) 2 ( e ˜ t + 1 2 + e ˜ t 2 ) + r t 2 ) .
Next, since z ˜ t + 1 z ˜ = 1 2 E u ( x ˜ t + 1 x ˜ ) , we have
2 μ z z ˜ t + 1 z ˜ 2 μ θ σ max L u x ˜ t + 1 x ˜ 2 .
Then, from (17d) and (16e), we obtain
μ θ θ t + 1 θ 2 = μ θ S ( x ˜ t + 1 x ˜ + e ˜ t + 1 ) + 1 μ θ ( λ t + 1 λ t ) 2 4 μ θ x ˜ t + 1 x ˜ 2 + 4 μ θ e ˜ t + 1 2 + 2 μ θ λ t + 1 λ t 2 .
Finally, substituting (A24)–(A26) into (A23), applying Lemma 2, we obtain
η t ( v ˜ t + 1 v ˜ H 1 2 + ξ e ˜ t + 1 2 ) η t { 7 μ θ σ min + ( F ( x ˜ t + 1 ) F ( x ˜ ) 2 + ϵ 2 x ˜ t + 1 x ˜ t 2 + 4 σ max L u μ θ 2 z ˜ t + 1 z ˜ t 2 + μ θ 2 θ t + 1 θ t 2 + 2 μ θ 2 ( 1 + ( σ max L s ) 2 ) ( e ˜ t + 1 2 + e ˜ t 2 ) + r t 2 ) + 2 μ θ λ t + 1 λ t 2 + ( ϵ + 4 μ θ + σ max L u μ θ ) x ˜ t + 1 x ˜ 2 + ( 4 + ξ ) e ˜ t + 1 2 } η t { 7 μ θ σ min + F ( x ˜ t + 1 ) F ( x ˜ ) 2 + 4 σ max L u μ θ 2 z ˜ t + 1 z ˜ t 2 + μ θ 2 θ t + 1 θ t 2 + 2 μ θ λ t + 1 λ t 2 + ( 1 A ) ( 4 + ξ ) + 7 c 3 ( 2 A ) μ θ σ min + e ˜ t 2 + B ( 4 + ξ ) + 7 μ θ σ min + ϵ 2 + 2 τ t 2 + c 3 B x ˜ t + 1 x ˜ t 2 + ( ϵ + 4 μ θ + μ z σ max L θ ) x ˜ t + 1 x ˜ 2 } 2 Θ t + 1 + v ˜ t + 1 v ˜ t H 3 2 + 2 ( x ˜ t + 1 x ˜ ) r t + ( ξ c 1 ) e ˜ t + 1 2 + ( ξ c 2 ) e ˜ t 2 1 4 μ θ α t + 1 α 2 μ θ 4 θ t + 1 θ 2 1 4 μ θ λ t + 1 λ 2 2 Θ t + 1 + F ( x ˜ t + 1 ) F ( x ˜ ) 2 + ϵ + ( ξ Ξ 7 ) B x ˜ t + 1 x ˜ t 2 + 4 μ θ z ˜ t + 1 z ˜ t 2 + ξ Ξ 8 + ( 1 A ) ( ξ Ξ 7 ) e ˜ t 2 + 2 ( x ˜ t + 1 x ˜ ) r t 1 4 μ θ α t + 1 α 2 + λ t + 1 λ 2 μ θ 4 θ t + 1 θ 2 + 3 μ θ 4 θ t + 1 θ t 2 + 3 4 μ θ λ t + 1 λ t 2 .
To ensure that both sides of inequality (A27) hold, we need to separately consider and determine the coefficient of η t . We consider the terms in (A27) separately:
2 ( x ˜ t + 1 x ˜ ) r t ζ r t 2 + 1 ζ x ˜ t + 1 + x ˜ 2 ζ ( τ t 2 + 2 B γ t 2 ) x ˜ t + 1 x ˜ t 2 + 2 ( 2 A ) ζ τ t 2 e ˜ t 2 + 1 ζ x ˜ t + 1 x ˜ 2 ,
and
1 4 μ θ α t + 1 α 2 + λ t + 1 λ 2 7 4 μ θ σ min + ( F ( x ˜ t + 1 ) F ( x ˜ ) 2 + ϵ 2 x ˜ t + 1 x ˜ t 2 + 4 σ max L u μ θ 2 z ˜ t + 1 z ˜ t 2 + μ θ 2 θ t + 1 θ t 2 + 2 μ θ 2 1 + ( σ max L s ) 2 ( e ˜ t + 1 2 + e ˜ t 2 ) + r t 2 ) ,
and
μ θ 4 θ t + 1 θ 2 μ θ x ˜ t + 1 x ˜ 2 + μ θ e ˜ t + 1 2 + 1 2 μ θ λ t + 1 λ t 2 .
Substituting (A28)–(A30) into (A27) and by properly choosing η , we obtain
B μ θ + ζ ( τ t 2 + 2 B γ t 2 ) x ˜ t + 1 x ˜ t 2 + ( 1 A ) μ θ + 2 ( 2 A ) ζ τ t 2 e ˜ t 2 + ( μ θ + 1 ζ ) x ˜ t + 1 x ˜ 2 + 1 2 μ θ λ t + 1 λ t 2 + 7 4 μ θ σ min + ( F ( x ˜ t + 1 ) F ( x ˜ ) 2 + ϵ 2 + 2 τ t 2 + c 3 B x ˜ t + 1 x ˜ t 2 + 4 σ max L u μ θ 2 z ˜ t + 1 z ˜ t 2 + μ θ 2 θ t + 1 θ t 2 + 2 ( 2 A ) μ θ 2 + ( σ max L s ) 2 μ θ 2 + 2 γ t 2 e ˜ t 2 ) + η t { 7 μ θ σ min + F ( x ˜ t + 1 ) F ( x ˜ ) 2 + 4 σ max L u μ θ 2 z ˜ t + 1 z ˜ t 2 + μ θ 2 θ t + 1 θ t 2 + 2 μ θ λ t + 1 λ t 2 + ( 1 A ) ( 4 + ξ ) + 7 ( 2 A ) μ θ σ min + ( 2 μ θ 2 + 4 γ t 2 + σ max L s 2 μ z 2 2 ) e ˜ t 2 + B ( 4 + ξ ) + 7 μ θ σ min + ϵ 2 + 2 τ t 2 + c 3 B x ˜ t + 1 x ˜ t 2 + ( ϵ + 4 μ θ + σ max L u μ θ ) x ˜ t + 1 x ˜ 2 } 2 Θ t + 1 + F ( x ˜ t + 1 ) F ( x ˜ ) 2 + 4 μ θ z ˜ t + 1 z ˜ t 2 + ϵ + ( ξ c 1 ) B x ˜ t + 1 x ˜ t 2 + 3 μ θ 4 θ t + 1 θ t 2 + 3 4 μ θ λ t + 1 λ t 2 + ( ξ c 2 + ( 1 A ) ( ξ c 1 ) ) e ˜ t 2 .
To make (A31) hold, η t is chosen such that
η t B ( 4 + ξ ) + 7 μ θ σ min + ( ϵ 2 + 2 τ t 2 + B Ξ 3 ) ϵ + ( ξ Ξ 1 μ θ ) B ζ ( τ t 2 + 2 B γ t 2 ) 7 4 μ θ σ min + ( ϵ 2 + 2 τ t 2 + B Ξ 3 ) , η t ( 1 A ) ( 4 + ξ ) + 7 μ θ σ min + ( 2 A ) Ξ 3 ξ Ξ 2 + ( 1 A ) ( ξ Ξ 1 μ θ ) 2 ( 2 A ) ζ τ t 2 7 4 μ θ σ min + ( 2 A ) Ξ 3 , η t ϵ + 4 μ θ + σ max L u μ θ 2 m f M f m f + M f 1 ζ μ θ , η t 7 μ θ σ min + 2 m f + M f 7 4 μ θ σ min + , η t 28 σ max L u μ θ σ min + 4 μ θ 7 σ max L u μ θ σ min + , η t 7 μ θ σ min + 3 μ θ 4 7 μ θ 4 σ min + , η t 2 μ θ 3 8 μ θ ,
which implies (23). Then, we can prove that (A22) holds. The proof is completed. □

References

  1. Olfati-Saber, R.; Fax, J.A.; Murray, R.M. Consensus and cooperation in networked multi-agent systems. Proc. IEEE 2007, 95, 215–233. [Google Scholar] [CrossRef]
  2. Yoo, S.J.; Park, B.S. Dynamic event-triggered prescribed-time consensus tracking of nonlinear time-delay multiagent systems by output feedback. Fractal Fract. 2024, 8, 545. [Google Scholar] [CrossRef]
  3. Liu, H.J.; Shi, W.; Zhu, H. Distributed voltage control in distribution networks: Online and robust implementations. IEEE Trans. Smart Grid 2017, 9, 6106–6117. [Google Scholar] [CrossRef]
  4. Molzahn, D.K.; Dorfler, F.; Sandberg, H.; Low, S.H.; Chakrabarti, S.; Baldick, R.; Lavaei, J. A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans. Smart Grid 2017, 8, 2941–2962. [Google Scholar] [CrossRef]
  5. Liu, Y.F.; Chang, T.H.; Hong, M.; Wu, Z.; So, A.M.C.; Jorswieck, E.A.; Yu, W. A survey of recent advances in optimization methods for wireless communications. IEEE J. Sel. Areas Commun. 2024, 42, 2992–3031. [Google Scholar] [CrossRef]
  6. Huang, J.; Zhou, S.; Tu, H.; Yao, Y.; Liu, Q. Distributed optimization algorithm for multi-robot formation with virtual reference center. IEEE/CAA J. Autom. Sin. 2022, 9, 732–734. [Google Scholar] [CrossRef]
  7. Yang, X.; Zhao, W.; Yuan, J.; Chen, T.; Zhang, C.; Wang, L. Distributed optimization for fractional-order multi-agent systems based on adaptive backstepping dynamic surface control technology. Fractal Fract. 2022, 6, 642. [Google Scholar] [CrossRef]
  8. Liu, J.; Zhang, C. Distributed learning systems with first-order methods. Found. Trends Databases 2020, 9, 1–100. [Google Scholar] [CrossRef]
  9. Nedic, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 2009, 54, 48–61. [Google Scholar] [CrossRef]
  10. Nedic, A.; Olshevsky, A.; Shi, W. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 2017, 27, 2597–2633. [Google Scholar] [CrossRef]
  11. Xu, J.; Zhu, S.; Soh, Y.C.; Xie, L. Convergence of asynchronous distributed gradient methods over stochastic networks. IEEE Trans. Autom. Control 2018, 63, 434–448. [Google Scholar] [CrossRef]
  12. Wen, X.; Luan, L.; Qin, S. A continuous-time neurodynamic approach and its discretization for distributed convex optimization over multi-agent systems. Neural Netw. 2021, 143, 52–65. [Google Scholar] [CrossRef]
  13. Feng, Z.; Xu, W.; Cao, J. Alternating inertial and overrelaxed algorithms for distributed generalized Nash equilibrium seeking in multi-player games. Fractal Fract. 2021, 5, 62. [Google Scholar] [CrossRef]
  14. Che, K.; Yang, S. A snapshot gradient tracking for distributed optimization over digraphs. In Proceedings of the CAAI International Conference on Artificial Intelligence, Beijing, China, 27–28 August 2022; pp. 348–360. [Google Scholar]
  15. Zhou, S.; Wei, Y.; Liang, S.; Cao, J. A gradient tracking protocol for optimization over Nabla fractional multi-agent systems. IEEE Trans. Signal Inf. Process. Over Netw. 2024, 10, 500–512. [Google Scholar] [CrossRef]
  16. Shi, W.; Ling, Q.; Wu, G.; Yin, W. EXTRA: An exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 2015, 25, 944–966. [Google Scholar] [CrossRef]
  17. Ling, Q.; Shi, W.; Wu, G.; Ribeiro, A. DLM: Decentralized linearized alternating direction method of multipliers. IEEE Trans. Signal Process. 2015, 63, 4051–4064. [Google Scholar] [CrossRef]
  18. Mokhtari, A.; Shi, W.; Ling, Q.; Ribeiro, A. DQM: Decentralized quadratically approximated alternating direction method of multipliers. IEEE Trans. Signal Process. 2016, 64, 5158–5173. [Google Scholar] [CrossRef]
  19. Eisen, M.; Mokhtari, A.; Ribeiro, A. A primal-dual quasi-Newton method for exact consensus optimization. IEEE Trans. Signal Process. 2019, 67, 5983–5997. [Google Scholar] [CrossRef]
  20. Mansoori, F.; Wei, E. A fast distributed asynchronous Newton-based optimization algorithm. IEEE Trans. Autom. Control 2019, 65, 2769–2784. [Google Scholar] [CrossRef]
  21. Jiang, X.; Qin, S.; Xue, X.; Liu, X. A second-order accelerated neurodynamic approach for distributed convex optimization. Neural Netw. 2022, 146, 161–173. [Google Scholar] [CrossRef] [PubMed]
  22. Li, Y.; Voulgaris, P.G.; Stipanović, D.M.; Freris, N.M. Communication efficient curvature aided primal-dual algorithms for decentralized optimization. IEEE Trans. Autom. Control 2023, 68, 6573–6588. [Google Scholar] [CrossRef]
  23. Alistarh, D.; Grubic, D.; Li, J.Z.; Tomioka, R.; Vojnovic, M. QSGD: Communication-efficient SGD via gradient quantization and encoding. In Proceedings of the 30th NeurIPS, Long Beach, CA, USA, 4–9 December 2017; pp. 1710–1721. [Google Scholar]
  24. Wangni, J.; Wang, J.; Liu, J.; Zhang, T. Gradient sparsification for communication-efficient distributed optimization. In Proceedings of the 31st NeurIPS 2018, Montreal, QC, Canada, 2–8 December 2018; pp. 1306–1316. [Google Scholar]
  25. Stich, S.U.; Cordonnier, J.B.; Jaggi, M. Sparsified SGD with memory. In Proceedings of the 31st NeurIPS 2018, Montreal, QC, Canada, 2–8 December 2018; pp. 4447–4458. [Google Scholar]
  26. Doan, T.T.; Maguluri, S.T.; Romberg, J. Fast convergence rates of distributed subgradient methods with adaptive quantization. IEEE Trans. Autom. Control 2020, 66, 2191–2205. [Google Scholar] [CrossRef]
  27. Taheri, H.; Mokhtari, A.; Hassni, H.; Pedarsani, R. Quantized decentralized stochastic learning over directed graphs. In Proceedings of the 37th ICML, Virtual, 13–18 July 2020; pp. 9324–9333. [Google Scholar]
  28. Song, Z.; Shi, L.; Pu, S.; Yan, M. Compressed gradient tracking for decentralized optimization over general directed networks. IEEE Trans. Signal Process. 2022, 70, 1775–1787. [Google Scholar] [CrossRef]
  29. Xiong, Y.; Wu, L.; You, K.; Xie, L. Quantized distributed gradient tracking algorithm with linear convergence in directed networks. IEEE Trans. Autom. Control 2022, 68, 5638–5645. [Google Scholar] [CrossRef]
  30. Zhu, S.; Hong, M.; Chen, B. Quantized consensus ADMM for multi-agent distributed optimization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 4134–4138. [Google Scholar]
  31. Elgabli, A.; Park, J.; Bedi, A.S.; Issaid, C.B.; Bennis, M.; Aggarwal, V. Q-GADMM: Quantized group ADMM for communication efficient decentralized machine learning. IEEE Trans. Commun. 2020, 69, 164–181. [Google Scholar] [CrossRef]
  32. Li, W.; Liu, Y.; Tian, Z.; Ling, Q. Communication-censored linearized ADMM for decentralized consensus optimization. IEEE Trans. Signal Inf. Process. Over Netw. 2020, 6, 18–34. [Google Scholar] [CrossRef]
  33. Gao, L.; Deng, S.; Li, H.; Li, C. An event-triggered approach for gradient tracking in consensus-based distributed optimization. IEEE Trans. Netw. Sci. Eng. 2021, 9, 510–523. [Google Scholar] [CrossRef]
  34. Zhang, Z.; Yang, S.; Xu, W.; Di, K. Privacy-preserving distributed ADMM with event-triggered communication. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 2835–2847. [Google Scholar] [CrossRef]
  35. Chen, T.; Giannakis, G.; Sun, T.; Yin, W. LAG: Lazily aggregated gradient for communication-efficient distributed learning. Adv. Neural Inf. Process. Syst. 2018, 31, 5050–5060. [Google Scholar]
  36. Sun, J.; Chen, T.; Giannakis, G.; Yang, Z. Communication-efficient distributed learning via lazily aggregated quantized gradients. Adv. Neural Inf. Process. Syst. 2019, 32, 3370–3380. [Google Scholar]
  37. Singh, N.; Data, D.; George, J.; Diggavi, S. SPARQ-SGD: Event-triggered and compressed communication in decentralized optimization. IEEE Trans. Autom. Control 2022, 68, 721–736. [Google Scholar] [CrossRef]
  38. Yang, X.; Yuan, J.; Chen, T.; Yang, H. Distributed adaptive optimization algorithm for fractional high-order multiagent systems based on event-triggered strategy and input quantization. Fractal Fract. 2023, 7, 749. [Google Scholar] [CrossRef]
  39. Zhang, Z.; Yang, S.; Xu, W. Decentralized ADMM with compressed and event-triggered communication. Neural Netw. 2023, 165, 472–482. [Google Scholar] [CrossRef]
  40. Richtárik, P.; Sokolov, I.; Fatkhullin, I. EF21: A new, simpler, theoretically better, and practically faster error feedback. In Proceedings of the 34th NeurIPS, Virtual, 6–14 December 2021; pp. 4384–4396. [Google Scholar]
  41. Richtarik, P.; Sokolov, I.; Fatkhullin, I.; Gasanov, E.; Li, Z.; Gorbunov, E. 3PC: Three point compressors for communication-efficient distributed training and a better theory for Lazy aggregation. In Proceedings of the 39th ICML, Baltimore, MD, USA, 17–23 July 2022; pp. 18596–18648. [Google Scholar]
  42. Shi, W.; Ling, Q.; Wu, G.; Yin, W. A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Signal Process. 2015, 63, 6013–6023. [Google Scholar] [CrossRef]
  43. Alghunaim, S.; Yuan, K.; Sayed, A.H. A linearly convergent proximal gradient algorithm for decentralized optimization. In Proceedings of the 32nd NeurIPS, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  44. Guo, L.; Shi, X.; Yang, S.; Cao, J. DISA: A dual inexact splitting algorithm for distributed convex composite optimization. IEEE Trans. Autom. Control 2024, 69, 2995–3010. [Google Scholar] [CrossRef]
  45. Li, W.; Liu, Y.; Tian, Z.; Ling, Q. COLA: Communication-censored linearized ADMM for decentralized consensus optimization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 5237–5241. [Google Scholar]
Figure 1. Distribution of samples across agents for the a9a dataset (left) and ijcnn1 (right).
Figure 1. Distribution of samples across agents for the a9a dataset (left) and ijcnn1 (right).
Fractalfract 08 00721 g001
Figure 2. Random communication graph of network with 10 agents.
Figure 2. Random communication graph of network with 10 agents.
Fractalfract 08 00721 g002
Figure 3. Performance comparison of distributed logistic regression the on a9a dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Figure 3. Performance comparison of distributed logistic regression the on a9a dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Fractalfract 08 00721 g003
Figure 4. Performance comparison of distributed logistic regression the on ijcnn1 dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Figure 4. Performance comparison of distributed logistic regression the on ijcnn1 dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Fractalfract 08 00721 g004
Figure 5. Performance comparison of distributed ridge regression on the a9a dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Figure 5. Performance comparison of distributed ridge regression on the a9a dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Fractalfract 08 00721 g005
Figure 6. Performance comparison of distributed ridge regression on the ijcnn1 dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Figure 6. Performance comparison of distributed ridge regression on the ijcnn1 dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Fractalfract 08 00721 g006
Figure 7. Performance comparison of distributed LASSO on the a9a dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Figure 7. Performance comparison of distributed LASSO on the a9a dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Fractalfract 08 00721 g007
Figure 8. Performance comparison of distributed LASSO on the ijcnn1 dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Figure 8. Performance comparison of distributed LASSO on the ijcnn1 dataset: Plots of iteration number (left) and total communication bits (right) versus distance error.
Fractalfract 08 00721 g008
Table 1. Convergence accuracy ( e r r t ) for different experiments.
Table 1. Convergence accuracy ( e r r t ) for different experiments.
Distributed Logistic RegressionDistributed Ridge RegressionDistributed LASSO
a9aijcnn1a9aijcnn1a9aijcnn1
Convergence Error ( e r r t ) 1 × 10 4.5 1 × 10 5.5 1 × 10 6.8 1 × 10 6.8 1 × 10 4.5 1 × 10 5.5
Table 2. Comparison of iterations.
Table 2. Comparison of iterations.
MethodDistributedDistributedDistributed
Logistic RegressionRidge RegressionLASSO
a9aijcnn1a9aijcnn1a9aijcnn1
P2D2468811--4230409
PG-EXTRA470813--4231410
CC-DQM--24679--
DRUID-Gradient8458849771763348445
CE-DADMM-Gradient:EF218458849771713348446
CE-DADMM-Gradient:CLAG8458849801703349446
DRUID-Newton154559197601910318
CE-DADMM-Newton:EF21155559252601912318
CE-DADMM-Newton:CLAG155559194641912318
DRUID-BFGS684566330772890399
CE-DADMM-BFGS:EF21478567325732890400
CE-DADMM-BFGS:CLAG478566327692890399
Table 3. Comparison of communication bits.
Table 3. Comparison of communication bits.
MethodDistributedDistributedDistributed
Logistic RegressionRidge RegressionLASSO
a9aijcnn1a9aijcnn1a9aijcnn1
P2D2116,056,83238,539,264--862,720,00029,344,768
PG-EXTRA125,088,00030,080,000--861,520,00020,101,504
CC-DQM--20,782,0801,219,680--
DRUID-Gradient146,340,48027,382,784169,200,7685,451,776579,820,03213,784,320
CE-DADMM-Gradient:EF2135,692,8007,468,03241,268,4801,444,608141,419,5203,767,808
CE-DADMM-Gradient:CLAG35,650,5607,459,58441,352,9601,427,712141,419,5203,759,360
DRUID-Newton26,670,33617,315,58434,117,2481,858,560330,781,4409,850,368
CE-DADMM-Newton:EF216,547,2004,722,43210,644,480506,88080,762,8802,686,464
CE-DADMM-Newton:CLAG6,504,9604,713,9848,152,320532,22480,720,6402,678,016
DRUID-BFGS118,457,85617,532,41657,150,7202,385,152500,501,76012,359,424
CE-DADMM-BFGS:EF2120,190,7204,790,01613,728,000616,704122,073,6003,379,200
CE-DADMM-BFGS:CLAG20,148,4804,773,12013,770,240574,464122,031,3603,362,304
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chang, Z.; Zhang, Z.; Yang, S.; Cao, J. A Flexible Framework for Decentralized Composite Optimization with Compressed Communication. Fractal Fract. 2024, 8, 721. https://doi.org/10.3390/fractalfract8120721

AMA Style

Chang Z, Zhang Z, Yang S, Cao J. A Flexible Framework for Decentralized Composite Optimization with Compressed Communication. Fractal and Fractional. 2024; 8(12):721. https://doi.org/10.3390/fractalfract8120721

Chicago/Turabian Style

Chang, Zhongyi, Zhen Zhang, Shaofu Yang, and Jinde Cao. 2024. "A Flexible Framework for Decentralized Composite Optimization with Compressed Communication" Fractal and Fractional 8, no. 12: 721. https://doi.org/10.3390/fractalfract8120721

APA Style

Chang, Z., Zhang, Z., Yang, S., & Cao, J. (2024). A Flexible Framework for Decentralized Composite Optimization with Compressed Communication. Fractal and Fractional, 8(12), 721. https://doi.org/10.3390/fractalfract8120721

Article Metrics

Back to TopTop