Open AccessArticle

Construction of a Deep Learning Model for Unmanned Aerial Vehicle-Assisted Safe Lightweight Industrial Quality Inspection in Complex Environments

Zhongyuan Jing

^1,2,3,* and

Ruyan Wang

^1,2,3

School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

The Advanced Network and Intelligent Connection Technology Key Laboratory, Chongqing 400065, China

Key Laboratory of Ubiquitous Sensing and Networking, Chongqing 400065, China

Author to whom correspondence should be addressed.

Drones 2024, 8(12), 707; https://doi.org/10.3390/drones8120707

Submission received: 15 October 2024 / Revised: 19 November 2024 / Accepted: 22 November 2024 / Published: 27 November 2024

(This article belongs to the Special Issue Mobile Fog and Edge Computing in Drone Swarms)

Download

Browse Figures

Figure 1
Example of a workplace for UAV-assisted industrial quality inspection. "> Figure 2
The architecture of CP-FL. "> Figure 3
Algorithm flow diagram. "> Figure 4
Schematic of knowledge distillation and federal learning processes. The teacher model generates soft labels for training the student model. Local updates are performed on the client, and the updated parameters are aggregated globally by the server. "> Figure 5
Diagram of the ConvNeXt network model (using depth-separable convolution to decouple the fusion of spatial information and the fusion of channel information, expanding the overall width of the model). "> Figure 6
Impact of different schemes on the performance of ResNet on MNIST. "> Figure 7
Impact of different schemes on the performance of ResNet on CIFAR-10. "> Figure 8
Performance test of model with different pruning rate based on MNIST dataset. "> Figure 9
Performance test of models with different pruning rates based on CIFAR-10 dataset. "> Figure 10
Layer-wise quantization bit statistics for VGG16. "> Figure 11
Layer-wise compression ratio statistics for VGG16. "> Figure 12
Classification accuracy of the CP-FL framework for participant models on the MNIST dataset. "> Figure 13
Classification accuracy of the CP-FL framework for participant models on the DTB70 dataset. "> Figure 14
Variation in accuracy on the CIFAR-10 dataset. "> Figure 15
Variation in the loss function on the CIFAR-10 dataset. "> Figure 16
Loss function vs. rounds for compression methods on MNIST. "> Figure 17
Loss function vs. rounds for compression method on CIFAR-10. ">

Versions Notes

Abstract

With the development of mobile communication technology and the proliferation of the number of Internet of Things (IoT) terminal devices, a large amount of data and intelligent applications are emerging at the edge of the Internet, giving rise to the demand for edge intelligence. In this context, federated learning, as a new distributed machine learning method, becomes one of the key technologies to realize edge intelligence. Traditional edge intelligence networks usually rely on terrestrial communication base stations as parameter servers to manage communication and computation tasks among devices. However, this fixed infrastructure is difficult to adapt to the complex and ever-changing heterogeneous network environment. With its high degree of flexibility and mobility, the introduction of unmanned aerial vehicles (UAVs) into the federated learning framework can provide enhanced communication, computation, and caching services in edge intelligence networks, but the limited communication bandwidth and unreliable communication environment increase system uncertainty and may lead to a decrease in overall energy efficiency. To address the above problems, this paper designs a UAV-assisted federated learning with a privacy-preserving and efficient data sharing method, Communication-efficient and Privacy-protection for FL (CP-FL). A network-sparsifying pruning training method based on a channel importance mechanism is proposed to transform the pruning training process into a constrained optimization problem. A quantization-aware training method is proposed to automate the learning of quantization bitwidths to improve the adaptability between features and data representation accuracy. In addition, differential privacy is applied to the uplink data on this basis to further protect data privacy. After the model parameters are aggregated on the pilot UAV, the model is subjected to knowledge distillation to reduce the amount of downlink data without affecting the utility. Experiments on real-world datasets validate the effectiveness of the scheme. The experimental results show that compared with other federated learning frameworks, the CP-FL approach can effectively mitigate the communication overhead, as well as the computation overhead, and has the same outstanding advantage in terms of the balance between privacy and usability in differential privacy preservation.

Keywords:

federated learning; sparsification; quantization; differential privacy; knowledge distillation

1. Introduction

With the rapid development of next-generation network systems such as 6G communication networks, high-performance unmanned aerial vehicles (UAVs) are regarded as a new type of airborne edge server due to their sensing, computing, and storage capabilities [1]. Compared to fixed-edge servers traditionally installed on ground base stations, UAVs can be deployed on-demand by virtue of their high degree of agility, flexibility, and mobility, thus significantly enhancing the coverage of the system [2].

In the case of nuclear power plants, for example, traditional inspection methods usually rely on manual labor, which is not only time-consuming and labor-intensive but also poses a serious threat to the safety of personnel when operating in a radioactive environment. However, through the introduction of drone technology, detailed inspections of the internal structures of nuclear power plants can be accomplished without the need for personnel to come into direct contact with hazardous environments. Similarly, oil refineries and natural gas processing facilities often cover large areas and contain numerous complex piping networks and large storage tanks, as shown in Figure 1. Traditional manual inspections are not only inefficient but can also pose safety hazards in certain regions. Drones can easily overcome these obstacles and perform inspections efficiently.

Multiple UAVs can maneuver in different regions, receive widely distributed user data, and complete complex mobile edge computing tasks collaboratively, thus training machine learning models with high availability and high real-time response capability, such as image classification models [3]. In addition, in federated learning mode, after completing local model training, drones only need to upload local model parameters to cloud servers for global model aggregation, thereby achieving the sharing of model training results and effectively protecting data privacy [4].

This decentralized learning approach ensures that each company’s data privacy is protected while also allowing the model to learn more comprehensive knowledge from more data, improving the model’s ability to generalize. Ultimately, each participant has access to a high-performance model trained on multi-party data for real-time monitoring of equipment status, predicting potential failures, and planning preventive maintenance [5]. The combination of UAVs and federated learning technology provides an efficient, real-time and intelligent solution for the inspection and monitoring of industrial equipment, which helps to improve the safety and efficiency of industrial production [6].

Although federated learning has significant advantages in data privacy protection, it also faces some challenges. In the process of joint modeling, each participant needs to send their trained model parameters or gradients to the central server, which usually contain a large amount of data, especially for large and complex models [7]. In order to obtain accurate models, federated learning typically requires multiple training iterations, each iteration requiring communication between participants and the central server, which significantly increases communication overhead. To address this issue, researchers have explored various strategies, including model compression, gradient quantization, sparse communication, local training multiple aggregations (reducing communication frequency), etc. [8].

Federated learning requires participants to upload updated model parameters after each local training session, and these updated parameters still contain sensitive information from the participants. This means that although federated learning does not directly collect user privacy data, it faces new challenges in terms of potential privacy information leakage [9]. Current methods are mainly divided into two categories: one is encryption methods, such as secure multi-party computation, homomorphic encryption, etc.; the other is data perturbation methods, such as differential privacy. Encryption methods provide an effective means of protecting data privacy by encoding plaintext into ciphertext, allowing only specific personnel to decode, but this often requires a great deal of computational overhead and is difficult to apply in actual federated learning scenarios, while data perturbation methods are relatively lightweight [10].

UAV-assisted federated learning has obvious advantages such as wide communication coverage, low communication overhead, and instant response. However, it also faces challenges such as limited communication bandwidth, unreliable communication environment, and uncertainty in flight environment, which may lead to low energy efficiency issues. This article focuses on promoting the collision and fusion of diverse data, maximizing the value of data, and proposing a new federated learning method CP-FL assisted by multiple drones to ensure data privacy, maintain algorithm utility, and improve communication efficiency. The main contributions of this article are as follows:

(1): We propose CP-FL, a method that addresses privacy, utility, and communication efficiency concerns by processing the data volume for both the uplink and downlink during model training and by introducing noise to parameters when they are uploaded from edge servers to the central server.
(2): A network-sparsification pruning training method based on channel importance is introduced, converting the pruning process into a constrained optimization problem. Additionally, a quantization-aware training method is proposed to enable automated learning of quantization bitwidth, enhancing the adaptability between feature representation and data precision.
(3): To further enhance privacy, after model parameter pruning and quantization, we employ differential privacy techniques to protect users’ uploaded data. Following the aggregation of model parameters at the central server, knowledge distillation is applied to the model, reducing the amount of data transmitted downstream without compromising utility.

The rest of the paper is organized as follows. Section 2 reviews the existing work in related fields; Section 3 details the architecture and implementation of the UAV-assisted federal learning system we designed; Section 4 verifies the system performance through experimental and simulation analysis; Section 5 delves into the problems identified during the research process and proposes directions for future research; and finally, in Section 6, we summarize the full paper and draw the conclusions of the study.

2. Related Work

2.1. UAV-Assisted Federal Learning Algorithms

In recent years, drones have been widely used in various fields such as environmental monitoring, emergency communication, traffic control, and remote sensing due to their flexibility and maneuverability. They can not only effectively collect data and provide communication services in edge networks but also provide necessary computing support [11]. Dai et al. proposed a multi-agent reinforcement learning method based on federated learning. This method introduces a federated learning framework in multi-agent systems, enabling non-private data sharing between drones. The simulation results show that compared to traditional multi-agent deep reinforcement learning algorithms without information interaction, this method can significantly improve network utility [12]. Many IoT devices are used in various application scenarios such as precision agriculture, soil management, automated operations, information collection, and local processing. In view of the limitations of computing power and energy of IoT devices, Akbari et al. proposed mobile edge computing (MEC) assisted by unmanned aerial vehicles (UAVs), which can undertake some computing tasks of IoT nodes and provide additional resources for these devices, thus making applications such as smart agriculture more feasible [13].

In addition, reference [14] explores effective communication and computing solutions for constructing a digital twin model of the ocean Internet of Things. In this scheme, the non-orthogonal multiple access model trained by unmanned ground vehicles is uploaded to a high-altitude platform composed of drones for global model aggregation. Reference [15] designed an optimized multi-drone-assisted federated learning framework. Within this framework, conventional IoT devices are responsible for executing training tasks, while multiple drones are responsible for local and global model aggregation. This study proposes an online resource allocation algorithm that minimizes training latency by jointly deciding on client selection and global aggregation server selection.

2.2. Model Compression and Quantization

Current neural network model compression methods mainly include sparsification and quantization, among others. Traditional model compression processes typically use a single compression method to obtain a lightweight model that meets hardware deployment and computational requirements. However, as the model inference process shifts towards the front end, unilateral model compression is no longer sufficient to meet the new demands and challenges posed by edge-end hardware platforms with extremely limited storage and computing power for the scale of deployed models [16]. Therefore, combining or integrating multiple model compression methods to simultaneously compress the model structure and parameters from different perspectives has become a new paradigm for further reducing the model size, improving inference performance, and adapting to the deployment requirements of light and small hardware models.

Researchers have proposed a staged combination optimization approach for jointly addressing model structure and parameter bitwidth compression independently, leading to higher compression ratios with the integration of sparsification and quantization methods [17]. In prior studies, the compression of connection structures primarily involves the elimination of unimportant model weights to achieve model sparsification, resulting in a more compact architectural design [18]. Regarding weight parameters, the strategy entails approximating high-bitwidth continuous data with low-bitwidth discrete representations, thereby reducing the computational complexity and storage requirements of the model [19]. The combined application of sparsification and quantization yields significantly greater compression rates compared to employing either method alone, effectively mitigating the hardware storage resource demands during model deployment [20].

Giannopoulos et al. proposed a communication-efficient and privacy-aware intelligent transportation federal air learning (FedShip) framework [21]. The framework uses a layered architecture that focuses on reducing data traffic loads and optimizing wireless resources such as transmitter power and spectrum utilization. FedShip also considers improvements in data privacy, model-sharing capabilities, and energy efficiency from federal learning and airborne computing methods.

2.3. Federated Learning with Differential Privacy

Within the framework of federated learning (FL), the exchange of model parameters or gradients between servers and clients inherently poses a risk of privacy leakage [22]. To address these privacy threats, the research community has widely adopted techniques such as homomorphic encryption, secret sharing, and differential privacy (DP) to reinforce FL’s privacy-preserving mechanisms [23]. While homomorphic encryption allows operations directly on encrypted data, its high computational cost and limitations in handling complex computations hinder practicality. Secret sharing, though capable of fragmenting information to distribute risk, incurs substantial communication overhead and reduced computational efficiency. In contrast, DP, by introducing noise into data, safeguards individual information, facilitating a reasonable level of anonymity.

Currently, numerous studies are focusing on FL schemes based on Local Differential Privacy (LDP), aiming to mitigate the privacy leakage challenges posed by “honest but curious” participants (semi-honest attackers who follow the protocol but attempt to analyze data or infer data ownership through reverse analysis of model parameters [24]). Most existing studies adopt a fixed strategy to adjust the pruning threshold, sensitivity, and noise intensity, ensuring that model parameters are uniformly protected during each round of global updates, but this may overlook the differences in privacy protection needs for different layers or parameters [25]. The LDP-FedSGD algorithm aims to integrate the LDP principle with the FL framework, especially in collaborative model training between cloud servers and mobile devices such as vehicles, achieving a win-win for privacy protection and communication efficiency, and is particularly suitable for machine learning tasks in a crowdsourcing environment [26].

Zhang et al. proposed LSFL (Lightweight and Secure Federated Learning), a lightweight crowdsourced federated learning scheme designed to help edge node vendors improve quality of service [27]. LSFL not only protects consumers’ privacy but also allows them to opt-out at any time during the training phase. In addition, they also proposed a lightweight Byzantine robust two-server secure aggregation protocol, which can achieve secure aggregation and Byzantine robustness.

2.4. Federated Learning Combined with Knowledge Distillation

Knowledge distillation is a common technique for model compression, leveraging the supervision from a larger, higher-performing model to train a smaller model with the aim of achieving comparable performance and accuracy. Its application in federated learning for efficient communication has effectively reduced the communication costs of federated learning; however, the overall training process necessitates the involvement of a central server, making it prone to single-point failure issues. Mo et al. [28] proposed a federated distillation algorithm, FedDQ, which leverages federated distillation to minimize communication overhead in federated learning, yet it still requires central server aggregation and is thus susceptible to single-point failures. Wu et al. [29] introduced Federated ICT (FedICT), a multi-access edge computing-based federated multi-task extraction method that directly separates local/global knowledge during the bidirectional extraction process between clients and servers, intending to accommodate multi-task clients while mitigating client drift caused by divergent local model optimization directions. Li et al. [30] proposed HBMD-FL, which replaces the federal learning central server with blockchain and employs model distillation to tackle model heterogeneity; however, it does not account for the impact of low-quality model labels during the aggregation process. Reference [31] designed a three-tier federated reinforcement learning framework with an end-edge-cloud structure, combined with an edge-cloud structured system. This scheme aims to support the optimization of client node selection and global aggregation frequency during the FL period through collaborative decision-making strategies [31]. Based on the idea of utilizing unlabeled open datasets, a semi-supervised FL algorithm based on distillation (DS-FL) was proposed, which exchanges the outputs of local models between mobile devices. Without compromising the model performance, it overcomes the significant communication costs caused by the model size [32].

3. Problem Statement and Proposed Scheme

3.1. Threat Modeling and Designing Program Goals

In this section, we begin by presenting our system model and threat model that presents our design goals. These objectives show our consideration of three challenges in the design of UAV-assisted model building: privacy leakage, high communication overhead, and the trade-off between security and efficiency.

3.1.1. System Architecture

The multi-UAV-assisted federated learning system consists of a lead UAV, multiple following UAVs, and ground terminals, forming a three-tier federated learning architecture (shown in Figure 2). In this architecture, individual edge intelligent devices (i.e., ground terminals) perform local data collection and transmit these data to the following UAVs. The following UAV then collects data, processes and trains the model in its assigned area, generates local model parameters, and sends these parameters to the pilot UAV. The primary responsibility of the pilot UAV is to receive model update parameters from the follower UAV and perform global model aggregation. The core concept of this design is to ensure the overall performance of the system while maximizing data privacy. The layered structure of the entire system and the clear division of tasks aim to fully utilize the privacy-preserving benefits brought by federated learning and enhance the overall performance through distributed computing. This not only ensures data security but also realizes efficient data processing and model training, providing a feasible solution for large-scale data collaboration in complex environments.

3.1.2. Threat Model

In the model set, we construct a model of trust that prevails in the real world: the server is seen as “honest but curious”, which means that the server performs all steps in strict compliance with the predefined protocol and does not tamper with or disrupt the process; however, due to its curiosity, it may try to obtain additional information about the client’s data from the protocol interactions. At the same time, some clients are set to be “malicious”, indicating that they will violate protocol rules and may attempt malicious operations, while others are considered “honest but curious”.

Although the practice of uploading only gradient information instead of raw data has been adopted under the federated learning framework to minimize the risk of directly exposing sensitive information, it is important to note that both servers and honest-but-curious clients (with the help of advanced gradient reverse engineering techniques) may still be able to restore some or even all of the target client’s raw training data from the received gradient information. Therefore, when designing and implementing a federated learning system, it is important to fully consider and take effective measures to counteract this potential risk of privacy leakage.

3.1.3. Design Objectives

The CP-FL system aims to realize a lightweight, privacy-preserving machine learning framework with the following design objectives:

Security: Ensure that all transmitted data enjoy a high degree of privacy during model construction. During the joint training process of multiple edge intelligence devices in collaboration with the pilot UAV, implement strict measures to safeguard the privacy and security of data during the interaction and transmission sessions and joint computation process to prevent any potential risk of information leakage.

Efficiency: The CP-FL system should ensure high efficiency during the joint training process between edge intelligent devices and pilot UAVs. It is capable of realizing lower communication overhead between multiple edge intelligent devices and the pilot UAV, effectively reducing the communication load.

Accuracy: The CP-FL system should uphold dependable and precise model training, maintaining a high level of model expressiveness to deliver accurate prediction results for each edge server.

3.2. Framework Design

In this section, we detail our proposed multi-UAV-assisted federated learning system. The CP-FL scheme is proposed on the basis of FL aggregation rules to construct an efficient, privacy-preserving and accurate FL framework.

In this system, the perception area is divided into

M = \{1, 2, \dots, M\}

sub-areas, and an intelligent device is deployed at the center of each sub-area to collect and transmit data from that area. Due to the limited coverage range of edge servers (hereinafter referred to as base stations, BSs) on ground base stations and the RF power limitations of user devices, these smart devices cannot directly communicate with base stations. To overcome this limitation, the system introduces multiple UAVs as mobile edge computing nodes. These drones are equipped with necessary hardware facilities, including data transmission and reception modules, storage units, and processing units (such as embedded CPUs), as well as other basic components such as body structure, batteries, power control systems, and flight control devices. The highly integrated design of drones endows them with excellent data storage, processing capabilities, and maneuverability, surpassing traditional fixed-edge servers. Within the research framework of this article, drones serve as airborne edge servers, supporting long-distance and short-range wireless communication technologies, and can effectively serve areas beyond the coverage range of base stations, thereby providing necessary computing services.

In a target sensing region, a group of UAVs

N = \{1, 2, \dots, N\}

forming a swarm of intelligences fly at a fixed altitude

H

. At the end of each time slot

t \in T = \{1, 2, \dots, T\}

, the UAV

i

flies to the next sensing sub-region in direction

θ_{i}^{t} \in [0, 2 π)

and distance

d_{i}^{t} \in [0, l^{\max})

, where

l^{\max}

is the maximum distance the UAV can fly in a single time slot.

To ensure that drones collect data only within the area assigned to them, a combination of binary variable

o_{i, k}^{t}

and geo-fencing techniques can be used to monitor and control the behavior of drones. A binary variable

o_{i, k}^{t} \in 0, 1

is employed to denote the position of drone

i

in time slot

t

, with

o_{i, k}^{t} = 1

if and only if drone

i \in N

is over sub-area

k \in M

. Specifically, each drone is pre-set with an operating range, i.e., the specific geographic boundary within which it is permitted to perform data collection. Once a UAV’s position information indicates that it is out of this pre-set working range, the system automatically sets the corresponding binary variable for that UAV to 0, indicating that it is currently not within the legal data collection area.

Assuming that the sensing capability of a UAV is defined as its maximum communication radius

R_{i}^{\max}

, any smart device within the maximum communication range is considered to be sensible and its data can be collected. The drone

i

collects data information from the set

M_{i} \subset M

of covered sub-areas when it satisfies the constraints

b_{i, k}^{t} R_{i, k}^{t} \leq R_{i}^{\max}

(1)

where

b_{i, k}^{t} \in 0, 1

is a communication decision binary variable that indicates whether the UAV communicates with the smart devices (hereinafter collectively referred to as smart devices

k

) in subregion

k

at time slot

t

, with

b_{i, k}^{t} = 1

indicating communication; otherwise,

b_{i, k}^{t} = 0

. In addition, there are constraints on the communication of UAVs

\sum_{i = 1}^{N} b_{i, k}^{t} \in \{0, 1\}

(2)

It states that when the smart device

k

is in the coverage area of two or more drones at the same time, it will select at most one drone to communicate with.

The task of the system is to multi-follow UAVs as local training nodes and utilize the data of the region to collaboratively train global models for various intelligent applications of data analysis. In this paper, we use the federated learning framework for collaborative learning among multiple devices with a multiclassification prediction model as an example, as illustrated in Figure 3. Federated learning is a process that iterates repeatedly until the global model converges, and each round of its global iteration includes the following steps:

(1): Download global model: the following UAVs download the latest global model $w^{t - 1}$ from the pilot UAV and use it as the initial local model: $w_{i}^{t, 0} \leftarrow w^{t - 1}$ ;
(2): Local model training: the following UAVs receive training data from smart devices in the coverage area and execute stochastic gradient descent method for local model training: $w_{i}^{t, n} = w_{i}^{t, n - 1} - η \nabla L_{i} (w_{i}^{t, n - 1}), n \geq 1,$ $η$ is the learning rate of the local model, and $L_{i} (w_{i}^{t, n})$ is the loss function. The local model training is stopped and the local model at this point is denoted as $w_{i}^{t} \leftarrow w_{i}^{t, n}$ ;
(3): Upload local model: the UAV node processes the local model $w_{i}^{t}$ and uploads the model to the pilot UAV for model aggregation;
(4): Global model aggregation: the pilot UAV weights and aggregates the received local models through the federal average algorithm to obtain a new global model $w^{t}$ , and then the model is processed by knowledge distillation.

Figure 3. Algorithm flow diagram.

3.3. Model Pruning and Quantization

Deep neural network pruning techniques are based on the widespread redundancy within neural networks, aiming to build a more streamlined model structure by eliminating non-critical parameters. This strategy seeks to achieve the same or comparable performance level as the original network with the fewest number of parameters, thereby greatly reducing the model’s demand for storage space and computational resources. Such optimization not only facilitates the efficient portability and deployment of the model but also greatly enhances its adaptability and practicality in a variety of application scenarios. Under the conditions of a specified dataset Data and sparsity

s

, etc., deep neural network pruning can be transformed into the following constrained optimization problem.

\{\begin{cases} \min_{w} (w, D a t a) = \min_{w} \frac{1}{n} \sum_{i = 1}^{n} L (w; (x_{i}, y_{i})) \\ w \in R^{n}, {‖w‖}_{0} \leq s \end{cases}

(3)

In the formula,

L (\cdot)

is the model loss function, and this article uses the cross entropy loss function to evaluate the classification error of the neural network model,

w

is a parameter in the network,

n

is the total number of network parameters, and

{‖w‖}_{0}

is the standard

L (0)

norm.

In this work, the dataset

D a t a = {\{x_{i}, y_{i}\}}_{i = 1}^{n}

is employed for experimentation, with

x_{i}

and

y_{i}

representing the network’s training labels and original labels, respectively. The importance of each connection’s weight is evaluated to selectively retain or discard it, introducing an auxiliary variable mask

c \in \{0, 1\}

to denote the connectivity of individual weights. Assuming random initialization of the mask matrix

M^{'}

and weight matrix

W^{'}

, the weights of the neural network, represented by matrix

W

, are learned through iterative training on the dataset’s samples.

M^{'} * W^{'} = W

(4)

In the formula,

W^{'}

is the initialization network weight parameter,

M^{'}

is the initial mask matrix of the network. When

M_{i, j} = 1

, it indicates that the weight parameters in its corresponding network have not been removed; otherwise, a pruning operation has been performed. In the process of neural network pruning, the random mask used in the initial stage introduces a certain degree of randomness, which requires us to continuously evaluate the impact of these randomly selected parameters on the overall objective function.

Since the mask

m \in \{0, 1\}

ultimately converges within range

[0, 1]

, the value of a single mask can be regarded as a binary random variable, and the mask selection problem is transformed into a weight and probability space loss minimization problem. By selectively pruning operations on the mask matrices composed of zeros and ones, along with their associated weight matrices, the significance of parameters between layers in the network model is assessed. The introduction of the choice of the mask matrix as a random variable transforms the discrete problem into a continuous one, ultimately formulating it as a minimization problem for network loss. Assuming the probability of convergence to 0 is

p_{i}

, the probability of convergence to 1 is

1 - p_{i}

, where

p_{i} \in \{0, 1\}

. Given that the selections of mask variables

m_{i}

are independent and do not affect each other, they follow a Bernoulli distribution.

P (m| p) = \prod_{i = 1}^{n} p^{m_{i}} {(1 - p_{i})}^{1 - m_{i}}

(5)

The size of the neural network model architecture is to some extent determined by the sum of mask probabilities, which is

E_{m - p (m |p)} {‖m‖}_{0} = \sum_{j = 1}^{n} p_{j}

. We transform the mask problem with the aforementioned discrete constraints into the following continuous space loss minimization problem

\{\begin{cases} \underset{w}{\min E_{p (m |p)} L} (w, m) \\ w \in R^{n}, p \in \{0, 1\}, {‖p‖}_{0} \leq s \end{cases}

(6)

Due to the presence of a large number of redundant parameters in the randomly initialized network model, according to the conditional limitations of the above formula, a large number of

m_{i}

converge to 0 or 1 with a high probability, causing the

M

matrix to converge to a deterministic mask matrix, resulting in the loss of the pruned mask continuously approaching the expected loss. The pruning ratio of neural networks directly affects the consumption of computing resources and memory. As the degree of pruning deepens, although the computational cost of the model significantly decreases, it may also lead to a decline in accuracy.

Due to the different sensitivities of the neural network layers to data representation, quantization using a uniform data bitwidth may lead to different degrees of information redundancy. Therefore, in order to improve the adaptability between the accuracy and efficiency of feature representation, it is necessary to assign appropriate quantization bitwidths to different layers. In this paper, we draw on the idea of neural network architecture search (NAS) to design a mixed-precision quantization method with a multi-branch structure and achieve automatic optimization of quantization bitwidths through quantization-aware training (QAT) techniques. In the process, we introduce a quantization bitwidth selection factor

α_{l, i}

to evaluate and characterize the importance of choosing a specific quantization bitwidth among different branches, which guides the selection of the optimal quantization strategy.

{\hat{W}}_{l} = \sum_{i} σ (α_{l, i}) \times Q (W_{l}, b i t_{i})

(7)

{\hat{W}}_{l} \in \{{\hat{W}}_{1}, {\hat{W}}_{2}, \dots {\hat{W}}_{L} |{\hat{W}}_{l} \in [Q_{n_{i}}, Q_{p_{i}}]\}

in the formula is the quantized weight. To limit the spatial range of feasible solutions searched by quantization factors,

α_{l, i}

is normalized using function

σ (\cdot)

σ (α_{l, i}) = \frac{e^{α_{l, i}}}{\sum_{c = 1}^{C} e^{α_{l, c}}}

(8)

where

C

denotes the number of quantization branch nodes; function

σ (\cdot)

transforms vector

α_{l, i}

into a probability distribution over the interval

[0, 1]

with a sum of 1, representing the importance probabilities of each quantization branch. As illustrated in Equation (7), the quantized values are summed after being weighted by their respective importance probabilities, and through the employment of gradient descent optimization, the branch with the highest importance probability is identified.

To ensure the stability of model convergence during the search process, quantization-aware training employs a pseudo-quantization approach. This involves quantizing floating-point weights to integer values and then mapping them back to floating-point format. This methodology allows for the integration of quantization errors into the training process while preserving the precision of both forward inference and backward propagation during training, thereby optimizing the balance between quantization efficiency and model accuracy. The pseudo-quantization operation formula is

Q_{F a k e} (W_{l}, b i t) = (\frac{R_{\max} - R_{\min}}{2^{b i t - 1} - 1}) \times Q (W_{l}, b i t)

(9)

Therefore, Formula (5) can be rewritten as:

{\hat{W}}_{l} = \sum_{i} σ (α_{l, i}) \times Q_{F a k e} (W_{l}, b i t_{i})

(10)

During backpropagation, to address the issue of discontinuity in the quantization mapping function, mainstream straight-through estimator (STE) methods can be employed to approximate gradients of the quantization process. Accordingly, in accordance with the chain rule of differentiation, the parameter update formula for the pseudo-quantized weights in Equation (14) of the objective function can be defined as follows.

w_{l, i}^{'} = w_{l, i} - l r \times σ (α_{l, i}) \times \frac{\partial E_{l}}{\partial {\tilde{W}}_{l}}

(11)

w_{l, i}^{'}

represents the weight parameters updated through iteration. During the training process, the continuous transfer of quantization error between layers can be ensured by using continuous floating-point weight gradients to approximate the discrete weight gradients after quantization. For the convergence process of the whole network, the truncation error and approximation error introduced by quantization are the key factors affecting the convergence effect of the model. By passing these errors during training, the model’s ability to perceive the quantization error can be enhanced. In this way, with the iteration of training, the model can gradually correct the impact of these errors on the final prediction results so that the quantized model is as close as possible to the convergence state of the original unquantized model.

3.4. Protection of Model Parameters

In federated learning, participating parties contribute to generating a global model by sharing updates of their local models instead of raw data. However, local model parameters can inherently reflect characteristics of the training data, leading to indirect leakage of data privacy if shared directly. To address this issue, this paper employs differential privacy mechanisms to perturb the local model parameters, thereby ensuring the protection of participants’ data privacy.

Based on DP, we can use a random algorithm to ensure that the impact of changing a record on the output is always below a certain threshold, and under a series of strict mathematical proofs, we ensure that attackers cannot distinguish the real input through the output.

Definition 1.

If there is

P (M (D) \in S) \leq \exp (ε) P (M (D^{'}) \in S) + δ

for all adjacent datasets

D, D^{'}

, as well as for all subsets

S

within the algorithm

M

range, then the random algorithm

M

satisfies

(ε, δ) - D P

. In this definition,

ε

represents the privacy budget for differential privacy, with smaller values indicating stronger privacy.

0 \leq δ \leq 1

represents the tolerance for probabilistic failure of differential privacy. The smaller the tolerance, the stronger the privacy. In this article, we implement DP based on RDP (Rényi Differential Privacy).

Definition 2.

If there exists

D_{α} [f (D) ‖f (D^{'})] \leq ε^{'}

for all adjacent datasets

D, D^{'}

and all subsets

S

within the value range of algorithm

M

, then the random algorithm

M

is said to satisfy

(α, ε^{'}) - R D P

, and

D_{α} [\cdot ‖\cdot]

represents the

α

-order Rayleigh entropy.

In this article, the following Gaussian mechanism

M (d)

that satisfies

(α, ε^{'}) - R D P

will be adopted.

\begin{array}{l} M (d) = f (d) + N, N \sim N (0, σ^{2}), \\ σ^{2} = \frac{α Δ f_{2}^{2}}{2 ε^{'}} \\ Δ f_{2}^{2} = \max {‖f (d) - f (d^{'})‖}_{2} \end{array}

(12)

where

Δ f_{2}^{2}

represents the sensitivity of

l_{2}

. This sensitivity represents the maximum

l_{2}

distance between the gradients calculated by any two users. Due to the uncontrollability of gradient values, we can limit each gradient value to a fixed range by adding constraints to the gradient.

\nabla_{θ} {\hat{L}}_{i} = \nabla_{θ} L_{i} \times \min (1, C / {‖\nabla_{θ} L_{i}‖}_{2})

(13)

As per the formula, whenever the magnitude of the gradient exceeds a threshold

C

, its value is scaled down without altering its direction to ensure it remains bounded, thereby ensuring that the output of the querying function is always within a defined limit. This implies that the sensitivity of each user’s data can be computed according to the following formula.

Δ f_{2} = C

(14)

Substituting Equation (14) into Equation (12), the resulting

M (d)

constitutes the algorithm employed for enforcing differential privacy in the federated learning framework.

3.5. Model Aggregation and Knowledge Distillation on Pilot UAVs

The pilot UAV receives the parameters uploaded from the follow UAV side and relies on these parameters to optimize the parameters of the global model using a weighted average.

{\bar{w}}_{t + 1} = \sum_{k = 1}^{K} \frac{n_{k}}{n} w_{t + 1}^{k}

(15)

Upon completion of the global model update, it serves as the teacher model, and a student model is trained leveraging a common dataset. This process, while ensuring the accuracy of model training, compresses the model to reduce the volume of downstream data.

Knowledge distillation is a model compression technique that can compress more complex models into a simpler model while retaining the predictive power of the original model.

q_{i} = \frac{\exp (z_{i} / T)}{\sum_{j} \exp (z_{i} / T)}

(16)

In the formula,

T

is a temperature parameter that controls the degree of softening of the output probability. When

T = 1

, the output is the class probability of softmax. When

T

approaches infinity, this formula is equivalent to the logical unit output by the network,

z_{i}

is the probability of

T

function outputting the classification category, and

q_{i}

is the soft objective output of the function. Knowledge distillation can be divided into soft distillation and hard distillation. Soft distillation utilizes the probability distribution information of teacher models to provide more detailed guidance, helping student models learn more complex features and decision boundaries. This article adopts soft distillation, and its framework process is shown in Figure 4. The purpose of soft distillation is to minimize the

K L

-divergence between the softmax output of the teacher model and the softmax output of the student model. Its loss function is as follows:

\begin{array}{l} L_{t o t a l} = & (1 - λ) L_{C E} [Ψ (z_{s}), y] + \\ λ T^{2} L_{K L} [Ψ (z_{s}, T), Ψ (z_{t}, T)] \end{array}

(17)

In the formula,

L_{t o t a l}

is the total loss,

L_{C E} (\cdot)

is the cross entropy loss function,

L_{K L} (\cdot)

is the divergence loss function of

K L

Ψ (\cdot)

is the soft objective function,

z_{s}

and

z_{t}

are the class classification probabilities output by the student model and the teacher model,

λ

is the distillation coefficient, and

T

is the temperature coefficient.

Figure 4. Schematic of knowledge distillation and federal learning processes. The teacher model generates soft labels for training the student model. Local updates are performed on the client, and the updated parameters are aggregated globally by the server.

In Figure 4, the teacher model (a large, complex model) generates soft labels for the training data. These soft labels capture subtle knowledge of the teacher model, which is then used to train the student model (a smaller, simpler model). Using soft labels can help the student model learn more effectively than using hard labels alone. The temperature coefficient is used to soften the probability distribution of the teacher model output. This makes the probability distribution smoother, allowing the student model to better capture subtle differences between classes. The optimal temperature value is determined by experimental verification.

Each client updates its data locally. The updated local model parameters are sent to the central server where they are aggregated to update the global model. The aggregation process involves averaging the parameters of all clients to create a new global model. The local model is trained using soft labels generated by the teacher model. This ensures that the local model can benefit from the knowledge transferred by the teacher model. The federal learning process is iterative. After each round of local updates and global aggregation, the updated global model is sent back to the client for the next round of local updates. This process continues until the global model converges to an optimal state.

3.6. Security Analysis

In this section, we demonstrate the security of the proposed privacy protection scheme. Firstly, we need to provide the combinatorial lemma of RDP, as shown below.

Lemma 1.

If mechanism

f (x)

satisfies

(α, ε_{1}) - R D P

and mechanism

g (x)

satisfies

(α, ε_{2}) - R D P

, then mechanism

f (g (x))

satisfies

(α, ε_{1} + ε_{2}) - R D P

Lemma 2.

If mechanism

f (x)

satisfies

(α, ε^{'}) - R D P

, then

f (x)

also satisfies

(ε + \frac{\log \frac{1}{δ}}{α - 1}, δ) - D P

Based on these two lemmas, the privacy protection method based on RDP can be converted to traditional DP to demonstrate the privacy protection performance of our scheme.

Theorem 1.

For the

i

-th user in the

t

-th iteration, the federated learning algorithm based on CP-FL satisfies

(ε, δ) - D P

, where

ε = \frac{t C^{2}}{2 σ^{2}} + 2 \sqrt{\frac{t C^{2}}{2 σ^{2}} \log \frac{1}{δ}}

(18)

According to Definition 1 and Lemma 1, it can be inferred that during the

t

-th iteration, the CP-FL based federated learning algorithm satisfies

(α, ε^{'}) - R D P

. Combining Formula (12) and Lemma 2, it can be concluded that,

ε = \frac{t C^{2}}{2 σ^{2}} α + \frac{\log 1 / δ}{α - 1}

(19)

Since both

σ

and

δ

are pre-set parameters, there exists a one-to-one functional relationship between

ε

and

σ

. The optimal

ε

can be found by setting a reasonable value for

σ

. By making the partial derivative equal to 0, the extremum point at which

ε

takes the extreme value can be determined as

α_{0} = 1 + \sqrt{\frac{2 σ^{2} \log (1 / δ)}{t C^{2}}}

(20)

Substituting Equation (20) into Equation (19) yields Theorem 1.

Next, we will try to discuss the contribution of the CL-FL mechanism to privacy preservation. Since the CL-FL mechanism inevitably reduces the number of gradient dimensions uploaded by the client in each round, we set the dropout rate

ρ \in (0, 1)

. Then, the data that the user needs to submit in each round will be reduced to

(1 - ρ)

times the original. In order to show that the CL-FL mechanism can reduce the noise added with the same budget, we need to first introduce the following lemma.

Lemma 3.

Assuming that the values of each dimension of the gradient obey the same distribution in FL, the true sensitivity will drop from

Δ f_{2}

Δ f_{2} \sqrt{1 - ρ}

after the CL-FL mechanism is applied, where

ρ \in (0, 1)

denotes the dropout rate.

Assuming

f_{d} \geq f_{d}^{'}

and

\sup f_{d} = (y_{1}, y_{2}, \dots, y_{m})

\inf f_{d}^{'} = (y_{1}^{'}, y_{2}^{'}, \dots, y_{m}^{'})

, where

m

denotes the dimension of the gradient, then at this point, Equation (12) can be rewritten in the following form.

Δ f_{2} = \sqrt{{(y_{1} - y_{1}^{'})}^{2} + {(y_{2} - y_{2}^{'})}^{2} + \dots + {(y_{m} - y_{m}^{'})}^{2}}

(21)

Again, since the values taken in each dimension of the gradient obey the same distribution, one can set

f_{0} = y_{1} - y_{1}^{'} = y_{2} - y_{2}^{'} \dots y_{m} - y_{m}^{'}

(22)

Substituting this into Equation (21) yields

Δ f_{2} = \sqrt{m {f_{0}}^{2}}

(23)

And when the CL-FL mechanism is applied, the dimension will be reduced from

m

(1 - ρ) m

, at which point the true sensitivity of the gradient will be

\begin{array}{l} Δ f_{2}^{'} & = \sqrt{(1 - ρ) m {f_{0}}^{2}} \\ = Δ f_{2} \sqrt{(1 - ρ)} \end{array}

(24)

Therefore, the cited reasoning indicates Lemma 3 is valid.

Based on Lemma 3, we can state the following theorem.

Theorem 2.

Suppose that in traditional FL, adding noise with variance

σ^{2}

in each round causes the privacy budget to be consumed in each round as

ε

. Then, if the privacy budget is still allocated

ε

in each round, FL based on the CL-FL mechanism only needs to add noise with variance

(1 - ρ) σ^{2}

, where

ρ \in (0, 1)

denotes the dropout rate.

Proof.

Due to Lemma 3, the sensitivity decreases from

Δ f_{2}

Δ f_{2} \sqrt{1 - ρ}

. Setting

Δ f_{2}^{'} = Δ f_{2} \sqrt{1 - ρ}

, and substituting it into Equation (9) yields

\begin{array}{l} σ^{' 2} & = \frac{Δ {f_{2}^{'}}^{2} α}{2 ε^{2}} = \frac{{(Δ f_{2} \sqrt{1 - ρ})}^{2} α}{2 ε^{2}} \\ = (1 - ρ) \frac{Δ {f_{2}}^{2} α}{2 ε^{2}} = (1 - ρ) σ^{2} \end{array}

(25)

Therefore, Theorem 2 is proved. □

4. Experiments

4.1. Experimental Environment and Setup

The experimental environment was developed using Python 3.7.11 and ran on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00 GHz, equipped with an NVIDIA Tesla V100 SXM2 GPU, with Ubuntu 18.04.2 Operating System (OS). The memory capacity was

32 GB \times 4

. The specific parameters of the UAVs are shown in Table 1.

This study provides an in-depth examination of the performance of the proposed joint compression method by employing the classic and widely recognized image recognition datasets in the field of machine learning, MNIST, CIFAR-10, and the UAV visual recognition DTB70 dataset. The MNIST dataset consists of 60,000 training samples and 10,000 test samples and focuses on handwritten digit recognition, while the CIFAR 10, on the other hand, consists of 60,000 colorful 32 × 32-pixel images covering 10 categories with 6000 images per category for image classification tasks of higher complexity. The DTB70 dataset is a set of invaluable resources designed specifically for visual target tracking in UAVs, aiming at evaluating and comparing the performances of different tracking algorithms in UAV-specific tracking scenarios. The experiments rely on automated data loading scripts to ensure efficient utilization of the dataset during training. Global updating of the models is instead achieved through the FedAvg algorithm, which facilitates the aggregation of model parameters across clients and ensures continuous optimization of the global model.

In order to highlight the performance of the proposed algorithm, we have selected three different datasets and conducted comparative experiments on these datasets for the VGG16, ResNet (including 18-, 34-, 50-, and 101-layer versions), and ConvNeXt network models, as illustrated in Figure 5. By verifying the performance of different network architectures when dealing with specific types of data or tasks, we can comprehensively evaluate and demonstrate the advantages of the proposed algorithms. This multi-model, multi-dataset comparison helps to reveal the performance differences of various network architectures, thus more clearly demonstrating the effectiveness and robustness of the new algorithm.

In terms of training hyperparameters, the Adam optimizer is used to optimize the model parameters, the momentum parameter is set to 0.009, the decay coefficient is

10^{- 4}

, and the batch size is 128. In order to ensure the stability of the network’s learning process and to improve the training efficiency, this paper adopts a learning rate decay strategy. Specifically, two key nodes are set for learning rate adjustment throughout the training process: the initial learning rate of the network is 0.01, and when the total training progress of the network reaches 50% and 75%, the current learning rate is reduced to 10% of the original. This staged approach of decreasing the learning rate helps the model to adopt different optimization strategies at different training stages. A cross-entropy loss function was used to calculate the loss value of the model. A few methods of comparison with CP-FL are briefly summarized below.

(a): Federated Averaging (FedAvg) [33]: The federated averaging algorithm is a classic benchmark algorithm in federated learning. Edge servers update the model using local data based on the model issued and upload it back to the central server. The central server aggregates the models from the edge servers collected, using a weighted average based on the number of samples from each party, to obtain the model for the next round.
(b): FedProx [34]: When the training data are heterogeneous, FedProx has demonstrated stronger convergence than FedAvg on a set of real federated datasets by allowing each participant to perform a variable amount of work in compliance with device-level system constraints.
(c): FedDrop [35]: Based on the classic dropout scheme, a federated dropout scheme for stochastic model pruning is proposed. In each iteration of the FL algorithm, dropout is used to independently generate several subnets from the global model of the server. Each subnet is adapted to the assigned channel state and downloaded to the associated device for updating.
(d): UDP [36]: This is a localized differential privacy federated learning method based on the Gaussian mechanism, which includes an adaptive pruning threshold strategy. UDP ensures the privacy of local data from each participant in each round of global communication by fixing the privacy budget, and the noise in each round of global communication is adaptively variable.
(e): DP-SCAFFOLD [37]: This is a representative method in the field of federated learning based on the Gaussian mechanism, which takes the median of the gradient norm in each local training as the pruning threshold. The noise added to the model gradients uploaded by all participants in each round of global communication is fixed.

The experiments also cite recent state-of-the-art methods in the field of model compression to compare with the methods proposed in this paper, including APRS [38] and Hrank [39] pruning methods; MXQN [40], DSQ [41], SQ [42], APOT [43], HAWQ [44], and Unified INT8 [45] quantization methods; and DPP [17] and HFPQ [46] joint compression methods. In order to ensure the fairness of the experimental results, this paper analyzes the performance of the proposed methods in this paper by using the following three metrics with reference to the evaluation metrics in the above comparison methods.

Accuracy [47] loss is used to evaluate the loss of model accuracy before and after compression, i.e., the original model accuracy is subtracted from the compressed model accuracy, which is expressed as “Acc.↓”. If the accuracy loss is positive, it means that the model accuracy decreases after compression; conversely, it means that the model accuracy increases.

Average weight bits, which are used to evaluate the quantization bitwidth of the model weights, i.e., the ratio of the quantization bitwidth of all the model weights to the total number of weights after compression, are expressed as “Ave. bits”. The smaller the average weight bits, the higher the quantization degree of the model and the greater the compression rate.

The compression ratio is used to evaluate the degree of compression of the storage resources occupied by the model weights, that is, the ratio of the number of bytes of the original model weights to the number of bytes of the compressed model weights, which is expressed as “Comp. ratio”. The higher the compression ratio, the smaller the storage space consumed by the compressed model.

4.2. Knowledge Distillation Temperature Experiment

In deep learning, knowledge distillation is a model compression technique that mimics a large pre-trained teacher model by training a smaller student model. In this context, the “distillation temperature” is a key parameter to control the smoothness of the soft target distribution. To find the best distillation temperature, we first refer to the relevant literature to understand the distillation temperature of the common value range, while a series of targeted experiments is performed to determine the optimal temperature value. We choose different temperature values T and train the student model using the same dataset and training process for each selected temperature value. During each training process, key performance indicators such as training loss and validation set accuracy are recorded. For each temperature value, we evaluate the performance of the student model on the validation set.

The analysis of the experimental results shows that when T = 1, the softmax output is hard label, i.e., only one category has a significant probability, and the probability of other categories is close to zero. In this case, the student model is unable to learn the linkage information between different categories, which leads to a decrease in generalization ability. Softmax outputs at T = 3, T = 5, T = 10 are smoother, providing information about the linkage between the true category of the sample and other classifications. This allows the student model to learn more about the relationships between categories, which improves its generalization ability. At T = 5, the student model exhibits the best performance, indicating that the information provided by the softmax at this temperature value is the most beneficial to the student model’s learning. At T = 50, the distribution of the outputs of the categories is too uniform, and the differentiation between the correct category and the other categories is over-reduced, leading to the student model’s difficulty in learning useful knowledge effectively. Through experiments, it was found that when the distillation temperature T was set to 5, the student model was able to effectively learn useful information from the teacher model and maintain good generalization ability. This temperature value provides sufficient inter-category information without making the inter-category variability too low, which is the optimal choice in this experiment.

4.3. Model Performance Testing

Figure 6 and Figure 7 shows the variation in accuracy (Accuracy) of different federated learning algorithms with the number of training rounds (Rounds) on the MNIST and CIFAR10 datasets. Four different federated learning algorithms, FedAVG, FedProx, FedDrop, and CP-FL, are listed in the figure. From the figure, it can be seen that the accuracy of all the algorithms gradually improves as the number of training rounds increases. However, the CP-FL algorithm achieves high accuracy in fewer training rounds and maintains high accuracy in most of the training rounds. The CP-FL algorithm’s accuracy is close to 0.8 in the 20th round of iteration, and its ability to achieve high accuracy within fewer training rounds means that it learns and adapts to the data faster. In addition, the CP-FL algorithm’s accuracy fluctuates less, suggesting that it is more stable during training and less susceptible to outliers or noise. If the CP-FL algorithm achieves accuracy similar to or higher than the other algorithms in fewer training rounds, then it may be more efficient in real-world applications because it reduces communication and computation overhead. Overall, this figure shows that the CP-FL algorithm outperforms or at least does not underperform several other federated learning algorithms on the MNIST dataset, especially in terms of fast convergence and stability.

As can be seen from the experimental results in Figure 6 and Figure 7, the same model has different results when experimented on the two datasets, MNIST and CIFAR10. The MNIST dataset is a grayscale image of handwritten digits, with each image size of 28 × 28 pixels, and the backgrounds are usually more uniform, and the target objects (digits) are relatively simple and salient. The CIFAR-10 dataset, on the other hand, is a color image containing 10 categories of natural objects, each with an image size of 32 × 32 pixels, rich in color, with the distinction between background and foreground objects not as clear as on MNIST and with more variation in shapes and textures among objects. The category content (e.g., airplanes, automobiles, birds, etc.) is more complex and diverse in comparison to the figures on MNIST, and with the same model architecture on CIFAR-10, it is difficult to achieve similar performance to MNIST unless more detailed tuning is performed or the complexity of the model is increased.

4.4. Performance Testing Under Different Pruning Rates

During pruning, the dimensions of the neural network (that is, the number of hidden layers and the number of neurons per layer) change. Pruning usually involves the removal of unimportant connections or neurons, thereby reducing model size and computational complexity. The pruned model usually has higher reasoning efficiency and lower memory consumption, but it may affect the performance of the model. Therefore, it is necessary to carry out performance evaluation before and after pruning to ensure that the model meets the performance requirements and achieves the expected efficiency improvement.

To maximize resource utilization in terms of time, space, and efficiency, this paper investigates a network model optimization method to drastically reduce the redundant parameters in the model by setting different network pruning rates (70%, 80%, 90%). This approach aims to reduce the complexity of the model and the time required for computation, thus meeting the demand for high performance in real-world application scenarios. On the CIFAR-10 dataset, experiments based on different pruning strategies for the ResNet architecture were conducted and the corresponding results were obtained (see Table 2). The experiments show that the proposed method achieves recognition accuracies of 82.01%, 84.69%, and 76.11% at pruning rates of 70%, 80% and 90%, respectively. Particularly noteworthy is that when the pruning rate reaches 80%, the method improves the recognition accuracy by about 4% and 6% compared to the existing FedProx and FedDrop methods, which demonstrates that the module proposed in this study can effectively provide information that contributes to the pruning process, thus significantly improving the model performance. These results indicate that even at higher pruning rates, the performance of the model on a given task can be maintained or even enhanced by a well-designed pruning strategy, providing strong support for the creation of more compact and efficient neural networks.

Figure 8 and Figure 9 show the experimental comparison results of different pruning methods based on the MNIST dataset and CIFAR-10 dataset on the ResNet network architecture. To visualize and compare the effects of different pruning methods, the data collected from our experiments are organized according to different pruning methods and corresponding pruning rates and saved in CSV format via TensorBoard. These CSV files were read using Python scripts to extract the required precision data and other relevant information. This step involves cleaning and preprocessing the data as necessary to ensure the accuracy of the subsequent analysis. Graphs were created using the matplotlib library. We set different colors or line shapes for each pruning method and its corresponding pruning rate for easy differentiation and plot model accuracy curves for each method as training progresses. We add titles, axis labels, and other annotations and adjust the chart layout to ensure that all important details are visible. By analyzing the generated graphs, you can see how different pruning methods affect the model’s learning dynamics. Certain methods may show faster convergence initially but lower final accuracy, while other methods achieve higher stable accuracy despite slower convergence. In addition, the differences in the performance of various methods at specific pruning rates can be observed, which can help select the best pruning strategy that best suits the current application scenario.

4.5. Compression Effect Comparison

Comparison experiments were conducted on the CIFAR-10 dataset for the VGG16, ResNet, and ConvNeXt models, where four versions, 18, 20, 56, and 110, were selected for ResNet. Among them, the results of the VGG16 experiments are shown in Table 3.

According to the data in Table 1, the CP-FL method only decreases the accuracy by 1.2% when compressing the storage space of the weight parameters of the original model by 99.3 times. Compared with the MXQN method, the compression rate of CP-FL is improved by about 95.7 times. In addition, CP-FL improves the compression rate by about 53.6 times compared to the HFPQ joint compression method with similar accuracy loss.

The experimental results of VGG16 for joint compression using the CP-FL method on the CIFAR-10 dataset are shown in Figure 9 and Figure 10. Figure 9 and Figure 10 show the quantization bitwidths and compression rates of the convolutional (convolution, Conv) and fully connected (fully connected, FC) layers in terms of the network layers of the model, respectively.

As can be seen from Figure 10, a relatively high quantization bitwidth is adaptively chosen for the first layer of the model to retain richer low-dimensional information, while for the deeper layers of the network, a lower data bitwidth is required. For the shallow layer (the part close to the input) in the neural network, we tend to use a higher quantization bit width. This is because the shallow layer is responsible for extracting the underlying features from the raw data, which often contain a great deal of detail and information. Using a higher quantization accuracy can help to better preserve this low-dimensional information, thus ensuring that subsequent layers can base their calculations on more accurate data. In contrast, lower quantization bitwidths are acceptable in the deeper layers (the parts closer to the output). The deeper layers are mainly responsible for combining the high-level abstract features extracted from the previous layers to accomplish the final task, such as classification or regression. At this point, there is no need for the specific details of the raw data anymore, so even a small reduction in the precision of the numerical representation will not affect the overall result too much.

Figure 11 provides a visual comparison of the compression of storage resources consumed by each layer of the model in percentage form. As can be seen from the figure, as the number of layers deepens, the consumption of storage resources by each layer of the model usually exhibits a trend of gradually increasing compression. Shallow networks are mainly responsible for extracting low-level features from the input data, which tend to be more specific and detailed and thus require a high level of precision to retain enough information. Deep layer networks, on the other hand, deal with high-level features after multiple layers of abstraction, which are more generalized and require relatively low precision. As a result, greater compression rates can be employed at deeper layers without significantly affecting model performance. In addition, as the depth of the network increases, the connections between different layers may become more redundant. Certain parameters in the deeper layers have less impact on the final output and can be effectively compressed by reducing the number of these parameters or decreasing their precision. In contrast, the shallow layer has a relatively small compression space due to the direct processing of the original input data and stronger dependencies between its parameters. As the number of network layers increases, the model can compress at a higher rate, thus saving storage resources.

To better validate the effectiveness of the CP-FL compression method on different network models, in this paper, we conduct comparative experiments on ResNet18/20/56/110, and the experimental results are shown in Table 4.

According to the data in Table 4, using the CP-FL compression method can significantly improve the compression rate of the model with no more than 2.5% loss in accuracy, which shows a clear advantage over several other methods. Specifically, on ResNet18, the compression rate of CP-FL is 98.6 times higher than that of DDP, while the accuracy only decreases by 1.8%. For ResNet20, CP-FL improves the compression rate by 79.9× and 58.4× compared to DSQ and HAWQ, respectively, with an accuracy loss of at most 2.6%. Especially on ResNet56, the advantage of CP-FL is even more prominent: it improves the compression rate by 117.5 times compared to Hrank, with almost the same accuracy. These results show that CP-FL is not only able to achieve substantial model compression while maintaining high accuracy but also outperforms other compared methods on different network structures.

4.6. Privacy and Usability Testing

In this section, we evaluate the role of the CP-FL mechanism in balancing privacy and availability by introducing a differential privacy mechanism. In the scenario where federal learning is combined with differential privacy, choosing the appropriate privacy budget

ε

is a key issue. It is a parameter used to quantify the strength of data privacy protection. A small value of

ε

means greater privacy protection but at the expense of model accuracy; a large value of

ε

reduces the level of privacy protection, but it may improve model performance. In this thesis, we adjust the

ε

-value through many experiments and observe the change in the model performance under different settings.

Figure 12 and Figure 13 show the classification accuracies of the models in the DP-FL framework on the two datasets. Different differential privacy budgets are set for them. In the MNIST and DTB70 datasets,

ε

is set to 1, 2, 4, 6, and 8. the DTB70 dataset is more difficult to train, so a larger differential privacy budget is set, and

ε

is set to be from 2 to 10. The vertical axis of the figure represents the classification accuracy, and the horizontal axis represents different privacy budgets, i.e., different levels of privacy protection.

After continuous iterative training, each participant model achieved better classification accuracy. For example, on the MNIST dataset, the participant models achieved accuracies ranging from 66.53% to 86.62%. Accuracy from 65.53% to 79.62% was also achieved on the DTB70 dataset. Typically, the smaller the privacy budget, the more noise is added and the greater the negative impact on accuracy. Through the uploading and downloading of model parameters, participant models benefit from the parameters of other participant models and do not require data sharing. That is, users with small privacy budgets realize an increase in accuracy through interactive training with federated learning.

We reflect the training effect of the model with the change of accuracy vs. loss function, and the experimental results on the CIFAR-10 dataset are shown in Figure 14 and Figure 15 Approximating a uniformly assigned privacy budget of

ε = 4

for each user, the CP-FL mechanism shows an advantage over FedDrop with FedAvg under the rounds horizontal axis. Surprisingly, at a very high dropout rate of 80%, the CP-FL mechanism instead achieves almost optimal training results due to the fact that the high dropout rate saves a large amount of the privacy budget, thus leading to superior model usability while keeping privacy constant. Similarly, we also examined the impact that different privacy preservation levels bring to the usability of these methods by attempting to approximate the privacy budget as

ε = 2, 4, 8, 10

to express the usability in terms of the accuracy of the model. The experiments show that maintaining a high privacy-preserving level of the privacy budget

ε = 2

, the CP-FL mechanism can still be trained to obtain sizable model accuracies, exceeding other federated learning architectures at the same privacy-preserving level.

In addition to this, the CP-FL algorithm proposed in this paper has higher usability under the same privacy budget. With increasing privacy budget, the impact of model accuracy is gradually increasing. This is due to the fact that the increase in the privacy budget leads to a smaller size of the noise it adds, and the accuracy of the model gradually improves. The model accuracy of the CP-FL algorithm is significantly higher than the other algorithms on different datasets as the privacy budget increases, which can improve the model accuracy more significantly.

4.7. Ablation Experiment

To deeply explore the effect of joint compression techniques on model effectiveness, this study chooses the ResNet model architecture and conducts a set of rigorous ablation experiments based on the MNIST and CIFAR-10 datasets. Under the premise of ensuring the consistency of the model architecture and the same training protocol, the experiments specifically compare the effects of sparsification, quantization, and the joint application of the two (i.e., the joint compression method) on the training loss of the model and provide detailed statistics on the loss errors under different strategies.

Within this framework, the experiments were conducted using the training results of the federated averaging algorithm without any compression processing as the baseline, and subsequently, the versions of the models focusing on a single means of compression (sparsification only and quantization only) were generated within the CP-FL framework by disabling the quantization or sparsification steps individually as a way to directly compare their efficacy with the comprehensive compression scheme of CP-FL. According to the experimental process, the training loss values of all models show a gradual decrease with the number of training rounds. In particular, when the training reaches the preset number of rounds, 50, the CP-FL method exhibits a model loss level that is very close to that of the original uncompressed model, which demonstrates the significant advantage of CP-FL in maintaining the model prediction accuracy. In contrast, the sparsification or quantization compression strategies implemented alone, while also contributing to loss reduction, end up with relatively high loss levels, revealing the limitations of these two single approaches in maintaining the original accuracy of the model. As seen in Figure 16 and Figure 17, the joint compression approach can effectively preserve and even optimize the prediction accuracy of the model while improving its efficiency, providing strong support for efficient model deployment in resource-constrained environments.

5. Discussion

In this paper, we propose an innovative federated learning optimization strategy aimed at jointly constructing lightweight models to enhance the efficiency and accuracy of the models while protecting data privacy. Focusing on efficient communication and data transmission, as well as enhanced security and privacy protection mechanisms, the scheme aims to minimize the communication burden between UAVs and ground devices, ensure the security of data and model transmission and storage, and comply with relevant privacy regulations.

UAV-assisted federated learning is an important development direction in the field of federated learning. Through UAV networks, the challenges of ground terminals in terms of data collection and communication can be effectively addressed. Nevertheless, there are still some challenges to be overcome, especially in the energy management of UAVs. In order to improve the range time and energy utilization efficiency of UAVs, it is necessary to develop an intelligent energy management system. Specifically, this can be achieved by optimizing flight path planning and introducing solar charging technology. In addition to flight path planning and solar charging, research could further explore how to optimize the energy consumption patterns of UAVs using machine learning techniques. For example, predictive algorithms can be developed to dynamically adjust the UAV’s flight and operation strategies based on the demands of specific missions, as well as real-time weather conditions, in order to maximize the efficiency of energy use.

Drones may be used for illegal surveillance and invasion of personal privacy. There may be a risk of data leakage during the data collection process, resulting in personal information security issues. Formulate clear laws and regulations, stipulating that when using drones, personal privacy must not be violated, and unauthorized private territory flight and shooting are prohibited. Strengthen data protection measures and require drone operators to take necessary technical measures to ensure data security and prevent data leakage. Establish a registration system for the use of drones, and register all drones for commercial and personal use to ensure traceability.

The problem of motion coordination of unmanned aerial vehicles (UAVs) has received extensive attention in industry and academia in recent years due to its broad development prospects [48]. Its research covers a variety of application scenarios, including patrol, coordinated rescue, and continuous surveillance. According to different control strategies, UAV motion coordination can be divided into formation control, cluster control, containment control, and cooperative target encircling. However, the existing encirclement strategy can only make the UAV move around the target along a single-circle or multi-circle path with the same radius. Since the longer girths of these paths can lead to excessive energy consumption and are less efficient in multi-objective surround tasks, it is important to conduct targeted studies to optimize existing surround strategies.

Through these comprehensive measures, not only can the collaborative ability of drones in the network be enhanced, but their operational efficiency and durability in various complex environments can also be effectively improved. These research findings will open up new application prospects for drone-assisted federated learning and promote its widespread application in more practical scenarios.

6. Conclusions

In practical applications, drones equipped with lightweight models can capture high-definition images or video streams of equipment surfaces in real time during flight and immediately analyze and process them. Whether testing the structural integrity of a nuclear power plant or monitoring pipeline corrosion in an oil refinery, the lightweight model can quickly identify potential safety hazards or failure points. This instantaneous data processing capability enables UAV inspections to not only cover vast geographic areas but also penetrate into places that are difficult or dangerous to reach manually, greatly improving the coverage and efficiency of inspection work. We propose an innovative federated learning optimization strategy to improve model efficiency and accuracy while protecting data privacy during joint construction of lightweight models. First, by introducing a channel importance assessment mechanism, we convert model pruning into a constrained optimization problem, whereby we identify and remove network channels that contribute less to the model performance, effectively streamlining the model structure and reducing unnecessary communication overhead. Next, a quantization-aware training method is used, which is able to adaptively learn the optimal quantization bitwidth to ensure the best balance between feature representation accuracy and data complexity. In terms of privacy protection, differential privacy techniques are implemented to add noise to the model updates transmitted upstream to effectively counteract the risk of potential information leakage. After the model parameters are aggregated at the central node, the model is refined using the knowledge distillation technique to ensure that the amount of downlink data is reduced while the predictive ability and generalization performance of the model are not compromised.

Author Contributions

Conceptualization, Z.J. and R.W.; Methodology, Z.J. and R.W.; Software, Z.J.; Validation, Z.J.; Formal analysis, Z.J.; Investigation, Z.J.; Resources, R.W.; Data curation, Z.J. and R.W.; Writing—original draft, Z.J.; Writing—review & editing, Z.J.; Visualization, Z.J.; Supervision, R.W.; Funding acquisition, R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62271096 and Grant U20A20157; in part by the Natural Science Foundation of Chongqing of China under Grant CSTB2023NSCQLZX0134; and in part by the Doctoral Innovation Talent Program of Chongqing University of Posts and Telecommunications under Grant BYJS202203.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the anonymous reviewers for their thorough and constructive comments that have helped improve the quality of this paper.

Conflicts of Interest

All authors declare that there are no conflicts of interest regarding the publication of this article.

References

Klaib, A.F.; Alsrehin, N.O.; Melhem, W.Y.; Bashtawi, H.O.; Magableh, A.A. Eye tracking algorithms, techniques, tools, and applications with an emphasis on machine learning and Internet of Things technologies. Expert Syst. Appl. 2021, 166, 114037. [Google Scholar] [CrossRef]
Zhu, J.; Cao, J.; Saxena, D.; Jiang, S.; Ferradi, H. Blockchain-empowered federated learning: Challenges, solutions, and future directions. ACM Comput. Surv. 2023, 55, 1–31. [Google Scholar] [CrossRef]
Liu, Y.; Wang, J.; Li, J.; Niu, S.; Song, H. Machine learning for the detection and identification of Internet of Things devices: A survey. IEEE Internet Things J. 2021, 9, 298–320. [Google Scholar] [CrossRef]
Ma, Z.; Ma, J.; Miao, Y.; Li, Y.; Deng, R.H. ShieldFL: Mitigating Model Poisoning Attacks in Privacy-Preserving Federated Learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1639–1654. [Google Scholar] [CrossRef]
Ghimire, B.; Rawat, D.B. Recent advances on federated learning for cybersecurity and cybersecurity for federated learning for internet of things. IEEE Internet Things J. 2022, 9, 8229–8249. [Google Scholar] [CrossRef]
Jiang, M.; Wang, Z.; Dou, Q. Harmofl: Harmonizing local and global drifts in federated learning on heterogeneous medical images. Proc. AAAI Conf. Artif. Intell. 2022, 36, 1087–1095. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Li, Z.; Liu, J.; Wang, Y. Improvement of min-entropy evaluation based on pruning and quantized deep neural network. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1410–1420. [Google Scholar] [CrossRef]
Yi, M.K.; Lee, W.K.; Hwang, S.O. A human activity recognition method based on lightweight feature extraction combined with pruned and quantized CNN for wearable device. IEEE Trans. Consum. Electron. 2023, 69, 657–670. [Google Scholar] [CrossRef]
Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Poor, H.V. Federated learning for internet of things: A comprehensive survey. IEEE Commun. Surv. Tutor. 2021, 23, 1622–1658. [Google Scholar] [CrossRef]
Wibawa, F.; Catak, F.O.; Kuzlu, M.; Sarp, S.; Cali, U. Homomorphic encryption and federated learning based privacy-preserving cnn training: Covid-19 detection use-case. In Proceedings of the 2022 European Interdisciplinary Cybersecurity Conference, Barcelona, Spain, 15–16 June 2022; pp. 85–90. [Google Scholar]
AbdulRahman, S.; Ould-Slimane, H.; Chowdhury, R.; Mourad, A.; Talhi, C.; Guizani, M. Adaptive upgrade of client resources for improving the quality of federated learning model. IEEE Internet Things J. 2022, 10, 4677–4687. [Google Scholar] [CrossRef]
Dai, Z.; Zhang, Y.; Zhang, W.; Luo, X.; He, Z. A multi-agent collaborative environment learning method for UAV deployment and resource allocation. IEEE Trans. Signal Inf. Process. Over Netw. 2022, 8, 120–130. [Google Scholar] [CrossRef]
Akbari, M.; Syed, A.; Kennedy, W.S.; Erol-Kantarci, M. Constrained federated learning for AoI-limited SFC in UAV-Aided MEC for smart agriculture. IEEE Trans. Mach. Learn. Commun. Netw. 2023, 1, 277–295. [Google Scholar] [CrossRef]
Qian, L.P.; Li, M.; Ye, P.; Wang, Q.; Lin, B.; Wu, Y.; Yang, X. Secrecy-driven energy minimization in federated learning-assisted marine digital twin networks. IEEE Internet Things J. 2023, 11, 5155–5168. [Google Scholar] [CrossRef]
Tang, J.; Nie, J.; Zhang, Y.; Xiong, Z.; Jiang, W.; Guizani, M. Multi-UAV-assisted federated learning for energy-aware distributed edge training. IEEE Trans. Netw. Serv. Manag. 2023, 21, 280–294. [Google Scholar] [CrossRef]
Yang, S.; He, S.; Duan, H.; Chen, W.; Zhang, X.; Wu, T.; Yin, Y. APQ: Automated DNN Pruning and Quantization for ReRAM-based Accelerators. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 2498–2511. [Google Scholar] [CrossRef]
Gonzalez-Carabarin, L.; Huijben, I.A.; Veeling, B.; Schmid, A.; van Sloun, R.J. Dynamic probabilistic pruning: A general framework for hardware-constrained pruning at different granularities. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 733–744. [Google Scholar] [CrossRef]
Wiedemann, S.; Kirchhoffer, H.; Matlage, S.; Haase, P.; Marban, A.; Marinc, T.; Neumann, D.; Nguyen, T.; Schwarz, H.; Wiegand, T.; et al. Deepcabac: A universal compression algorithm for deep neural networks. IEEE J. Sel. Top. Signal Process. 2020, 14, 700–714. [Google Scholar] [CrossRef]
Marinó, G.C.; Petrini, A.; Malchiodi, D.; Frasca, M. Deep neural networks compression: A comparative survey and choice recommendations. Neurocomputing 2023, 520, 152–170. [Google Scholar] [CrossRef]
Kirchhoffer, H.; Haase, P.; Samek, W.; Muller, K.; Rezazadegan-Tavakoli, H.; Cricri, F.; Aksu, E.B.; Hannuksela, M.M.; Jiang, W.; Wang, W.; et al. Overview of the neural network compression and representation (NNR) standard. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3203–3216. [Google Scholar] [CrossRef]
Giannopoulos, A.E.; Spantideas, S.T.; Zetas, M.; Nomikos, N.; Trakadas, P. FedShip: Federated Over-the-Air Learning for Communication-Efficient and Privacy-Aware Smart Shipping in 6G Communications. IEEE Trans. Intell. Transp. Syst. 2024, 99, 1–16. [Google Scholar] [CrossRef]
Khan, L.U.; Saad, W.; Han, Z.; Hossain, E.; Hong, C.S. Federated learning for internet of things: Recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutor. 2021, 23, 1759–1799. [Google Scholar] [CrossRef]
Yin, X.; Zhu, Y.; Hu, J. A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions. ACM Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
Yu, R.; Li, P. Toward resource-efficient federated learning in mobile edge computing. IEEE Netw. 2021, 35, 148–155. [Google Scholar] [CrossRef]
Song, M.; Wang, Z.; Zhang, Z.; Song, Y.; Wang, Q.; Ren, J.; Qi, H. Analyzing user-level privacy attack against federated learning. IEEE J. Sel. Areas Commun. 2020, 38, 2430–2444. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, J.; Yang, M.; Wang, T.; Wang, N.; Lyu, L.; Niyato, D.; Lam, K.Y. Local differential privacy based federated learning for Internet of Things. IEEE Internet Things J. 2020, 8, 8836–8853. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, L.; Ma, C.; Li, J.; Wang, J.; Wang, Q.; Yu, S. LSFL: A lightweight and secure federated learning scheme for edge computing. IEEE Trans. Inf. Forensics Secur. 2022, 18, 365–379. [Google Scholar] [CrossRef]
Qu, L.; Song, S.; Tsui, C.Y. Feddq: Communication-efficient federated learning with descending quantization. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 281–286. [Google Scholar]
Wu, Z.; Sun, S.; Wang, Y.; Liu, M.; Pan, Q.; Jiang, X.; Gao, B. Fedict: Federated multi-task distillation for multi-access edge computing. IEEE Trans. Parallel Distrib. Syst. 2023, 35, 1107–1121. [Google Scholar] [CrossRef]
Li, Y.; Zhang, J.; Zhu, J.; Li, W. HBMD-FL: Heterogeneous federated learning algorithm based on blockchain and model distillation. In International Symposium on Emerging Information Security and Applications; Springer Nature: Cham, Switzerland, 2022; pp. 145–159. [Google Scholar]
Zhou, X.; Zheng, X.; Cui, X.; Shi, J.; Liang, W.; Yan, Z.; Yang, L.T.; Shimizu, S.; Wang, K.I.-K. Digital twin enhanced federated reinforcement learning with lightweight knowledge distillation in mobile networks. IEEE J. Sel. Areas Commun. 2023, 41, 3191–3211. [Google Scholar] [CrossRef]
Itahara, S.; Nishio, T.; Koda, Y.; Morikura, M.; Yamamoto, K. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data. IEEE Trans. Mob. Comput. 2021, 22, 191–205. [Google Scholar] [CrossRef]
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-efficient learning of deep networks from decentralized data. arXiv 2016, arXiv:1602.05629. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Caldas, S.; Konečny, J.; McMahan, H.B.; Talwalkar, A. Expanding the Reach of Federated Learning by Reducing Client Resource Requirements. arXiv 2018, arXiv:1812.07210. [Google Scholar]
Wei, K.; Li, J.; Ding, M.; Ma, C.; Su, H.; Zhang, B.; Poor, H.V. User-level privacy-preserving federated learning: Analysis and performance optimization. IEEE Trans. Mob. Comput. 2021, 21, 3388–3401. [Google Scholar] [CrossRef]
Noble, M.; Bellet, A.; Dieuleveut, A. Differentially private federated learning on heterogeneous data. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual Conference, 28–30 March 2022; pp. 10110–10145. [Google Scholar]
Sun, Q.; Cao, S.; Chen, Z. Filter pruning via automatic pruning rate search. In Proceedings of the Asian Conference on Computer Vision, Macao, China, 4–8 December 2022; pp. 4293–4309. [Google Scholar]
Lin, M.; Ji, R.; Wang, Y.; Zhang, Y.; Zhang, B.; Tian, Y.; Shao, L. Hrank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1529–1538. [Google Scholar]
Huang, C.; Liu, P.; Fang, L. MXQN: Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks. Appl. Intell. 2021, 51, 4561–4574. [Google Scholar] [CrossRef]
Gong, R.; Liu, X.; Jiang, S.; Li, T.; Hu, P.; Lin, J.; Yu, F.; Yan, J. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4852–4861. [Google Scholar]
Razani, R.; Morin, G.; Sari, E.; Nia, V.P. Adaptive binary-ternary quantization. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference, 19–25 June 2021; pp. 4613–4618.
Li, Y.; Dong, X.; Wang, W. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. arXiv 2019, arXiv:1909.13144. [Google Scholar]
Dong, Z.; Yao, Z.; Gholami, A.; Mahoney, M.W.; Keutzer, K. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 293–302. [Google Scholar]
Zhu, F.; Gong, R.; Yu, F.; Liu, X.; Wang, Y.; Li, Z.; Yang, X.; Yan, J. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1969–1979. [Google Scholar]
Fan, Y.; Pang, W.; Lu, S. HFPQ: Deep neural network compression by hardware-friendly pruning-quantization. Appl. Intell. 2021, 51, 7016–7028. [Google Scholar] [CrossRef]
Yang, H.; Gui, S.; Zhu, Y.; Liu, J. Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2178–2188. [Google Scholar]
Mei, Z.; Shao, X.; Xia, Y.; Liu, J. Enhanced Fixed-time Collision-free Elliptical Circumnavigation Coordination for UAVs. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 4257–4270. [Google Scholar] [CrossRef]

Figure 1. Example of a workplace for UAV-assisted industrial quality inspection.

Figure 2. The architecture of CP-FL.

Figure 5. Diagram of the ConvNeXt network model (using depth-separable convolution to decouple the fusion of spatial information and the fusion of channel information, expanding the overall width of the model).

Figure 6. Impact of different schemes on the performance of ResNet on MNIST.

Figure 7. Impact of different schemes on the performance of ResNet on CIFAR-10.

Figure 8. Performance test of model with different pruning rate based on MNIST dataset.

Figure 9. Performance test of models with different pruning rates based on CIFAR-10 dataset.

Figure 10. Layer-wise quantization bit statistics for VGG16.

Figure 11. Layer-wise compression ratio statistics for VGG16.

Figure 12. Classification accuracy of the CP-FL framework for participant models on the MNIST dataset.

Figure 13. Classification accuracy of the CP-FL framework for participant models on the DTB70 dataset.

Figure 14. Variation in accuracy on the CIFAR-10 dataset.

Figure 15. Variation in the loss function on the CIFAR-10 dataset.

Figure 16. Loss function vs. rounds for compression methods on MNIST.

Figure 17. Loss function vs. rounds for compression method on CIFAR-10.

Table 1. UAV-related model parameters.

Parameters	Definition	Value
$β$	Coefficient of weight	0.8
$e^{\max}$	Iteration cycle number	100
$R_{\max} / m$	Maximum coverage	25
$R^{m} / m$	Minimum distance between UAVs	10
$H / m$	Altitude of flight	15
$B / MHz$	Bandwidth	1
$P_{n} / W$	Transmit power	0.1
$h_{0} / dB$	Gain of channel	−40
$σ^{2} / dBm$	Power of noise	−80

Table 2. Comparison of results with different pruning rates on the CIFAR-10 dataset.

Scheme	70% Pruning Rate	80% Pruning Rate	90% Pruning Rate
FedAvg	80 ± 0.25	82 ± 0.64	70 ± 0.57
FedProx	78 ± 0.43	80 ± 0.34	71 ± 0.25
FedDrop	77 ± 0.62	78 ± 0.33	73 ± 0.82
CP-FL	82 ± 0.01	84 ± 0.69	76 ± 0.11

Table 3. Comparison of compression results of VGG16 on CIFAR-10 dataset.

Compression Method	Acc. %	Ave. Bits	Comp. Ratio
APRS	0.9	32.0	5.1×
Hrank	−2.7	32.0	12.5×
MXQN	0.4	9.0	3.6×
DSQ	−0.1	1.0	32.0×
SQ	0.2	5.7	25.1×
DPP	0.4	8.0	25.6×
HFPQ	1.1	5.0	45.7×
CP-FL	1.3	4.1	99.3×

Table 4. Comparison of compression results of ResNet and ConvNeXt on CIFAR-10 dataset.

ResNet or ConvNeXt	Compression Method	Acc. %	Ave. Bits	Comp. Ratio
ResNet 18	DDP	−0.2	32.0	3.7×
ResNet 18	CP-FL	1.6	3.9	102.3×
ResNet 20	DSQ	0.5	1.0	3.6×
ResNet 20	APOT	0.6	2.0	32.0×
ResNet 20	HAWQ	0.2	2.1	25.1×
ResNet 20	CP-FL	2.1	4.3	83.5×
ResNet 56	Hrank	2.5	5.0	5.7×
ResNet 56	CP-FL	2.3	4.4	123.2×
ResNet 110	APRS	−0.4	32.0	3.2×
ResNet 110	Hrank	0.9	32.0	3.2×
ResNet 110	CP-FL	1.1	4.0	110.5×
ConvNeXt	U.INT8	−1.2	8.0	8×
ConvNeXt	CP-FL	0.9	4.2	28.7×

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jing, Z.; Wang, R. Construction of a Deep Learning Model for Unmanned Aerial Vehicle-Assisted Safe Lightweight Industrial Quality Inspection in Complex Environments. Drones 2024, 8, 707. https://doi.org/10.3390/drones8120707

AMA Style

Jing Z, Wang R. Construction of a Deep Learning Model for Unmanned Aerial Vehicle-Assisted Safe Lightweight Industrial Quality Inspection in Complex Environments. Drones. 2024; 8(12):707. https://doi.org/10.3390/drones8120707

Chicago/Turabian Style

Jing, Zhongyuan, and Ruyan Wang. 2024. "Construction of a Deep Learning Model for Unmanned Aerial Vehicle-Assisted Safe Lightweight Industrial Quality Inspection in Complex Environments" Drones 8, no. 12: 707. https://doi.org/10.3390/drones8120707

APA Style

Jing, Z., & Wang, R. (2024). Construction of a Deep Learning Model for Unmanned Aerial Vehicle-Assisted Safe Lightweight Industrial Quality Inspection in Complex Environments. Drones, 8(12), 707. https://doi.org/10.3390/drones8120707

Article Menu

Construction of a Deep Learning Model for Unmanned Aerial Vehicle-Assisted Safe Lightweight Industrial Quality Inspection in Complex Environments

Abstract

1. Introduction

2. Related Work

2.1. UAV-Assisted Federal Learning Algorithms

2.2. Model Compression and Quantization

2.3. Federated Learning with Differential Privacy

2.4. Federated Learning Combined with Knowledge Distillation

3. Problem Statement and Proposed Scheme

3.1. Threat Modeling and Designing Program Goals

3.1.1. System Architecture

3.1.2. Threat Model

3.1.3. Design Objectives

3.2. Framework Design

3.3. Model Pruning and Quantization

3.4. Protection of Model Parameters

3.5. Model Aggregation and Knowledge Distillation on Pilot UAVs

3.6. Security Analysis

4. Experiments

4.1. Experimental Environment and Setup

4.2. Knowledge Distillation Temperature Experiment

4.3. Model Performance Testing

4.4. Performance Testing Under Different Pruning Rates

4.5. Compression Effect Comparison

4.6. Privacy and Usability Testing

4.7. Ablation Experiment

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI