Open AccessArticle

Toward 6G: Latency-Optimized MEC Systems with UAV and RIS Integration

Abdullah Alshahrani

Department of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah 23218, Saudi Arabia

Mathematics 2025, 13(5), 871; https://doi.org/10.3390/math13050871

Submission received: 12 February 2025 / Revised: 1 March 2025 / Accepted: 4 March 2025 / Published: 5 March 2025

Download

Browse Figures

Figure 1
Framework of the proposed algorithm. "> Figure 2
Average rewards vs. no. of episodes. "> Figure 3
The total time delay according to different schemes vs. <math display="inline"><semantics> <mi mathvariant="script">F</mi> </semantics></math>m,k, with K = 100 and <math display="inline"><semantics> <msub> <mi>ζ</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> </semantics></math> = 30 Giga cycles/s. "> Figure 4
The total time delay according to different schemes vs. <math display="inline"><semantics> <msub> <mi>ζ</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> </semantics></math>, with K = 100 and <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> </semantics></math> = 600 cycles/bit. "> Figure 5
The total time delay vs. no. of UEs. "> Figure 6
Task completion ratio vs. no. of UEs. ">

Versions Notes

Abstract

Multi-access edge computing (MEC) has emerged as a cornerstone technology for deploying 6G network services, offering efficient computation and ultra-low-latency communication. The integration of unmanned aerial vehicles (UAVs) and reconfigurable intelligent surfaces (RISs) further enhances wireless propagation, capacity, and coverage, presenting a transformative paradigm for next-generation networks. This paper addresses the critical challenge of task offloading and resource allocation in an MEC-based system, where a massive MIMO base station, serving multiple macro-cells, hosts the MEC server with support from a UAV-equipped RIS. We propose an optimization framework to minimize task execution latency for user equipment (UE) by jointly optimizing task offloading and communication resource allocation within this UAV-assisted, RIS-aided network. By modeling this problem as a Markov decision process (MDP) with a discrete-continuous hybrid action space, we develop a deep reinforcement learning (DRL) algorithm leveraging a hybrid space representation to solve it effectively. Extensive simulations validate the superiority of the proposed method, demonstrating significant latency reductions compared to state-of-the-art approaches, thereby advancing the feasibility of MEC in 6G networks.

Keywords:

network protocols; artificial neural networks and deep learning; deep reinforcement learning; task offloading; mobile edge computing; reconfigurable intelligent surfaces; unmanned aerial vehicle; problem-solving in the context of artificial intelligence

MSC:

68M12; 68T07; 68T20

1. Introduction

The rapid advancement of wireless technologies has driven the proliferation of portable intelligent devices, marking the dawn of the Internet of Things (IoT) era. According to the International Telecommunication Union (ITU), global mobile subscribers and data traffic are projected to grow substantially in the coming years, with mobile data traffic expected to reach approximately 5 zettabytes per month by 2030 [1]. This explosive growth, combined with the rising demand for data-intensive applications such as extended reality (XR), autonomous vehicles, and industrial automation, poses significant challenges to the capacity and sustainability of current 5G networks. To address the requirements of these emerging applications, future networks must achieve ultra-low latency and unparalleled reliability. Consequently, the development of 6G systems has become imperative to ensure robust, seamless, and efficient connectivity for the next generation of communication technologies (Ahmed et al. [2]).

The evolution toward 6G networks introduces significant advancements in communication efficiency and intelligent data processing, enabling seamless support for emerging technologies and applications. A crucial component of this vision is multi-access edge computing (MEC), which offloads computationally intensive tasks from resource-constrained Internet of Things (IoT) devices to edge servers. By processing data closer to the source, MEC effectively minimizes latency and reduces energy consumption, addressing two critical challenges faced by modern IoT ecosystems. As next-generation networks aim to deliver ultra-reliable, low-latency, and energy-efficient communication, MEC stands out as a foundational technology (Tariq et al. [3]). Its integration with advanced 6G capabilities ensures robust performance for diverse applications, ranging from real-time analytics to autonomous operations.

The deployment of 6G networks demands innovations to improve wireless channel propagation, particularly in overcoming high penetration losses associated with THz frequencies in complex environments. To address these challenges, solutions such as massive multiple-input multiple-output (mMIMO) and hybrid analog-digital beamforming have been proposed, ensuring efficient and reliable communication (Ahmed et al. [4]). However, designing efficient multi-antenna transceivers for THz bandwidth introduces significant challenges due to the complexity of beamforming in such high-frequency domains. To tackle these issues, UAV-based communications, where unmanned aerial vehicles act as flying base stations to establish line-of-sight (LoS) links and reconfigurable intelligent surface (RIS)-assisted environments, which dynamically control signal propagation to enhance physical layer communication, have been proposed. These innovations aim to overcome the limitations of THz communication while enhancing network performance. Moreover, UAVs improve signal strength and network efficiency by providing flexible and dynamic communication links, while RIS, with its programmable metasurfaces, mitigates path loss and channel sparsity by redirecting signals to strengthen connections between base stations and users (Mehrabian and Wong [5]). By combining these technologies, RIS-assisted UAV communications effectively leverage the strengths of both approaches, offering a powerful solution for addressing key challenges in next-generation networks and garnering significant research attention (Xu et al. [6]).

However, despite these advantages, the practical deployment of RIS and UAVs introduces several challenges. UAV-based communication is constrained by high power consumption, limited battery life, and stringent airspace regulations, which impact operational feasibility (Banafaa et al. [7]). Additionally, RIS requires real-time reconfigurability and advanced control mechanisms to dynamically adjust signal propagation, posing implementation difficulties. Efficient energy management strategies, regulatory compliance frameworks, and intelligent control algorithms are essential to overcoming these obstacles and fully realizing the potential of RIS-UAV-assisted networks.

The integration of MEC solutions with RIS-assisted UAV communications holds immense potential for next-generation wireless networks. This approach combines the computational efficiency of MEC with the enhanced signal propagation and flexibility of RIS and UAV technologies, paving the way for innovative solutions to meet the stringent demands of future wireless systems.

1.1. Contributions

The implementation of MEC systems assisted by both UAV and RIS technologies has garnered significant attention from the research community due to their potential to enhance network performance. Recent studies (Xu et al. [8]), Hu et al. [9] have demonstrated that the complete integration of these technologies offers substantial advantages compared to the exclusive use of either UAV-based MEC systems or RIS-assisted MEC systems (Wang et al. [10]). However, most existing works that integrate UAVs and RIS have primarily considered RIS mounted on the facades of buildings. This limitation underscores the need for exploring alternative RIS deployment strategies, such as UAV-assisted RIS, to fully realize the potential of these combined technologies.

Under these perspectives, this paper addresses the optimization problem of task offloading and resource allocation in MEC systems supported by a RIS-equipped UAV. The proposed optimization problem considers practical constraints, assuming perfect CSI at the MIMO base station (MBS), UAV energy limitations affecting its movement and service time, and configurable RIS phase shifts without additional delay overhead. These constraints ensure feasibility in real-world deployments.

The key contributions of this work to the current state of the art are as follows:

We propose a novel optimization framework for a MEC system hosted within a massive MBS and assisted by a RIS-equipped UAV capable of flying within the coverage area. For this system, we formulate an optimization problem aimed at minimizing system latency by jointly optimizing the power allocation for each user, user association, phase shift configuration of RIS reflecting elements, and computing resource allocation at the MBS, all subject to the MBS’s computing resource constraints and QoS requirements. Moreover, this optimization problem is formulated into a Markov decision process (MDP).
Addressing the challenge of discrete-continuous hybrid action spaces, the paper proposes a unique solution that integrates a mechanism to represent discrete actions using an embedding table and leverages a conditional variational autoencoder to handle continuous actions. By integrating these into a unified hybrid action space, the framework leverages the twin delayed deep deterministic policy gradient (TD3) algorithm to solve the joint optimization problem effectively.
Extensive simulation results reveal that the proposed algorithm, when compared with benchmark approaches, effectively minimizes the total latency for executing tasks across all users.

1.2. Paper Organization

The remainder of this paper is structured as follows: Section 2 provides a review of related work. Section 3 describes the system model under consideration, followed by the formulation of the optimization problem for latency minimization in Section 4. The proposed methodology is detailed in Section 5, while Section 6 evaluates the effectiveness of the proposed algorithm in minimizing total system latency. Finally, Section 7 presents the conclusion and discusses potential future research directions.

2. Related Work

The integration of line-of-sight (LoS) transmissions enabled by UAVs with the smart and controllable propagation environment created by reconfigurable intelligent surfaces (RIS) is a growing research focus for next-generation wireless networks. UAV-assisted MEC systems have demonstrated significant potential to enhance network computing capacity dynamically and support emergency scenarios. Task offloading in such systems has become a critical research area, addressing challenges in resource allocation optimization and ensuring network efficiency.

Recent advancements have leveraged UAVs in 5G networks to enhance task offloading, addressing critical challenges like resource constraints and latency. Ning et al. [11] proposed a 5G-enabled task offloading network that clusters users geographically and models the offloading problem as a mixed-integer non-linear programming (MINLP) problem to maximize throughput. Similarly, Song et al. [12] introduced an evolutionary multi-objective RL algorithm to optimize UAV trajectory and task offloading, with objectives to minimize task delay, reduce UAV energy consumption, and maximize task collection efficiency. These works highlight the growing focus on integrating UAVs for efficient task management in next-generation networks.

Task dependencies have gained significant attention in UAV-assisted MEC systems due to their impact on resource allocation and system efficiency. Xu et al. [13] addressed this by formulating a joint optimization problem for resource allocation and UAV trajectory design, aiming to minimize system energy consumption while meeting delay and dependency constraints. Their approach combined dynamic programming and optimization algorithms to provide efficient solutions. However, their work was limited to a single UAV with constrained power and computational resources, underscoring the need for scalable and resource-efficient alternatives to support more complex and demanding scenarios.

Multi-UAV MEC systems address these limitations by enabling resource sharing and improving task offloading efficiency. For example, Goudarzi et al. [14] proposed a two-layer framework incorporating a queue-based algorithm to minimize delay, while Bai et al. [15] tackled non-convex load balancing challenges using Lyapunov optimization for online decision-making. Furthermore, Zhao et al. [16] introduced a multi-agent DRL approach based on the twin delayed deep deterministic policy gradient (TD3) algorithm to optimize UAV trajectories, power allocation, and task offloading, achieving significant cost reductions and scalability.

RIS technology has been explored to enhance MEC systems. Wang et al. [10] proposed a RIS-aided MEC system for heterogeneous networks (HetNet), focusing on the joint optimization of caching, task offloading, computing resources, and RIS/base station resource allocation. To address the NP-hard nature of this problem, they developed a two-stage optimization algorithm. Their findings indicate that integrating RIS into MEC HetNet systems can significantly reduce task computing delays, highlighting the potential of RIS to improve system performance.

The integration of RIS and UAV technologies has proven to significantly enhance MEC system performance through joint optimization. Xu et al. [8] proposed a RIS-enhanced, UAV-assisted MEC framework utilizing NOMA communication, focusing on optimizing RIS phase shifts, resource allocation, decoding order, and UAV deployment. Building on this, Hu et al. [9] incorporated dynamic UAV trajectories into the framework, further optimizing computation/offloading bits, RIS phase shifts, bandwidth allocation, and UAV trajectories. These advancements have demonstrated improved computation capacity, underscoring the effectiveness of combining RIS and UAV technologies in MEC systems for superior performance and resource efficiency.

3. System Model

We consider a mMIMO communication system that extends coverage between a MIMO base station (MBS) and distributed users, such as mobile users, vehicles, and IoT sensors, with the assistance of a RIS-equipped UAV. This integration significantly enhances network performance and quality of service (QoS), especially in areas affected by propagation blockages, improving reliability for UEs beyond direct MBS coverage (Zhai et al. [17]). In this system, the MBS is equipped with a large antenna array of L elements to serve K single-antenna users, which are grouped into M clusters represented by

K_{U} = {K_{1}, K_{2}, \dots, K_{M}}

. Each cluster has a varying number of users. The RIS-equipped UAV functions as a small-cell base station, with an RIS panel of N discrete elements reflecting signals from UEs to the MBS. The (m, k)-th UE denotes the k-th UE in the m-th cluster, where

u_{m, k}

is the user association indicator, determining whether the UE offloads its computing task to the MEC server (

u_{m, k} = 1

for offloading,

u_{m, k} = 0

otherwise). User associations are represented by the vector

u_{m} = {[u_{m, k}]}_{k = 1}^{K_{m}}

for cluster m and the matrix

u = {[u_{m}]}_{m = 1}^{M}

for all clusters, it is expressed as follows:

\begin{matrix} u_{m, k} = \{\begin{matrix} 1 & if computing task offloading is needed, \\ 0 & otherwise . \end{matrix} \end{matrix}

(1)

A time division multiple access (TDMA) scheme is employed when the UAV serves the m-th cluster, allowing users to offload tasks to the MBS. The RIS plays a critical role by steering beams to reflect only the desired signal toward the MBS, minimizing interference from other users and ensuring efficient communication (Di Renzo et al. [18]).

3.1. Channel Model

We consider a 3D Cartesian coordinate system to represent the spatial positions of the MBS, UAV-RIS, and users. The MBS is located at

(x_{0}, y_{0}, H_{0})

, where

H_{0}

denotes the antenna height, while the UAV-RIS is positioned at

(x_{U}, y_{U}, H_{U})

, with

H_{U}

being the UAV’s altitude. The users (UEs) are distributed at ground level, with their positions represented as

(x_{k}, y_{k}, 0)

k = 1, 2, \dots, K

. The positions of all entities, including the MBS, UAV-RIS, and UEs, are assumed to be accurately tracked using the GPS and stored locally at the MBS. Communication between the MBS and the UAV-RIS is modeled under a LoS assumption, ensuring high-quality signal propagation. For the channel between the MBS and UAV-RIS, we adopt a free-space path loss model to describe signal attenuation. This model accounts for the LoS condition, and the specific mathematical expression for the free-space path loss model is expressed as follows:

β_{m, 0} = \frac{β_{0}}{d_{m, 0}^{2} + {(H_{0} - H_{m})}^{2}}, m = 1, \dots, M,

(2)

where

d_{m, 0} = \sqrt{{(x_{0} - x_{m})}^{2} + {(y_{0} - y_{m})}^{2}}

d_{0}

is defined as the distance, and

β_{0}

signifies the power gain associated with the wireless channel. Additionally,

H_{m}

refers to the altitude of the UAV as it hovers above the m-th cluster.

The communication channel between the UAV-RIS and the

(m, k)

-th UE, classified as an air-to-ground channel, is more complex due to the effects of propagation blockages. To address this, the path-loss formulation incorporates both the air-to-air and air-to-ground links, represented as follows:

β_{m, k} = P L_{m, k} + η_{L o S} P_{m, k}^{L o S} + η_{N L o S} P_{m, k}^{N L o S},

(3)

We define the average additional losses for the LoS and NLoS paths as

η_{LoS}

and

η_{NLoS}

, respectively. Based on these, the distance-based path loss can be expressed as follows:

P L_{m, k} = 10 log {(\frac{4 π f_{c} D_{m, k}}{c})}^{α},

(4)

Here,

α

is the path loss exponent with

α \geq 2

, while c represents the speed of light (in meters per second) and

f_{c}

denotes the carrier frequency (in Hertz). The Euclidean distance from the UAV-RIS to the

(m, k)

-th UE is represented as

d_{m, k}

, and the total Euclidean distance, considering the UAV height

H_{m}

, is given by

D_{m, k} = \sqrt{d_{m, k}^{2} + H_{m}^{2}}

. As a result, the probability of LoS given as follows (Al-Hourani et al. [19]):

P_{m, k}^{L o S} = \frac{1}{1 + a exp [- b (\arctan (\frac{H_{m}}{d_{m, k}}) - a)]}

(5)

where the values of the constants a and b are environment-dependent. Consequently, the probability of an NLoS connection can be expressed as

P_{m, k}^{NLoS} = 1 - P_{m, k}^{LoS}

. The free-space path loss model is chosen for the MBS-to-UAV-RIS link due to the highly probable LoS between the two, given the UAV-RIS’s elevated positioning and the absence of significant obstacles. This assumption aligns with established aerial communication models and ensures accurate signal attenuation estimation. In contrast, the UAV-to-UE link is more susceptible to environmental factors such as urban obstructions and terrain variations. To account for these uncertainties, a probabilistic LoS/NLoS model is employed, effectively capturing the dynamic nature of air-to-ground signal propagation in UAV-assisted networks.

For UEs requiring assistance from the RIS to communicate with the MBS, the small-scale fading coefficients for the channels are represented as

h_{m, k} \in C^{N \times 1}

for the link from the

(m, k)

-th UE to the UAV-RIS, and

h_{m, 0}^{H} \in C^{N \times L}

for the link from the UAV-RIS to the MBS, where

m \in M

and

k \in K_{m}

. These coefficients are assumed to be independent and identically distributed (i.i.d.) random variables with zero mean and unit variance. The superscript H denotes the conjugate transpose operation. Furthermore, we define

H_{m, k} \in C^{L \times N}

and

H_{m, 0}^{H} \in C^{L \times N}

as the channel matrices for the

(m, k)

-th UE to UAV-RIS link and the UAV-RIS to MBS link, respectively, where these matrices describe the small-scale fading effects over the respective multi-antenna channels are as follows:

H_{m, k} = \sqrt{β_{m, k}} h_{m, k}, and H_{m, 0}^{H} = \sqrt{β_{m, 0}} h_{m, 0}^{H} .

(6)

Hence, the cascaded channel matrix of the link from the

(m, k)

-th UE to the MBS via the UAV-RIS,

G_{m, k} \in C^{L \times 1}

, was described by Wu and Zhang [20]:

G_{m, k} = H_{m, 0}^{H} Φ_{m} H_{m, k}, m \in M,

(7)

where

Φ_{m} = diag [ϕ_{1 m}, ϕ_{2 m}, \dots, ϕ_{N m}]

represents the phase shift matrix at the UAV-RIS. Each element

ϕ_{n m} = α_{n m} e^{j θ_{n m}}

characterizes the reflection properties of the n-th reflecting element, where

α_{n m} \in [0, 1]

denotes the reflection amplitude, and

θ_{n m} \in [0, 2 π]

represents the phase shift. Here,

n = 1, 2, \dots, N

and

m \in M

. Assuming that the RIS reflecting elements modify only the phase of the reflected signals, we set

α_{n m} = 1

, ensuring maximum signal reflection without amplitude attenuation.

3.2. Transmission Scheme

Given that the

(m, k)

-th UE in the m-th group lacks a direct link to the MBS due to propagation blockages like large buildings, it offloads its computing task to the MBS by transmitting its signal via the RIS-equipped UAV. Consequently, the signal received at the MBS from the

(m, k)

-th UE is represented as follows:

\begin{matrix} y_{m, k} & = \sqrt{P_{m, k}} G_{m, k}^{H} f_{m, k} s_{m, k} \\ + \sum_{l = 1, l \neq k}^{K_{m}} \sqrt{P_{l, m}} G_{l, m}^{H} f_{l, m} s_{l, m} + n_{0}, \end{matrix}

(8)

here,

P_{m, k}

represents the transmit power of the

(m, k)

-th UE,

f_{m, k} \in C^{L \times 1}

is the MBS beamforming vector for the

(m, k)

-th UE, and

s_{m, k}

is the offloading information transmitted by the UE with

| s_{m, k} |^{2} \leq 1

. The AWGN at the MBS is denoted by

n_{0} \sim CN (0, σ_{0}^{2})

, and the maximum transmit power of the UE is

P_{m, k}^{\max}

. The first term in (8) represents the signal transmitted by the

(m, k)

-th UE via the RIS panel, while the second term captures intra-cell interference from other UEs in the m-th group transmitting simultaneously. To mitigate this interference, the zero-forcing (ZF) technique is defined as follows:

G_{m} = [G_{m, 1}, \dots, G_{m, K_{m}}] \in C^{L \times K_{m}} (m = 0, 1, \dots, M) .

(9)

As the eigenvalue distribution of the square matrix

G_{m}^{H} G_{m} \in C^{K_{m} \times K_{m}}

becomes increasingly deterministic with larger L, we leverage this property of the mMIMO system to design the beamforming vector

f_{m, k}

using the zero-forcing (ZF) technique, expressed as follows:

{\bar{f}}_{m} = G_{m} {(G_{m}^{H} G_{m})}^{- 1},

(10)

where

{\tilde{f}}_{m} = [{\tilde{f}}_{m, 1}, \dots, {\tilde{f}}_{m, K_{m}}] \in C^{L \times K_{m}}

{\tilde{f}}_{m, k} \in C^{L \times 1}

m = 0, 1, \dots, M

k \in K_{m}

. The matrix

{\tilde{f}}_{m}

is obtained from the ZF beamforming solution given in (10), where

{\bar{f}}_{m}

represents the precoding matrix computed based on the pseudo-inverse of the channel matrix

G_{m}

. Each column

{\tilde{f}}_{m, k}

corresponds to the beamforming vector for user k within cluster m. To ensure unit-norm beamforming vectors, we normalize

{\tilde{f}}_{m, k} = {\tilde{f}}_{m, k} / ∥ {\tilde{f}}_{m, k} ∥

and then compute the final beamforming vector

f_{m, k}

as follows:

f_{m, k} = \sqrt{p_{m, k}} {\tilde{f}}_{m, k}, m = 0, 1, \dots, M, k \in K_{m},

(11)

where

p_{m, k}

represents the power control coefficient of the

(m, k)

-th UE. Consequently, Equation (8) can be rewritten as follows and the intra-cell interference has been successfully canceled.

y_{m, k} = \sqrt{P_{m, k}} \sqrt{p_{m, k}} G_{m, k}^{H} {\tilde{f}}_{m, k} s_{m, k} + n_{0},

(12)

Let

p_{m} = {[p_{m, k}]}_{k = 1}^{K_{m}}

represent the power control coefficients, and

p = {[p_{m}]}_{m = 0}^{M}

aggregate these coefficients across all panels. Similarly, let

Φ = {[Φ_{m}]}_{m = 1}^{M}

denote the phase shift configuration of the RIS panels. The achievable throughput for the transmission of the

(m, k)

-th UE to the MBS can be expressed as follows:

\begin{matrix} R_{m, k} (p_{m, k}, Φ_{m}) = W {log}_{2} (1 + \frac{P_{m, k} p_{m, k} | G_{m, k}^{H} {\tilde{f}}_{m, k} |^{2}}{σ_{0}^{2}}), \end{matrix}

(13)

where W is the allocated bandwidth to the

(m, k)

-th UE.

3.3. Offloading Model

In the context of computation modeling, it is assumed that a specific task of size

I_{m, k}

can be processed either locally by the

(m, k)

-th UE or offloaded for execution to the MEC server housed at the MBS. Accordingly, two models are introduced to characterize the computation latency, as outlined below.

3.3.1. Local Computing

Let

F_{m, k}

denote the number of CPU cycles required to process each bit of the task for the

(m, k)

-th UE. Consequently, the time taken to execute the task locally can be calculated as follows, according to Raza et al. [21]:

T_{m, k}^{l} = \frac{I_{m, k} F_{m, k}}{c_{m, k}}, m = 0, 1, \dots, M, k \in K_{m},

(14)

where

c_{m, k}

denotes the upper limit of computational resources available to the

(m, k)

-th UE.

3.3.2. Offloading to MBS

Alternatively, when the task is offloaded from the

(m, k)

-th user equipment (UE) to the MBS, the transmission time for offloading must be considered, which is given as follows, according to Raza et al. [21]:

T_{m, k}^{t x} (p_{m, k}, Φ_{m}) = \frac{I_{m, k}}{R_{m, k} (p_{m, k}, Φ_{m})},

(15)

where

R_{m, k} (p_{m, k}, Φ_{m})

represents the communication rate defined in (11). After the task is successfully transmitted to the MBS, the computation time required to process the offloaded task at the MBS can be expressed as follows:

T_{m, k}^{c o m} (ζ_{m, k}^{b s}) = \frac{I_{m, k} F_{m, k}}{ζ_{m, k}^{b s}}, m = 0, 1, \dots, M, k \in K_{m},

(16)

where

ζ_{m, k}^{bs}

represents the computing capacity of the MBS allocated to process the task for the

(m, k)

-th UE. For simplicity, let

ζ_{m} = {[ζ_{m, k}^{bs}]}_{k = 1}^{K_{m}}

and

ζ = {[ζ_{m}]}_{m = 0}^{M}

represent the overall MBS computing capacity allocation. The total latency for executing the task of the

(m, k)

-th UE can therefore be expressed as follows:

\begin{matrix} T_{T o t a l} = T_{m, k}^{t o t} (p_{m, k}, u_{m, k}, Φ_{m}, ζ_{m, k}^{b s}) = (1 - u_{m, k}) T_{m, k}^{l} + \\ + u_{m, k} (T_{m, k}^{t x} (p_{m, k}, Φ_{m}) + T_{m, k}^{c o m} (ζ_{m, k}^{b s})) . \end{matrix}

(17)

The time required to transmit the computation results from the MBS back to the UEs can be neglected, as this latency is significantly smaller than the total latency incurred during task execution (Raza et al. [22,23]).

4. Problem Formulation

The UAV-RIS-assisted MEC model optimization aims to minimize total latency for task execution by jointly optimizing power allocation (

p

), user association (

u

), RIS phase shift matrix (

Φ

), and computing resource allocation (ı). Constraints ensure the MBS computing capacity is not exceeded and QoS requirements for all users are met. The problem is formulated to balance resources and reduce latency effectively. The detailed formulation is as follows:

\begin{matrix} (18) & min_{p, u, Φ, ζ} \sum_{m = 0}^{M} \sum_{k = 1}^{K_{m}} T_{T o t a l} \\ (18a) & s . t . 0 \leq p_{m, k} \leq 1, \\ (18b) & R_{m, k} (p_{m, k}, Φ_{m}) \geq {\bar{r}}_{0}, m = 0, 1, \dots, M, k \in K_{m}, \\ (18c) & 0 \leq θ_{n m} \leq 2 π, \forall n = 1, 2, \dots, N, m \in M, \\ (18d) & \sum_{k = 1}^{K_{m}} u_{m, k} ζ_{m, k}^{b s} \leq ζ_{m a x}, \end{matrix}

The constraints are defined as follows:

u_{m, k}

represents the user association coefficient as per (1). Constraint (18a) specifies the allowable power range for each user, while (18b) ensures that users meet the QoS requirement in terms of a minimum achievable uplink rate (

{\bar{r}}_{0}

). Constraint (18c) defines the valid range for each RIS phase-shift coefficient. Finally, (18d) reflects the MEC server’s computing resource limits that can be allocated to each UE.

5. Proposed Scheme

The proposed scheme aims to enhance task offloading efficiency in a UAV-RIS-assisted MEC model by systematically addressing decision-making complexities. By reformulating the problem as an MDP (Ahmed et al. [24]), we enabled a structured framework to optimize offloading decisions, power allocation, user associations, and resource distribution. The following sections outline this MDP formulation in detail:

5.1. MDP Formulation

In the UAV-RIS-assisted MEC model, we optimize the offloading decision, the power allocated by each user (

p

), user association (

u

), the phase shift matrix of the RIS (

Φ

), and computing resource allocation at the MBS (ı) subject to the MBS computing resource constraints and QoS requirements. The task offloading problem is modeled as an MDP, where the system state at the next time slot is determined solely by the current state and the action taken at the present time. In each time slot, the system observes the current state, selects an action based on this observation, and generates a reward that reflects the effectiveness of the action. The primary objective is to minimize system latency by utilizing an optimization scheme that effectively maps system states to latency-reducing actions.

5.1.1. Sate Space

The state captures essential information required for decision-making while avoiding redundancy. It includes three main factors: the task size (

I_{m, k}

), CPU cycles per bit (

F_{m, k}

), and the minimum required uplink rate (

{\bar{r}}_{0}

The task size (

I_{m, k}

) represents the data amount that needs to be processed, measured in bits. This directly impacts the computation and transmission latency. CPU cycles per bit (

F_{m, k}

) denote the computational effort required to process one bit of data, influencing the processing time both locally and at the MEC server. The minimum required uplink rate (

{\bar{r}}_{0}

) serves as a QoS constraint to ensure adequate communication performance. These three factors are mathematically expressed as follows:

s (t) = {I_{m, k}, F_{m, k}, {\bar{r}}_{0}} .

(19)

These components are sufficient to derive or calculate other interrelated variables, such as path loss, LoS probability, and computational capacity.

5.1.2. Action State

The action space includes the control variables necessary for optimizing system performance. The first component is the transmission power allocation (

p_{m, k}

), which specifies the transmission power for each user and is bounded by the user’s maximum allowable power:

0 \leq P_{m, k} \leq P_{m, k}^{\max} .

(20)

The second component, user association (

u_{m, k}

), determines whether the user’s task is offloaded to the MEC server or processed locally. The third component involves optimizing the phase shifts (

Φ_{m}

) of the RIS panel to improve signal quality. This is expressed in (7). Lastly, the MEC resource allocation (

ζ_{m, k}^{bs}

) determines the computational capacity assigned to each user, constrained by the MEC server’s total capacity as expressed in Equation (18d).

Thus, the action space is represented as follows:

a (t) = {p_{m, k}, u_{m, k}, Φ_{m}, ζ_{m, k}^{bs}} .

(21)

5.1.3. Reward Function

The reward function is structured to align with the objective of minimizing the total latency for all users. The total latency for a user

(m, k)

is defined as the sum of local computation, transmission, and MEC processing times as expressed in (17). To ensure the agent minimizes this latency, the reward function includes a latency term:

r (t) = - \sum_{m = 0}^{M} \sum_{k = 1}^{K_{m}} T_{m, k}^{tot} .

(22)

5.2. DRL-Based Algorithm

The proposed DRL-based algorithm addresses the challenges of the hybrid discrete-continuous action space in the MDP, which conventional DRL methods cannot handle effectively. Converting to a discrete space causes scalability issues, while using a continuous space increases approximation complexity, both of which degrade performance. To resolve this, a novel algorithm is designed specifically for hybrid action spaces, ensuring efficient and robust decision-making.

5.2.1. Hybrid Space Modeling in UAV-RIS-Assisted MEC

In the UAV-RIS-assisted MEC model, converting the MDP into a hybrid space representation facilitates efficient optimization of the hybrid discrete-continuous action space. This approach simplifies the complexity inherent in optimizing offloading decisions, user associations, power allocation, RIS phase shifts, and MEC resource allocations while ensuring scalability and effective policy learning.

The hybrid space is constructed by first encoding the discrete components of the action space, such as user association (

u_{m, k}

) and RIS phase shifts (

Φ_{m}

). An embedding table

G_{ω} \in R^{N_{u} \times l_{1}}

is utilized to map each discrete action to a continuous vector representation, ensuring that the discrete variables are represented compactly and consistently. Each embedding

g_{ω, u} = G_{ω} (u)

captures the essential features of the discrete actions, reducing the dimensionality and enabling effective integration with continuous variables.

For the continuous components of the action space, including transmission power (

p_{m, k}

) and MEC resource allocation (

ζ_{m, k}^{bs}

), a conditional variational autoencoder (VAE) is employed. The VAE encoder

q_{ϕ} (z | p, ı, s, g_{ω, u})

maps these variables into a hybrid representation

z \in R^{l_{2}}

, conditioned on the current state

s (t)

and the embeddings of discrete actions

g_{ω, u}

. This hybrid representation encapsulates the interdependencies between discrete and continuous variables, maintaining a unified and compact action representation.

The resulting hybrid representation combines the discrete embeddings and the continuous hybrid variables into a single hybrid vector:

h = [g_{ω, u}, z] \in R^{M l_{1} + l_{2}} .

(23)

This vector serves as the input to the reinforcement learning algorithm. During execution, the hybrid vector is decoded back into the original action space. Discrete actions (

u_{m, k}, Φ_{m}

) are retrieved through a nearest neighbor search in the embedding table:

u_{m, k} = arg min_{u} | | g_{ω, u} - g {| |}_{2},

(24)

while continuous actions (

p_{m, k}, ζ_{m, k}^{bs}

) are reconstructed using the VAE decoder

q_{ψ}

p, ı = q_{ψ} (z, s, g_{ω, u}) .

(25)

The hybrid space is trained to accurately reconstruct the original action space through a combined loss function. The reconstruction loss ensures the decoded actions match the originals:

L_{recon} = | | p - \hat{p} {| |}_{2}^{2} + | | ı - \hat{ı} {| |}_{2}^{2},

(26)

and the Kullback–Leibler (KL) divergence term regularizes the hybrid variables for smooth and meaningful representations:

L_{KL} = D_{KL} (q_{ϕ} (z | p, ı, s, g_{ω, u}) | | N (0, I)) .

(27)

The final training loss for the hybrid space is a combination of these terms:

L = L_{recon} + β L_{KL},

(28)

where

β

is a weighting parameter to balance reconstruction and regularization.

Integrating the hybrid space with reinforcement learning algorithms like TD3 ensures that the compact representation is directly used for policy and value function approximations. The actor network outputs the hybrid representation

h = [g_{ω, u}, z]

, while the critic evaluates the Q-value for the hybrid actions. This hybrid representation enables efficient optimization of the hybrid action space while preserving the relationships between discrete and continuous components.

The hybrid space significantly reduces the dimensionality of the action space, enhancing computational efficiency and convergence stability. Furthermore, it captures the relationships between discrete and continuous actions, improving the overall performance in dynamic and large-scale UAV-RIS-assisted MEC environments. This methodology not only simplifies hybrid action optimization but also ensures scalability and adaptability in complex, resource-constrained scenarios, making it ideal for minimizing latency and optimizing system performance in UAV-RIS MEC systems.

5.2.2. Long-Term Latency Minimization Algorithm for UAV-RIS-Assisted MEC Systems

The long-term latency minimization algorithm for UAV-RIS-assisted MEC systems integrates the hybrid action representation space with the TD3 algorithm to address the offloading problem. The TD3, designed for deterministic strategy, excels in handling high-dimensional continuous action spaces. Its key components include an actor network, which maps system states to actions and drives decision-making, and a critic network, which estimates rewards for specific state-action pairs to refine decisions. This integration ensures effective task offloading and optimization, aiming to minimize long-term system latency.

The proposed approach employs a hybrid action space representation to enhance decision-making in UAV-RIS-assisted MEC systems. The actor network utilizes the system state s as input and outputs a hybrid action vector

(g, z)

, where

g \in R^{M l_{1}}

and

z \in R^{l_{2}}

. A decoder processes

(g, z)

, mapping it to a hybrid action

a = {i_{m, k}, p_{m, k}}, \forall m, k

The hybrid action value is approximated using twin critic networks,

Q_{θ_{1}}

and

Q_{θ_{2}}

, which utilize the hybrid action a as input and evaluate its value. The training process leverages experience tuples

(s, a, r, s^{'})

stored in a replay buffer D. The critic networks are optimized through the Double Q-Learning algorithm, ensuring stability and precise value approximation.

The loss function for critic networks is defined as follows:

L_{C D Q} (θ_{j}) = E [{(ζ - Q_{θ_{j}} (s, g, z))}^{2}], \forall j = 1, 2,

(29)

where

ζ = r + γ min Q_{{\hat{θ}}_{j}} (s^{'}, π_{ζ} (s^{'}))

, and

{\hat{θ}}_{j}

represent the parameters of the target network.

The actor network, employing a hybrid policy, is updated using the deterministic policy gradient method, expressed as follows:

\nabla_{ζ} J (ζ) = E [\nabla_{π_{ζ} (s)} Q_{θ_{1}} (s, π_{ζ} (s)) \nabla_{ζ} π_{ζ} (s)] .

(30)

By integrating the hybrid representation space with the TD3, we propose a novel approach to address the optimization problem in UAV-RIS-assisted MEC systems. This approach effectively balances exploration and exploitation in hybrid action spaces, ensuring robust performance in dynamic network environments.

In Algorithm 1, the actor and critic networks, along with the discrete action table and VAE parameters, are initialized to manage the hybrid action space (lines 1–2). The initial system state and replay buffer are then prepared (lines 3–4). A warm-up phase follows, where the VAE parameters are trained using reconstruction and regularization losses (lines 5–7). During each environment step, the current state is observed, and hybrid actions are generated by the actor network with added exploration noise (lines 8–10). These hybrid actions are decoded into discrete actions (e.g., user association and RIS phase shifts) and continuous actions (e.g., power and resource allocation) for execution, producing rewards and the next state (lines 11–13). Transition tuples are stored in the replay buffer for training (line 14). Critic networks are updated using loss functions, while the actor network is refined via policy gradients (lines 15–17). Representation parameters are periodically updated for efficient action decoding (lines 18–20). The algorithm iterates until convergence or the maximum steps are reached (lines 8–22). The framework of the proposed algorithm is illustrated in Figure 1.

Algorithm 1 Proposed algorithm for the UAV-RIS-assisted MEC optimization.

1:: Initialize the actor network $π_{ζ}$ and critic networks $Q_{θ_{1}}, Q_{θ_{2}}$ with $ζ, θ_{1}, θ_{2}$ parameters;
2:: Initialize embedding table $G_{ω}$ and $ϕ, ψ, ω$ ;
3:: Initialize system state $s_{1}$ as in Equation (19);
4:: Initialize a replay buffer, denoted as $D$ ;
5:: while warm-up iterations not complete do
6:: Update $ω, ϕ, ψ$ with samples in $D$ as per Equation (28);
7:: end while
8:: while environment steps not complete do
9:: Monitor the current system state s;
10:: Choose hybrid actions by actor network
11:: $g, z = π_{ζ} (s) + ϵ_{g}, ϵ_{g} \sim N (0, σ)$ ;
12:: Decode $u_{m, k}, Φ_{m} = f_{D} (g), p_{m, k}, ζ_{m, k}^{bs} = q_{ψ} (z, s, g)$ as per (20), (21), (24), and (25);
13:: Execute ${u_{m, k}, Φ_{m}, p_{m, k}, ζ_{m, k}^{bs}}$ , obtain reward r as per Equation (22) and next state $s^{'}$ ;
14:: Store $(s, g, z, r, s^{'})$ in $D$ ;
15:: Analyze hybrid actions using the critic network.
16:: Sample mini-batch from $D$ ;
17:: Update $Q_{θ_{1}}, Q_{θ_{2}}$ as per Equation (29);
18:: Update $π_{ζ}$ with according to Equation (30);
19:: while training iterations not complete do
20:: Update $ω, ϕ, ψ$ using $D$ as per Equation (28);
21:: end while
22:: end while

5.2.3. Computation Complexity

The computational complexity of the proposed algorithm is analyzed across two key phases: the warm-up phase and the environment interaction phase. The warm-up phase involves updating the parameters

ω, φ, ψ

using stored samples from the replay buffer

D

via stochastic gradient descent (SGD). Let

N_{warm}

denote the number of warm-up iterations, and assume each neural network layer has

O (F)

parameters with L layers. The complexity of this phase is approximately

O (N_{warm} \cdot L \cdot F^{2})

In the environment interaction phase, the actor network

π_{ζ}

selects hybrid actions with a complexity of

O (F)

. Decoding and retrieving discrete actions (e.g., user association

u_{m, k}

, RIS phase shifts

Φ_{m}

) and reconstructing continuous actions (e.g., power allocation

p_{m, k}

, computing resource allocation

ζ_{m, k}^{b s}

) require matrix operations and nonlinear activations, leading to

O (F^{2})

. The reward computation in (22) incurs a negligible

O (1)

cost.

Once the transition

(s, g, z, r, s^{'})

is stored in

D

, the critic networks

Q_{θ_{1}}, Q_{θ_{2}}

update using mini-batches of size B, following (29), with a complexity of

O (B \cdot L \cdot F^{2})

. The actor network update, as per (30), incurs the same cost, while additional training iterations contribute

O (N_{train} \cdot L \cdot F^{2})

Summing all contributions, the overall complexity for T environment steps can be expressed as follows:

O (N_{warm} \cdot L \cdot F^{2}) + O (T \cdot (F + F^{2} + B \cdot L \cdot F^{2} + N_{train} \cdot L \cdot F^{2})) .

(31)

Since the dominant term is

O (T \cdot B \cdot L \cdot F^{2})

, scalability depends on batch size B, network depth L, and feature count F. Additionally, the number of user equipment (UE) units K and RIS elements M directly influence the action space complexity, increasing computational overhead in action decoding and network updates.

Moreover, in large-scale mMIMO systems, ZF beamforming introduces an additional computational burden due to the required matrix inversion, resulting in a complexity of

O (K^{3})

, where K is the number of UEs. This cubic complexity can significantly impact real-time processing, especially as the number of UEs grows. Despite this, the proposed framework remains computationally feasible due to its structured hybrid action space and efficient optimization strategy, making it suitable for large-scale deployments in MEC-assisted networks.

6. Numerical Results

This section provides numerical results and a discussion of our proposed algorithm in terms of the latency minimization when compared with other state-of-the-art schemes.

The simulation scenario is designed to evaluate the performance of the proposed MEC system. A massive MBS is positioned at coordinates

(0, 0, 30)

meters, providing direct service to users within a 500 m radius. To extend coverage for users located up to 2000 m away, a RIS-equipped UAV operates within the area at an altitude ranging from 50 to 150 m, with user equipment (UE) randomly distributed across the extended region (Nguyen et al. [25]). The transmission parameters include an uplink transmission power of 30 dBm, a central frequency of 2.4 GHz, a bandwidth of 1 MHz, and a noise spectral density of

- 130

dBm/Hz. To ensure quality of service (QoS), the minimum achievable uplink rate per UE is set at 1 Mbps. The computational tasks involve a task size of 100 kB with a computational complexity of 600 cycles per bit. The MEC server, hosted at the MBS, has a maximum processing capacity of 30 Giga cycles per second, while each UE is limited to a maximum of 0.5 Giga cycles per second. This setup provides a realistic and challenging environment for evaluating task offloading and resource allocation in the RIS-assisted UAV-enabled MEC framework (Liu et al. [26]). The detailed configuration of the simulation parameters is presented in Table 1.

The proposed scheme is compared with the following baseline schemes.

Deep Q-network (DQN) (Khan et al. [27]): DQN algorithm is designed for discrete action spaces, where a neural network approximates the Q-value function to derive optimal policies.
Soft actor–critic (SAC) (Heidarpour et al. [28]): SAC is a model-free, off-policy algorithm optimized for continuous action spaces. It employs a stochastic policy with an entropy-based objective to balance exploration and exploitation. While SAC performs well in purely continuous environments.
Multi-Agent deep deterministic policy gradient (MADDPG) (Tariq et al. [3]): MADDPG extends deterministic policy gradient methods to multi-agent systems, facilitating coordinated decision-making among RIS-equipped UAVs and MEC servers.
Proposed scheme: The proposed approach employs a hybrid discrete-continuous optimization framework with hybrid space representation to optimize task offloading, power allocation, and RIS phase configurations. By integrating a conditional variational autoencoder and the TD3 algorithm, it addresses hybrid action spaces.

Figure 2 illustrates the average reward progression for various DRL algorithms, i.e., the proposed scheme, DQN, SAC, and MADDPG over 2000 training episodes. The average reward reflects the capability of each method in addressing the task offloading and resource allocation challenges in a UAV-RIS-assisted MEC environment. The results demonstrate the superiority of the proposed scheme, which achieves the highest average reward by the end of the training. This performance is attributed to its hybrid discrete-continuous optimization framework, effectively utilizing hybrid representations and advanced policy learning to navigate complex decision spaces. Among the baseline methods, the MADDPG algorithm shows competitive performance due to its multi-agent coordination capabilities. However, its effectiveness is hindered by inefficiencies in handling hybrid action spaces, an area where the proposed scheme excels. SAC achieves moderate results, optimized for continuous actions but constrained by its lack of inter-agent coordination. Meanwhile, DQN, with its discrete action space, shows the weakest performance, highlighting its limitations in handling the complexity and scale of the optimization problem. For clarity, Figure 2 incorporates shaded confidence intervals that represent the variability in rewards across multiple training trials. These confidence intervals highlight the stability of each algorithm, with the proposed scheme exhibiting a narrower range of fluctuations, indicating more consistent performance. MADDPG and SAC show moderate variance, while DQN not only achieves the lowest average reward but also displays the widest confidence interval, reinforcing its instability in dynamic UAV-RIS-MEC environments. Despite varying performance levels, all algorithms demonstrate steady learning progression, with rewards increasing as training episodes advance. However, the faster convergence, higher average reward, and reduced variability of the proposed scheme underscore its robustness for latency-critical, resource-intensive applications. These findings emphasize the significance of hybrid optimization in dynamic MEC environments, where decision efficiency and stability are crucial.

In Figure 3, the outperformance of the proposed method compared with the state-of-the-art schemes is demonstrated in terms of total network latency. For instance, at

F_{m, k} = 600

, the total latency with the proposed scheme and the benchmark schemes are approximately 28.8 ms, 32.2 ms, 40.8 ms, and 49.3 ms for the proposed scheme, MADDPG, SAC, and DQN, respectively. Moreover, it is evident from the figure that as

F_{m, k}

increases, the total network latency for all schemes grows, yet the proposed scheme consistently achieves the lowest latency. This performance gap highlights the efficiency of the proposed approach in minimizing latency. To this end, it is worth mentioning that these results are valid within the range of considered CPU cycles per bit (

F_{m, k}

) values, demonstrating the robustness of the proposed method across varying computational demands.

In Figure 4, the total network latency is evaluated for the proposed scheme in comparison with other benchmark schemes across varying computing capacities of the MBS (

ζ_{m a x}

) measured in megacycles per second. The evaluation considers a fixed number of UEs (

K = 100

) and CPU cycles per bit (

F_{m, k} = 600

cycles/bit). Figure 4 illustrates that increasing the computing capacity of the MBS (

ζ_{m a x}

) leads to a noticeable reduction in total network latency for all schemes. This trend underscores the critical role of computational resources at the MBS in minimizing latency. Among the evaluated schemes, the proposed method consistently outperforms the benchmark schemes (DQN, SAC, and MADDPG), achieving the lowest latency across all considered values of

ζ_{m a x}

. Moreover, the latency reduction with the proposed scheme is more pronounced compared to the other schemes, highlighting its efficiency in leveraging enhanced computational resources.

Figure 5 demonstrates the impact of varying the number of UEs on total task delay, with user counts ranging up to a maximum of 300 and RIS

N = 100

. The results reveal a clear trend: as the number of UEs increases, the total time delay rises significantly. Among the evaluated algorithms, the proposed scheme consistently achieves the lowest task delay across all scenarios, highlighting its superior ability to handle high user densities efficiently. In contrast, DQN exhibits the highest delay, reflecting its limited scalability for large-scale environments. SAC and MADDPG perform moderately, but their delays grow more sharply with increasing UEs compared to the proposed scheme. This analysis underscores the proposed scheme’s robustness and scalability in managing resource allocation and task offloading in densely populated MEC systems.

Figure 6 illustrates the task completion ratio as a function of the number of UEs in the system. The task completion ratio is defined as the percentage of tasks completed within the predefined delay threshold relative to the total number of tasks. As shown in Figure 6, the completion ratio decreases with an increasing number of UEs due to resource limitations within the UAV-RIS-assisted MEC framework. Specifically, the proposed scheme demonstrates the highest task completion ratio across all user densities, showcasing its robust capability to prioritize high-priority tasks and manage resource allocation effectively. In contrast, the benchmark algorithms—DQN, SAC, and MADDPG—exhibit significantly lower completion ratios as the number of UEs grows, highlighting their limitations in handling resource scarcity and task prioritization. The proposed scheme’s superior performance stems from its hybrid action space optimization and reward mechanism, prioritizing high-priority tasks. This scheme improves overall completion ratios by allocating more resources to critical tasks, ensuring effectiveness across varying UE densities.

7. Conclusions

In this paper, we proposed an MEC system hosted within a massive MIMO base station, serving M groups of users with the assistance of a RIS-equipped UAV to enhance system coverage and performance. To optimize this communication scenario, we formulated the problem of minimizing total task execution latency across all UEs as an MDP and developed a novel DRL algorithm, leveraging a hybrid space representation. This approach effectively reduced latency while ensuring efficient resource utilization. Extensive simulations demonstrated the effectiveness of our approach in improving system performance. Specifically, our framework achieved a latency reduction of 10.56%, 29.41%, and 41.62% compared to MADDPG, SAC, and DQN, respectively. Additionally, it improved the task completion ratio by 12.16%, 40.68%, and 56.60% over MADDPG, SAC, and DQN, respectively. These results validate the superiority of the proposed method over state-of-the-art schemes, demonstrating its potential for next-generation communication networks. Moreover, the findings highlight the scalability of the proposed solution. As the number of UEs and RIS elements increases, our algorithm maintains a practical computational complexity, as analyzed in Section 5.2.3. Despite these promising outcomes, certain limitations remain. The model assumes idealized channel conditions, which may not fully capture real-world deployments where environmental uncertainties affect performance. In future work, we aim to develop adaptive solutions that incorporate real-time feedback to handle dynamic environments and fluctuating system states. Furthermore, the framework will be extended to consider energy efficiency and security as key performance metrics, enhancing its applicability to diverse 6G use cases.

Funding

The author extend his appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number MoE-IF-UJ-R2-22-04100399-1.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DRL	deep reinforcement learning
IoT	Internet of Things
ITU	International Telecommunication Union
LoS	line-of-sight
MEC	multi-access edge computing
MDP	Markov decision process
MBS	MIMO base station
QoS	quality of service
RISs	reconfigurable intelligent surfaces
TDMA	time division multiple access
UAVs	unmanned aerial vehicles
UE	user equipment

References

Measuring Digital Development—Facts and Figures 2023. Available online: https://www.itu.int/hub/publication/d-ind-ict_mdd-2023-1/ (accessed on 20 January 2025).
Ahmed, M.; Raza, S.; Soofi, A.A.; Khan, F.; Khan, W.U.; Xu, F.; Chatzinotas, S.; Dobre, O.A.; Han, Z. A survey on reconfigurable intelligent surfaces assisted multi-access edge computing networks: State of the art and future challenges. Comput. Sci. Rev. 2024, 54, 100668. [Google Scholar] [CrossRef]
Tariq, M.N.; Wang, J.; Raza, S.; Siraj, M.; Altamimi, M.; Memon, S. Towards Optimal Resource Allocation: A Multi-agent DRL based Task Offloading Approach in Multi-UAV-Assisted MEC Networks. IEEE Access 2024, 12, 81428–81440. [Google Scholar] [CrossRef]
Ahmed, M.; Raza, S.; Soofi, A.A.; Khan, F.; Khan, W.U.; Abideen, S.Z.U.; Xu, F.; Han, Z. Active reconfigurable intelligent surfaces: Expanding the frontiers of wireless communication—A survey. IEEE Commun. Surv. Tutor. 2024. [Google Scholar] [CrossRef]
Mehrabian, A.; Wong, V.W. Joint Spectrum, Precoding, and Phase Shifts Design for RIS-Aided Multiuser MIMO THz Systems. IEEE Trans. Commun. 2024, 72, 5087–5101. [Google Scholar] [CrossRef]
Xu, F.; Ahmad, S.; Ahmed, M.; Raza, S.; Khan, F.; Ma, Y.; Khan, W.U. Beyond encryption: Exploring the potential of physical layer security in UAV networks. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 101717. [Google Scholar] [CrossRef]
Banafaa, M.K.; Pepeoğlu, Ö.; Shayea, I.; Alhammadi, A.; Shamsan, Z.A.; Razaz, M.A.; Alsagabi, M.; Al-Sowayan, S. A comprehensive survey on 5G-and-beyond networks with UAVs: Applications, emerging technologies, regulatory aspects, research trends and challenges. IEEE Access 2024, 12, 7786–7826. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, T.; Zou, Y.; Liu, Y. Reconfigurable intelligence surface aided UAV-MEC systems with NOMA. IEEE Commun. Lett. 2022, 26, 2121–2125. [Google Scholar] [CrossRef]
Hu, H.; Sheng, Z.; Nasir, A.A.; Yu, H.; Fang, Y. Computation Capacity Maximization for UAV and RIS Cooperative MEC System with NOMA. IEEE Commun. Lett. 2024, 28, 592–596. [Google Scholar] [CrossRef]
Wang, Y.; Niu, J.; Chen, G.; Zhou, X.; Li, Y.; Liu, S. RIS-Aided Latency-Efficient MEC HetNet with Wireless Backhaul. IEEE Trans. Veh. Technol. 2024, 73, 8705–8719. [Google Scholar] [CrossRef]
Ning, Z.; Dong, P.; Wen, M.; Wang, X.; Guo, L.; Kwok, R.Y.; Poor, H.V. 5G-enabled UAV-to-community offloading: Joint trajectory design and task scheduling. IEEE J. Sel. Areas Commun. 2021, 39, 3306–3320. [Google Scholar] [CrossRef]
Song, F.; Xing, H.; Wang, X.; Luo, S.; Dai, P.; Xiao, Z.; Zhao, B. Evolutionary multi-objective reinforcement learning based trajectory control and task offloading in UAV-assisted mobile edge computing. IEEE Trans. Mob. Comput. 2022, 22, 7387–7405. [Google Scholar] [CrossRef]
Xu, B.; Kuang, Z.; Gao, J.; Zhao, L.; Wu, C. Joint offloading decision and trajectory design for UAV-enabled edge computing with task dependency. IEEE Trans. Wirel. Commun. 2022, 22, 5043–5055. [Google Scholar] [CrossRef]
Goudarzi, S.; Soleymani, S.A.; Wang, W.; Xiao, P. Uav-enabled mobile edge computing for resource allocation using cooperative evolutionary computation. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 5134–5147. [Google Scholar] [CrossRef]
Bai, Z.; Lin, Y.; Cao, Y.; Wang, W. Delay-Aware Cooperative Task Offloading for Multi-UAV Enabled Edge-Cloud Computing. IEEE Trans. Mob. Comput. 2024, 23, 1034–1049. [Google Scholar] [CrossRef]
Zhao, N.; Ye, Z.; Pei, Y.; Liang, Y.C.; Niyato, D. Multi-Agent Deep Reinforcement Learning for Task Offloading in UAV-Assisted Mobile Edge Computing. IEEE Trans. Wirel. Commun. 2022, 21, 6949–6960. [Google Scholar] [CrossRef]
Zhai, Z.; Dai, X.; Duo, B.; Wang, X.; Yuan, X. Energy-efficient UAV-mounted RIS assisted mobile edge computing. IEEE Wirel. Commun. Lett. 2022, 11, 2507–2511. [Google Scholar] [CrossRef]
Di Renzo, M.; Zappone, A.; Debbah, M.; Alouini, M.S.; Yuen, C.; De Rosny, J.; Tretyakov, S. Smart radio environments empowered by reconfigurable intelligent surfaces: How it works, state of research, and the road ahead. IEEE J. Sel. Areas Commun. 2020, 38, 2450–2525. [Google Scholar] [CrossRef]
Al-Hourani, A.; Kandeepan, S.; Lardner, S. Optimal LAP altitude for maximum coverage. IEEE Wirel. Commun. Lett. 2014, 3, 569–572. [Google Scholar] [CrossRef]
Wu, Q.; Zhang, R. Beamforming optimization for wireless network aided by intelligent reflecting surface with discrete phase shifts. IEEE Trans. Commun. 2019, 68, 1838–1851. [Google Scholar] [CrossRef]
Raza, S.; Liu, W.; Ahmed, M.; Anwar, M.R.; Mirza, M.A.; Sun, Q.; Wang, S. An efficient task offloading scheme in vehicular edge computing. J. Cloud Comput. 2020, 9, 1–14. [Google Scholar] [CrossRef]
Raza, S.; Wang, S.; Ahmed, M.; Anwar, M.R.; Mirza, M.A.; Khan, W.U. Task offloading and resource allocation for IoV using 5G NR-V2X communication. IEEE Internet Things J. 2021, 9, 10397–10410. [Google Scholar] [CrossRef]
Raza, S.; Ahmed, M.; Ahmad, H.; Mirza, M.A.; Habib, M.A.; Wang, S. Task offloading in mmwave based 5g vehicular cloud computing. J. Ambient Intell. Humaniz. Comput. 2023, 14, 12595–12607. [Google Scholar] [CrossRef]
Ahmed, M.; Raza, S.; Ahmad, H.; Khan, W.U.; Xu, F.; Rabie, K. Deep reinforcement learning approach for multi-hop task offloading in vehicular edge computing. Eng. Sci. Technol. Int. J. 2024, 59, 101854. [Google Scholar] [CrossRef]
Nguyen, L.D.; Tuan, H.D.; Duong, T.Q.; Dobre, O.A.; Poor, H.V. Downlink beamforming for energy-efficient heterogeneous networks with massive MIMO and small cells. IEEE Trans. Wirel. Commun. 2018, 17, 3386–3400. [Google Scholar] [CrossRef]
Liu, T.; Tang, L.; Wang, W.; Chen, Q.; Zeng, X. Digital-twin-assisted task offloading based on edge collaboration in the digital twin edge network. IEEE Internet Things J. 2021, 9, 1427–1444. [Google Scholar] [CrossRef]
Khan, I.; Raza, S.; Rehman, W.u.; Khan, R.; Nahida, K.; Tao, X. A Deep Learning-Based Algorithm for Energy and Performance Optimization of Computational Offloading in Mobile Edge Computing. Wirel. Commun. Mob. Comput. 2023, 2023, 1357343. [Google Scholar] [CrossRef]
Heidarpour, A.R.; Heidarpour, M.R.; Ardakani, M.; Tellambura, C.; Uysal, M. Soft actor–critic-based computation offloading in multiuser MEC-enabled IoT—A lifetime maximization perspective. IEEE Internet Things J. 2023, 10, 17571–17584. [Google Scholar] [CrossRef]

Figure 1. Framework of the proposed algorithm.

Figure 2. Average rewards vs. no. of episodes.

Figure 3. The total time delay according to different schemes vs.

F

m,k, with K = 100 and

ζ_{m a x}

= 30 Giga cycles/s.

Figure 3. The total time delay according to different schemes vs.

F

m,k, with K = 100 and

ζ_{m a x}

= 30 Giga cycles/s.

Figure 4. The total time delay according to different schemes vs.

ζ_{m a x}

, with K = 100 and

F_{m, k}

= 600 cycles/bit.

Figure 4. The total time delay according to different schemes vs.

ζ_{m a x}

, with K = 100 and

F_{m, k}

= 600 cycles/bit.

Figure 5. The total time delay vs. no. of UEs.

Figure 6. Task completion ratio vs. no. of UEs.

Table 1. Simulation parameters.

Parameter	Value
MBS coverage radius for direct connection	500 m
Maximum distance of users from MBS	2000 m
3D Cartesian coordinates of the MBS	$(0, 0, 30)$
Altitude range of RIS-equipped UAV $(H^{\min}, H^{\max})$	$(50, 150)$ m
Maximum transmission power ( $P_{m, k}^{\max}$ )	30 dBm
Central frequency ( $f_{c}$ )	2.4 GHz
Bandwidth (W)	1 MHz
White noise spectral density ( $σ_{0}^{2}$ )	$- 130$ dBm/Hz
Minimum achievable uplink rate ( ${\tilde{r}}_{0}$ )	1 Mbps
Task size ( $I_{m, k}$ )	100 kB
Task computation complexity ( $F_{m, k}$ )	600 cycles/bit
Maximum MEC server computing resource ( $ζ_{\max}$ )	30 Giga cycles/s
Maximum UE computing resource ( $c_{m, k}$ )	0.5 Giga cycles/s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alshahrani, A. Toward 6G: Latency-Optimized MEC Systems with UAV and RIS Integration. Mathematics 2025, 13, 871. https://doi.org/10.3390/math13050871

AMA Style

Alshahrani A. Toward 6G: Latency-Optimized MEC Systems with UAV and RIS Integration. Mathematics. 2025; 13(5):871. https://doi.org/10.3390/math13050871

Chicago/Turabian Style

Alshahrani, Abdullah. 2025. "Toward 6G: Latency-Optimized MEC Systems with UAV and RIS Integration" Mathematics 13, no. 5: 871. https://doi.org/10.3390/math13050871

APA Style

Alshahrani, A. (2025). Toward 6G: Latency-Optimized MEC Systems with UAV and RIS Integration. Mathematics, 13(5), 871. https://doi.org/10.3390/math13050871

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Toward 6G: Latency-Optimized MEC Systems with UAV and RIS Integration

Abstract

1. Introduction

1.1. Contributions

1.2. Paper Organization

2. Related Work

3. System Model

3.1. Channel Model

3.2. Transmission Scheme

3.3. Offloading Model

3.3.1. Local Computing

3.3.2. Offloading to MBS

4. Problem Formulation

5. Proposed Scheme

5.1. MDP Formulation

5.1.1. Sate Space

5.1.2. Action State

5.1.3. Reward Function

5.2. DRL-Based Algorithm

5.2.1. Hybrid Space Modeling in UAV-RIS-Assisted MEC

5.2.2. Long-Term Latency Minimization Algorithm for UAV-RIS-Assisted MEC Systems

5.2.3. Computation Complexity

6. Numerical Results

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI