CN118012601B - Resource allocation method of vehicle edge computing system considering influence of comment information - Google Patents
Resource allocation method of vehicle edge computing system considering influence of comment information Download PDFInfo
- Publication number
- CN118012601B CN118012601B CN202311607887.6A CN202311607887A CN118012601B CN 118012601 B CN118012601 B CN 118012601B CN 202311607887 A CN202311607887 A CN 202311607887A CN 118012601 B CN118012601 B CN 118012601B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- service
- edge
- edge server
- edge servers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013468 resource allocation Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000002787 reinforcement Effects 0.000 claims abstract description 16
- 230000008901 benefit Effects 0.000 claims abstract description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 41
- 238000011156 evaluation Methods 0.000 claims description 31
- 230000009471 action Effects 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 18
- 238000007726 management method Methods 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000035945 sensitivity Effects 0.000 claims description 6
- 230000007423 decrease Effects 0.000 description 6
- 238000004088 simulation Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 239000009611 Arque-Ajeeb Substances 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000003673 groundwater Substances 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0283—Price estimation or determination
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Neurology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a resource allocation method of a vehicle edge computing system considering comment information influence, which specifically comprises the following steps: constructing a market consisting of two edge servers with competing relationships, and providing a computing offload service in two service phases; at the beginning of each service phase, both edge servers announce their optimal pricing strategy after gaming, and the pricing strategy of the second phase is influenced by the rating information generated by the vehicle at the end of the first phase; vehicles select their edge servers based on service information, pricing, vehicle preferences, and rating information; then determining own request strategy according to the resource request information of other vehicles selecting the same edge server; in situations where vehicles are not willing to disclose their resource request information, a deep reinforcement learning framework is used to maximize the utility of the vehicle. The invention can realize Nash equilibrium among vehicles in any stage or server scene, and realize the maximization of the overall benefits of all vehicles.
Description
Technical Field
The invention belongs to a vehicle edge computing technology, and particularly relates to a resource allocation method of a vehicle edge computing system considering comment information influence.
Background
Topics related to resource allocation problems under vehicle edge calculation include resource allocation, vehicle edge calculation, and deep reinforcement learning.
A. resource allocation
In recent years, the rapid development of wireless networks has led to an increase in their complexity and heterogeneity, which has made it necessary to switch from the traditional resource allocation mechanisms. Guo Shengjie et al propose a cascaded hungarian channel allocation algorithm that simplifies the resource allocation problem with reliability requirements, converting it into a powerful power allocation problem with opportunistic constraints. Chen Xing et al propose an adaptive resource allocation method consisting of a framework with a feedback loop, incorporating an iterative QoS prediction model and a Particle Swarm Optimization (PSO) based runtime decision algorithm. Wu Dapeng et al propose a novel heuristic radio resource allocation scheme that considers slice characteristics, analyzes their characteristics, and converts them into a network profit model of resource utilization. Ghanbari et al have conducted comprehensive research on resource allocation techniques by organizing various classifications, including cost-aware, environment-aware, efficiency-aware, load-balancing-aware, power-aware, qoS-aware, SLA-based, and utilization-aware mechanisms. At the same time Battula et al underscores the recent resource allocation algorithms in the fog computing field for compensation of resource providers and cost estimation of fog resources. None of the above studies have considered the application of gaming theory to resource allocation solutions.
Gaming theory has become a popular approach to solving the problem of resource allocation in wireless networks. Zhong Xudong et al constructed the resource allocation problem as a collaborative game, emphasizing the potential of game theory in this context. Wu Ducheng et al explored the application of gaming theory in co-layer interference mitigation in 5G small cell networks, underscores the relevance of gaming theory and distributed learning solutions in this context. The dunghill et al propose a distributed resource allocation algorithm based on gaming theory to mitigate cross-layer interference between device-to-device (D2D) communications and cellular users, as well as co-layer interference in D2D communications. Yang Lixia et al used game theory to optimize the selection of the emergency distribution path of groundwater resources, taking into account the actual congestion and transit time. Li Feixiang et al focused on computational offloading and pricing problems in the context of industrial internet of things (IIoT) by developing a two-stage stark-berg game model to characterize interactions between edge clouds and devices. However, none of the above mentioned works bring the resource management solution into the context of edge computing.
B. Moving edge calculation
Recently, literature on edge computing resource allocation has explored various schemes. Lin Fuhong et al describe a generic edge computing Intrusion Detection System (IDS) architecture, which is the basis of their resource allocation model. Zamzam et al apply game theory to analyze user behavior, successfully get solutions that meet all users and reach equilibrium. Bahreini et al address the resource allocation and pricing issues in two-level edge computing systems, while Ma Shi et al propose a three-way circular game (3 CG) involving users, edge nodes and service providers. In this model, users select their favorite services, and service providers select cost-effective edge nodes that prioritize high-value users. Baek et al studied three dynamic pricing schemes for resource allocation in the edge calculation of the internet of things environment, the numerical results verified the proposed theorem, and compared and analyzed the three mechanisms. Sun Yuhu et al address the problem of resource allocation in edge computing by proposing a two-way auction scheme named DPODA. Lin Zifan et al outline edge computation and edge resource allocation techniques based on a number of research and application scenarios. However, the resource allocation process in VECs typically involves dynamic and random continuous interactions, but few of the above papers have solutions that take this problem into account.
C. deep reinforcement learning
Literature on resource allocation and compute offloading in edge computing has presented various architectures based on deep reinforcement learning architecture to address challenges presented by the needs of mobile devices and VEC environments. The chen et al describes an air-world integrated network (SAGIN) edge/cloud computing architecture that considers the energy and computing constraints of offloading computing-intensive applications using Deep Reinforcement Learning (DRL). Wang Jiadai et al propose a resource allocation (DRLRA) scheme based on deep reinforcement learning. The method can adaptively allocate resources, reduce average service time and balance resource usage under different VEC environments. Alfakih et al [36] aim to make optimal offloading decisions to minimize system costs, including power consumption and computation time delay, and the SARSA algorithm was used to solve this problem.
Yellow and bright et al address the need for proper resource allocation in computing offloading by proposing a Deep Q Network (DQN) based task offloading and resource allocation algorithm. Chen Jienan et al describe an Intelligent Resource Allocation Framework (iRAF) in which a Deep Reinforcement Learning (DRL) algorithm is used to solve the complex resource allocation problem in a collaborative mobile edge computing (CoMEC) network. Zhou Huan et al studied joint optimization of computational offloading and resource allocation schemes in dynamic multi-user MEC systems based on Deep Reinforcement Learning (DRL). Feng Jie et al developed a collaborative computing offloading and resource allocation framework for blockchain enabled MEC systems that employed asynchronous dominant actor-commentator algorithms to solve the problem. Ning Zhaolong et al constructed an intelligent offloading system for vehicle edge computation using deep reinforcement learning. Tan et al propose a joint communication, caching and computation strategy to achieve cost efficiency in a deep reinforcement learning based on-board network. However, none of the above studies considered the impact of comments on allocation problems in Vehicle Edge Computing (VEC).
Disclosure of Invention
In view of the above, the present invention provides a resource allocation method of a vehicle edge computing system considering the influence of comment information.
According to the resource allocation method of the vehicle edge computing system considering the influence of comment information, a market consisting of two edge servers with competition relationship is constructed, and computing unloading service is provided in two service stages; at the beginning of each service phase, both edge servers announce their optimal pricing strategy after gaming, and the pricing strategy of the second phase is influenced by the rating information generated by the vehicle at the end of the first phase; vehicles select their edge servers based on service information, pricing, vehicle preferences, and rating information; then determining own request strategy according to the resource request information of other vehicles selecting the same edge server; in situations where vehicles are not willing to disclose their resource request information, a deep reinforcement learning framework is used to maximize the utility of the vehicle. The method comprises the following steps:
First, a monopoly market model is built with two competing edge servers at two stages of offloading services.
The vehicle generates service requests and uninstalls the requests to the edge server for processing; the edge server monopolizes the market, provides services for vehicles, and makes corresponding pricing strategies to acquire incomes from the vehicles.
A. utility of edge servers
Assuming two competing edge servers in the market are H and L, they each have an edge unit EU number of M H and M L, respectively.
The first stage: before the service starts, the edge servers H and L will publish their respective first stage prices, set toAndAssume that in the first stage, the arrival rates of service requests of the edge servers H and L follow poisson distribution, and the average arrival rates are respectively: And The preference of the vehicle for the edge server is expressed as: And Representing edge servers H and L, respectively; the quality of the edge computing service provided by the edge server for the vehicle is composed of two parts: part is the objective quality of service, i.e., the general understanding that a vehicle forms from the brand, reputation, and service product information of an edge server; another part is the quality of experience, i.e. the experience of the vehicle after service, which is not known before the end of service; let q i, i=h, L be the objective quality of edge server i; assuming that the edge server H has high objective quality, and L has low objective quality; here, q H=q,qL =θq, where θ∈ (0, 1); the higher the value of θ, the smaller the objective quality gap between the two edge servers; defining eta i, i=h, and L is the experience quality of the vehicle after the service provider i serves; considering the variability of the vehicle experience service, setting eta i -U (-1, 1); in addition, the vehicle compares the price difference between the edge servers to perceive the sense of loss or benefit, and the price difference of the edge server H is: thus, the net utility of the vehicle to edge servers H and L in the first phase is: When (when) If the vehicle is satisfied, it gives a positive evaluation; otherwise, it gives a negative evaluation. Thus, the benefits of edge servers H and L in the first phase are:
and a second stage: before the service period of the second phase starts, the vehicle that prefers the edge server i will find an expected value of the quality of experience for the other edge server service after seeing the evaluation at the end of the first phase; this expected value is:
Wherein, The probability that the representative vehicle gives a positive evaluation after experiencing the product of edge server i; Is the desired quality of experience value for the vehicle to give a frontal assessment, β is the sensitivity of the vehicle to the frontal assessment, here set to:
In the second phase, the vehicle will decide whether to continue to purchase service from the original edge server or based on the price published by the second phase edge server Price differenceAnd switching to another edge server according to the desire for quality of experience of the other edge server formed by the comments of the first-stage vehicle; the vehicle arrival rates of the second-stage edge servers H and L are respectively:
Wherein, The probability that the vehicle serving the edge server H is still purchasing from the edge server H in the second phase,The probability that the vehicle serving the edge server L is still purchasing from the edge server L in the second phase.
The combination formula (2) is obtained:
thus, the profits of edge servers H and L in the second phase are:
B. utility of vehicle
The first stage: the arrival rates of the vehicle at the edge servers H and L in the first stage are respectively: And Consider that each edge server provides a K level of service to offload requests, denoted as i ε { 1..once., K }; the EU numbers of the i-th level services provided by the edge servers H and L are c Hi and c Li, respectively; the number of vehicles performing the i-th level service at the edge servers H and L is respectivelyAndThus, defining the utility resulting from selecting the service level i of H or L in the first phase is:
Wherein the method comprises the steps of Representing the level of satisfaction achieved by the vehicle selecting the service level i of the edge server H in the first stage.
And a second stage: the arrival rate of the vehicle in the second stage may be changed due to the influence of comment information generated by the vehicle; the arrival rates of vehicles arriving at the edge servers H and L in the second stage are recorded as follows: And Assuming that the total number of service levels provided by edge servers H and L for offloading requests is still K; during the second stage service, the number of vehicles at which the edge servers H and L execute level i are respectively noted asAndThe EU numbers they occupy are c Hi and c Li, respectively; thus, the second stage selects the utility of service level i for H or L as:
C. description of the problem
In the established model, the edge servers compete first in two phases, and the optimal pricing in the two phases is formulated to attract more vehicle unloading requests, so that more profits are obtained; the total revenue for the two phases of edge servers H and L are noted as: And Thus, from the edge server's perspective, the optimization problem is summarized as:
the vehicle selects an edge server to request service according to pricing set by the edge server in two stages, preference of the edge server by the vehicle and evaluation information of the vehicle; the optimal pricing of the edge server H set at the first stage and the second stage is: And Likewise, the best pricing of the edge server L in the first and second phases is obtained respectivelyAndThe optimization problem for the vehicle selecting the edge server H in the first stage is summarized as:
Second, gaming at both stages of service is balanced.
In two service phases, the pricing policies of edge servers H and L are given: respectively isAndIf this pricing policy satisfies the condition: And A nash equilibrium point is considered to be reached between the two edge servers.
The vehicle selects an edge server to make a service request and formulates their respective offload request policies in two stages.
In the second phase, the optimal pricing for edge servers H and L is:
in the first stage, the optimal pricing for edge servers H and L is:
Finally, the problem of non-cooperative game between vehicles is discussed in two categories: complete information disclosure and incomplete information disclosure.
A. Resource request management problem under information sharing
Each vehicle first shares the selected service level, and each vehicle decides the strategy of requesting the optimal service level of the EUs according to the request states of other vehicles.
Considering the variation of the service level selected for each vehicle, the number of vehicles within a given system selecting service level iOptimal pricing of first stage edge server HThe optimal offload resource request policy for a vehicle requesting service class i is:
Here the number of the elements is the number,
B. resource request management problem algorithm under incomplete information sharing
Considering the selfish and randomness of each vehicle in the system, each vehicle is set as an intelligent agent actor, only the strategy is executed and experience samples are collected, and the selected edge server H or L is used as a learner learner for making centralized decisions; initially, each vehicle interacts with a system of selected edge servers; after taking action, it forms an empirical playback pair based on the observed conditionAnd transmitting the information to a learner for centralized control processing; each actor employs an ε -greedy algorithm when implementing the learning strategy.
As edge servers, H and L act as learners; first, they announce their EU service pricing before the first and second phases begin, then learn the policies based on each actor's empirical playback information sent to the edge servers, and then pass the learned policies back to each actor; here, vehicle resource request policy processing is performed using DDQN algorithm; the status, action and rewards information are respectively:
Status: the states within all systems are: s= (S 1,...,SN), where
The actions are as follows:
rewarding: the rewards for the vehicle to select the class i service from the server H in the first phase are:
In the DDQN framework, the edge server includes two neural networks, namely an evaluation network and a target network; the parameters of the main network are represented by theta, and the parameters of the target network are represented by theta -; both networks input the current state and output the Q value of each vehicle; the parameter theta - of the target network is copied to the main network after a certain step; then, the target values based on DDQN use are: in the formula, the argmax function is used to determine that in a given state And actions to maximize Q under network parameters θ t I.e. arg function is applied toThe max function is applied to the possible values of the Q function.
Edge server resources are allocated using DDQN algorithm: first, the state, rewards and actions of the system are given, and an experience playback pool D with a size of N is established, a random weight parameter is providedAction-value function Q and a target network; for each iteration, a state sequence S is initialized, then in each step, the input state of the evaluation network is evaluated using S t, and a random action a t is selected according to the ε -greedy algorithm; thereafter, the current prize and next state are obtained by preset criteria s t+1, and then stored (s t,at,rt,st+1) in the empirical playback pool to update the parameters of the evaluation network.
The beneficial technical effects of the invention are as follows:
1. the invention constructs the resource management problem in Vehicle Edge Computing (VEC) in the environment of two-stage service and two competing edge servers. Further discussed is the transition selected by the edge server of the vehicle in the second phase, which is affected by the comment information. The two-stage resource pricing process of the edge server is described as a double oligopolistic gaming problem.
2. Theoretical analysis is performed in the invention, and it is proved that under the principle of non-cooperative game, nash equilibrium exists based on interaction influence among vehicles. After establishing this equalization we propose a dynamic iterative algorithm. This algorithm, with full information support, can achieve Nash equalization between vehicles at any stage or server scenario.
3. The invention provides a resource unloading (DRLRO) framework based on distributed deep reinforcement learning for solving the problem of an unloading request strategy of a vehicle under the condition of incomplete information. This DRLRO framework utilizes a Dual Deep Q Network (DDQN) to generate an offload request strategy for each vehicle to maximize the overall benefits of all vehicles. The simulation results confirm the effectiveness of the proposed DRLRO framework.
Drawings
Fig. 1 is a structure of an edge assist driving system of the present invention.
FIG. 2 is a schematic diagram of a vehicle resource request strategy algorithm based on DDQN of the present invention.
FIG. 3 is a unit price of different VEC servers for objective quality of service variation in an embodiment.
FIG. 4 is a price per unit of VEC server for various changes in vehicle susceptibility to dislike in an embodiment.
FIG. 5 is a utility of a vehicle in a two-stage VEC server in an embodiment.
FIG. 6 is a graph of vehicle utility at different arrival rates in an embodiment.
FIG. 7 is a graph showing the relationship between the vehicle utility and the total EU number in the embodiment.
FIG. 8 is a vehicle utility under different algorithms in an embodiment.
Detailed Description
The invention will be described in further detail with reference to the drawings and the specific embodiments.
According to the resource allocation method of the vehicle edge computing system considering the influence of comment information, a market consisting of two edge servers with competition relationship is constructed, and computing unloading service is provided in two service stages; at the beginning of each service phase, both edge servers announce their optimal pricing strategy after gaming, and the pricing strategy of the second phase is influenced by the rating information generated by the vehicle at the end of the first phase; vehicles select their edge servers based on service information, pricing, vehicle preferences, and rating information; then determining own request strategy according to the resource request information of other vehicles selecting the same edge server; in situations where vehicles are not willing to disclose their resource request information, a deep reinforcement learning framework is used to maximize the utility of the vehicle. The method comprises the following steps:
First, a monopoly market model is built with two competing edge servers at two stages of offloading services.
Fig. 1 is a schematic diagram of an edge server in a oligopolistic marketplace providing computing offload services for vehicles, including base stations, edge servers, and vehicles. Before each service phase begins, both edge servers publish respective service price information. The vehicle will select the appropriate edge server based on factors such as quality of service, pricing, personal preferences, and comment information. We define the minimum unit of resources used by the vehicle in the system as the "edge unit" (EU), which includes the computational, storage, and communication resources required to process the vehicle request. Considering that the total amount of EU resources available to each edge server is limited, each vehicle aims to maximize its own benefits by requesting as much resources as possible, thereby forming a non-cooperative game between vehicles selecting the same edge server. In the game process, each vehicle determines the optimal resource request strategy according to own requirements and the EU quantity requested by other users. If an optimal strategy is not adopted, service may be denied, forcing the vehicle to handle more tasks locally, possibly resulting in higher costs and greater delays.
The vehicle generates service requests and uninstalls the requests to the edge server for processing; the edge server monopolizes the market, provides services for vehicles, and makes corresponding pricing strategies to acquire incomes from the vehicles.
A. utility of edge servers
Assuming two competing edge servers in the market are H and L, they each have an edge unit EU number of M H and M L, respectively.
The first stage: before the service starts, the edge servers H and L will publish their respective first stage prices, set toAndAssume that in the first stage, the arrival rates of service requests of the edge servers H and L follow poisson distribution, and the average arrival rates are respectively: And The preference of the vehicle for the edge server is expressed as: And Representing edge servers H and L, respectively; the quality of the edge computing service provided by the edge server for the vehicle is composed of two parts: part is the objective quality of service, i.e., the general understanding that a vehicle forms from the brand, reputation, and service product information of an edge server; another part is the quality of experience, i.e. the experience of the vehicle after service, which is not known before the end of service; let q i, i=h, L be the objective quality of edge server i; assuming that the edge server H has high objective quality, and L has low objective quality; here, q H=q,qL =θq, where θ∈ (0, 1); the higher the value of θ, the smaller the objective quality gap between the two edge servers; defining eta i, i=h, and L is the experience quality of the vehicle after the service provider i serves; considering the variability of the vehicle experience service, setting eta i -U (-1, 1); in addition, the vehicle compares the price difference between the edge servers to perceive the sense of loss or benefit, and the price difference of the edge server H is: thus, the net utility of the vehicle to edge servers H and L in the first phase is: When (when) If the vehicle is satisfied, it gives a positive evaluation; otherwise, it gives a negative evaluation. Thus, the benefits of edge servers H and L in the first phase are:
and a second stage: before the service period of the second phase starts, the vehicle that prefers the edge server i will find an expected value of the quality of experience for the other edge server service after seeing the evaluation at the end of the first phase; this expected value is:
Wherein, The probability that the representative vehicle gives a positive evaluation after experiencing the product of edge server i; Is the desired quality of experience value for the vehicle to give a frontal assessment, β is the sensitivity of the vehicle to the frontal assessment, here set to:
In the second phase, the vehicle will decide whether to continue to purchase service from the original edge server or based on the price published by the second phase edge server Price differenceAnd switching to another edge server according to the desire for quality of experience of the other edge server formed by the comments of the first-stage vehicle; the vehicle arrival rates of the second-stage edge servers H and L are respectively:
Wherein, The probability that the vehicle serving the edge server H is still purchasing from the edge server H in the second phase,The probability that the vehicle serving the edge server L is still purchasing from the edge server L in the second phase.
The combination formula (2) is obtained:
thus, the profits of edge servers H and L in the second phase are:
B. utility of vehicle
The first stage: the arrival rates of the vehicle at the edge servers H and L in the first stage are respectively: And Consider that each edge server provides a K level of service to offload requests, denoted as i ε { 1..once., K }; the EU numbers of the i-th level services provided by the edge servers H and L are c Hi and c Li, respectively; the number of vehicles performing the i-th level service at the edge servers H and L is respectivelyAndThus, defining the utility resulting from selecting the service level i of H or L in the first phase is:
Wherein the method comprises the steps of Representing the level of satisfaction achieved by the vehicle selecting the service level i of the edge server H in the first stage.
And a second stage: the arrival rate of the vehicle in the second stage may be changed due to the influence of comment information generated by the vehicle; the arrival rates of vehicles arriving at the edge servers H and L in the second stage are recorded as follows: And Assuming that the total number of service levels provided by edge servers H and L for offloading requests is still K; during the second stage service, the number of vehicles at which the edge servers H and L execute level i are respectively noted asAndThe EU numbers they occupy are c Hi and c Li, respectively; thus, the second stage selects the utility of service level i for H or L as:
C. description of the problem
In the established model, the edge servers compete first in two phases, and the optimal pricing in the two phases is formulated to attract more vehicle unloading requests, so that more profits are obtained; the total revenue for the two phases of edge servers H and L are noted as: And Thus, from the edge server's perspective, the optimization problem is summarized as:
the vehicle selects an edge server to request service according to pricing set by the edge server in two stages, preference of the edge server by the vehicle and evaluation information of the vehicle; the optimal pricing of the edge server H set at the first stage and the second stage is: And Likewise, the best pricing of the edge server L in the first and second phases is obtained respectivelyAndThe optimization problem for the vehicle selecting the edge server H in the first stage is summarized as:
Second, gaming at both stages of service is balanced.
(1) In two service phases, the pricing policies of edge servers H and L are given: respectively isAndIf this pricing policy satisfies the condition: And A nash equilibrium point is considered to be reached between the two edge servers.
(2) The vehicle selects an edge server to make a service request and formulates their respective offload request policies in two stages. Taking the vehicle selecting the edge server H in the first stage as an example, when the following conditions are satisfiedWe consider that the vehicle selecting the edge server H reaches the equilibrium point. When edge servers H and L reach the equilibrium point, neither H nor L would be willing to make further adjustments in their pricing policies, as they would not get more profit. When the vehicles reach the equilibrium point, no vehicles would like to make policy adjustments in the number of offload requests (EUs) because they would not get more utility and would therefore be denied service requests.
Theorem 1: in the second phase, the optimal pricing for edge servers H and L is:
and (3) proving: let the revenue functions of edge servers H and L in the second phase be: And By respectively applying formulas (16) and (17)Taking the partial derivative and setting the derivative to zero, one can obtain:
by simultaneous solution of these equations, one can obtain
Theorem 2: in the first stage, the optimal pricing for edge servers H and L is:
And (3) proving: the total utility function of edge servers H and L in both phases is denoted pi H,πL, respectively. Here we get Based on this and according to the foregoing, it is possible to obtain:
As previously described: η H,ηL to U (-1, 1), it can be concluded that: thus, the first and second substrates are bonded together, Is monotonically increasing. In addition, note that Thus (2)A maximum value is taken at the upper limit. I.e.
Finally, the problem of non-cooperative game between vehicles is discussed in two categories: complete information disclosure and incomplete information disclosure.
A. Resource request management problem under information sharing
First, it is demonstrated that there is Nash equilibrium in the random non-cooperative game between vehicles selecting the edge server H in the first stage. Subsequently, a dynamic iterative algorithm is introduced to solve this Nash equilibrium.
Theorem 3: considering the variation of the service level selected for each vehicle, the number of vehicles within a given system selecting service level iOptimal pricing of first stage edge server HThe optimal offload resource request policy for a vehicle requesting service class i is:
Here the number of the elements is the number,
And (3) proving: for any vehicle that selects edge server H at service level i in the first phase, its corresponding utilityShould be zero, i.e.:
further, given At the position ofWhere the maximum value is reached. In addition, fromThere is a presence ofDelta 1,Δ2 can then be obtained.
Inference 1: when the target quality of the first-stage edge server H increases, the number of vehicles requesting the EUs of the edge server H increases.
And (3) proving: according to the expression in theorem 1 we knowAnd from the previous discussion we haveTaking the partial derivative of q H we can getSince q H>0,qL >0, we can obtainThus, as q H continues to increase, the amount of resources requested by the vehicle for service level i also continues to increase.
Algorithm 1 is presented to calculate the case where nash equalization is achieved with full information. In algorithm 1, each vehicle first shares its selected service level. Each vehicle decides the policy of his requesting the optimal service level of the EUs according to the request status of other vehicles. The strategy for determining the optimal number of EUs can be derived from theorem 3, which provides a mathematical model for this calculation:
algorithm 1, optimal request strategy of complete information sharing. |
1, Executing for each vehicle; |
and 2, publishing private information, request information of the vehicle and the number of vehicles requested by the corresponding level. |
Collecting information from other vehicles: |
4, calculating an optimal request strategy through theorem 3: |
5 ending the cycle |
B. resource request management problem algorithm under incomplete information sharing
In the case of complete information sharing, each vehicle selecting edge servers H and L in each service period knows the number of EUs requested by other vehicles from the edge serversNumber of vehicles served by edge serverAnd the number of EUs remaining in the system. However, in a real world scenario, edge servers and vehicles may refuse to share or provide such information in order to preserve privacy. Furthermore, this information is constantly changing over time, making it difficult for any vehicle itself to accurately evaluate the request status of other vehicles and the overall situation within the edge server system. Thus, finding a solution to this non-convex problem is an NP-hard problem, which is difficult to solve with conventional optimization algorithms.
The present invention proposes an ALRM (actor-learner resource management) framework to solve this problem. This framework provides an optimal strategy for each vehicle to request EU resources from edge servers H or L under current conditions. Unlike the prior DRL (Deep Reinforcement Learning) algorithm, each vehicle is set as an intelligent agent actor, only performs policies and collects experience samples, considering the selfiness and randomness of each vehicle in the system, while the selected edge server H or L acts as a learner learner for making centralized decisions; initially, each vehicle interacts with a system of selected edge servers; after taking action, it forms an empirical playback pair based on the observed conditionAnd transmitting the information to a learner for centralized control processing; each actor employs an ε -greedy algorithm when implementing the learning strategy.
As edge servers, H and L act as learners; first, they announce their EU service pricing before the first and second phases begin, then learn the policies based on each actor's empirical playback information sent to the edge servers, and then pass the learned policies back to each actor; here, vehicle resource request policy processing is performed using DDQN algorithm; the status, action and rewards information are respectively:
Status: the states within all systems are: s= (S 1,...,SN), where
The actions are as follows:
rewarding: the rewards for the vehicle to select the class i service from the server H in the first phase are:
The DDQN-based vehicle resource request policy algorithm is shown as algorithm 2.
The DDQN-based vehicle resource request policy algorithm is shown as algorithm 2. In this DDQN (Double Deep Q-Network) framework, the edge server includes two neural networks, namely an evaluation Network and a target Network. The parameters of the primary network are denoted by θ and the parameters of the target network are denoted by θ -. Both networks input the current state and output the Q value for each vehicle. The parameter θ - of the target network will be copied to the master network after a certain step. Then, the target values based on DDQN use are: in this formula, the argmax function is used to determine that in a given state And actions to maximize Q under network parameters θ t This means that the arg function is applied toThe max function is applied to the possible values of the Q function.
The learner sends a policy update to actor and receives the experience sample. These updates contain new neural network weights, while the samples consist of tuples describing network states, behaviors, rewards, and new states resulting from the actions. Policies and samples are exchanged back through the network without affecting air interface resources. The new policies may be issued periodically or based on dynamic criteria such as data collection required or policy generation time of the learner. During periods of low demand, experience samples may be quickly generated and shared, making the information interaction insensitive to delays.
The DDQN framework constructed is shown in fig. 2, which approximates the Q function using a designed deep neural network model, and generates a series of action values. The entire network is initialized with state s. In the time slot t=1, the edge server generates an initial observation state s 1 according to the driving behavior conversion request transmitted by the vehicle. The edge server selects an action to accept or reject the request based on the current policy and resource capacity constraints. The actions are then performed by the edge server. In addition, the edge server gets the prize r 1 and the state transitions to s t+1. Finally, an action value can be obtained. Edge server resources are allocated using DDQN algorithm: first, the state, rewards and actions of the system are given, and an experience playback pool D with a size of N is established, a random weight parameter is provided Action-value function Q and a target network; for each iteration, a state sequence S is initialized, then in each step, the input state of the evaluation network is evaluated using S t, and a random action a t is selected according to the ε -greedy algorithm; thereafter, the current prize and next state are obtained by preset criteria s t+1, and then stored (s t,at,rt,st+1) in the empirical playback pool to update the parameters of the evaluation network.
Examples:
The DDQN algorithm used in the present invention is a fully connected neural network with 400, 200, and 120 neurons. To ensure convergence, the discount factor is set to 0.95. Using the ReLU activation function, the initial learning rate was set to 0.015. Matlab is used to identify and evaluate the proposed resource allocation problem, and the vehicle selects either the H or L server during any given service period. Here, the number of service levels provided by the edge servers H and L is set to 6, and each service level corresponds to the EU number of 1 to 6. All simulation experiments were performed on a device equipped with Intel i7-7700k CPU, 32GB RAM and NVIDIA RTX 3060 GPU. The specific simulation parameters are shown in table 1.
Table 1 overview of simulation parameters
Parameters (parameters) | Network simulation parameter values |
Discount factor | 0.95 |
Learning rate | 0.015 |
Number of rounds | 1000 |
Number of steps | 500 |
Experience playback pool storage size | 5000 |
Number of empirical samples taken at a time | 32 |
DDQN action space size | 2 |
Optimizer | Adam |
Activation function | relu |
As shown in fig. 3 and 4, the optimal pricing of edge servers H and L during the two-phase service period varies with the target quality θ of the edge servers $h$ and $l$ and the sensitivity β of the vehicle to negative evaluations. Here, the service quality of the edge server H is set asAs can be seen from fig. 3, as the quality difference between edge servers H and L decreases, the optimal pricing of the first stage edge servers H and L gradually increases and tends to agree when the difference is zero. In the second phase, the optimal pricing of edge server H decreases with decreasing quality differences, while edge server L is exactly the opposite. This is due to the influence of the evaluation information generated by the vehicle. The appearance of the evaluation information has a great influence on the edge servers with better service quality on the market. It should also be noted that the second phase is significantly less priced by edge servers than the first phase due to the impact of the rating information. This is because the edge server, if affected by the rating information, has to take measures to reduce the price in the second phase to take up a larger market share. When θ=0.8, it can be seen from fig. 4 that as the sensitivity of the vehicle to negative evaluations decreases, the optimal price of the second stage edge server H will gradually decrease, while the edge server L gradually gains advantage. This is because vehicles are more prone to experiencing the services of other edge servers. In addition, since the first stage has no intervention of the rating information, the optimal pricing of the edge server in the first stage does not change due to changes in the sensitivity of the vehicle to negative ratings.
As shown in fig. 5, the time evolution of the prize values of the vehicles selecting the edge servers H and L is visually depicted. Consider the evolution of the total vehicle arrival rate under different conditions. The convergence process of two separate service phases is clearly depicted. We compare the vehicle rewards based on deep reinforcement learning under incomplete information with the rewards of nash equalization under complete information. It can be observed that the total effectiveness of the vehicle selecting the edge server and L in both service phases converges to near nash equilibrium in about 5000 cycles. Meanwhile, since the edge server h$ has an advantage in service quality, more vehicle users can be attracted, and the utility of the vehicle selecting the edge server H is greater than that of the edge server L in both service phases. Further, the edge server H obtains a greater advantage due to the influence of the evaluation information generated by the second-stage vehicle, resulting in a higher vehicle utility, while the edge server L is negatively affected.
As shown in fig. 6, taking the total utility of the vehicles of the second-stage edge server H as an example, the total utility of the vehicles of the second-stage edge server H continues to rise as the total arrival rate of the vehicles gradually increases. This is because the resource utilization of the edge server is higher due to the increase in the number of vehicles. It should be noted, however, that as the arrival rate continues to rise, the rate of increase in utility gradually decreases. This is because the total amount of resources is ultimately limited, and excessive vehicle users can cause more competition, such that the increase in utility is gradually slowed down.
As shown in fig. 7, as the number of iterations increases, the total utility of the vehicle for both edge servers is selected. The curves are plotted for different total amounts of resources and take the second service phase as an example. As can be seen from the figure, as the total resources (EU) owned by the edge servers increase, so does the total utility of the vehicle selecting both edge servers. This is because as the total resources of the edge servers increase, the available computing resources per vehicle correspondingly increase and the probability of being rejected also gradually decreases. It should also be noted that the total utility gap of the vehicles of edge servers H and L is small when the total amount of resources is limited. This is because the number of vehicles that the edge server H can service may be limited.
As shown in fig. 8, the utility of the vehicles selecting the edge servers H and L when DDQN (Double Deep Q-Network) and DQN (Deep Q-Network) algorithms are employed in the first service phase is compared. From the simulation results, DDQN algorithm shows faster convergence speed, better stability and higher throughput. This is because the same neural network is used to select and evaluate the behavior when calculating the Q-value function in the DQN algorithm, which easily results in an overly optimistic evaluation. To address this problem DDQN uses different neural networks for selection and evaluation. This can effectively solve the overestimation problem, resulting in a more stable training model and a faster convergence rate.
Claims (1)
1. A resource allocation method of a vehicle edge computing system considering influence of comment information is characterized in that,
Constructing a market consisting of two edge servers with competing relationships, and providing a computing offload service in two service phases; at the beginning of each service phase, both edge servers announce their optimal pricing strategy after gaming, and the pricing strategy of the second phase is influenced by the rating information generated by the vehicle at the end of the first phase; vehicles select their edge servers based on service information, pricing, vehicle preferences, and rating information; then determining own request strategy according to the resource request information of other vehicles selecting the same edge server; in the event that the vehicle is not willing to disclose their resource request information, a deep reinforcement learning framework is used to maximize the utility of the vehicle, specifically:
Firstly, constructing a monopoly market model with two competing edge servers at two stages of service unloading;
The vehicle generates service requests and uninstalls the requests to the edge server for processing; the edge server monopolizes the market, provides services for vehicles, and makes corresponding pricing strategies to acquire incomes from the vehicles;
A. utility of edge servers
Assuming that two competing edge servers in the market are H and L, the number of edge units EU they each own is M H and M L, respectively;
the first stage: before the service starts, the edge servers H and L will publish their respective first stage prices, set to AndAssume that in the first stage, the arrival rates of service requests of the edge servers H and L follow poisson distribution, and the average arrival rates are respectively: And The preference of the vehicle for the edge server is expressed as: And Representing edge servers H and L, respectively; the quality of the edge computing service provided by the edge server for the vehicle is composed of two parts: part is the objective quality of service, i.e., the general understanding that a vehicle forms from the brand, reputation, and service product information of an edge server; another part is the quality of experience, i.e. the experience of the vehicle after service, which is not known before the end of service; let q i, i=h, L be the objective quality of edge server i; assuming that the edge server H has high objective quality, and L has low objective quality; here, q H=q,qL =θq, where θ∈ (0, 1); the higher the value of θ, the smaller the objective quality gap between the two edge servers; defining eta i, i=h, and L is the experience quality of the vehicle after the service provider i serves; considering the variability of the vehicle experience service, setting eta i -U (-1, 1); in addition, the vehicle compares the price difference between the edge servers to perceive the sense of loss or benefit, and the price difference of the edge server H is: thus, the net utility of the vehicle to edge servers H and L in the first phase is: When (when) If the vehicle is satisfied, it gives a positive evaluation; otherwise, it gives a negative evaluation; thus, the benefits of edge servers H and L in the first phase are:
and a second stage: before the service period of the second phase starts, the vehicle that prefers the edge server i will find an expected value of the quality of experience for the other edge server service after seeing the evaluation at the end of the first phase; this expected value is:
Wherein, The probability that the representative vehicle gives a positive evaluation after experiencing the product of edge server i; Is the desired quality of experience value for the vehicle to give a frontal assessment, β is the sensitivity of the vehicle to the frontal assessment, here set to:
In the second phase, the vehicle will decide whether to continue to purchase service from the original edge server or based on the price published by the second phase edge server Price differenceAnd switching to another edge server according to the desire for quality of experience of the other edge server formed by the comments of the first-stage vehicle; the vehicle arrival rates of the second-stage edge servers H and L are respectively:
Wherein, The probability that the vehicle serving the edge server H is still purchasing from the edge server H in the second phase,The probability that the vehicle serving the edge server L is still purchasing from the edge server L in the second phase;
The combination formula (2) is obtained:
thus, the profits of edge servers H and L in the second phase are:
B. utility of vehicle
The first stage: the arrival rates of the vehicle at the edge servers H and L in the first stage are respectively: And Consider that each edge server provides a K level of service to offload requests, denoted as i ε { 1..once., K }; the EU numbers of the i-th level services provided by the edge servers H and L are c Hi and c Li, respectively; the number of vehicles performing the i-th level service at the edge servers H and L is respectivelyAndThus, defining the utility resulting from selecting the service level i of H or L in the first phase is:
Wherein the method comprises the steps of Representing a level of satisfaction obtained by the vehicle selecting the service level i of the edge server H in the first phase;
And a second stage: the arrival rate of the vehicle in the second stage may be changed due to the influence of comment information generated by the vehicle; the arrival rates of vehicles arriving at the edge servers H and L in the second stage are recorded as follows: And Assuming that the total number of service levels provided by edge servers H and L for offloading requests is still K; during the second stage service, the number of vehicles at which the edge servers H and L execute level i are respectively noted asAndThe EU numbers they occupy are c Hi and c Li, respectively; thus, the second stage selects the utility of service level i for H or L as:
C. description of the problem
In the established model, the edge servers compete first in two phases, and the optimal pricing in the two phases is formulated to attract more vehicle unloading requests, so that more profits are obtained; the total revenue for the two phases of edge servers H and L are noted as: And Thus, from the edge server's perspective, the optimization problem is summarized as:
the vehicle selects an edge server to request service according to pricing set by the edge server in two stages, preference of the edge server by the vehicle and evaluation information of the vehicle; the optimal pricing of the edge server H set at the first stage and the second stage is: And Likewise, the best pricing of the edge server L in the first and second phases is obtained respectivelyAndThe optimization problem for the vehicle selecting the edge server H in the first stage is summarized as:
Secondly, game balancing in two service stages;
in two service phases, the pricing policies of edge servers H and L are given: respectively is AndIf this pricing policy satisfies the condition: And Consider that a Nash equilibrium point is reached between two edge servers;
the vehicle selects an edge server to make a service request and formulates their respective offload request policies in two phases:
In the second phase, the optimal pricing for edge servers H and L is:
in the first stage, the optimal pricing for edge servers H and L is:
finally, the problem of non-cooperative game between vehicles is discussed in two categories: complete information disclosure and incomplete information disclosure;
A. Resource request management problem under information sharing
Each vehicle firstly shares the selected service level, and each vehicle decides a strategy of requesting the optimal service level of the EUs according to the request states of other vehicles;
considering the variation of the service level selected for each vehicle, the number of vehicles within a given system selecting service level i Optimal pricing of first stage edge server HThe optimal offload resource request policy for a vehicle requesting service class i is:
Here the number of the elements is the number,
B. resource request management problem algorithm under incomplete information sharing
Considering the selfish and randomness of each vehicle in the system, each vehicle is set as an intelligent agent actor, only the strategy is executed and experience samples are collected, and the selected edge server H or L is used as a learner learner for making centralized decisions; initially, each vehicle interacts with a system of selected edge servers; after taking action, it forms an empirical playback pair based on the observed conditionAnd transmitting the information to a learner for centralized control processing; each actor employs an epsilon-greedy algorithm when implementing the learning strategy;
As edge servers, H and L act as learners; first, they announce their EU service pricing before the first and second phases begin, then learn the policies based on each actor's empirical playback information sent to the edge servers, and then pass the learned policies back to each actor; here, vehicle resource request policy processing is performed using DDQN algorithm; the status, action and rewards information are respectively:
Status: the states within all systems are: s= (S 1,...,SN), where
The actions are as follows:
rewarding: the rewards for the vehicle to select the class i service from the server H in the first phase are:
In the DDQN framework, the edge server includes two neural networks, namely an evaluation network and a target network; the parameters of the main network are represented by theta, and the parameters of the target network are represented by theta -; both networks input the current state and output the Q value of each vehicle; the parameter theta - of the target network is copied to the main network after a certain step; then, the target values based on DDQN use are: in the formula, the argmax function is used to determine that in a given state And actions to maximize Q under network parameters θ t I.e. arg function is applied toThe max function is applied to the possible values of the Q function;
Edge server resources are allocated using DDQN algorithm: first, the state, rewards and actions of the system are given, and an experience playback pool D with a size of N is established, a random weight parameter is provided Action-value function Q and a target network; for each iteration, a state sequence S is initialized, then in each step, the input state of the evaluation network is evaluated using S t, and a random action a t is selected according to the ε -greedy algorithm; thereafter, the current prize and next state are obtained by preset criteria s t+1, and then stored (s t,at,rt,st+1) in the empirical playback pool to update the parameters of the evaluation network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311607887.6A CN118012601B (en) | 2023-11-29 | 2023-11-29 | Resource allocation method of vehicle edge computing system considering influence of comment information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311607887.6A CN118012601B (en) | 2023-11-29 | 2023-11-29 | Resource allocation method of vehicle edge computing system considering influence of comment information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118012601A CN118012601A (en) | 2024-05-10 |
CN118012601B true CN118012601B (en) | 2024-09-06 |
Family
ID=90946121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311607887.6A Active CN118012601B (en) | 2023-11-29 | 2023-11-29 | Resource allocation method of vehicle edge computing system considering influence of comment information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118012601B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112888002A (en) * | 2021-01-26 | 2021-06-01 | 重庆邮电大学 | Game theory-based mobile edge computing task unloading and resource allocation method |
CN114466023A (en) * | 2022-03-07 | 2022-05-10 | 中南大学 | Computing service dynamic pricing method and system for large-scale edge computing system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021194583A1 (en) * | 2020-03-23 | 2021-09-30 | Apple Inc. | Dynamic service discovery and offloading framework for edge computing based cellular network systems |
CN115169800A (en) * | 2022-06-06 | 2022-10-11 | 湖北工业大学 | Game theory-based vehicle edge computing resource allocation excitation method and system |
-
2023
- 2023-11-29 CN CN202311607887.6A patent/CN118012601B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112888002A (en) * | 2021-01-26 | 2021-06-01 | 重庆邮电大学 | Game theory-based mobile edge computing task unloading and resource allocation method |
CN114466023A (en) * | 2022-03-07 | 2022-05-10 | 中南大学 | Computing service dynamic pricing method and system for large-scale edge computing system |
Also Published As
Publication number | Publication date |
---|---|
CN118012601A (en) | 2024-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shi et al. | Priority-aware task offloading in vehicular fog computing based on deep reinforcement learning | |
Zhou et al. | Cost minimization-oriented computation offloading and service caching in mobile cloud-edge computing: An A3C-based approach | |
Hossain et al. | Edge computational task offloading scheme using reinforcement learning for IIoT scenario | |
Tong et al. | Adaptive computation offloading and resource allocation strategy in a mobile edge computing environment | |
Dai et al. | Asynchronous deep reinforcement learning for data-driven task offloading in MEC-empowered vehicular networks | |
Zhou et al. | UAV-aided computation offloading in mobile-edge computing networks: A Stackelberg game approach | |
Asheralieva et al. | Learning-based mobile edge computing resource management to support public blockchain networks | |
Wu et al. | Load balance guaranteed vehicle-to-vehicle computation offloading for min-max fairness in VANETs | |
Mishra et al. | A collaborative computation and offloading for compute-intensive and latency-sensitive dependency-aware tasks in dew-enabled vehicular fog computing: A federated deep Q-learning approach | |
Tong et al. | Stackelberg game-based task offloading and pricing with computing capacity constraint in mobile edge computing | |
Khoobkar et al. | Partial offloading with stable equilibrium in fog-cloud environments using replicator dynamics of evolutionary game theory | |
Wang et al. | Reinforcement learning-based optimization for mobile edge computing scheduling game | |
Chen et al. | Service migration for mobile edge computing based on partially observable Markov decision processes | |
Zhang et al. | RMDDQN-Learning: Computation Offloading Algorithm Based on Dynamic Adaptive Multi-Objective Reinforcement Learning in Internet of Vehicles | |
Zheng et al. | Learning based task offloading in digital twin empowered internet of vehicles | |
Chuang et al. | A real-time and ACO-based offloading algorithm in edge computing | |
Wu et al. | Delay-aware edge-terminal collaboration in green internet of vehicles: A multiagent soft actor-critic approach | |
Liu et al. | Learn to coordinate for computation offloading and resource allocation in edge computing: A rational-based distributed approach | |
Yu-Jie et al. | Balanced computing offloading for selfish IoT devices in fog computing | |
Ahmed et al. | MARL based resource allocation scheme leveraging vehicular cloudlet in automotive-industry 5.0 | |
Ayepah-Mensah et al. | Blockchain-enabled federated learning-based resource allocation and trading for network slicing in 5G | |
Gong et al. | Slicing-based resource optimization in multi-access edge network using ensemble learning aided DDPG algorithm | |
Cao et al. | Reinforcement learning based tasks offloading in vehicular edge computing networks | |
Peng et al. | Task Offloading for IoAV under Extreme Weather Conditions Using Dynamic Price Driven Double Broad Reinforcement Learning | |
Liu et al. | Stackelberg-Game Computation Offloading Scheme for Parked Vehicle-Assisted VEC and Experiment Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Country or region after: China Address after: No. 999, Xi'an Road, Pidu District, Chengdu, Sichuan, 611756 Applicant after: SOUTHWEST JIAOTONG University Address before: 610031, No. two, section 111, North Ring Road, Jinniu District, Sichuan, Chengdu Applicant before: SOUTHWEST JIAOTONG University Country or region before: China |
|
GR01 | Patent grant | ||
GR01 | Patent grant |