CN118012601B

CN118012601B - Resource allocation method of vehicle edge computing system considering influence of comment information

Info

Publication number: CN118012601B
Application number: CN202311607887.6A
Authority: CN
Inventors: 洪鑫涛; 梁宏斌; 张涵
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-09-06
Anticipated expiration: 2043-11-29
Also published as: CN118012601A

Abstract

The invention discloses a resource allocation method of a vehicle edge computing system considering comment information influence, which specifically comprises the following steps: constructing a market consisting of two edge servers with competing relationships, and providing a computing offload service in two service phases; at the beginning of each service phase, both edge servers announce their optimal pricing strategy after gaming, and the pricing strategy of the second phase is influenced by the rating information generated by the vehicle at the end of the first phase; vehicles select their edge servers based on service information, pricing, vehicle preferences, and rating information; then determining own request strategy according to the resource request information of other vehicles selecting the same edge server; in situations where vehicles are not willing to disclose their resource request information, a deep reinforcement learning framework is used to maximize the utility of the vehicle. The invention can realize Nash equilibrium among vehicles in any stage or server scene, and realize the maximization of the overall benefits of all vehicles.

Description

Resource allocation method of vehicle edge computing system considering influence of comment information

Technical Field

The invention belongs to a vehicle edge computing technology, and particularly relates to a resource allocation method of a vehicle edge computing system considering comment information influence.

Background

Topics related to resource allocation problems under vehicle edge calculation include resource allocation, vehicle edge calculation, and deep reinforcement learning.

A. resource allocation

In recent years, the rapid development of wireless networks has led to an increase in their complexity and heterogeneity, which has made it necessary to switch from the traditional resource allocation mechanisms. Guo Shengjie et al propose a cascaded hungarian channel allocation algorithm that simplifies the resource allocation problem with reliability requirements, converting it into a powerful power allocation problem with opportunistic constraints. Chen Xing et al propose an adaptive resource allocation method consisting of a framework with a feedback loop, incorporating an iterative QoS prediction model and a Particle Swarm Optimization (PSO) based runtime decision algorithm. Wu Dapeng et al propose a novel heuristic radio resource allocation scheme that considers slice characteristics, analyzes their characteristics, and converts them into a network profit model of resource utilization. Ghanbari et al have conducted comprehensive research on resource allocation techniques by organizing various classifications, including cost-aware, environment-aware, efficiency-aware, load-balancing-aware, power-aware, qoS-aware, SLA-based, and utilization-aware mechanisms. At the same time Battula et al underscores the recent resource allocation algorithms in the fog computing field for compensation of resource providers and cost estimation of fog resources. None of the above studies have considered the application of gaming theory to resource allocation solutions.

Gaming theory has become a popular approach to solving the problem of resource allocation in wireless networks. Zhong Xudong et al constructed the resource allocation problem as a collaborative game, emphasizing the potential of game theory in this context. Wu Ducheng et al explored the application of gaming theory in co-layer interference mitigation in 5G small cell networks, underscores the relevance of gaming theory and distributed learning solutions in this context. The dunghill et al propose a distributed resource allocation algorithm based on gaming theory to mitigate cross-layer interference between device-to-device (D2D) communications and cellular users, as well as co-layer interference in D2D communications. Yang Lixia et al used game theory to optimize the selection of the emergency distribution path of groundwater resources, taking into account the actual congestion and transit time. Li Feixiang et al focused on computational offloading and pricing problems in the context of industrial internet of things (IIoT) by developing a two-stage stark-berg game model to characterize interactions between edge clouds and devices. However, none of the above mentioned works bring the resource management solution into the context of edge computing.

B. Moving edge calculation

Recently, literature on edge computing resource allocation has explored various schemes. Lin Fuhong et al describe a generic edge computing Intrusion Detection System (IDS) architecture, which is the basis of their resource allocation model. Zamzam et al apply game theory to analyze user behavior, successfully get solutions that meet all users and reach equilibrium. Bahreini et al address the resource allocation and pricing issues in two-level edge computing systems, while Ma Shi et al propose a three-way circular game (3 CG) involving users, edge nodes and service providers. In this model, users select their favorite services, and service providers select cost-effective edge nodes that prioritize high-value users. Baek et al studied three dynamic pricing schemes for resource allocation in the edge calculation of the internet of things environment, the numerical results verified the proposed theorem, and compared and analyzed the three mechanisms. Sun Yuhu et al address the problem of resource allocation in edge computing by proposing a two-way auction scheme named DPODA. Lin Zifan et al outline edge computation and edge resource allocation techniques based on a number of research and application scenarios. However, the resource allocation process in VECs typically involves dynamic and random continuous interactions, but few of the above papers have solutions that take this problem into account.

C. deep reinforcement learning

Literature on resource allocation and compute offloading in edge computing has presented various architectures based on deep reinforcement learning architecture to address challenges presented by the needs of mobile devices and VEC environments. The chen et al describes an air-world integrated network (SAGIN) edge/cloud computing architecture that considers the energy and computing constraints of offloading computing-intensive applications using Deep Reinforcement Learning (DRL). Wang Jiadai et al propose a resource allocation (DRLRA) scheme based on deep reinforcement learning. The method can adaptively allocate resources, reduce average service time and balance resource usage under different VEC environments. Alfakih et al [36] aim to make optimal offloading decisions to minimize system costs, including power consumption and computation time delay, and the SARSA algorithm was used to solve this problem.

Yellow and bright et al address the need for proper resource allocation in computing offloading by proposing a Deep Q Network (DQN) based task offloading and resource allocation algorithm. Chen Jienan et al describe an Intelligent Resource Allocation Framework (iRAF) in which a Deep Reinforcement Learning (DRL) algorithm is used to solve the complex resource allocation problem in a collaborative mobile edge computing (CoMEC) network. Zhou Huan et al studied joint optimization of computational offloading and resource allocation schemes in dynamic multi-user MEC systems based on Deep Reinforcement Learning (DRL). Feng Jie et al developed a collaborative computing offloading and resource allocation framework for blockchain enabled MEC systems that employed asynchronous dominant actor-commentator algorithms to solve the problem. Ning Zhaolong et al constructed an intelligent offloading system for vehicle edge computation using deep reinforcement learning. Tan et al propose a joint communication, caching and computation strategy to achieve cost efficiency in a deep reinforcement learning based on-board network. However, none of the above studies considered the impact of comments on allocation problems in Vehicle Edge Computing (VEC).

Disclosure of Invention

In view of the above, the present invention provides a resource allocation method of a vehicle edge computing system considering the influence of comment information.

According to the resource allocation method of the vehicle edge computing system considering the influence of comment information, a market consisting of two edge servers with competition relationship is constructed, and computing unloading service is provided in two service stages; at the beginning of each service phase, both edge servers announce their optimal pricing strategy after gaming, and the pricing strategy of the second phase is influenced by the rating information generated by the vehicle at the end of the first phase; vehicles select their edge servers based on service information, pricing, vehicle preferences, and rating information; then determining own request strategy according to the resource request information of other vehicles selecting the same edge server; in situations where vehicles are not willing to disclose their resource request information, a deep reinforcement learning framework is used to maximize the utility of the vehicle. The method comprises the following steps:

First, a monopoly market model is built with two competing edge servers at two stages of offloading services.

The vehicle generates service requests and uninstalls the requests to the edge server for processing; the edge server monopolizes the market, provides services for vehicles, and makes corresponding pricing strategies to acquire incomes from the vehicles.

A. utility of edge servers

Assuming two competing edge servers in the market are H and L, they each have an edge unit EU number of M _H and M _L, respectively.

The first stage: before the service starts, the edge servers H and L will publish their respective first stage prices, set toAndAssume that in the first stage, the arrival rates of service requests of the edge servers H and L follow poisson distribution, and the average arrival rates are respectively: And The preference of the vehicle for the edge server is expressed as: And Representing edge servers H and L, respectively; the quality of the edge computing service provided by the edge server for the vehicle is composed of two parts: part is the objective quality of service, i.e., the general understanding that a vehicle forms from the brand, reputation, and service product information of an edge server; another part is the quality of experience, i.e. the experience of the vehicle after service, which is not known before the end of service; let q _i, i=h, L be the objective quality of edge server i; assuming that the edge server H has high objective quality, and L has low objective quality; here, q _H＝q,q_L =θq, where θ∈ (0, 1); the higher the value of θ, the smaller the objective quality gap between the two edge servers; defining eta _i, i=h, and L is the experience quality of the vehicle after the service provider i serves; considering the variability of the vehicle experience service, setting eta _i -U (-1, 1); in addition, the vehicle compares the price difference between the edge servers to perceive the sense of loss or benefit, and the price difference of the edge server H is: thus, the net utility of the vehicle to edge servers H and L in the first phase is: When (when) If the vehicle is satisfied, it gives a positive evaluation; otherwise, it gives a negative evaluation. Thus, the benefits of edge servers H and L in the first phase are:

and a second stage: before the service period of the second phase starts, the vehicle that prefers the edge server i will find an expected value of the quality of experience for the other edge server service after seeing the evaluation at the end of the first phase; this expected value is:

Wherein, The probability that the representative vehicle gives a positive evaluation after experiencing the product of edge server i; Is the desired quality of experience value for the vehicle to give a frontal assessment, β is the sensitivity of the vehicle to the frontal assessment, here set to:

In the second phase, the vehicle will decide whether to continue to purchase service from the original edge server or based on the price published by the second phase edge server Price differenceAnd switching to another edge server according to the desire for quality of experience of the other edge server formed by the comments of the first-stage vehicle; the vehicle arrival rates of the second-stage edge servers H and L are respectively:

Wherein, The probability that the vehicle serving the edge server H is still purchasing from the edge server H in the second phase,The probability that the vehicle serving the edge server L is still purchasing from the edge server L in the second phase.

The combination formula (2) is obtained:

thus, the profits of edge servers H and L in the second phase are:

B. utility of vehicle

The first stage: the arrival rates of the vehicle at the edge servers H and L in the first stage are respectively: And Consider that each edge server provides a K level of service to offload requests, denoted as i ε { 1..once., K }; the EU numbers of the i-th level services provided by the edge servers H and L are c _Hi and c _Li, respectively; the number of vehicles performing the i-th level service at the edge servers H and L is respectivelyAndThus, defining the utility resulting from selecting the service level i of H or L in the first phase is:

Wherein the method comprises the steps of Representing the level of satisfaction achieved by the vehicle selecting the service level i of the edge server H in the first stage.

And a second stage: the arrival rate of the vehicle in the second stage may be changed due to the influence of comment information generated by the vehicle; the arrival rates of vehicles arriving at the edge servers H and L in the second stage are recorded as follows: And Assuming that the total number of service levels provided by edge servers H and L for offloading requests is still K; during the second stage service, the number of vehicles at which the edge servers H and L execute level i are respectively noted asAndThe EU numbers they occupy are c _Hi and c _Li, respectively; thus, the second stage selects the utility of service level i for H or L as:

C. description of the problem

In the established model, the edge servers compete first in two phases, and the optimal pricing in the two phases is formulated to attract more vehicle unloading requests, so that more profits are obtained; the total revenue for the two phases of edge servers H and L are noted as: And Thus, from the edge server's perspective, the optimization problem is summarized as:

the vehicle selects an edge server to request service according to pricing set by the edge server in two stages, preference of the edge server by the vehicle and evaluation information of the vehicle; the optimal pricing of the edge server H set at the first stage and the second stage is: And Likewise, the best pricing of the edge server L in the first and second phases is obtained respectivelyAndThe optimization problem for the vehicle selecting the edge server H in the first stage is summarized as:

Second, gaming at both stages of service is balanced.

In two service phases, the pricing policies of edge servers H and L are given: respectively isAndIf this pricing policy satisfies the condition: And A nash equilibrium point is considered to be reached between the two edge servers.

The vehicle selects an edge server to make a service request and formulates their respective offload request policies in two stages.

In the second phase, the optimal pricing for edge servers H and L is:

in the first stage, the optimal pricing for edge servers H and L is:

Finally, the problem of non-cooperative game between vehicles is discussed in two categories: complete information disclosure and incomplete information disclosure.

A. Resource request management problem under information sharing

Each vehicle first shares the selected service level, and each vehicle decides the strategy of requesting the optimal service level of the EUs according to the request states of other vehicles.

Considering the variation of the service level selected for each vehicle, the number of vehicles within a given system selecting service level iOptimal pricing of first stage edge server HThe optimal offload resource request policy for a vehicle requesting service class i is:

Here the number of the elements is the number,

B. resource request management problem algorithm under incomplete information sharing

Considering the selfish and randomness of each vehicle in the system, each vehicle is set as an intelligent agent actor, only the strategy is executed and experience samples are collected, and the selected edge server H or L is used as a learner learner for making centralized decisions; initially, each vehicle interacts with a system of selected edge servers; after taking action, it forms an empirical playback pair based on the observed conditionAnd transmitting the information to a learner for centralized control processing; each actor employs an ε -greedy algorithm when implementing the learning strategy.

As edge servers, H and L act as learners; first, they announce their EU service pricing before the first and second phases begin, then learn the policies based on each actor's empirical playback information sent to the edge servers, and then pass the learned policies back to each actor; here, vehicle resource request policy processing is performed using DDQN algorithm; the status, action and rewards information are respectively:

Status: the states within all systems are: s= (S ¹,...,S^N), where

The actions are as follows:

rewarding: the rewards for the vehicle to select the class i service from the server H in the first phase are:

In the DDQN framework, the edge server includes two neural networks, namely an evaluation network and a target network; the parameters of the main network are represented by theta, and the parameters of the target network are represented by theta ^-; both networks input the current state and output the Q value of each vehicle; the parameter theta ^- of the target network is copied to the main network after a certain step; then, the target values based on DDQN use are: in the formula, the argmax function is used to determine that in a given state And actions to maximize Q under network parameters θ _t I.e. arg function is applied toThe max function is applied to the possible values of the Q function.

Edge server resources are allocated using DDQN algorithm: first, the state, rewards and actions of the system are given, and an experience playback pool D with a size of N is established, a random weight parameter is providedAction-value function Q and a target network; for each iteration, a state sequence S is initialized, then in each step, the input state of the evaluation network is evaluated using S _t, and a random action a _t is selected according to the ε -greedy algorithm; thereafter, the current prize and next state are obtained by preset criteria s _t+1, and then stored (s _t,a_t,r_t,s_t+1) in the empirical playback pool to update the parameters of the evaluation network.

The beneficial technical effects of the invention are as follows:

1. the invention constructs the resource management problem in Vehicle Edge Computing (VEC) in the environment of two-stage service and two competing edge servers. Further discussed is the transition selected by the edge server of the vehicle in the second phase, which is affected by the comment information. The two-stage resource pricing process of the edge server is described as a double oligopolistic gaming problem.

2. Theoretical analysis is performed in the invention, and it is proved that under the principle of non-cooperative game, nash equilibrium exists based on interaction influence among vehicles. After establishing this equalization we propose a dynamic iterative algorithm. This algorithm, with full information support, can achieve Nash equalization between vehicles at any stage or server scenario.

3. The invention provides a resource unloading (DRLRO) framework based on distributed deep reinforcement learning for solving the problem of an unloading request strategy of a vehicle under the condition of incomplete information. This DRLRO framework utilizes a Dual Deep Q Network (DDQN) to generate an offload request strategy for each vehicle to maximize the overall benefits of all vehicles. The simulation results confirm the effectiveness of the proposed DRLRO framework.

Drawings

Fig. 1 is a structure of an edge assist driving system of the present invention.

FIG. 2 is a schematic diagram of a vehicle resource request strategy algorithm based on DDQN of the present invention.

FIG. 3 is a unit price of different VEC servers for objective quality of service variation in an embodiment.

FIG. 4 is a price per unit of VEC server for various changes in vehicle susceptibility to dislike in an embodiment.

FIG. 5 is a utility of a vehicle in a two-stage VEC server in an embodiment.

FIG. 6 is a graph of vehicle utility at different arrival rates in an embodiment.

FIG. 7 is a graph showing the relationship between the vehicle utility and the total EU number in the embodiment.

FIG. 8 is a vehicle utility under different algorithms in an embodiment.

Detailed Description

The invention will be described in further detail with reference to the drawings and the specific embodiments.

Fig. 1 is a schematic diagram of an edge server in a oligopolistic marketplace providing computing offload services for vehicles, including base stations, edge servers, and vehicles. Before each service phase begins, both edge servers publish respective service price information. The vehicle will select the appropriate edge server based on factors such as quality of service, pricing, personal preferences, and comment information. We define the minimum unit of resources used by the vehicle in the system as the "edge unit" (EU), which includes the computational, storage, and communication resources required to process the vehicle request. Considering that the total amount of EU resources available to each edge server is limited, each vehicle aims to maximize its own benefits by requesting as much resources as possible, thereby forming a non-cooperative game between vehicles selecting the same edge server. In the game process, each vehicle determines the optimal resource request strategy according to own requirements and the EU quantity requested by other users. If an optimal strategy is not adopted, service may be denied, forcing the vehicle to handle more tasks locally, possibly resulting in higher costs and greater delays.

A. utility of edge servers

The combination formula (2) is obtained:

thus, the profits of edge servers H and L in the second phase are:

B. utility of vehicle

C. description of the problem

Second, gaming at both stages of service is balanced.

(1) In two service phases, the pricing policies of edge servers H and L are given: respectively isAndIf this pricing policy satisfies the condition: And A nash equilibrium point is considered to be reached between the two edge servers.

(2) The vehicle selects an edge server to make a service request and formulates their respective offload request policies in two stages. Taking the vehicle selecting the edge server H in the first stage as an example, when the following conditions are satisfiedWe consider that the vehicle selecting the edge server H reaches the equilibrium point. When edge servers H and L reach the equilibrium point, neither H nor L would be willing to make further adjustments in their pricing policies, as they would not get more profit. When the vehicles reach the equilibrium point, no vehicles would like to make policy adjustments in the number of offload requests (EUs) because they would not get more utility and would therefore be denied service requests.

Theorem 1: in the second phase, the optimal pricing for edge servers H and L is:

and (3) proving: let the revenue functions of edge servers H and L in the second phase be: And By respectively applying formulas (16) and (17)Taking the partial derivative and setting the derivative to zero, one can obtain:

by simultaneous solution of these equations, one can obtain

Theorem 2: in the first stage, the optimal pricing for edge servers H and L is:

And (3) proving: the total utility function of edge servers H and L in both phases is denoted pi _H,π_L, respectively. Here we get Based on this and according to the foregoing, it is possible to obtain:

As previously described: η _H,η_L to U (-1, 1), it can be concluded that: thus, the first and second substrates are bonded together, Is monotonically increasing. In addition, note that Thus (2)A maximum value is taken at the upper limit. I.e.

A. Resource request management problem under information sharing

First, it is demonstrated that there is Nash equilibrium in the random non-cooperative game between vehicles selecting the edge server H in the first stage. Subsequently, a dynamic iterative algorithm is introduced to solve this Nash equilibrium.

Theorem 3: considering the variation of the service level selected for each vehicle, the number of vehicles within a given system selecting service level iOptimal pricing of first stage edge server HThe optimal offload resource request policy for a vehicle requesting service class i is:

Here the number of the elements is the number,

And (3) proving: for any vehicle that selects edge server H at service level i in the first phase, its corresponding utilityShould be zero, i.e.:

further, given At the position ofWhere the maximum value is reached. In addition, fromThere is a presence ofDelta ₁,Δ₂ can then be obtained.

Inference 1: when the target quality of the first-stage edge server H increases, the number of vehicles requesting the EUs of the edge server H increases.

And (3) proving: according to the expression in theorem 1 we knowAnd from the previous discussion we haveTaking the partial derivative of q _H we can getSince q _H＞0,q_L >0, we can obtainThus, as q _H continues to increase, the amount of resources requested by the vehicle for service level i also continues to increase.

Algorithm 1 is presented to calculate the case where nash equalization is achieved with full information. In algorithm 1, each vehicle first shares its selected service level. Each vehicle decides the policy of his requesting the optimal service level of the EUs according to the request status of other vehicles. The strategy for determining the optimal number of EUs can be derived from theorem 3, which provides a mathematical model for this calculation:

algorithm 1, optimal request strategy of complete information sharing.
	1, Executing for each vehicle;
and 2, publishing private information, request information of the vehicle and the number of vehicles requested by the corresponding level.
	Collecting information from other vehicles:
4, calculating an optimal request strategy through theorem 3:
	5 ending the cycle

In the case of complete information sharing, each vehicle selecting edge servers H and L in each service period knows the number of EUs requested by other vehicles from the edge serversNumber of vehicles served by edge serverAnd the number of EUs remaining in the system. However, in a real world scenario, edge servers and vehicles may refuse to share or provide such information in order to preserve privacy. Furthermore, this information is constantly changing over time, making it difficult for any vehicle itself to accurately evaluate the request status of other vehicles and the overall situation within the edge server system. Thus, finding a solution to this non-convex problem is an NP-hard problem, which is difficult to solve with conventional optimization algorithms.

The present invention proposes an ALRM (actor-learner resource management) framework to solve this problem. This framework provides an optimal strategy for each vehicle to request EU resources from edge servers H or L under current conditions. Unlike the prior DRL (Deep Reinforcement Learning) algorithm, each vehicle is set as an intelligent agent actor, only performs policies and collects experience samples, considering the selfiness and randomness of each vehicle in the system, while the selected edge server H or L acts as a learner learner for making centralized decisions; initially, each vehicle interacts with a system of selected edge servers; after taking action, it forms an empirical playback pair based on the observed conditionAnd transmitting the information to a learner for centralized control processing; each actor employs an ε -greedy algorithm when implementing the learning strategy.

Status: the states within all systems are: s= (S ¹,...,S^N), where

The actions are as follows:

The DDQN-based vehicle resource request policy algorithm is shown as algorithm 2.

The DDQN-based vehicle resource request policy algorithm is shown as algorithm 2. In this DDQN (Double Deep Q-Network) framework, the edge server includes two neural networks, namely an evaluation Network and a target Network. The parameters of the primary network are denoted by θ and the parameters of the target network are denoted by θ ^-. Both networks input the current state and output the Q value for each vehicle. The parameter θ ^- of the target network will be copied to the master network after a certain step. Then, the target values based on DDQN use are: in this formula, the argmax function is used to determine that in a given state And actions to maximize Q under network parameters θ _t This means that the arg function is applied toThe max function is applied to the possible values of the Q function.

The learner sends a policy update to actor and receives the experience sample. These updates contain new neural network weights, while the samples consist of tuples describing network states, behaviors, rewards, and new states resulting from the actions. Policies and samples are exchanged back through the network without affecting air interface resources. The new policies may be issued periodically or based on dynamic criteria such as data collection required or policy generation time of the learner. During periods of low demand, experience samples may be quickly generated and shared, making the information interaction insensitive to delays.

The DDQN framework constructed is shown in fig. 2, which approximates the Q function using a designed deep neural network model, and generates a series of action values. The entire network is initialized with state s. In the time slot t=1, the edge server generates an initial observation state s ₁ according to the driving behavior conversion request transmitted by the vehicle. The edge server selects an action to accept or reject the request based on the current policy and resource capacity constraints. The actions are then performed by the edge server. In addition, the edge server gets the prize r ₁ and the state transitions to s _t+1. Finally, an action value can be obtained. Edge server resources are allocated using DDQN algorithm: first, the state, rewards and actions of the system are given, and an experience playback pool D with a size of N is established, a random weight parameter is provided Action-value function Q and a target network; for each iteration, a state sequence S is initialized, then in each step, the input state of the evaluation network is evaluated using S _t, and a random action a _t is selected according to the ε -greedy algorithm; thereafter, the current prize and next state are obtained by preset criteria s _t+1, and then stored (s _t,a_t,r_t,s_t+1) in the empirical playback pool to update the parameters of the evaluation network.

Examples:

The DDQN algorithm used in the present invention is a fully connected neural network with 400, 200, and 120 neurons. To ensure convergence, the discount factor is set to 0.95. Using the ReLU activation function, the initial learning rate was set to 0.015. Matlab is used to identify and evaluate the proposed resource allocation problem, and the vehicle selects either the H or L server during any given service period. Here, the number of service levels provided by the edge servers H and L is set to 6, and each service level corresponds to the EU number of 1 to 6. All simulation experiments were performed on a device equipped with Intel i7-7700k CPU, 32GB RAM and NVIDIA RTX 3060 GPU. The specific simulation parameters are shown in table 1.

Table 1 overview of simulation parameters

Parameters (parameters)	Network simulation parameter values
		Discount factor	0.95
Learning rate	0.015
		Number of rounds	1000
Number of steps	500
		Experience playback pool storage size	5000
Number of empirical samples taken at a time	32
		DDQN action space size	2
Optimizer	Adam
		Activation function	relu

As shown in fig. 3 and 4, the optimal pricing of edge servers H and L during the two-phase service period varies with the target quality θ of the edge servers $h$ and $l$ and the sensitivity β of the vehicle to negative evaluations. Here, the service quality of the edge server H is set asAs can be seen from fig. 3, as the quality difference between edge servers H and L decreases, the optimal pricing of the first stage edge servers H and L gradually increases and tends to agree when the difference is zero. In the second phase, the optimal pricing of edge server H decreases with decreasing quality differences, while edge server L is exactly the opposite. This is due to the influence of the evaluation information generated by the vehicle. The appearance of the evaluation information has a great influence on the edge servers with better service quality on the market. It should also be noted that the second phase is significantly less priced by edge servers than the first phase due to the impact of the rating information. This is because the edge server, if affected by the rating information, has to take measures to reduce the price in the second phase to take up a larger market share. When θ=0.8, it can be seen from fig. 4 that as the sensitivity of the vehicle to negative evaluations decreases, the optimal price of the second stage edge server H will gradually decrease, while the edge server L gradually gains advantage. This is because vehicles are more prone to experiencing the services of other edge servers. In addition, since the first stage has no intervention of the rating information, the optimal pricing of the edge server in the first stage does not change due to changes in the sensitivity of the vehicle to negative ratings.

As shown in fig. 5, the time evolution of the prize values of the vehicles selecting the edge servers H and L is visually depicted. Consider the evolution of the total vehicle arrival rate under different conditions. The convergence process of two separate service phases is clearly depicted. We compare the vehicle rewards based on deep reinforcement learning under incomplete information with the rewards of nash equalization under complete information. It can be observed that the total effectiveness of the vehicle selecting the edge server and L in both service phases converges to near nash equilibrium in about 5000 cycles. Meanwhile, since the edge server h$ has an advantage in service quality, more vehicle users can be attracted, and the utility of the vehicle selecting the edge server H is greater than that of the edge server L in both service phases. Further, the edge server H obtains a greater advantage due to the influence of the evaluation information generated by the second-stage vehicle, resulting in a higher vehicle utility, while the edge server L is negatively affected.

As shown in fig. 6, taking the total utility of the vehicles of the second-stage edge server H as an example, the total utility of the vehicles of the second-stage edge server H continues to rise as the total arrival rate of the vehicles gradually increases. This is because the resource utilization of the edge server is higher due to the increase in the number of vehicles. It should be noted, however, that as the arrival rate continues to rise, the rate of increase in utility gradually decreases. This is because the total amount of resources is ultimately limited, and excessive vehicle users can cause more competition, such that the increase in utility is gradually slowed down.

As shown in fig. 7, as the number of iterations increases, the total utility of the vehicle for both edge servers is selected. The curves are plotted for different total amounts of resources and take the second service phase as an example. As can be seen from the figure, as the total resources (EU) owned by the edge servers increase, so does the total utility of the vehicle selecting both edge servers. This is because as the total resources of the edge servers increase, the available computing resources per vehicle correspondingly increase and the probability of being rejected also gradually decreases. It should also be noted that the total utility gap of the vehicles of edge servers H and L is small when the total amount of resources is limited. This is because the number of vehicles that the edge server H can service may be limited.

As shown in fig. 8, the utility of the vehicles selecting the edge servers H and L when DDQN (Double Deep Q-Network) and DQN (Deep Q-Network) algorithms are employed in the first service phase is compared. From the simulation results, DDQN algorithm shows faster convergence speed, better stability and higher throughput. This is because the same neural network is used to select and evaluate the behavior when calculating the Q-value function in the DQN algorithm, which easily results in an overly optimistic evaluation. To address this problem DDQN uses different neural networks for selection and evaluation. This can effectively solve the overestimation problem, resulting in a more stable training model and a faster convergence rate.

Claims

1. A resource allocation method of a vehicle edge computing system considering influence of comment information is characterized in that,

Constructing a market consisting of two edge servers with competing relationships, and providing a computing offload service in two service phases; at the beginning of each service phase, both edge servers announce their optimal pricing strategy after gaming, and the pricing strategy of the second phase is influenced by the rating information generated by the vehicle at the end of the first phase; vehicles select their edge servers based on service information, pricing, vehicle preferences, and rating information; then determining own request strategy according to the resource request information of other vehicles selecting the same edge server; in the event that the vehicle is not willing to disclose their resource request information, a deep reinforcement learning framework is used to maximize the utility of the vehicle, specifically:

Firstly, constructing a monopoly market model with two competing edge servers at two stages of service unloading;

The vehicle generates service requests and uninstalls the requests to the edge server for processing; the edge server monopolizes the market, provides services for vehicles, and makes corresponding pricing strategies to acquire incomes from the vehicles;

A. utility of edge servers

Assuming that two competing edge servers in the market are H and L, the number of edge units EU they each own is M _H and M _L, respectively;

the first stage: before the service starts, the edge servers H and L will publish their respective first stage prices, set to AndAssume that in the first stage, the arrival rates of service requests of the edge servers H and L follow poisson distribution, and the average arrival rates are respectively: And The preference of the vehicle for the edge server is expressed as: And Representing edge servers H and L, respectively; the quality of the edge computing service provided by the edge server for the vehicle is composed of two parts: part is the objective quality of service, i.e., the general understanding that a vehicle forms from the brand, reputation, and service product information of an edge server; another part is the quality of experience, i.e. the experience of the vehicle after service, which is not known before the end of service; let q _i, i=h, L be the objective quality of edge server i; assuming that the edge server H has high objective quality, and L has low objective quality; here, q _H＝q,q_L =θq, where θ∈ (0, 1); the higher the value of θ, the smaller the objective quality gap between the two edge servers; defining eta _i, i=h, and L is the experience quality of the vehicle after the service provider i serves; considering the variability of the vehicle experience service, setting eta _i -U (-1, 1); in addition, the vehicle compares the price difference between the edge servers to perceive the sense of loss or benefit, and the price difference of the edge server H is: thus, the net utility of the vehicle to edge servers H and L in the first phase is: When (when) If the vehicle is satisfied, it gives a positive evaluation; otherwise, it gives a negative evaluation; thus, the benefits of edge servers H and L in the first phase are:

Wherein, The probability that the vehicle serving the edge server H is still purchasing from the edge server H in the second phase,The probability that the vehicle serving the edge server L is still purchasing from the edge server L in the second phase;

The combination formula (2) is obtained:

thus, the profits of edge servers H and L in the second phase are:

B. utility of vehicle

Wherein the method comprises the steps of Representing a level of satisfaction obtained by the vehicle selecting the service level i of the edge server H in the first phase;

C. description of the problem

Secondly, game balancing in two service stages;

in two service phases, the pricing policies of edge servers H and L are given: respectively is AndIf this pricing policy satisfies the condition: And Consider that a Nash equilibrium point is reached between two edge servers;

the vehicle selects an edge server to make a service request and formulates their respective offload request policies in two phases:

In the second phase, the optimal pricing for edge servers H and L is:

in the first stage, the optimal pricing for edge servers H and L is:

finally, the problem of non-cooperative game between vehicles is discussed in two categories: complete information disclosure and incomplete information disclosure;

A. Resource request management problem under information sharing

Each vehicle firstly shares the selected service level, and each vehicle decides a strategy of requesting the optimal service level of the EUs according to the request states of other vehicles;

considering the variation of the service level selected for each vehicle, the number of vehicles within a given system selecting service level i Optimal pricing of first stage edge server HThe optimal offload resource request policy for a vehicle requesting service class i is:

Here the number of the elements is the number,

Considering the selfish and randomness of each vehicle in the system, each vehicle is set as an intelligent agent actor, only the strategy is executed and experience samples are collected, and the selected edge server H or L is used as a learner learner for making centralized decisions; initially, each vehicle interacts with a system of selected edge servers; after taking action, it forms an empirical playback pair based on the observed conditionAnd transmitting the information to a learner for centralized control processing; each actor employs an epsilon-greedy algorithm when implementing the learning strategy;

Status: the states within all systems are: s= (S ¹,...,S^N), where

The actions are as follows:

In the DDQN framework, the edge server includes two neural networks, namely an evaluation network and a target network; the parameters of the main network are represented by theta, and the parameters of the target network are represented by theta ^-; both networks input the current state and output the Q value of each vehicle; the parameter theta ^- of the target network is copied to the main network after a certain step; then, the target values based on DDQN use are: in the formula, the argmax function is used to determine that in a given state And actions to maximize Q under network parameters θ _t I.e. arg function is applied toThe max function is applied to the possible values of the Q function;

Edge server resources are allocated using DDQN algorithm: first, the state, rewards and actions of the system are given, and an experience playback pool D with a size of N is established, a random weight parameter is provided Action-value function Q and a target network; for each iteration, a state sequence S is initialized, then in each step, the input state of the evaluation network is evaluated using S _t, and a random action a _t is selected according to the ε -greedy algorithm; thereafter, the current prize and next state are obtained by preset criteria s _t+1, and then stored (s _t,a_t,r_t,s_t+1) in the empirical playback pool to update the parameters of the evaluation network.