Downlink Non-Orthogonal Multiple Access Power Allocation Algorithm Based on Double Deep Q Network for Ensuring User’s Quality of Service
<p>Schematic diagram of system model.</p> "> Figure 2
<p>DDQN-based NOMA power allocation algorithm.</p> "> Figure 3
<p>Relationship between reward function, loss function, and the number of iterations.</p> "> Figure 4
<p>Relationship between the number of iterations and a user’s SINR in downlink two-user NOMA scenarios.</p> "> Figure 5
<p>Comparison of DDQN algorithm with Q-tab algorithm.</p> "> Figure 6
<p>Relationship between the number of iterations and SINR in NOMA system with 6-users.</p> "> Figure 7
<p>Variation of channel rate with power for NOMA and OMA systems.</p> "> Figure 8
<p>Channel capacity comparison of different algorithms.</p> "> Figure 9
<p>Convergence of the algorithm at different learning rates.</p> ">
Abstract
:1. Introduction
- The gain difference and similarity of each channel are considered in the grouping process. And we construct the gain difference matrix and similarity matrix. The new matrix with both similarity and gain difference information is obtained by adding the two matrices and normalizing them. User matching is performed based on the data of the new matrix, which reduces the complexity of the receiver and the interference between users.
- We use the poor user’s signal to interference noise ratio (SINR) as the optimization objective. The base station transmit power allocation factor is used as an action, the improvement of SINR is used as a reward, and the violation of the communication model is defined as a penalty. Power allocation is performed using a DDQN-based approach. And we use a DDQN-based method for power allocation.
- The algorithm in this paper ensures user fairness and efficiency. From the simulation results, it can be seen that the channel rates of multiple users are gradually close to each other after iterations. The algorithm in this paper can balance the communication quality and ensure the basic communication of users in an emergency.
2. System Model
3. DDQN and Power Allocation
3.1. DDQN Algorithm
3.2. Power Allocation Algorithm
- Agent: In this paper, the BS is considered as the agent. In the downlink NOMA, the BS is responsible for acquiring the channel conditions. The BS allocates power to each user and sends out the signal with the allocated power;
- State space: Since the optimization objective of this paper can be redefined to the improvement of the SINR of the user with the worst channel conditions within the NOMA downlink communication system. Due to the worst users are not fixed, this paper takes the SINR of all users as the state. It can be noted as . By identifying the symmetrical properties of users’ SINR under similar conditions, the algorithm can reduce the size of the state space and simplify the algorithm’s complexity. At the beginning of the algorithm, the user’s SINR is calculated based on the initial random state of the user and is used as an input to the neural network. The execution of the action changes the SINR. Then it moves to the next state;
- Action space: The action of the DDQN is to change the value of the power allocation factor. Continuous problems can be quantified into discrete ones by setting the variation value. This process reduces the complexity and convergence time of the algorithm. The power used in this paper is continuous from 0 to . acts as a regulatory factor. When the power allocation factor of a user increases , the power allocation factor of the rest of the users has to decrease accordingly. This can be seen as a symmetrical operation because adjusting one factor affects the other factors. Thus, this way can guarantee constraint (10b), i.e., the sum of the power allocation factors of all users is not more than 1.
- Reward function: In reinforcement learning, the reward function plays an important role. The reward function ensures that the agent performs better actions. Better actions can help achieve the objective function, and poor actions can make the objective function more difficult to achieve. Therefore, the following points need to be considered: Firstly, the channel gap between users needs to be decreased. This gap is expressed in two ways: the gap between user rates in a subchannel as well as the gap between the maximum and minimum user rates in the whole system. This process tries to achieve similar performance for all users in the NOMA communication system. Thus it is necessary to reduce the power allocation factor of the strong users and compensate for the weak users. The agent takes the variance of the user’s SINR as a reward. The action space prevents the agent from violating constraint (10b) during exploration. The communication model ensures that the agent does not violate constraint (10c). The reward function penalizes actions that violate constraint (10d). Thus, the reward function can be expressed as follows:
Algorithm 1 NOMA power allocation algorithm based on DDQN |
Input: learning rate ; discount factor ; greedy factor ; neural network parameters ; batch size k; amount of change in action ; state space S; Output: (after training) Initialization: Initialize the experience pool D capacity to N; set the total number of rounds ; initialize the action space;
|
- Initializing the value network including the number of layers, the type of layers, the activation functions, and the weights and biases of the network. Initialize the target network in the same way. Initializing the experience replay pool involves defining the capacity of the replay buffer, which determines how many experiences can be stored, as well as specifying the number of experiences to be sampled from the buffer at one time.
- The agent begins by observing the current state of the environment and analyzes information about the current environment. Using this information, the agent then leverages its neural network to predict the Q value of each possible action. By evaluating these Q values, the agent can determine which action will likely maximize its rewards, guiding its decision-making process.
- The agent chooses to execute the action with the highest Q value. Usually, an -greedy strategy is used in this process. In this approach, the agent randomly selects actions with a certain probability to ensure it explores the environment effectively. Specifically, with probability , the agent chooses a random action rather than the action that currently seems best according to its Q-value estimates. This random selection encourages the agent to try out different actions and discover potentially better strategies that it might not have considered if it always followed the action with the highest estimated Q-value.
- The agent performs the selected action and observes the next state. It stores the current state, the selected action, the reward, and the next state in a memory replay pool and randomly selects a batch of experiences for updating the network.
- The parameters of the value network are updated. The value network is periodically copied to the target network to maintain the stability of the target. This copying helps to stabilize the training process, reducing the oscillations and divergence that can occur when both the value and target networks are updated simultaneously. By periodically syncing the target network with the value network, the agent can more effectively learn from its experiences and improve its policy over time.
- Repeat the above process until the algorithm converges.
4. Simulation Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
NOMA | Non-Orthogonal Multiple Access |
OMA | Orthogonal Multiple Access |
DDQN | Double Deep Q Network |
DRL | Deep Reinforcement Learnin |
QoS | Quality of Service |
SINR | Signal to Interference Noise Ratio |
5G | The Fifth Generation |
IoT | Internet Of Thing |
SIC | Successive Interference Cancellation |
RL | Reinforcement Learning |
ML | Machine Learning |
BS | Base Station |
QoE | Quality Of Experience |
DQN | Deep Q Learning |
References
- Sufyan, A.; Khan, K.B.; Khashan, O.A.; Mir, T.; Mir, U. From 5G to beyond 5G: A comprehensive survey of wireless network evolution, challenges, and promising technologies. Electronics 2023, 12, 2200. [Google Scholar] [CrossRef]
- Sarker, I.H.; Khan, A.I.; Abushark, Y.B.; Alsolami, F. Internet of things (iot) security intelligence: A comprehensive overview, machine learning solutions and research directions. Mob. Netw. Appl. 2023, 28, 296–312. [Google Scholar] [CrossRef]
- Pham, Q.V.; Nguyen, H.T.; Han, Z.; Hwang, W.J. Coalitional games for computation offloading in NOMA-enabled multi-access edge computing. IEEE Trans. Veh. Technol. 2019, 69, 1982–1993. [Google Scholar] [CrossRef]
- Gures, E.; Shayea, I.; Ergen, M.; Azmi, M.H.; El-Saleh, A.A. Machine learning-based load balancing algorithms in future heterogeneous networks: A survey. IEEE Access 2022, 10, 37689–37717. [Google Scholar] [CrossRef]
- Al Homssi, B.; Dakic, K.; Wang, K.; Alpcan, T.; Allen, B.; Boyce, R.; Kandeepan, S.; Al-Hourani, A.; Saad, W. Artificial intelligence techniques for next-generation massive satellite networks. IEEE Commun. Mag. 2023, 62, 66–72. [Google Scholar] [CrossRef]
- Kong, L.; Tan, J.; Huang, J.; Chen, G.; Wang, S.; Jin, X.; Zeng, P.; Khan, M.; Das, S.K. Edge-computing-driven internet of things: A survey. ACM Comput. Surv. 2022, 55, 1–41. [Google Scholar] [CrossRef]
- Keti, F.; Atroshey, S.M.; Hamadamin, J.A. A Review of New Improvements in Resource Allocation Problem Optimization In 5G Using Non-Orthogonal Multiple Access. Acad. J. Nawroz Univ. 2022, 11, 245–254. [Google Scholar] [CrossRef]
- Islam, S.R.; Zeng, M.; Dobre, O.A.; Kwak, K.S. Resource allocation for downlink NOMA systems: Key techniques and open issues. IEEE Wirel. Commun. 2018, 25, 40–47. [Google Scholar] [CrossRef]
- Paulus, A.; Rolínek, M.; Musil, V.; Amos, B.; Martius, G. Comboptnet: Fit the right np-hard problem by learning integer programming constraints. In Proceedings of the International Conference on Machine Learning, Shenzhen, China, 26 February–1 March 2021; pp. 8443–8453. [Google Scholar]
- Hiesmayr, B.C. Free versus bound entanglement, a NP-hard problem tackled by machine learning. Sci. Rep. 2021, 11, 19739. [Google Scholar] [CrossRef]
- Zhang, X.; Gao, Q.; Gong, C.; Xu, Z. User grouping and power allocation for NOMA visible light communication multi-cell networks. IEEE Commun. Lett. 2016, 21, 777–780. [Google Scholar] [CrossRef]
- Cui, J.; Liu, Y.; Nallanathan, A. Multi-agent reinforcement learning-based resource allocation for UAV networks. IEEE Trans. Wirel. Commun. 2019, 19, 729–743. [Google Scholar] [CrossRef]
- Chen, S.; Peng, K.; Jin, H. A suboptimal scheme for uplink NOMA in 5G systems. In Proceedings of the 2015 International Wireless Communications and Mobile Computing Conference (IWCMC), Dubrovnik, Croatia, 24–28 August 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1429–1434. [Google Scholar]
- Lei, L.; Yuan, D.; Ho, C.K.; Sun, S. Power and channel allocation for non-orthogonal multiple access in 5G systems: Tractability and computation. IEEE Trans. Wirel. Commun. 2016, 15, 8580–8594. [Google Scholar] [CrossRef]
- Zeng, M.; Yadav, A.; Dobre, O.A.; Poor, H.V. Energy-efficient power allocation for MIMO-NOMA with multiple users in a cluster. IEEE Access 2018, 6, 5170–5181. [Google Scholar] [CrossRef]
- Wang, D.; Chen, D.; Song, B.; Guizani, N.; Yu, X.; Du, X. From IoT to 5G I-IoT: The next generation IoT-based intelligent algorithms and 5G technologies. IEEE Commun. Mag. 2018, 56, 114–120. [Google Scholar] [CrossRef]
- Zhai, Q.; Boli´c, M.; Li, Y.; Cheng, W.; Liu, C. A Q-learning-based resource allocation for downlink non-orthogonal multiple access systems considering QoS. IEEE Access 2021, 9, 72702–72711. [Google Scholar] [CrossRef]
- Palitharathna, K.W.; Suraweera, H.A.; Godaliyadda, R.I.; Herath, V.R.; Ding, Z. Neural network-based blockage prediction and optimization in lightwave power transfer-enabled hybrid VLC/RF systems. IEEE Internet Things J. 2023, 11, 5237–5248. [Google Scholar] [CrossRef]
- Ghanbarzadeh, V.; Zahabi, M.; Amiriara, H.; Jafari, F.; Kaddoum, G. Resource allocation in NOMA Networks: Convex optimization and stacking ensemble machine learning. IEEE Open J. Commun. Soc. 2024, 5, 5276–5288. [Google Scholar] [CrossRef]
- He, S.; Wang, W. QoE-aware Q-learning resource allocation for NOMA wireless multimedia communications. IET Netw. 2020, 9, 262–269. [Google Scholar] [CrossRef]
- Guo, W.; Wang, X. Power Allocation for Secure NOMA Network Based on Q-learning. IEEE Access 2024, 12, 104833–104845. [Google Scholar] [CrossRef]
- Wang, X.; Meng, K.; Wang, X.; Liu, Z.; Ma, Y. Dynamic user resource allocation for downlink multicarrier NOMA with an actor–critic method. Energies 2023, 16, 2984. [Google Scholar] [CrossRef]
- Siddiqi, U.F.; Sait, S.M.; Uysal, M. Deep Q-learning based optimization of VLC systems with dynamic time-division multiplexing. IEEE Access 2020, 8, 120375–120387. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
Parameter | Meaning | Value |
---|---|---|
N | number of users | 2, 6 |
M | number of channel | 1, 3 |
total power | 3 W | |
number of rounds | 100 | |
punishing factor | −3 | |
quantification interval | 0.0001 | |
greed factor | 0.1 | |
learning rate | 0.01 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, Y.; Gong, X.; Xiong, Y.; Li, H.; Wang, X. Downlink Non-Orthogonal Multiple Access Power Allocation Algorithm Based on Double Deep Q Network for Ensuring User’s Quality of Service. Symmetry 2024, 16, 1613. https://doi.org/10.3390/sym16121613
Lin Y, Gong X, Xiong Y, Li H, Wang X. Downlink Non-Orthogonal Multiple Access Power Allocation Algorithm Based on Double Deep Q Network for Ensuring User’s Quality of Service. Symmetry. 2024; 16(12):1613. https://doi.org/10.3390/sym16121613
Chicago/Turabian StyleLin, Ying, Xingbo Gong, Yongwei Xiong, Haomin Li, and Xiangcheng Wang. 2024. "Downlink Non-Orthogonal Multiple Access Power Allocation Algorithm Based on Double Deep Q Network for Ensuring User’s Quality of Service" Symmetry 16, no. 12: 1613. https://doi.org/10.3390/sym16121613
APA StyleLin, Y., Gong, X., Xiong, Y., Li, H., & Wang, X. (2024). Downlink Non-Orthogonal Multiple Access Power Allocation Algorithm Based on Double Deep Q Network for Ensuring User’s Quality of Service. Symmetry, 16(12), 1613. https://doi.org/10.3390/sym16121613