[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = multi-player multi-armed bandit

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 773 KiB  
Article
Distributed Data-Driven Learning-Based Optimal Dynamic Resource Allocation for Multi-RIS-Assisted Multi-User Ad-Hoc Network
by Yuzhu Zhang and Hao Xu
Algorithms 2024, 17(1), 45; https://doi.org/10.3390/a17010045 - 19 Jan 2024
Cited by 3 | Viewed by 2724
Abstract
This study investigates the problem of decentralized dynamic resource allocation optimization for ad-hoc network communication with the support of reconfigurable intelligent surfaces (RIS), leveraging a reinforcement learning framework. In the present context of cellular networks, device-to-device (D2D) communication stands out as a promising [...] Read more.
This study investigates the problem of decentralized dynamic resource allocation optimization for ad-hoc network communication with the support of reconfigurable intelligent surfaces (RIS), leveraging a reinforcement learning framework. In the present context of cellular networks, device-to-device (D2D) communication stands out as a promising technique to enhance the spectrum efficiency. Simultaneously, RIS have gained considerable attention due to their ability to enhance the quality of dynamic wireless networks by maximizing the spectrum efficiency without increasing the power consumption. However, prevalent centralized D2D transmission schemes require global information, leading to a significant signaling overhead. Conversely, existing distributed schemes, while avoiding the need for global information, often demand frequent information exchange among D2D users, falling short of achieving global optimization. This paper introduces a framework comprising an outer loop and inner loop. In the outer loop, decentralized dynamic resource allocation optimization has been developed for self-organizing network communication aided by RIS. This is accomplished through the application of a multi-player multi-armed bandit approach, completing strategies for RIS and resource block selection. Notably, these strategies operate without requiring signal interaction during execution. Meanwhile, in the inner loop, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm has been adopted for cooperative learning with neural networks (NNs) to obtain optimal transmit power control and RIS phase shift control for multiple users, with a specified RIS and resource block selection policy from the outer loop. Through the utilization of optimization theory, distributed optimal resource allocation can be attained as the outer and inner reinforcement learning algorithms converge over time. Finally, a series of numerical simulations are presented to validate and illustrate the effectiveness of the proposed scheme. Full article
(This article belongs to the Collection Parallel and Distributed Computing: Algorithms and Applications)
Show Figures

Figure 1

Figure 1
<p>Multi-RIS-assisted ad-hoc wireless network.</p>
Full article ">Figure 2
<p>Overall outer and inner network structure 1.</p>
Full article ">Figure 3
<p>TD3 network structure.</p>
Full article ">Figure 4
<p>(<b>a</b>) RIS selection of agent 1. (<b>b</b>) RIS selection of agent 2. (<b>c</b>) RIS selection of agent 3. (<b>d</b>) RIS selection of agent 4.</p>
Full article ">Figure 5
<p>(<b>a</b>) Average EE compared with different methods. (<b>b</b>) Average SE compared with different methods.</p>
Full article ">Figure 6
<p>An illustration of the variation in EE and SE with varying transmit power using various methods. (<b>a</b>) Average EE versus time steps under <math display="inline"><semantics> <msub> <mi>P</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> </semantics></math> = 20 dBm, 22 dBm, 24 dBm. (<b>b</b>) Average SE versus time steps under <math display="inline"><semantics> <msub> <mi>P</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> </semantics></math> = 20 dBm, 22 dBm, 24 dBm.</p>
Full article ">
18 pages, 1298 KiB  
Article
Budgeted Bandits for Power Allocation and Trajectory Planning in UAV-NOMA Aided Networks
by Ramez Hosny, Sherief Hashima, Ehab Mahmoud Mohamed, Rokaia M. Zaki and Basem M. ElHalawany
Drones 2023, 7(8), 518; https://doi.org/10.3390/drones7080518 - 7 Aug 2023
Cited by 4 | Viewed by 1658
Abstract
On one hand combining Unmanned Aerial Vehicles (UAVs) and Non-Orthogonal Multiple Access (NOMA) is a remarkable direction to sustain the exponentially growing traffic requirements of the forthcoming Sixth Generation (6G) networks. In this paper, we investigate effective Power Allocation (PA) and Trajectory Planning [...] Read more.
On one hand combining Unmanned Aerial Vehicles (UAVs) and Non-Orthogonal Multiple Access (NOMA) is a remarkable direction to sustain the exponentially growing traffic requirements of the forthcoming Sixth Generation (6G) networks. In this paper, we investigate effective Power Allocation (PA) and Trajectory Planning Algorithm (TPA) for UAV-aided NOMA systems to assist multiple survivors in a post-disaster scenario, where ground stations are malfunctioned. Here, the UAV maneuvers to collect data from survivors, which are grouped in multiple clusters within the disaster area, to satisfy their traffic demands. On the other hand, while the problem is formulated as Budgeted Multi-Armed Bandits (BMABs) that optimize the UAV trajectory and minimize battery consumption, challenges may arise in real-world scenarios. Herein, the UAV is the bandit player, the disaster area clusters are the bandit arms, the sum rate of each cluster is the payoff, and the UAV energy consumption is the budget. Hence, to tackle these challenges, two Upper Confidence Bound (UCB) BMAB schemes are leveraged to handle this issue, namely BUCB1 and BUCB2. Simulation results confirm the superior performance of the proposed BMAB solution against benchmark solutions for UAV-aided NOMA communication. Notably, the BMAB-NOMA solution exhibits remarkable improvements, achieving 60% enhancement in the total number of assisted survivors, 80% improvement in convergence speed, and a considerable amount of energy saving compared to UAV-OMA. Full article
(This article belongs to the Special Issue AI-Powered Energy-Efficient UAV Communications)
Show Figures

Figure 1

Figure 1
<p>A multi-cluster emergency UAV-NOMA enabled network.</p>
Full article ">Figure 2
<p>Number of assisted survivors versus time horizon.</p>
Full article ">Figure 3
<p>Number of assisted survivors versus transmit Power <math display="inline"><semantics><msub><mi>P</mi><mi>t</mi></msub></semantics></math>, dBm.</p>
Full article ">Figure 4
<p>Number of assisted survivors versus a number of clusters <math display="inline"><semantics><mi mathvariant="double-struck">N</mi></semantics></math> for E = 1000 J.</p>
Full article ">Figure 5
<p>Number of assisted survivors versus the number of clusters for different NOMA power allocation schemes.</p>
Full article ">Figure 6
<p>Number of assisted survivors versus time horizon NOMA power allocation.</p>
Full article ">Figure 7
<p>Number of assisted survivors versus UAV-NOMA transmit Power <math display="inline"><semantics><msub><mi>P</mi><mi>t</mi></msub></semantics></math>, dBm.</p>
Full article ">
18 pages, 1072 KiB  
Article
Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB
by Ehab Mahmoud Mohamed, Mohammad Alnakhli, Sherief Hashima and Mohamed Abdel-Nasser
Electronics 2023, 12(1), 12; https://doi.org/10.3390/electronics12010012 - 20 Dec 2022
Cited by 14 | Viewed by 2251
Abstract
Millimeter wave (mmWave), reconfigurable intelligent surface (RIS), and unmanned aerial vehicles (UAVs) are considered vital technologies of future six-generation (6G) communication networks. In this paper, various UAV mounted RIS are distributed to support mmWave coverage over several hotspots where numerous users exist in [...] Read more.
Millimeter wave (mmWave), reconfigurable intelligent surface (RIS), and unmanned aerial vehicles (UAVs) are considered vital technologies of future six-generation (6G) communication networks. In this paper, various UAV mounted RIS are distributed to support mmWave coverage over several hotspots where numerous users exist in harsh blockage environment. UAVs should be spread among the hotspots to maximize their average achievable data rates while minimizing their hovering and flying energy consumptions. To efficiently address this non-polynomial time (NP) problem, it will be formulated as a centralized budget constraint multi-player multi-armed bandit (BCMP-MAB) game. In this formulation, UAVs will act as the players, the hotspots as the arms, and the achievable sum rates of the hotspots as the profit of the MAB game. This formulated MAB problem is different from the traditional one due to the added constraints of the limited budget of UAVs batteries as well as collision avoidance among UAVs, i.e., a hotspot should be covered by only one UAV at a time. Numerical analysis of different scenarios confirm the superior performance of the proposed BCMP-MAB algorithm over other benchmark schemes in terms of average sum rate and energy efficiency with comparable computational complexity and convergence rate. Full article
(This article belongs to the Special Issue Online Learning Aided Solutions for 6G Wireless Networks)
Show Figures

Figure 1

Figure 1
<p>Proposed system model of multi mmWave UAV mounted RIS hotspot area coverage.</p>
Full article ">Figure 2
<p>Schematic diagram of the mmWave BS, UAV mounted RIS, UE communication links.</p>
Full article ">Figure 3
<p>Average sum rate against the value of <math display="inline"><semantics> <mi>ρ</mi> </semantics></math>.</p>
Full article ">Figure 4
<p>Average energy consumption against the value of <math display="inline"><semantics> <mi>ρ</mi> </semantics></math>.</p>
Full article ">Figure 5
<p>Average sum rate against number of UAVs.</p>
Full article ">Figure 6
<p>Average energy efficiency against number of UAVs.</p>
Full article ">Figure 7
<p>Average sum rate against <math display="inline"><semantics> <msub> <mi>P</mi> <mi>t</mi> </msub> </semantics></math>.</p>
Full article ">Figure 8
<p>Average energy efficiency in bps/J against <math display="inline"><semantics> <msub> <mi>P</mi> <mi>t</mi> </msub> </semantics></math>.</p>
Full article ">Figure 9
<p>Average sum rate convergence against the time horizon using <math display="inline"><semantics> <mrow> <mi>M</mi> <mspace width="4pt"/> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>N</mi> <mspace width="4pt"/> <mo>=</mo> <mn>20</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 10
<p>Average sum rate convergence against the time horizon using <math display="inline"><semantics> <mrow> <mi>M</mi> <mspace width="4pt"/> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>N</mi> <mspace width="4pt"/> <mo>=</mo> <mn>50</mn> </mrow> </semantics></math>.</p>
Full article ">
19 pages, 731 KiB  
Article
Enhanced Dynamic Spectrum Access in UAV Wireless Networks for Post-Disaster Area Surveillance System: A Multi-Player Multi-Armed Bandit Approach
by Amr Amrallah, Ehab Mahmoud Mohamed, Gia Khanh Tran and Kei Sakaguchi
Sensors 2021, 21(23), 7855; https://doi.org/10.3390/s21237855 - 25 Nov 2021
Cited by 13 | Viewed by 2905
Abstract
Modern wireless networks are notorious for being very dense, uncoordinated, and selfish, especially with greedy user needs. This leads to a critical scarcity problem in spectrum resources. The Dynamic Spectrum Access system (DSA) is considered a promising solution for this scarcity problem. With [...] Read more.
Modern wireless networks are notorious for being very dense, uncoordinated, and selfish, especially with greedy user needs. This leads to a critical scarcity problem in spectrum resources. The Dynamic Spectrum Access system (DSA) is considered a promising solution for this scarcity problem. With the aid of Unmanned Aerial Vehicles (UAVs), a post-disaster surveillance system is implemented using Cognitive Radio Network (CRN). UAVs are distributed in the disaster area to capture live images of the damaged area and send them to the disaster management center. CRN enables UAVs to utilize a portion of the spectrum of the Electronic Toll Collection (ETC) gates operating in the same area. In this paper, a joint transmission power selection, data-rate maximization, and interference mitigation problem is addressed. Considering all these conflicting parameters, this problem is investigated as a budget-constrained multi-player multi-armed bandit (MAB) problem. The whole process is done in a decentralized manner, where no information is exchanged between UAVs. To achieve this, two power-budget-aware PBA-MAB) algorithms, namely upper confidence bound (PBA-UCB (MAB) algorithm and Thompson sampling (PBA-TS) algorithm, were proposed to realize the selection of the transmission power value efficiently. The proposed PBA-MAB algorithms show outstanding performance over random power value selection in terms of achievable data rate. Full article
Show Figures

Figure 1

Figure 1
<p>UAV surveillance-system-assisted DSA for a metropolitan post-disaster area.</p>
Full article ">Figure 2
<p>Distribution of PUs and SUs Tx/Rx pairs.</p>
Full article ">Figure 3
<p>Normalized average sum rate against number of ETC gates using 10 UAVs.</p>
Full article ">Figure 4
<p>Normalized average sum rate against number of UAVs using 10 ETC gates.</p>
Full article ">Figure 5
<p>Convergence of normalized average sum rate using 10 ETC gates and 10 UAVs.</p>
Full article ">Figure 6
<p>Convergence of normalized average sum rate using 10 ETC gates and 30 UAVs.</p>
Full article ">
19 pages, 3421 KiB  
Article
Wi-Fi Assisted Contextual Multi-Armed Bandit for Neighbor Discovery and Selection in Millimeter Wave Device to Device Communications
by Sherief Hashima, Kohei Hatano, Hany Kasban and Ehab Mahmoud Mohamed
Sensors 2021, 21(8), 2835; https://doi.org/10.3390/s21082835 - 17 Apr 2021
Cited by 20 | Viewed by 3746
Abstract
The unique features of millimeter waves (mmWaves) motivate its leveraging to future, beyond-fifth-generation/sixth-generation (B5G/6G)-based device-to-device (D2D) communications. However, the neighborhood discovery and selection (NDS) problem still needs intelligent solutions due to the trade-off of investigating adjacent devices for the optimum device choice against [...] Read more.
The unique features of millimeter waves (mmWaves) motivate its leveraging to future, beyond-fifth-generation/sixth-generation (B5G/6G)-based device-to-device (D2D) communications. However, the neighborhood discovery and selection (NDS) problem still needs intelligent solutions due to the trade-off of investigating adjacent devices for the optimum device choice against the crucial beamform training (BT) overhead. In this paper, by making use of multiband (?W/mmWave) standard devices, the mmWave NDS problem is addressed using machine-learning-based contextual multi-armed bandit (CMAB) algorithms. This is done by leveraging the context information of Wi-Fi signal characteristics, i.e., received signal strength (RSS), mean, and variance, to further improve the NDS method. In this setup, the transmitting device acts as the player, the arms are the candidate mmWave D2D links between that device and its neighbors, while the reward is the average throughput. We examine the NDS’s primary trade-off and the impacts of the contextual information on the total performance. Furthermore, modified energy-aware linear upper confidence bound (EA-LinUCB) and contextual Thomson sampling (EA-CTS) algorithms are proposed to handle the problem through reflecting the nearby devices’ withstanding battery levels, which simulate real scenarios. Simulation results ensure the superior efficiency of the proposed algorithms over the single band (mmWave) energy-aware noncontextual MAB algorithms (EA-UCB and EA-TS) and traditional schemes regarding energy efficiency and average throughput with a reasonable convergence rate. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>Multi-band D2D network architecture.</p>
Full article ">Figure 2
<p>General CMAB protocol.</p>
Full article ">Figure 3
<p>Average throughput versus the number of devices at no blocking for UCB, TS, LinUCB, and CTS algorithms.</p>
Full article ">Figure 4
<p>Average throughput versus blocking density λ for UCB, TS, LinUCB, and CTS algorithms using 60 devices.</p>
Full article ">Figure 5
<p>Convergence rates of LinUCB, CTS, TS, and CTS algorithms.</p>
Full article ">Figure 6
<p>Average throughput versus the number of devices at no blocking for EA-UCB, EA-TS, EA-LinUCB, and EA-CTS algorithms.</p>
Full article ">Figure 7
<p>Average throughput versus blocking density λ for EA-UCB, EA-TS, EA-LinUCB, and EA-CTS using 60 devices.</p>
Full article ">Figure 8
<p><span class="html-italic">EE</span> versus the number of devices for EA-UCB, EA-TS, EA-LinUCB, and EA-CTS at no blocking.</p>
Full article ">Figure 9
<p><span class="html-italic">EE</span> versus blocking density for EA-UCB, EA-TS, EA-LinUCB, and EA-CTS using 60 devices.</p>
Full article ">Figure 10
<p>Convergence rate of energy aware algorithms using 60 devices.</p>
Full article ">
22 pages, 4241 KiB  
Article
Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit
by Ehab Mahmoud Mohamed, Sherief Hashima, Abdallah Aldosary, Kohei Hatano and Mahmoud Ahmed Abdelghany
Sensors 2020, 20(14), 3947; https://doi.org/10.3390/s20143947 - 16 Jul 2020
Cited by 28 | Viewed by 3219
Abstract
Recently, unmanned aerial vehicle (UAV)-based communications gained a lot of attention due to their numerous applications, especially in rescue services in post-disaster areas where the terrestrial network is wholly malfunctioned. Multiple access/gateway UAVs are distributed to fully cover the post-disaster area as flying [...] Read more.
Recently, unmanned aerial vehicle (UAV)-based communications gained a lot of attention due to their numerous applications, especially in rescue services in post-disaster areas where the terrestrial network is wholly malfunctioned. Multiple access/gateway UAVs are distributed to fully cover the post-disaster area as flying base stations to provide communication coverage, collect valuable information, disseminate essential instructions, etc. The access UAVs after gathering/broadcasting the necessary information should select and fly towards one of the surrounding gateways for relaying their information. In this paper, the gateway UAV selection problem is addressed. The main aim is to maximize the long-term average data rates of the UAVs relays while minimizing the flights’ battery cost, where millimeter wave links, i.e., using 30~300 GHz band, employing antenna beamforming, are used for backhauling. A tool of machine learning (ML) is exploited to address the problem as a budget-constrained multi-player multi-armed bandit (MAB) problem. In this setup, access UAVs act as the players, and the arms are the gateway UAVs, while the rewards are the average data rates of the constructed relays constrained by the battery cost of the access UAV flights. In this decentralized setting, where information is neither prior available nor exchanged among UAVs, a selfish and concurrent multi-player MAB strategy is suggested. Towards this end, three battery-aware MAB (BA-MAB) algorithms, namely upper confidence bound (UCB), Thompson sampling (TS), and the exponential weight algorithm for exploration and exploitation (EXP3), are proposed to realize gateways selection efficiently. The proposed BA-MAB-based gateway UAV selection algorithms show superior performance over approaches based on near and random selections in terms of total system rate and energy efficiency. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

Figure 1
<p>Millimeter wave (mmWave) unmanned aerial vehicle (UAV) network architecture.</p>
Full article ">Figure 2
<p>Schematic diagram of the mmWave flat-top antenna model.</p>
Full article ">Figure 3
<p>General multi-player multi-armed bandit (MAB) protocol.</p>
Full article ">Figure 4
<p>Average system rate against the number of access UAVs using gateway UAVs of 20 and a beam-width of 60°.</p>
Full article ">Figure 5
<p>Average system rate against the number of gateway UAVs using access UAVs of 20 and a beam-width of 60°.</p>
Full article ">Figure 6
<p>Average system rate against beam-width using access UAVs of 40 and gateway UAVs of 20.</p>
Full article ">Figure 7
<p>Average energy efficiency against the number of access UAVs using gateway UAVs of 20 and a beam-width of 60°.</p>
Full article ">Figure 8
<p>Average energy efficiency against the number of gateway UAVs using access UAVs of 20 and a beam-width of 60°.</p>
Full article ">Figure 9
<p>Average energy efficiency against beam-width using access UAVs of 40 and gateway UAVs of 20.</p>
Full article ">Figure 10
<p>The convergence of system rate using access UAVs of 20, gateway UAVs of 20, and a beam-width of 60°.</p>
Full article ">Figure 11
<p>The convergence of system rate using access UAVs of 30, gateway UAVs of 20, and a beam-width of 60°.</p>
Full article ">Figure 12
<p>The convergence of system rate using access UAVs of 40, gateway UAVs of 20, and a beam-width of 60°.</p>
Full article ">
22 pages, 670 KiB  
Article
muMAB: A Multi-Armed Bandit Model for Wireless Network Selection
by Stefano Boldrini, Luca De Nardis, Giuseppe Caso, Mai T. P. Le, Jocelyn Fiorina and Maria-Gabriella Di Benedetto
Algorithms 2018, 11(2), 13; https://doi.org/10.3390/a11020013 - 26 Jan 2018
Cited by 21 | Viewed by 7548
Abstract
Multi-armed bandit (MAB) models are a viable approach to describe the problem of best wireless network selection by a multi-Radio Access Technology (multi-RAT) device, with the goal of maximizing the quality perceived by the final user. The classical MAB model does not allow, [...] Read more.
Multi-armed bandit (MAB) models are a viable approach to describe the problem of best wireless network selection by a multi-Radio Access Technology (multi-RAT) device, with the goal of maximizing the quality perceived by the final user. The classical MAB model does not allow, however, to properly describe the problem of wireless network selection by a multi-RAT device, in which a device typically performs a set of measurements in order to collect information on available networks, before a selection takes place. The MAB model foresees in fact only one possible action for the player, which is the selection of one among different arms at each time step; existing arm selection algorithms thus mainly differ in the rule according to which a specific arm is selected. This work proposes a new MAB model, named measure-use-MAB (muMAB), aiming at providing a higher flexibility, and thus a better accuracy in describing the network selection problem. The muMAB model extends the classical MAB model in a twofold manner; first, it foresees two different actions: to measure and to use; second, it allows actions to span over multiple time steps. Two new algorithms designed to take advantage of the higher flexibility provided by the muMAB model are also introduced. The first one, referred to as measure-use-UCB1 (muUCB1) is derived from the well known UCB1 algorithm, while the second one, referred to as Measure with Logarithmic Interval (MLI), is appositely designed for the new model so to take advantage of the new measure action, while aggressively using the best arm. The new algorithms are compared against existing ones from the literature in the context of the muMAB model, by means of computer simulations using both synthetic and captured data. Results show that the performance of the algorithms heavily depends on the Probability Density Function (PDF) of the reward received on each arm, with different algorithms leading to the best performance depending on the PDF. Results highlight, however, that as the ratio between the time required for using an arm and the time required to measure increases, the proposed algorithms guarantee the best performance, with muUCB1 emerging as the best candidate when the arms are characterized by similar mean rewards, and MLI prevailing when an arm is significantly more rewarding than others. This calls thus for the introduction of an adaptive approach capable of adjusting the behavior of the algorithm or of switching algorithm altogether, depending on the acquired knowledge on the PDF of the reward on each arm. Full article
Show Figures

Figure 1

Figure 1
<p>Performance in terms of regret of the six considered algorithms, with a Bernoulli distribution for the reward Probability Density Function (PDF) and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 2
<p>Performance in terms of regret of the six considered algorithms, with a Bernoulli distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>5</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 3
<p>Performance in terms of regret of the six considered algorithms, with a Bernoulli distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>10</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 4
<p>Performance in terms of regret of the six considered algorithms, with a truncated Gaussian distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 5
<p>Performance in terms of regret of the six considered algorithms, with a truncated Gaussian distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>5</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 6
<p>Performance in terms of regret of the six considered algorithms, with a truncated Gaussian distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>10</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 7
<p>Performance in terms of regret of the six considered algorithms, with an exponential distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 8
<p>Performance in terms of regret of the six considered algorithms, with a exponential distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>5</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 9
<p>Performance in terms of regret of the six considered algorithms, with an exponential distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>10</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 10
<p>Regret achieved by the six considered algorithms at the time horizon as a function of the run, with a Bernoulli distribution for the reward PDF and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>5</mn> </mrow> </semantics> </math>. (<b>a</b>) <span class="html-italic">Hard</span> configuration; (<b>b</b>) <span class="html-italic">Easy</span> configuration.</p>
Full article ">Figure 11
<p>Performance in terms of regret of the six considered algorithms, with real captured data used as reward and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>. (<b>a</b>) linear conversion; (<b>b</b>) logarithmic conversion.</p>
Full article ">Figure 12
<p>Performance in terms of regret of the six considered algorithms, with real captured data used as reward and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>5</mn> </mrow> </semantics> </math>. (<b>a</b>) linear conversion; (<b>b</b>) logarithmic conversion.</p>
Full article ">Figure 13
<p>Performance in terms of regret of the six considered algorithms, with real captured data used as reward and <math display="inline"> <semantics> <mrow> <msub> <mi>T</mi> <mi>U</mi> </msub> <mo>/</mo> <msub> <mi>T</mi> <mi>M</mi> </msub> <mo>=</mo> <mn>10</mn> </mrow> </semantics> </math>. (<b>a</b>) linear conversion; (<b>b</b>) logarithmic conversion.</p>
Full article ">Figure 14
<p>Execution time of the six considered algorithms normalized with respect to the execution time of the <math display="inline"> <semantics> <mi>ε</mi> </semantics> </math>-greedy algorithm.</p>
Full article ">
Back to TopTop