4.2.1 Scenario I.
We evaluate MEDUSA’s performance compared to state-of-the-art approaches according to the metrics introduced earlier. Figure
5–
12 represent the performance difference of ABR algorithms equipped with MEDUSA with respect to the baseline, i.e., without MEDUSA. Arrows indicating the higher the better (
\(\uparrow\)) and the lower the better (
\(\downarrow\)). The green triangle reported for each distribution represents its
mean value, i.e., as the arithmetic average of the values in the distribution.
Bitrate. Figure
5(a) and (b) illustrate the difference in bitrates requested by MEDUSA compared to the baseline (i.e., without MEDUSA) for 20 s and 40 s buffer capacity. A positive value means that MEDUSA obtains a bitrate increase with respect to the underlying ABR technique. Although “the higher the better” is reported in the graph, a higher bitrate value does not necessarily correspond to a higher perceived quality nor to a higher amount of transmitted data, as it refers to the bitrate reported in the manifest, hence averaged across all video segments. Looking at AGG-M and SARA-M, we notice a similar trend between network traces and two buffer scenarios. For both network traces, the bitrate distribution remains significantly below 0, decreasing down to roughly -30% for AGG-M and -33% for SARA-M. This implies that MEDUSA for AGG and SARA behaves more conservatively than the respective techniques and frequently switches representations to codecs with lower bitrates. It is interesting to note that for AGG-M and 4G-LTE, the mean value is lower than the median, which implies the effect of several negative outliers. BOLA-M exhibits a behavior in line with buffer-based ABR algorithms, boosting the mean bitrate when the buffer capacity is increased. This increase is attributed to the ample availability of buffer which leads MEDUSA’s results for 4G-LTE to a median change from -5.5% to 3.2% compared to BOLA. The highest bitrate increment is reached by BBA-0-M, whose distribution, benefiting from the buffer increase to 40 s, consistently improves the bitrate by up to 27% compared to BBA-0. The wide distribution range for 20 s buffer capacity comes from the different input video content (analyzed in Section
4.2.2).
VMAF. Figure
6(a) and (b) depict the difference in VMAF requested by MEDUSA compared to the baseline (i.e., without MEDUSA) for 20 s and 40 s buffer capacity, respectively. Higher values correspond to higher VMAF values for MEDUSA. In Figure
6(a) and in accordance with the bitrate changes discussed above, for both 20 s and 40 s buffer capacity, MEDUSA results in a VMAF decrease for AGG-M of up to 1 JND, which happens for limited cases, and in a VMAF increase for BBA-0-M which surpasses 1 JND for 20 s and touches 2 JNDs for 40 s. Given the consistent bitrate reduction for BOLA-M and SARA-M, it is interesting to note that MEDUSA can increase the VMAF by up to 1 JND compared to the respective underlying approach, which contrasts the general assumption that a lower bitrate maps to a lower quality. Increasing the buffer capacity leads SARA-M for 4G-LTE to a general decrease in VMAF of up to 1 JND, in accordance with the prominent bitrate reduction presented above. On the other hand, FCC drives SARA-M to obtain a consistent VMAF increase with respect to SARA by up to 5 VMAF points, reflecting an improvement of more than 1 JND.
VMAF instability. Figure
7(a) and (b) show the VMAF instability for 20 s and 40 s buffer capacity, respectively. A negative value implies that MEDUSA achieves a lower instability than the compared underlying ABR technique. In Figure
7(a) we notice higher variability and peaks than in Figure
7(b). This can be explained by the lower buffer range, which influences
\(\alpha\) and, hence, the bitrate selection. Since the buffer occupancy is modeled as the number of stored segments, with a higher buffer, the discrete weight values have higher precision, which guarantees a more stable bitrate selection. Furthermore, we observe that the higher the mean VMAF, the lower the mean VMAF instability.
For 20 s buffer capacity, the mean VMAF instability difference is contained within -4% and 4%, showing a consistent instability trend with and without MEDUSA. The only exception is BOLA-M using FCC, whose median and mean have very different values, roughly 2% and 8%, respectively. This means that the original distribution includes many positive outliers for which the instability value is much higher than for the rest of the distribution. The reason for these two opposite behaviors is due to the network trace itself.
FCC, unlike 4G-LTE, maintains a stable throughput of 1 Mbps for more than 80 s, which then suddenly fluctuates between 1 Mbps and 7 Mbps. Since MEDUSA relies on the buffer to balance quality and size, if the buffer is low, BOLA’s selection results in low quality, and MEDUSA prioritizes reducing size, which decreases VMAF and increases instability. If the buffer is high, MEDUSA prioritizes high VMAF regardless of size, increasing instability.
For 40 s buffer capacity, MEDUSA mostly improves the VMAF instability compared to the baseline algorithms. The maximum VMAF instability reduction of 36.8% is obtained by BOLA-M compared to BOLA. The maximum VMAF increase is apported by AGG-M, up to roughly 14% more than AGG.
Transmitted data. Figure
8(a) and (b) represent the comparison in transmitted data between MEDUSA and the underlying ABR technique for 20 s and 40 s buffer capacity. Lower values are preferred, as they refer to a reduction in transmitted data for MEDUSA. Comparing with the bitrate results from Figure
5(a) and (b), we can notice some similarities. However, the trend of bitrate and data changes does not always match, motivating the client’s need to be informed about the segment sizes prior to the segment selection. In fact, although MEDUSA for AGG-M follows the bitrate results reducing transmitted data compared to AGG for both 20 s and 40 s buffer capacity, BBA-0-M has a consistent negative median value for all settings while achieving the highest mean bitrate. The maximum discrepancy occurs for 40 s buffer capacity, when MEDUSA achieves more than 26% bitrate increase compared to BBA-0 while practically limiting the size increase by up to 1%. This shows that the bitrate is a superficial metric and that MEDUSA strikes a real tradeoff between size and quality, according to the mentioned increase in VMAF. Unlike in the 20 s buffer capacity case, MEDUSA for BOLA-M with 40 s buffer capacity and 4G-LTE consistently reduces the size by up to 10% compared to an increase in bitrate of the same percentage. Following the bitrate trend, SARA-M provides a consistent reduction in transmitted data for both 20 s (of up to 19.3%) and 40 s (of up to 41.8%) buffer capacity.
Number and duration of stalls. According to Figure
9(a) and (b), the reduction in bitrate and size has positive effects on the number of stalls, which are consistently decreased for almost all combinations of ABR algorithms and MEDUSA. A negative value means that MEDUSA reduces the number of stall events with respect to the underlying ABR approach. The same holds for the duration of the stalls, shown in Figure
10(a) and (b). The highest visible reduction in the number of stalls is with 20 s buffer capacity for AGG-M, with 1–7 fewer stalls for 4G-LTE and 2–5 fewer stalls for FCC, motivated by the bitrate and size reductions presented above. These results confirm the importance of trading-off between quality and size to reduce the amount of transmitted data while maintaining high video quality. Although the number of stalls decreases, the duration of the stalls remains constant for FCC or slightly increases for 4G-LTE. This means that AGG-M achieves fewer but longer stalls compared to AGG. Increasing the buffer capacity to 40 s has a positive impact on AGG-M, which can better cope with excessive throughput estimation. This leads to a reduction in number and duration of stalls for both network traces.
For BBA-0, BOLA, and SARA, the buffer plays a strategic role in assessing decisions. To a lesser degree than for AGG-M, with 20 s buffer capacity, BBA-0-M occasionally reduces the number of stalls to 2. As mentioned above, FCC is a more stable trace than 4G-LTE, allowing BBA-0-M to better manage the buffer during turbulent periods. MEDUSA improves decisions when buffer occupancy is good, but exacerbates issues when decisions are poor. This behavior leads to a similar number of stalls but longer stall durations in 4G-LTE. Increasing the buffer capacity to 40 s does not cause any stalls for BBA-0 and BBA-0-M for most runs. Some outliers are pushing the mean value for BBA-0-M toward the negative region, meaning that BBA-0-M occasionally reduces the number of stalls compared to BBA-0. To improve the playback smoothness, BOLA acts more conservatively than BBA-0, requesting lower bitrates. This impacts also MEDUSA, which improves BOLA’s actions and reduces the number of stalls under unstable network conditions. Indeed, for 4G-LTE, we can notice a decrease down to 2 stalls, with a trend similar for BBA-0-M. However, compared to BBA-0-M, the reduction in the duration of stalls for BOLA-M is consistent and reaches 9.3 s. SARA-M encounters fewer stalls than SARA in 4G-LTE with a 20 s buffer capacity. The difference in stall duration ranges from -7.1 s to 4.0 s, depending on the video sequence. For FCC, SARA-M reduces the number of stalls. Occasionally, there is a slight increase in the number of stalls, but a significant decrease in the duration of stalls. Increasing the buffer to 40 s, SARA-M is unable to reduce stalls in the FCC case, although the mean transmitted data volume is lower than for SARA. Therefore, there are no differences in the mean number and duration of stalls. The SARA results for 4G-LTE present a more interesting trend, with a reduction in the number of stalls to 1 and in the duration of the stalls to 22.6 s.
Codec switches. Figure
11(a) and (b) represent the difference in the number of codec switches for MEDUSA compared to the baseline with 20 s and 40 s buffer capacity, where each switch event refers to playing a video segment with a different codec than the previous one. Lower values are preferred, as they indicate that MEDUSA leads to a lower frequency in changing the codec. Higher values, however, depict the need for dynamic codec switching over time. It is interesting to note that AGG-M follows a similar trend for 20 s and 40 s buffer capacity since the underlying AGG does not consider the buffer to select the next segment. The number of codec switches ranges from 14 to 48 for both 4G-LTE and FCC. BBA-0-M is shown to be very sensitive to the buffer, which is detected by BBA-0 for the initial selection and then by MEDUSA to choose the right codec, with a wide distribution from 4 to 54 for 20 s and from 8 to 48 for 40 s buffer capacity. BOLA-M provides a similar distribution for 4G-LTE and FCC with 20 s buffer capacity within 20 and 50 while in the 40 s buffer capacity scenario, 4G-LTE leads to more switches than FCC. In particular, the inter-quartile range is within 22 and 38 for 4G-LTE and within 10 and 21 for FCC. With a behavior similar to BOLA-M for 20 s, SARA-M has a slightly wider distribution for FCC, from 12 to 49, with the median below 30. For 40 s buffer capacity, FCC leads to a tighter distribution from 19 to 39 compared to 4G-LTE, which extends from 20 to 52. The explanation for the high variability in distributions is the different number of switches for different video sequences.
QoE. The metrics explained so far give an overview of MEDUSA’s impact on a video streaming session. Combining them according to [
27], we obtain the final QoE measurements, which are presented in Figure
12(a) and (b) as the differences between the QoE scores by MEDUSA and the ones achieved by the underlying ABR algorithm without MEDUSA. Therefore, a positive value implies that MEDUSA achieves a better QoE score than the underlying ABR algorithm. Although reducing the overall volume of requested data compared to
AGG, MEDUSA (
AGG-M) increases the QoE by up to 30% (for 20 s buffer capacity scenario and 4G-LTE) due to a significant reduction in the number of stalls during the streaming session. A similar trend is observed for FCC. However, it is worth noting that although the bitrate is consistently reduced (by up to 30%), the mean decrease in VMAF is lower than 1 JND, which means that it has no impact on the quality perceived by the user. Since the considered ITU-T P.1203 model maps the bitrate to perceived quality, the expected real QoE improvement is probably even higher. Increasing the buffer capacity to 40 s shows a similar trend with MEDUSA (AGG-M) showing a QoE improvement of up to 35% over AGG. MEDUSA improves the QoE for BBA-0-M and BOLA-M compared to the respective underlying ABR algorithms with a QoE improvement of up to 11% for 20 s buffer capacity, with a similar trend for both network traces. A few samples in the distributions of Gameplay and Rally lie below zero for the 4G-LTE trace, similarly to BBA-0-M and AGG-M for FCC. Similar results are obtained by increasing the buffer capacity. For 20 s buffer capacity, SARA-M is the ABR technique achieving the highest mean QoE score. For 4G-LTE, SARA-M’s distribution reaches up to 41.7% improvement. For FCC, SARA-M performs 50% of the times better and 50% of the times worse than SARA. Here the content is extremely important; while SARA-M leads to a QoE improvement compared to SARA for all considered video sequences, it also reduces the QoE by down to 10.5% compared to SARA when streaming Gameplay. With 40 s buffer capacity, MEDUSA (SARA-M) behaves similar to SARA.
To assert the importance of the content in a video streaming session and the dependence of the ABR techniques’ performance on the content, Section
4.2.2 provides detailed graphs on the performance of each ABR for all video sequences, yet for one network trace and buffer capacity only.
4.2.2 Scenario II.
In Scenario II, we highlight the performance of MEDUSA depending on the streamed video content. Therefore, we analyze a specific scenario and present the metrics explained before, excluding the request bitrates (the transmitted data volume gives us a better overview) and the number of codec switches. The chosen scenario refers to 20 s buffer capacity and the 4G-LTE network trace. Figure
13 illustrates the different metrics for each video content and ABR algorithm (with and without MEDUSA on top).
Transmitted data. In Figure
13(a) we can observe that MEDUSA can consistently reduce the mean volume of transmitted data for each ABR algorithm and video content. The largest reduction occurs for AGG-M and ToS1, by up to approximately 24%. It is also interesting to note that for SARA-M the highest reduction is with ToS2 (15%), indicating that MEDUSA is particularly effective for complex video sequences (i.e., high SI/TI for ToS2; cf. Figure
4(c)).
VMAF. Figure
13(b) depicts the mean VMAF values of the requested segments. It is evident that Gameplay achieves a significantly lower VMAF compared to the other video sequences, independently of the chosen ABR strategy (due to its lower VMAF scores compared to other sequences). This impacts the final VMAF score but not the general VMAF trend for the considered ABR strategies, which is similar to all sequences. Based on Figure
13(a), with a single-codec strategy, we would expect that the data reduction corresponds to a decrease in VMAF points. However, we can observe that MEDUSA consistently improves the achieved VMAF compared to the underlying ABR technique. This increment can reach a mean of 4.3 VMAF points, over 1 JND, for BBA-0-M with Gameplay. The exception is AGG-M, which is, however, able to keep the VMAF reduction for all video sequences within 3 VMAF points compared to AGG.
VMAF instability. Figure
13(c) shows the mean VMAF instability for the ABR strategies and video sequences. We can observe a similar behavior of MEDUSA and the underlying ABR techniques, with all changes being
\(\pm 1\) VMAF points at most. This is expected since MEDUSA does not consider VMAF instability in its objective function and, hence, does not focus on reducing it.
Number and duration of stalls. Figure
13(d) and (e) illustrate the numbers and durations of stall events in the playback for each ABR technique and video content. We can see that MEDUSA can consistently reduce the number of stalls and partially decrease the duration of stalls for each ABR algorithm and video content. The largest decrease in number of stalls occurs for AGG-M with ToS2, the most complex test sequence in evaluation. The reduction in this case reflects 5 stalls on average (-73%). Despite this significant decrease, AGG-M doubles the duration of these stalls from approximately 5 s to 10 s, which means that AGG-M stalls less frequently but notably longer than AGG. Comparing BBA-0 and BBA-0-M, it is interesting to notice a similar trend in the number of stalls, but a different trend in the duration of stalls for Gameplay and Rally. While BBA-0 achieves a lower stall duration (-54%) for Gameplay than BBA-0-M, BBA-0-M reduces the mean value for Rally (-59%) compared to BBA-0. This can be explained by the behavior of MEDUSA. The mentioned stalls occur in proximity to a throughput drop. Although neither MEDUSA nor BBA-0 consider the throughput, the buffer is too short to cope with throughput fluctuations; this results in stall events. The duration of such stalls depends on the buffer status before the request is sent. Therefore, the decisions taken by BBA-0 or BBA-0-M for the previous segments are of vital importance. For Rally, having large segments, the decisions made by BBA-0 keep the buffer occupancy in a risky area, limiting the bitrate for the next segments and, hence, reducing the stall duration if a stall happens. BBA-0-M, on the other hand, aims at optimizing the tradeoff between quality and size, reduces the transmitted data volume and, therefore, also the download time, which results in an increase of the buffer occupancy. When the buffer occupancy is high and BBA-0-M prioritizes the quality over the size, if the throughput drops in the middle of the segment request, we inevitably experience a longer stall than for BBA-0. The opposite consideration holds for Gameplay, whose segments are on average smaller than those of other video sequences, which explains the lower VMAF score.
QoE. The aforementioned metrics eventually influence the QoE of users, which is represented in Figure
13(f). It is evident that MEDUSA can consistently enhance the QoE compared to the underlying ABR technique. The largest increase in QoE comes from SARA-M for ToS1 with more than 37%, due to the lower number and duration of stalls compared to SARA, which has a big impact on the QoE. The second ABR algorithm that provides a large improvement in QoE is AGG-M, which enhances the QoE for TOS2 by 22% as compared to AGG. Considering different video sequences for the same ABR approaches, we can observe a mostly constant pattern for BOLA and BOLA-M, while, for instance, SARA and SARA-M have different behaviors depending on the content. For instance, SARA achieves the highest QoE score for Gameplay (1.61) and the lowest for ToS1 (1.35). SARA-M, instead, achieves the highest QoE for ToS2 (1.88) but the worst for Gameplay (1.66).