[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
survey
Open access

Soft Delivery: Survey on a New Paradigm for Wireless and Mobile Multimedia Streaming

Published: 14 September 2023 Publication History

Abstract

The increasing demand for video streaming services is the key driver of modern wireless and mobile communications. Although many studies have designed digital-based delivery schemes to send video content over wireless and mobile networks, significant quality degradation, known as cliff and leveling effects, often occurs owing to fluctuating channel characteristics. In this article, we present a comprehensive summary of soft delivery, which is a new paradigm for wireless and mobile video streaming and discuss the future directions of soft delivery. Existing studies found that introducing multi-dimensional cosine transform, human vision system, and graph signal processing can make soft delivery schemes more effective in untethered immersive experiences, including virtual reality and volumetric media, than digital-based delivery schemes. In addition, this study finds that soft delivery has the potential to be a new standard to deliver deep neural network models and tactile information over wireless and mobile networks.

1 Introduction

Video streaming over wireless and mobile networks is one of the major applications in wireless environments. According to Ericsson Mobility Reports, approximately 80% of all mobile data traffic will be video traffic by 2028 [1]. The explosive growth of data traffic, especially video traffic, poses a huge challenge to wireless and mobile networks. In recent years, immersive content, such as Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR), have shown very good potential to be the next important applications for networks. The growth of such immersive applications is rapidly increasing together with the development of fifth generation (5G) technology and smart wearable devices, which can enable technology for all Extended Reality (XR) applications.
In general, wireless video streaming systems transmit images and video signals with different channel characteristics to single or multiple users. For high-quality video streaming applications, the main challenge is the difficulty in fully utilizing each user’s channel capacity and providing each user with the best video quality possible under his or her channel conditions. Solving this challenge will provide users with an improved quality of experience. To address this challenge, conventional streaming systems, which consist of video coding and transmission technologies, have been proposed based on digital-based solutions. For video coding, the H.265/High-Efficiency Video Coding (HEVC) standard [2], which has been standardized by the Joint Collaborative Team on Video Coding, can be used to encode VR/360-degree videos. As the successor of HEVC, the future video coding standard, H.266/Versatile Video Coding (VVC), has been developed by the Joint Video Experts Team. The VVC standard takes camera-view video, high dynamic range video, and VR/360-degree video into account. In addition, video- and geometry-based point cloud coding [3, 4] have been standardized by the Motion Picture Experts Group for volumetric video encoding and decoding. For video transmission, the source bits are channel coded with time interleaving to exhibit robustness against a certain level of channel errors. The channel-coded bits are then mapped into the transmit data symbols corresponding to arbitrary modulation schemes, such as Binary Phase Shift Keying (BPSK), Quadrature Phase Shift Keying (QPSK), or Quadrature Amplitude Modulation (QAM). To choose an appropriate source and channel coding rate according to the user’s channel condition, the channel statistics are generally required to be known at the time of source and channel coding. Once both the source and channel coding processes are completed, the conventional systems work optimally only for a specific channel condition and have performance limitations in noisy time-varying channels [5, 6].
If the observed channel quality (i.e., the channel Signal-to-Noise Ratio (SNR)) falls below a threshold, the decoding process tends to break down completely. This phenomenon is called the cliff effect [7]. In contrast, if the observed channel quality increases beyond the threshold, it does not improve the performance unless an adaptive rate control of the source and channel coding is performed in real time according to the rapid fading channels. This phenomenon is known as the leveling effect.
Thus, accurate channel estimation and real-time rate control of the source and channel coding are desired for conventional streaming systems. However, the channel conditions of wireless and mobile networks may vary drastically and unpredictably, resulting in imperfect channel estimation and rate control. Conventional streaming systems tend to utilize the channel capacity conservatively to prevent cliff and leveling effects, considering the fact that rate control may be inaccurate.
Scalable Video Coding (SVC) [8] and Dynamic Adaptive Streaming over HTTP (DASH) [9, 10] are typical standardized systems that utilize channel capacity without cliff and leveling effects. SVC encodes the video frames into multiple layers to progressively improve the video quality according to the number of received layers. DASH encodes video frames at multiple quality levels and stores all encoded video frames on the server. The main difference between SVC and DASH is the sender-dependent and receiver-dependent schemes. SVC determines how many layers to send based on the estimated channel quality. DASH retrieves the appropriate quality of video frames from the server based on the estimated channel quality. However, the conservative strategy in SVC and DASH systems under imperfect channel estimation and rate control will cause quality degradation.
A new paradigm of wireless video delivery, namely soft delivery [11, 12, 13], has been proposed to fully utilize the instantaneous channel capacity without cliff and leveling effects.
In contrast to SVC and DASH systems, soft delivery does not require channel estimation to utilize the instantaneous channel capacity.
It is essentially a scheme with “lossless compression and lossy transmission.” The compression stage is solely a transform to decorrelate the image and video signals into frequency-domain coefficients, leaving out the conventional quantization and entropy coding. The transmission stage skips digital-based channel coding. Instead, it scales each transform coefficient individually and modulates it directly to a dense constellation for transmission inspired by the advantage of analog transmission with linear coding [5, 14, 15]. Here, the scaling operation serves the purposes of both power allocation and unequal signal protection against channel noises and fading effects to maximize the reconstruction quality. At the receiver end, the image and video signals are reconstructed by demodulating the received signals and inverting the scaling and transform operations. The soft delivery scheme was shown to not only provide a graceful performance transition in a wide channel SNR range but also achieve competitive performance compared with conventional digital-based delivery schemes.

1.1 Contributions

This work provides a comprehensive survey of soft delivery schemes, including an overview on existing techniques, extension for immersive experiences, and future research directions. Some existing studies have focused on soft delivery schemes, which are shown in Figure 1, with a brief description of the related topics and key techniques. Although there are some survey papers [9, 16, 17, 18, 19, 20] related to the video delivery over wireless channels, all of the survey papers deal with digital-based approaches. To the best of our knowledge, this survey is the first to introduce methodologies and approaches for soft and Hybrid Digital–Analog (HDA) delivery to transmit high-quality image and video signals via unstable and diverse wireless and mobile channel environments. The main contributions of this study are summarized as follows:
Fig. 1.
Fig. 1. Taxonomy of the studies on soft delivery schemes.
We present an overview of the conventional digital-based and soft delivery schemes, as well as the benefits of the soft delivery schemes.
The existing soft delivery techniques, such as energy compaction, power allocation, bandwidth utilization, overhead reduction, and packet loss resilience, are surveyed. In this context, the abstraction and key contributions of these techniques are reviewed and summarized.
We present the extensions of soft delivery for immersive experiences, which are mainly contributed by our prior research. We summarize the key ideas of the extensions and benefits in contrast to the digital-based delivery schemes.
In addition, we review the future research directions of soft delivery: HDA delivery, AI-empowered soft delivery, soft delivery for AI, and Tactile Internet. We carry out some evaluations to discuss the various directions.
This survey identifies that soft delivery works particularly well in untethered immersive experiences including free viewpoint video, VR, and point cloud. Soft delivery yields better reconstruction quality compared with HEVC-based delivery schemes by integrating the energy compaction, power allocation, and overhead reduction techniques discussed in existing studies. In addition, soft delivery can realize high-quality and adaptive delivery of Deep Neural Network (DNN) models and tactile information over wireless channels. Such delivery will help to realize future services including Federated Learning (FL) and untethered XR applications.

1.2 Survey Structure

The remainder of this article is organized as follows:
Section 2 describes an overview of conventional digital-based delivery schemes and their issues, such as the cliff, leveling, and staircase effects.
Section 3 presents the basic principles of the pioneering work on soft delivery to solve the aforementioned effects in wireless and mobile video streaming applications.
Section 4 presents a review of the existing techniques on soft delivery. We classify these techniques into energy compaction, power allocation, bandwidth utilization, packet loss resilience, overhead reduction, and extension for immersive experiences, and we discuss their implementation as well as their contributions.
Section 5 suggests the future directions of the soft delivery approach. In addition to image and video signals, the soft delivery approach has a potential to realize a high-quality DNN model and tactile delivery over wireless channels.
Section 6 concludes the article.

2 Conventional Digital-Based Delivery

2.1 Overview

One of the major issues in wireless video delivery is sending high-quality videos within the considerably limited capacity of wireless links. For this purpose, standardized digital video compression is carried out for video frames in conventional video delivery schemes [21, 22, 23], as shown in Figure 2(a) to remove redundancy among video frames. In particular, H.264/Advanced Video Coding (AVC) [24], H.265/HEVC, and H.266/VVC standards are typical video coding standards for generating a compressed bitstream from video frames. In such video encoders, pixel values in each video frame are divided into blocks and transformed into frequency domain coefficients by using the Discrete Cosine Transform (DCT) or discrete sine transform, and then non-uniformly quantizing the coefficients according to a quantization parameter. A large quantization parameter indicates a larger quantization step, leading to a smaller bit rate. Finally, the quantized coefficients are compressed by an entropy coder, which removes statistical redundancy in the coefficients. Variable length coding is widely deployed for entropy coding because of its efficiency and simplicity.
Fig. 2.
Fig. 2. Framework of the conventional digital-based delivery and soft delivery schemes.
After passing through the digital video compression, the bitstream is then passed to the wireless transmission system in sequence. There are typically two ways to deliver the bitstream: Internet Protocol (IP) and non-IP networks. For the IP networks, the sender uses DASH [9, 10] or Real-Time Transport Protocol [25] for the bitstream. Although the IP-based protocols can deliver the content via the deployed IP networks, they cause transmission delays to packetize and de-packetize the bitstream [26]. For the non-IP networks, the sender passes the bitstream to the PHY (physical layer) for transmissions. The existing studies related to the non-IP network schemes reported that such schemes reduce the delay compared with the IP-based schemes, and thus the non-IP schemes are applicable for low-delay applications including untethered XR experiences and cooperative and competitive gaming.
This survey mainly discusses the content delivery over non-IP networks. In such non-IP schemes, channel coding is first used for the bitstream to protect against channel errors. For example, binary convolutional codes and low-density parity checks are widely used as forward error correction in Wi-Fi systems. The coded bitstream is then mapped onto in-phase and quadrature (I and Q) components using digital modulation formats such, as quadrature phase shift keying and m-ary QAM. In both wireless and mobile networks, a combination of modulation formats and different channel coding rates, for example, 1/2 and 3/4, is defined in the Modulation and Coding Scheme (MCS). According to the measured wireless channel SNRs, the sender adapts its MCS value to maximize the link data rate. At the receiver end, bit errors may occur in channel-coded bits owing to effective noise and/or fading effects. The receiver then tries to reconstruct video frames from the received bits using inverse procedures (i.e., demodulation, channel decoding, and video decoding).

2.2 Critical Issues on Quality

If the measured wireless channel quality is stable during video transmission, conventional digital-based schemes can provide high-quality video frames for users. However, the channel quality of each user fluctuates over time owing to a combination of user mobility, multipath propagation, and obstacle shadow. Table 1 lists three critical issues regarding the video quality of the digital-based schemes because of the channel quality fluctuations: cliff, leveling, and staircase effects.
Table 1.
PhenomenonEffect on Video QualityCauseSolution in Soft Delivery
Cliff effectSudden degradationAll-or-nothing behavior in entropy and channel codingsSkip entropy and channel codings to prevent all-or-nothing behavior
Leveling effectConstant irrespective of channel qualityUnrecoverable quantization error in video codingSkip quantization and adopt pseudo-analog modulation for recoverable errors at the receiver end
Staircase effectStep function of channel qualityAll-or-nothing behavior and quantization error in layered codingSkip hierarchical operations and adopt pseudo-analog modulation for linear video quality
Table 1. Critical Issues Regarding Video Quality in Wireless and Mobile Video Streaming

2.2.1 Cliff Effect.

Digitally encoded bits are known to be susceptible to errors during wireless transmission. Because entropy coding schemes have an all-or-nothing behavior, even a single bit error can cause the loss of entire data [27]. As mentioned earlier, channel coding schemes are adopted to correct burst and random bit errors. However, they generally exhibit an all-or-nothing behavior for error correction. When the instantaneous channel quality (i.e., the SNR) falls below a certain threshold, possible errors that occur in the bitstream during wireless communications will disable video decoding.
A collapsed signal reconstruction causes a cliff effect. The cliff effect is a phenomenon whereby the quality of the received information abruptly drops as soon as the channel quality falls below the threshold, as shown in Figure 3(a). For example, the video quality of the BPSK modulation format with 1/2-rate channel coding drops below the wireless channel SNR of 4 dB. In modern network environments (e.g., content delivery, mobile, and wireless networks), the cliff effect becomes a major impediment when video frames are transmitted over diverse channel conditions to heterogeneous users. In this case, users whose channel quality is below the critical point receive unwatchable video frames.
Fig. 3.
Fig. 3. Video quality of the conventional digital-based schemes and SoftCast scheme via wireless networks [11]. (a) Cliff and leveling effects in H.264/AVC over 802.11 under different MCSs. (b) Staircase effect in two-layered video coding and three-layered video coding shown in red and blue, respectively. For reference, the dashed lines are the three-equivalent single-layer H.264/AVC videos. (c) Performance of SoftCast versus single-layer H.264/AVC.
Some solutions have addressed the cliff effect associated exclusively with channel coding, such as hybrid automatic repeat request, and rateless coding schemes [28, 29, 30, 31, 32]. They adapt the number of transmissions to changing channel conditions for error prevention. However, these schemes are not well suited for streaming multiple users under diverse channel conditions. In addition, they do not reduce the quantization error at the video encoder, and thus the leveling effect still occurs in video quality.

2.2.2 Leveling Effect.

Once the channel quality surpasses the threshold, the video quality remains constant, as shown in Figure 3(a). As mentioned earlier, the cliff effect is caused when the receiver SNR is below 4 dB in the BPSK modulation format with 1/2-rate channel coding, whereas the channel gain does not reflect on the video quality above the wireless channel SNR of 4 dB. Digital-based schemes determine the parameters of the video coding and wireless transmission parts based on the channel estimation. If the instantaneous channel quality is better than the estimated one, no additional gain can be obtained because the distortion of the video coding cannot be reconstructed for each user.

2.2.3 Staircase Effect.

To mitigate the cliff and leveling effects, some layered coding schemes, referred to as schemes with SVC [8] with a combination of Hierarchical Modulation (HM) [33], have been proposed for wireless and mobile video streaming [34, 35]. These layered coding schemes encode video frames into one Base Layer (BL) and several Enhancement Layers (ELs). The BL is used to ensure that all users in the target channel SNR range can receive the baseline quality of video frames, whereas the ELs are used to enhance the video quality of users in high-channel SNRs. Each SVC layer is then mapped onto the corresponding HM layer. Notably, HM provides unequal error protection to the transmitted video frames according to their relative importance. However, SVC with HM cannot completely remove the cliff effect; it only divides one big cliff into multiple stairs according to the number of layers, as shown in Figure 3(b). In addition, because the assigned transmission power to each layer is lower than that of the single-layer coding schemes, the cliff shifts to higher wireless channel SNRs.

3 SoftCast: A Pioneer Work On Soft Delivery

3.1 Overview

To prevent the cliff, leveling, and staircase effects in wireless video delivery, a pioneer soft delivery work, namely SoftCast, was proposed by Jakubczak et al. [11, 12, 13]. The block diagram of SoftCast is illustrated in Figure 2(b). The design of SoftCast is based on a simple principle that ensures the transmitted signal samples are linearly related to the original pixel values. This principle naturally enables a sender to satisfy multiple receivers with diverse channel qualities, as well as a single receiver, where different packets experience different channel qualities.
The sender first takes a Group of Pictures (GoP) and uses a full-frame 3D-DCT as the decorrelation transform. The DCT frames are then divided into N small rectangular blocks of transformed coefficients called chunks. The coefficients in each chunk are then scaled to match the transmission power constraints. Specifically, the scaling coefficients are chosen to minimize the reconstruction Mean Square Error (MSE). Walsh–Hadamard transform is then applied to the scaled chunks for power normalization across the chunks to provide packet loss resilience. This process transforms the chunks into slices. Each slice is a linear combination of all scaled chunks. Finally, the coefficients in the slices are directly mapped to the I and Q components in a pseudo-analog manner for transmission. Here, channel coding operations are skipped for the coefficients.
Figure 4(a) and (b) show the conventional digital-based modulation (i.e., 16-QAM) and pseudo-analog modulation proposed in SoftCast. Conventional modulation modulates channel-coded bits to produce real-value digital samples that are transmitted to the channel. For example, 16-QAM modulation takes sequences of four bits and maps each sequence to a complex I/Q number, as shown in Figure 4(a). After modulation, the wireless PHY of the sender transmits the mapped complex numbers to the receiver. Because of the broadcast nature of the wireless medium, multiple receivers hear the transmitted samples but with different noise levels. A receiver with a low channel SNR can distinguish only the quadrant of the transmitted sample and hence can decode only the two bits of the transmitted sample. In this case, these bit errors may cause a collapsed signal reconstruction during digital video decoding.
Fig. 4.
Fig. 4. Mapping coded video to I/Q components of the transmitted signal. (a) Traditional 16-QAM maps a bit sequence to the complex number corresponding to the point labeled with that sequence. (b) By contrast, the PHY of SoftCast treats pairs of coded values as the real and imaginary parts of a complex number. (c) We find that the modulation of SoftCast follows a Lorentzian distribution.
In contrast to the existing modulation design, SoftCast outputs the real values of the DCT coefficients that are already coded for error protection. The pseudo-analog modulation in Figure 4(b) directly maps pairs of the scaled DCT coefficients to the I and Q of the digital signal samples. Figure 4(c) shows the distribution of the analog-modulated symbols of the test video sequence of “Akiyo” with the resolution of common intermediate format [37]. As shown in Figure 4(c), we find that the pseudo-analog modulation of the DCT coefficients follows a bivariate Lorentzian distribution as follows:
\begin{equation} f(x,y) = a \frac{1}{\pi ^2\left(b^2+y^2+x^2+\frac{x^2y^2}{b^2}\right)}, \end{equation}
(1)
where a and b represent the fitting parameters. From least-squares fitting based on the distribution of pseudo-analog symbols of the test video sequence and the bivariate Lorentzian function, the best fitting parameters are a of 0.001 and b of 0.24. As mentioned earlier, multiple receivers hear the transmitted samples under different channel SNRs. Although the transmitted samples are distorted according to their SNR, the receiver regards the received samples as scaled DCT coefficients. The sender does not need to estimate the channel condition, and the noise level in the received samples faithfully reflects the instantaneous channel condition [38]. Thus, pseudo-analog modulation ensures that the received video quality is proportional to the instantaneous channel quality. Consequently, this process avoids all cliff, leveling, and staircase effects.
In parallel, SoftCast sends an amount of data, referred to as metadata, for signal reconstruction. These metadata consist of the mean and variance of each transmitted chunk as well as a bitmap. The mean of each chunk is used to obtain the chunks’ approximate zero-mean distributions by subtracting the mean of all pixels in each chunk [39]. The variance of each chunk is used to find the per-chunk scaling factors that minimize the reconstruction error. The bitmap indicates the positions of the discarded chunks into the GoP. When the available channel bandwidth for SoftCast is less than the required bandwidth, SoftCast discards chunks with less energy. Specifically, when the available and required bandwidths for SoftCast are M chunks and \(N (\gt M)\) chunks, respectively, SoftCast discards lower-energy \(M-N\) chunks to meet the bandwidth requirement. On the receiver side, these discarded chunks are replaced by null values. The discarded chunks are registered as a bitmap and then compressed using run-length encoding. Metadata are strongly protected and transmitted in a robust way (e.g., BPSK modulation format with a low-rate channel code) to ensure correct delivery and decoding.
At the receiver side, a Minimum Mean Square Error (MMSE) decoder is used to estimate the content of the chunks to counteract channel noise. The MMSE provides a high-quality estimate of the DCT coefficients by leveraging the knowledge of the statistics of the DCT coefficients (i.e., chunk variance) as well as the statistics of the channel noise. Using the metadata, the denoised chunks are properly reassembled and undergo an inverse 3D-DCT, thereby providing the corresponding GoP. In principle, the preceding design and performance do not affect the content types (i.e., on-demand or live content). Regardless of the content type, SoftCast provides adaptive video delivery based on the channel quality between the sender and each receiver.
Table 2 shows the strengths, limitations, and use cases of the digital-based and soft delivery schemes. The digital-based delivery schemes are well suited for point-to-point communications over time-invariant channels. In addition, the buffering cost is relatively low compared to the soft delivery schemes because the required storage size for the compressed bitstream is small. Soft delivery schemes perform well in time-varying and diverse channels. As well, soft delivery schemes are low delay because they do not need to perform an expensive motion search for compression, making soft delivery schemes preferable for delay-sensitive applications. However, they require a high cost for modulation and demodulation in the PHY because modifying the PHY in both sender and receiver is required for the pseudo-analog modulation.
Table 2.
SchemesChannel TypesCommunication TypesLatencyBuffering CostModem CostUse Cases
Digital-based deliveryTime invariantPoint-to-pointHighLowLowVideo streaming over wide area networks, HTTP adaptive streaming
Soft deliveryTime variantMulticast, BroadcastLowHighHighBroadcasting, V2X communication, Video surveillance
Table 2. Strengths, Limitations, and Use Cases for Digital-Based and Soft Delivery Schemes
In summary, the typical use cases of digital-based delivery schemes are streaming services over wide area networks and HTTP adaptive streaming. In contrast, soft delivery schemes are well suited for delay-sensitive applications such as video broadcasting, Vehicle-to-Everything (V2X) communication, and real-time video surveillance.

3.2 Details of Scaling and Inverse Scaling Operations

In SoftCast, a full-frame 3D-DCT is carried out for the video frames in each GoP to compact the energy of the video signals, and the resulting DCT coefficients are transmitted to the receivers using pseudo-analog modulation. Here, the transmitted analog-modulated symbols are degraded over wireless channels at each receiver. SoftCast should minimize the MSE between the received and transmitted DCT coefficients under the wireless channel SNR to reconstruct the highest-quality video frames at each receiver. For this purpose, SoftCast must design MSE-minimized power allocation and denoising filters (i.e., scaling and inverse scaling operations) for analog-modulated symbols.
Figure 5 illustrates the procedures for obtaining the reconstructed DCT coefficients at the receiver end. SoftCast implements chunk-wise power allocation and filter operations according to the statistics of the chunks and channel conditions. Let \(x_{i}\) denote the ith analog-modulated symbol. Each analog-modulated symbol is scaled by \(g_{i}\) for noise reduction:
\begin{equation} x_{i} = g_{i} \cdot s_{i}. \end{equation}
(2)
Here, \(s_{i}\) is the ith DCT coefficient and \(g_{i}\) is the scale factor for the coefficient power allocation. The sender performs optimal power control for \(g_i\) to achieve the highest video quality. Specifically, the best \(g_{i}\) is obtained by minimizing the MSE under the power constraint with the total power budget P as follows:
\begin{align} \min &\quad \mathsf {MSE} = \mathbb {E} \left[ \left(x_{i} - \hat{x}_{i}\right)^2\right] = \sum _{i}^{N} \frac{\sigma ^2 {\lambda }_{i}}{g_{i}^2{\lambda }_{i} + \sigma ^2} \\ \nonumber \nonumber \mathrm{s.t.} &\quad \frac{1}{N}\sum _{i}^{N} g_{i}^2{\lambda }_{i} = P, \end{align}
(3)
Fig. 5.
Fig. 5. Scaling and inverse scaling operations in soft delivery schemes.
where \(\mathbb {E}[\cdot ]\) denotes the expectation, \(\hat{x}_{i}\) is an estimate of the transmitted symbol, \({\lambda }_{i}\) is the power of the ith DCT coefficient, N is the number of DCT coefficients, and \(\sigma ^2\) is the receiver noise variance. The near-optimal solution is expressed as
\begin{equation} g_{i} = {\lambda }_{i}^{-1/4} \sqrt {\frac{P}{\sum _j{\lambda }_{j}}}. \end{equation}
(4)
After transmission over the wireless channel, each symbol at the receiver end can be modeled as \(y_{i} = x_{i} + n_i\), where \(y_{i}\) is the ith received symbol and \(n_{i}\) is an effective noise with a variance of \(\sigma ^2\). The receiver extracts DCT coefficients from the I and Q components and reconstructs the coefficients using the MMSE filter [11] as follows:
\begin{equation} \hat{s}_{i} = \frac{g_{i} {\lambda }_{i}^2}{g_{i}^2 {\lambda }_{i}^2 + \sigma ^2} \cdot y_{i}. \end{equation}
(5)
The receiver then obtains the corresponding video sequence using the inverse 3D-DCT for the filter output \(\hat{s}_i\).

4 Technical Solutions for Soft Delivery

Because SoftCast skips nonlinear digital-based encoding and decoding operations corresponding to motion estimation, quantization, and entropy coding, it realizes a linear quality improvement associated with channel quality improvement. In particular, SoftCast has shown outstanding performance compared with the conventional digital-based delivery schemes when receivers are highly diverse and/or the channel condition of each receiver varies drastically. Conversely, the design of SoftCast is simplistic, so there remains much scope for improvement in adopting soft delivery in practical scenarios, including stable channel conditions, band-limited, and/or error-prone environments. For this purpose, many studies have been conducted to improve the performance of soft delivery. The existing works on soft delivery schemes can be classified into seven types, as shown in Figure 1: energy compaction, optimal scaling, bandwidth utilization, resilience to packet loss, overhead reduction, hardware implementation, and extension for immersive experiences.

4.1 Energy Compaction of Source Signals

In soft delivery schemes via linear mapping (from source signals to channel signals), the reconstruction quality greatly depends on the performance of the energy compaction technique for the source signals. Specifically, the study by Prabhakaran et al. [40] clarified that the performance of soft delivery schemes degrades as the ratio of maximum energy to minimum energy of the source component increases. To yield better quality under both stable and unstable channel conditions, existing studies have adopted different energy compaction techniques listed in Table 3 for the source signals.
Table 3.
TechniquesFeaturesProsCons
2D-DCT/2D-DWTTake DCT/DWT operation for each video frameReduce spatial redundancyNo temporal filter
3D-DCTTake DCT operation for each GoPReduce both spatial and temporal redundancyWeak temporal filter
MCTFTake wavelet transform for temporal filteringFurther reduce temporal redundancyComputational cost for temporal filtering
Component protectionSend lower-frequency coefficients as metadataDistribute power to higher-frequency coefficientsCommunication overhead, Significant degradation due to metadata error
Layered operationDivide video frames into the BL and ELs and send them in digital and pseudo-analog ways, respectivelyProvide baseline quality via the BL while enhancing quality via the ELsELs will be meaningless if bit errors occur in the BL
Coset codingPartition coefficients into several cosets and transmit the coset residual codesBring lower entropy according to a coset stepAccuracy of coset step and side information is crucial for reconstruction
Table 3. Brief Introduction to Typical Energy Compaction Techniques for Soft Delivery Schemes
Typical solutions are to adopt wavelet-based signal decorrelation methods. Specifically, some studies [41, 42, 43, 44, 45, 46] have adopted a Motion-Compensated Temporal Filter (MCTF), which is a temporal wavelet transform method, to remove inter-frame redundancy by realizing motion compensation in soft delivery. The MCTF recursively decomposes video frames into low- and high-frequency frames according to a predefined level. For example, WaveCast [44] adopted a 3D-Discrete Wavelet Transform (DWT) (i.e., the integration of 2D-DWT and MCTF) to remove temporal and spatial redundancy. Although SoftCast exploits a full-frame 3D-DCT to remove the intra- and inter-frame redundancy for energy compaction, WaveCast can further improve the reconstruction quality by fully exploiting the inter-frame redundancy using motion compensation. A detailed discussion on the effects of other decorrelation methods is presented in the work of Xiong et al. [47, 48]. Trioux et al. [49] also utilized inter-frame redundancy by designing an adaptive GOP size mechanism. It adaptively controlled the GoP size based on shot changes and the spatio-temporal characteristics of the video frames. It then used a full-frame 3D-DCT for energy compaction across the video frames in one GoP.
Another typical solution is to send large energy coefficients as metadata and thus prevent the transmission of such coefficients using pseudo-analog modulation. Lin et al. [50] designed Advanced SoftCast (ASoftCast) to send low-frequency coefficients as the metadata. ASoftCast decomposed the original images into frequency components using 2D-DWT. The frequency component was then divided into two parts: the lowest-frequency sub-band and other sub-bands. The wavelet coefficients in the lowest-frequency sub-band are processed by run-length coding. They are then channel coded and digitally modulated for additional metadata transmissions. The optimized power allocation for the SoftCast scheme in the work of He et al. [51] selected and sent high-energy coefficients as the metadata to reduce the energy of the analog-modulated symbols. These results can assign a high transmission power to low-energy coefficients to improve the received quality. Here, determining the high-energy coefficients for each GoP is computationally complex owing to the use of an exhaustive search. To reduce the computational complexity, Trioux et al. [52] adopted a zigzag scan to select the side information. Other studies [53, 54, 55, 56, 57, 58] divided the video into BL and ELs, which were coded and sent in digital and pseudo-analog ways, respectively. For example, the BL in gradient-based image SoftCast (G-Cast) [57] sent the DC and low-frequency coefficients of the image, whereas the EL extracted and sent an image gradient, which represents the edge portion of the image, using a gradient transform. The receiver then created a final estimation of the image via a gradient-based reconstruction procedure, utilizing both the image gradient at the EL and the low-frequency coefficients provided by the BL.
Other solutions adopted a nonlinear encoder and decoder for source signals to decrease the ratio of the maximum to the minimum energy of the analog-modulated symbols. The typical solution for soft delivery is to introduce coset coding [59, 60], which is a typical technique in distributed source coding. Coset coding partitions the set of possible source values into several cosets and transmits the coset residual codes to the receiver. With the received coset codes and the predictor, the receiver can recover the source value in the coset by choosing the one closest to the predictor. DCast [61, 62, 63, 64] first introduced coset coding for the soft delivery of inter frames. The coset coding in DCast divides each frequency domain coefficient \(s_i\) by a coset step q and obtains the coset residual code \(l_i\) as \(l_i = s_i - \lfloor \frac{s_i}{q} + \frac{1}{2} \rfloor q\), where \(\lfloor \frac{s_i}{q} + \frac{1}{2} \rfloor\) represents the coset index. At this time, the sender only needs to transmit the coset residual code for energy compaction. At the user side, with the received coset residual code \(\hat{l}_i\) and the side information \(\bar{s}_i\) (i.e., the predicted DCT coefficient obtained from the reference video frame), the receiver reconstructs the DCT coefficients by coset decoding. Given the coset residual code \(l_i\), there are multiple possible reconstructions of \(s_i\) that form a coset \(C = \lbrace \hat{l}_i, \hat{l}_i \pm q, \hat{l}_i \pm 2q, \hat{l}_i \pm 3q, \ldots \rbrace\). DCast then selects the coset C that is nearest to the side information \(\bar{s}_i\) as the reconstruction of the DCT coefficient. In this case, the value of each coset step q is crucial for the coding performance of DCast. The value of q is calculated by estimating the noise at the receiver end shown in the work of Fan et al. [63, 64]. However, the reconstruction quality of DCast also depends on the side information quality. If the side information \(\bar{s}_i\) is error prone, the receiver may make wrong decisions with a smaller q. Huang et al. [65] introduced a side information refinement algorithm [66] to refine the side information for the quality enhancement of DCast.
The concept of coset coding has been widely applied in other studies on soft delivery for the same purpose. For example, several works [67, 68, 69, 70, 71] utilized pseudo-coset coding for lower-frequency components and sent the coset index using the digital framework. Here, the residuals in the lowest-frequency components and other frequency components are sent using pseudo-analog modulation. The main difference between coset coding and pseudo-coset coding is the sending of the coset index as additional metadata. The layered coset coding and adaptive coset coding were applied to the soft delivery scheme in the work of Fan et al. [69] and Lv et al. [70], respectively. LayerCast [69] introduced layered coset coding to simultaneously accommodate heterogeneous users with diverse SNRs and bandwidths. The layered coset coding used large to small coset steps to obtain coarse to fine layers from each chunk. The coarse layer (i.e., BL) is sufficient to reconstruct a low-quality DCT chunk for narrowband users, whereas each fine layer (i.e., EL) provides refinement information of the DCT chunk for wideband users. Some works [72, 73, 74] utilized the coset coding for cooperative soft delivery systems (i.e., a three-node relay network). A sender broadcasts the DCT coefficients obtained from the video frames using pseudo-analog modulation to the relay node and the destination node. If the channel quality between the sender and the destination node is higher than a threshold, the destination node reconstructs the video frames from the soft-delivered DCT coefficients. If the channel condition is lower than the threshold, the relay node sends the coset residual code to the destination node, then the destination node reconstructs the video frames using the received coset residual code and the side information obtained from the softly delivered DCT coefficients from the sender.

4.2 Channel-Aware and Perception-Aware Power Allocation

As mentioned in Section 3.2, the power allocation in SoftCast minimizes the MSE between the original and reconstructed video signals over Additive White Gaussian Noise (AWGN) channels. There are several drawbacks toward adopting SoftCast in practical scenarios: (1) practical wireless channels have more complex characteristics (e.g., fading caused by multipath and impulse noise) than the AWGN channels, and (2) MSE is not an effective index for describing the perceptual fidelity of images/videos. To address the drawbacks related to the power allocation, the existing studies in Table 4 propose the power allocation for practical wireless channels and perceptual considerations.
Table 4.
CategoryPapersChannel ConsiderationOptimization Metric
 [11]AWGNMSE
 [75]FadingMSE
 [76, 77]OFDMMSE
 [78, 79, 80]MIMOMSE
Channel-aware power allocation[81, 82, 83]MIMO-OFDMMSE
[84]Impulse noiseMSE
 [85, 86]NOMAMSE
 [87]Underwater acoustic networksMSE
 [88]UAV-enabled networksMSE
 [89]mmWave lens MIMOMSE
 [90]AWGNSSIM
 [91]AWGN and MIMOFWD
Perception-aware power allocation[92]AWGNEQMSE
 [93]AWGNForeground and background distortions
Table 4. Overview of Power Allocation Techniques for Soft Delivery Schemes
For the first drawback, the existing studies redesigned the power allocation for practical wireless channels, including fading [75] and frequency-selective fading (i.e., Orthogonal Frequency-Division Multiplexing (OFDM)) [76, 77], impulse noise [84], Multiple-Input and Multiple-Output (MIMO) [78, 79, 80], and MIMO-OFDM channels [81, 82, 83]. Cui et al. [75] designed an optimal power allocation for fading channels. In fading channels, a fading effect (i.e., multiplicative noise) will degrade the reconstruction quality. Although SoftCast assumes that multiplicative noise can be canceled with exact channel estimation at the receiver end, no algorithm can guarantee an error-free channel estimation. In addition to the power allocation design, the authors analyzed the effect of the channel estimation error on the reconstruction quality at the receiver end.
For frequency-selective fading channels, such as OFDM and MIMO-OFDM channels, the key issue is how to match the analog-modulated symbols to the independent subcarriers/subchannels for high-quality image/video reconstruction. Liu et al. [81, 82] observed similarities between the source and channel characteristics and exploited the similarities for subcarrier/subchannel matching. ParCast [81] and the extended version of ParCast\(+\) [82] assigned the more important DCT coefficients to higher gain channel components and allocated power weights for each DCT coefficient with joint consideration of the source and channel for video unicast systems. ECast [83] extended the source and channel matching and power allocation for video multicast systems. For multicast systems, it is necessary to deal with the large overhead of channel feedback from multiple receivers. In ECast, multiple users simultaneously send tone signals for the channel feedback, and the sender receives the superposition of multiple tone signals. Although the sender cannot distinguish each of the channel gains, the weighted harmonic means of channel gains can be obtained from the superposed tone signals. Thus, ECast utilizes the channel gain for the source and channel matching and power allocation.
Other studies solved power allocation problems in modern wireless systems, including Non-Orthogonal Multiple Access (NOMA) [85, 86], underwater acoustic OFDM [87], Unmanned Aerial Vehicle (UAV)-enabled [88], and mmWave lens MIMO systems [89]. For example, in NOMA systems, source signals are coded into the BL and ELs and then transmitted simultaneously through superposition coding. With successive interference cancellation, near users with strong channel gains can decode both BL and EL signals, whereas far users with weak channel gains may only decode BL signals. In the existing studies, both the BL and ELs are analog coded in the work of Jiang et al. [85], whereas the BL and ELs are digital coded and analog coded, respectively, in the work of Wu et al. [86]. They solved the power allocation across the BL and ELs to minimize the distortion for all receivers with heterogeneous channel conditions. In underwater acoustic OFDM [87] and mmWave lens MIMO systems [89], the error behavior differed substantially across channel components, and the channel characteristics showed a similar tendency. They solved the source and channel matching and power allocation problems, which are also discussed in frequency-selective fading channels, to minimize the distortion at the receiver end.
For the second drawback, some studies [90, 91, 92, 93] also redesigned the power allocation with perceptual considerations, including Structural Similarity (SSIM) [90], foveation [91], and saliency [92]. In these studies, determining the perception-aware weights for each source component is challenging. Specifically, in SoftCast, the scaling factor for each coefficient is obtained from its power information to minimize the MSE: \(g_i \varpropto\lambda _i^{-1/4}\). These studies considered the perception-aware weight for the ith coefficient \(w_i\) in the scaling factor to minimize the perceptual distortion as \(g_i \∝_i^{1/4} \lambda _i^{-1/4}\). For this purpose, Zhao et al. [90] demonstrated the relationship between the MSE in the DCT coefficients and the SSIM distortion to obtain the weight for the ith DCT coefficients of all chunks \(w_i\). They found that the weight for the high-frequency coefficients was larger than that for the low-frequency coefficients, which was consistent with the characteristics of the Human Visual System (HVS). FoveaCast [91] introduced the foveation-based HVS [94] and the corresponding HVS-based visual perceptual quality metric, called Foveated Weighted Distortion (FWD), for the optimization objective. For a given foveation point \((f_x, f_y)\) in the pixel and frequency domains, the error sensitivity for each pixel/frequency coefficient at location \((x, y)\) can be defined in the foveation-based HVS. FoveaCast regarded the error sensitivity in the DWT domains as the weight \(w_i\) and performed foveation-aware power allocation. In the work of Hadizadeh [92], visual saliency maps were introduced for the perception-aware power allocation. Saliency maps represent the attended regions in an image when a user watches the image owing to the visual attention mechanism of the human brain. In this case, the weight for the ith pixel \(w_i\) is based on the normalized visual saliency defined from any arbitrary visual saliency model, such as the Itti–Koch–Niebur model [95]. Based on the weight, it allocates considerable transmission power to salient regions to minimize the Eye-Tracking Weighted Mean Square Error (EQMSE).

4.3 Bandwidth Utilization

The source bandwidth of soft delivery schemes depends on the number of transmitted analog-modulated symbols every second (i.e., baud rate). In the aforementioned designs, the source bandwidth is mainly considered sufficient to send all transmitted non-zero analog-modulated symbols over the wireless medium. However, when the channel bandwidth is lower than the source bandwidth, some analog-modulated symbols are discarded at the receiver side. Hence, the loss of the important coefficients (i.e., the low-frequency coefficients) may have a significant impact on the reconstruction quality. Specifically, the expected distortions in soft delivery schemes for single and multiple content owing to the bandwidth constraint under the transmission power constraint are discussed in the work of Liu et al. [111] and He et al. [112, 113], respectively. Some existing studies have adopted different techniques listed in Table 5 to meet the bandwidth constraint. The typical method is to selectively discard the chunks in higher-frequency components to fill the bandwidth [11, 96]. When the sender discards some chunks, the receiver regards all coefficients in the discarded chunks as zeros. Because it needs to send the locations of the discarded chunks to the receiver, SoftCast sends the location information as a bitmap. Although SoftCast assumes equal-size chunks across low- to high-frequency components, Li et al. [96] adopted smaller chunk sizes in high-frequency components to realize a fine-grained control to meet the bandwidth limitation. Another study [97] used bandwidth-reducing Shannon–Kotelnikov (SK) mappings to increase the number of chunks transmitted over bandwidth-constrained channels. The SK mappings are typical N:1 bandwidth-reducing or 1:M bandwidth-expanding non-linear mappings. In this study, 2:1 SK mappings were used to encode several pairs of chunks with less energy to send more chunks with medium energy within the channel bandwidth.
Table 5.
PapersTechniqueProsCons
[11]Discarding low-energy chunksSimple algorithmAdditional metadata for the location of discarded chunks and coefficients
[96]Adaptive chunk divisionFully utilize available bandwidth by discarding small high-frequency chunksImproper power allocation in low-frequency chunks
[97]SK mappingHigh reconstruction quality in middle and high channel SNRsLow reconstruction quality in a low SNR regime
[98, 99, 100, 101, 102, 103, 104, 105]Compressive sensingRecover discarded coefficients using a reconstruction algorithmComputational cost for the algorithm
[106, 107, 108, 109, 110]Data-assisted communicationReduce traffic by utilizing related images in a cloudSame or correlated images are available in a cloud
Table 5. Brief Introduction to the Existing Soft Delivery Techniques for Band-Limited Channels
Other studies [98, 99, 100, 101, 102, 103, 104, 105] introduced Compressive Sensing (CS) techniques [114, 115] for soft delivery over bandwidth-constrained wireless channels. Notably, CS is a sampling paradigm that allows the simultaneous measurement and compression of signals that are sparse or compressible in some domains. In general, recovering source signals from compressed signals is impossible because the system is underdetermined. However, if the source signals are sufficiently sparse in some domains, CS theory indicates that the source signals can be reconstructed from the compressed signals by solving the \(\ell _1\) minimization problem. The advantage of CS-based soft delivery is the recovery of chunks in high-frequency coefficients using CS-based signal reconstruction algorithms, such as approximate message passing and iterative thresholding, even though the chunks are discarded at the sender’s end. For high-quality reconstruction, adaptive rate control and reconstruction algorithms are mainly adopted for CS-based soft delivery. For instance, Yami and Hadizadeh [100] adaptively controlled the compression rate based on visual attention (i.e., both the texture complexity and visual saliency) to satisfy the bandwidth constraint while maintaining better perceptual quality. Liu et al. [104] adaptively selected reliable columns from the measurement matrix and compressed source signals using the selected columns. In view of the reconstruction algorithm, Hadizadeh and Bajic [101] designed an adaptive transform for noisy measurement signals to obtain sparser transform coefficients for clean reconstruction. Yin et al. [102] and Tung and Gunduz [103] designed grouping methods for measurement signals to utilize the similarity between video frames for the reconstruction.
Other studies utilized stored images/videos on the cloud to reduce the bandwidth requirement in soft delivery. Specifically, data-assisted communications of mobile images (DAC-Mobi) [106], data-assisted cloud radio access network (DaC-RAN) [107], and knowledge-enhanced mobile video broadcasting (KMV-Cast) schemes [108, 109, 110], which are referred to as data-assisted soft delivery schemes, have been proposed for high-quality image/video transmission. The main contributions of the data-assisted soft delivery schemes are (1) a sender sends a limited number of analog-modulated symbols and (2) the receiver reconstructs images/videos using correlated images (i.e., side information) obtained from a cloud.
In DAC-Mobi [106], successive coset encoders were introduced to divide the DCT coefficients into three layers of bit planes: Most Significant Bits (MSBs) in low-frequency coefficients, MSBs in other frequency coefficients and middle bits, and Least Significant Bits (LSBs). Here, MSBs in low-frequency coefficients and LSBs were transmitted to the receiver in digital and pseudo-analog manners, respectively, whereas MSBs in other frequency coefficients and middle bits were discarded. Based on the received MSB in the low-frequency coefficients, the receiver reconstructs a down-sampled image to retrieve correlated images in the cloud. The retrieved correlated images were used as side information to resolve ambiguity due to discarded bits and reconstruct the entire image. DaC-RAN [107] and the extended version of KMV-Cast [108, 109, 110] adopted Bayesian reconstruction algorithms that utilize correlated images/videos in the cloud as prior information to reduce the required bandwidth for soft delivery. The main difference between the DaC-RAN and KMV-Cast schemes is that the former assumes that the same images/videos exist in the cloud, whereas the latter does not require that the same images/videos exist at the receiver end by designing prior knowledge broadcasting in a digital manner.
The aforementioned studies considered the channel bandwidth to be lower than the source bandwidth. If the channel bandwidth is greater than the source bandwidth, the soft delivery schemes become less efficient. In this case, the soft delivery schemes utilize the extra bandwidth by retransmission. Lin et al. [116] and Tan et al. [117] designed an analog channel coding to use the extra channel bandwidth for quality enhancement. For example, Tan et al. [117] proposed a chaotic function-based analog encoding [118] for soft delivery. Although the existing chaotic function-based analog coding is designed for uniformly distributed sources, the analog coding for Gaussian distributed sources significantly amplifies source signals and thus consumes unnecessary transmission power. They designed a chaotic map function for Gaussian distributed source signals to prevent power increments compared to the input power. MCast [119] also utilized extra bandwidth for quality improvement. As mentioned earlier, the sender can send the source data multiple times if an extra bandwidth is available. In this case, the utilization of extra time slots for quality improvement is a key issue. To overcome this issue, MCast optimized the assignment of the chunks of the DCT coefficients to available channels in multiple time slots to fully exploit the time and frequency diversities.
In contrast to the aforementioned studies, Lan et al. [120] and He et al. [121] dealt with bandwidth variations. When the available bandwidth is less than the expected bandwidth at the sender’s end, some important chunks will not have the opportunity to be transmitted before the playback deadline. They grouped several chunks into a tile and sent the tile with a large variance and high priority to dispatch important coefficients before the playback deadline.

4.4 Packet Loss Resilience

Even when the channel bandwidth is sufficient to send all non-zero analog-modulated symbols, some analog-modulated symbols can be discarded at the receiver side owing to loss-prone wireless channels. Specifically, the packet loss owing to strong fading and interference may have a significant impact on the reconstruction quality if important chunks and coefficients are lost. SoftCast used Walsh–Hadamard transform to redistribute the energy of the source signals across whole packets for resilience against packet loss. However, each packet still contains a large amount of energy, and thus degradation owing to packet losses remains considerable.
To maintain better reconstruction quality in error-prone wireless channels, some related studies [122, 123, 124] have introduced CS techniques (i.e., block-wise CS [125]) for packet loss resilience. The CS technique is suitable for wireless transmission with random packet loss owing to its random measurement. Random measurement considers all packets as of equal importance. In contrast to typical CS techniques, block-wise CS can reduce the storage and computational costs of the reconstruction. A pioneering work on packet loss resilience is the distributed compressed sensing-based multicast scheme (DCS-Cast) [122]. In DCS-Cast, each image is first divided into blocks and the coefficients in each block are randomized using the same measurement matrix across the blocks. One coefficient in every block is packetized to normalize the importance across packets. Even though some packets may be lost over loss-prone wireless channels, the receiver obtains noisy pixel values using the same measurement matrix at the sender and reconstructs the lost pixel values using the CS reconstruction algorithm in the DCT/DWT domains. Because the lost pixel values can be recovered from the reconstruction algorithm, DCS-Cast maintains high image/video quality in loss-prone channels. To further improve the reconstruction quality, multi-scale [123] and adaptive [124] block-wise CS algorithms have been adopted for soft delivery. The multi-scale block-wise CS algorithm [123] decomposes each video frame into a multi-level 2D-DWT and then optimizes the sampling rate for each DWT level according to its importance. However, the adaptive block-wise CS algorithm [124] divides several video frames into one reference frame and subsequent non-reference frames and adaptively determines whether direct or predictive sampling should be used for each block in a non-reference frame. Direct sampling randomizes the signals in the block, whereas predictive sampling calculates the residuals between the blocks in the reference and non-reference frames and randomizes residuals to utilize the inter-frame similarity for the reconstruction.

4.5 Overhead Reduction

In soft delivery schemes without chunk division, a sender needs to let the receiver know the power information of all the DCT coefficients to demodulate the signals. For the receiver to carry out the MMSE filtering in Equation (5), the sender needs to transmit \({\lambda }_{i}\) of all coefficients without errors as metadata, which may constitute a large overhead. For example, when the sender transmits eight video frames with a resolution of \(352\times 288\), the sender needs to transmit metadata for all DCT coefficients (i.e., \(352\times 288\times 8 = 811{,}008\) variables in total) to the receiver. This overhead may induce performance degradation owing to the rate and power losses in the transmission of analog-modulated symbols. To reduce the overhead, SoftCast divides the DCT coefficients into chunks and carries out chunk-wise power allocation using an MMSE filter. However, overheads are still high, and chunk division causes performance degradation due to improper power allocation.
To achieve better quality under a low overhead requirement, the related studies can be classified into two types, as shown in Figure 6: sender-side overhead reduction and receiver-side overhead reduction. Studies on the sender-side overhead reduction [126, 127, 128, 129] designed fitting functions to obtain the power information with fewer parameters. In this case, the sender and receiver share the same fitting function in advance and send the parameters as metadata for overhead reduction. Specifically, Song et al. [126] designed a fitting function with four parameters for each chunk, and Xiong et al. [127] designed a log-linear function with two parameters for each chunk. Another study [128] found that equal-size chunk division was not suitable for chunk-wise fitting, and thus an adaptive chunk division (i.e., L-shaped chunk division) was designed for an accurate fitting. In addition, Fujihashi et al. [129] exploited a Lorentzian fitting function with seven parameters based on a Gaussian Markov Random Field (GMRF) for each GoP. The sender-side studies accurately predict the metadata using the fitting function with a limited number of the parameters (i.e., low overhead), whereas the fitting function causes an additional computational cost.
Fig. 6.
Fig. 6. Block diagram of the sender-side and receiver-side overhead reduction methods. (a) Sender utilizes fitting functions to obtain power information with fewer parameters. (b) Receiver estimates the power information only from the received symbols.
Studies on receiver-side overhead reduction [130, 131] estimate the power information only from the received signals without any additional computational cost at the sender side. The study of Li et al. [130] is a pioneer work to estimate the power information from the received signals. Blind data detection [131] was proposed to decode the received analog-modulated symbols without the power information at the receiver. Specifically, blind data detection uses a zero-forcing estimator and the sign of the received signals to approximate the source signals. One typical issue of the receiver-side overhead reduction is that the reconstruction quality highly depends on the quality of the received signals.
We note that both types of overhead reduction cause quality degradation owing to estimation errors. In the work of Xiong et al. [127], the effect of modeling accuracy on the reconstruction quality in soft delivery was analyzed.

4.6 Implementation

The aforementioned studies mainly discussed performance improvements in theoretical analyses and simulations. Table 6 lists the categories of the existing studies in terms of the performance evaluation. Existing studies [41, 45, 81, 82, 121, 134] used the software-defined radio platform SORA [141] to carry out emulations. In contrast to the simulations, the emulations obtain channel fading and noise trace from SORA to evaluate the performance under real wireless environments.
Table 6.
EvaluationPapers
Simulations[11, 12, 38, 39, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 61, 62, 63, 64, 65, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 96, 97, 98, 99, 100, 101, 102, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 116, 119, 120, 122, 123, 124, 126, 127, 128, 129, 130, 131, 132, 133]
Emulations[41, 45, 81, 82, 121, 134]
ExperimentsUSRP [13, 103, 135], SOUP [117], Xilinx Virtex7 [136, 137, 138], MU-MIMO prototype [139], LTE prototype [140]
Table 6. Categories of the Existing Soft Delivery Schemes in Terms of Performance Evaluation
Some studies implemented a soft delivery scheme on a software-defined radio platform [13, 103, 117, 135] and an Field-Programmable Gate Array (FPGA) [136, 137, 138] to empirically demonstrate the benefits of soft delivery in practical wireless channels. In some works [13, 103, 135], the authors used Universal Software Radio Peripheral (USRP) 2, USRP NI2900, and USRP X310 and GNU Radio for implementation and evaluated the visual quality of soft delivery, respectively. In addition, Tan et al. [117] built an experimental system based on the OpenAirInterface (OAI) platform and self-developed Software Universal Platform (SOUP). They migrated OAI to self-developed SOUP software-defined radio and implemented the proposed scheme based on OAI eMBMS codes. Conversely, other works [136, 137, 138] exploited the Xilinx Virtex7 FPGA for implementation and tested the reconstruction quality as a function of wireless channel SNRs.
Other studies [139, 140] implemented soft delivery on the prototypes of multi-user MIMO (MU-MIMO) and Long-Term Evolution (LTE) systems. For example, in the work of Chen et al. [139], SoftCast is implemented on BUSH, which is a large-scale MU-MIMO prototype that performs scalable beam user selection with hybrid beamforming for phased-array antennas in legacy WLANs. They performed experiments to evaluate the video quality in terms of Peak Signal-to-Noise Ratio (PSNR) and SSIM over a lossy MU-MIMO channel.

4.7 Extension for Immersive Experiences

SoftCast and other soft delivery schemes mentioned in the previous sections were designed for conventional images and video signals. In modern wireless and mobile communication scenarios, the streaming of immersive content will be a key application for reconstructing 3D perceptual scenes that provide full parallax and depth information for human eyes. The immersive content can be applied to various applications, such as 3 to 6 degrees-of-freedom entertainment, remote device operation, medical imaging, vehicular perception, VR/AR/MR, and simulated training. The typical immersive content includes free viewpoint video [142, 143, 144], 360-degree video, and point cloud [145], and Table 7 lists the features. Even in immersive content, the video frames are compressed in a digital manner, and the compressed bitstream is then channel coded and modulated in sequence. This means that cliff and leveling effects still occur in the streaming of the immersive content owing to the variation in the channel conditions. To prevent cliff, leveling, and staircase effects, some studies have extended soft delivery schemes toward immersive content for future wireless multimedia services.
Table 7.
ContentAcquisitionDisplayKey Issues
Free viewpoint videoLarge number of closely spaced RGB and infrared camera arraysSynthesize virtual cameras using rendering and freely switch the viewing cameraResource allocation for each RGB and infrared camera to maximize viewing quality
360-degree video360-degree cameraPlayback viewport through a VR headsetPredict future viewport and allocate resource to the viewport for quality maximization
Point cloudLaser scannerPlayback 3D points through AR and MR headsets and holographic displayCompress and send numerous and irregular structure of 3D points
Table 7. Typical Immersive Content and Its Features
One of the key advantages of soft delivery schemes for immersive content is that they simplify the optimization problem for image and video quality maximization. In immersive content delivery, the main issue is to maximize the image and video quality considering the user’s perspective. For example, the view synthesis distortion optimization problem and the viewport optimization problem should be solved in free viewpoint video and 360-degree video, respectively. In digital-based delivery schemes, a sender needs to find the best bit and transmission power allocation for video frames. However, it is often cumbersome to derive a solution. Soft delivery schemes simplify the optimization problems by reformulating them into a simple power allocation problem since bit allocation for quantization is not required in soft delivery schemes.

4.7.1 Free Viewpoint Video.

Free viewpoint videos enable us to observe a 3D scene from freely switchable angles/viewpoints. A free viewpoint video consists of numerous closely spaced RGB and infrared camera arrays to capture the texture and depth frames of a 3D scene, such as a football game. Even though the number of deployed cameras in the field is limited owing to physical constraints, the receiver can synthesize intermediate virtual viewpoints using rendering techniques (e.g., depth image-based rendering [146, 147]) to obtain numerous switchable viewpoints. To synthesize intermediate virtual viewpoints using the rendering technique, the sender encodes and transmits the texture and depth frames of two or more adjacent viewpoints, the format of which is known as Multi-View plus Depth (MVD) [148].
For conventional MVD video streaming over wireless links, digital video compression for MVD video frames (e.g., MVC+D [149] or 3D-AVC [150]) fully utilizes the redundancy between the cameras and texture-depth for compression. In this case, the streaming schemes need to solve view synthesis problems in addition to cliff and leveling effects to yield better video quality even in the synthesized virtual viewpoints. Specifically, the video quality of the virtual viewpoint is determined by the distortion of each texture and depth frame. In digital-based MVD schemes, the distortion depends on the bit and power assignments for each texture and depth frame. It is often cumbersome to achieve the best quality at a target virtual viewpoint using parameter optimization owing to the combinatorial problem with nonlinear quantization.
Some studies [151, 152, 153, 154, 155] designed a soft delivery scheme for a free viewpoint video. Specifically, FreeCast [151, 152] is the first scheme for a free viewpoint video. Because MVD video frames have redundancy of cameras and texture-depth, FreeCast jointly transforms texture and depth frames using 5D-DCT to exploit inter-view and texture-depth correlations for energy compaction. In addition, FreeCast can simplify the optimization problems of view synthesis by reformulating it into a simple power assignment problem. This is because bit allocation (i.e., quantization) is not required in FreeCast. The authors found that the power assignment problem for the texture and depth frames can be solved using a quadratic function to yield the best quality at the desired virtual viewpoint. Furthermore, FreeCast introduces a fitting function obtained from multi-dimensional GMRF at the sender and the receiver to obtain the power information with few parameters for the overhead reduction.
3DV SoftCast [154] focused on the view synthesis problems under the 3D-DCT operations for each camera’s texture and depth frames, and designed the power allocation method to solve the problem. The main differences between 3DV SoftCast and FreeCast are that 3DV SoftCast performs 3D-DCT for each camera and controls the transmission power to minimize the view synthesis distortion, whereas FreeCast performs 5D-DCT for better energy compaction of analog-modulated symbols and multi-dimensional GMRF-based overhead reduction for reconstructing high-quality MVD frames in band-limited environments. Yang et al. [155] designed a soft delivery scheme for depth video. They found that a block-based DCT performs well on depth video compared to a full-frame DCT because depth video has different characteristics from texture video. Although a different soft delivery scheme is required for texture video, low-distortion depth video in the work of Yang et al. [155] can provide better virtual viewpoint quality in free viewpoint video.

4.7.2 360-Degree Video.

Notably, 360-degree video content builds a synthetic virtual environment to mimic the real world with which the users interact. Each user can watch 360-degree videos through a traditional computer-supported VR headset or an all-in-one headset (e.g., Oculus Go). When the user requests the 360-degree video, the sender sends the 360-degree video frames, and the user may play a part of the 360-degree video frames, which is referred to as the viewport, through the user’s headset. Here, 360-degree videos are mainly captured by an omnidirectional camera or a combination of multiple cameras and saved in a spherical format. Before transmissions, the sphere frames are mapped onto the 2D plane using a certain projection method (e.g., equirectangular and cube map projections).
In 360-degree video streaming, the major issue is to yield better video quality in the user’s viewport by effectively reducing perceptual redundancy within 360-degree video frames. Because each user only watches the viewport via the headset at each time instance, excessive video traffic is created if the sender sends the full resolution of the 2D-projected video frames with an identical quantization parameter. One of the simplest methods to reduce perceptual redundancy is viewport-only streaming [156]. In video playback, the user may move a viewing viewport according to the user’s head/eye movement. Based on the movement, the user requests a new viewport from the sender, and the sender sends back the corresponding viewport. Because the sender transmits one viewport at each time instant, viewport-only streaming can mitigate the video traffic. However, the user needs to receive a new viewport from the sender in every viewport switching, which causes a long switching delay. A long switching delay (i.e., approximately 10 ms) may cause simulator sickness [157]. Owing to a long delay in the standard Internet, it is difficult for viewport-only streaming schemes to satisfy the switching delay requirements. To prevent simulator sickness, conventional schemes [158] divide 360-degree video frames into multiple tiles and independently encode them with different quantization parameters to yield better viewport quality within the bandwidth constraint.
Studies [159, 160, 161, 162] on soft delivery schemes focus on the quality optimization of the user’s viewport in addition to cliff and leveling effect prevention. Fujihashi et al. [159] presented the first scheme for viewport-aware soft 360-degree video delivery. According to the viewing viewport, the sender first adopts pixel-wise power allocation to reduce the perceptual redundancy in 360-degree video frames and then carries out the combination of 1D-DCT and spherical wavelet transform for decorrelation to utilize the redundancy in the sphere and time domains. In the work of Zhao et al. [160], OmniCast further considers the features of 360-degree videos into quality optimization. Specifically, the authors analyze the relationship of the distortion between the spherical and projected 2D domains as the spherical distortion for each projection method, and design power allocation to realize the optimal quality in the 2D-projected 360-degree videos. 360Cast [161] and the extended version 360Cast\(+\) [162] adopt viewport prediction based on linear regression and foveation-aware power allocation within the predicted viewport to further reduce the perceptual redundancy. They evaluate 360Cast\(+\) with the existing digital-based schemes in terms of weighted-to-spherically uniform PSNR [163]. Here, the digital-based schemes use HEVC Test Model 16.20 [164] and the modulation format of BPSK. They found that 360Cast\(+\) improves the average weighted-to-spherically uniform PSNR performance compared with the digital-based schemes by preventing the cliff effect at low SNR regimes and gradually improving the received video quality with the improvement of the wireless channel quality.

4.7.3 Point Cloud.

Volumetric content delivery provides highly immersive experiences for users through XR devices. The point cloud [145] is arguably the most popular volumetric data structure for representing 3D scenes and objects on holographic displays [165, 166]. A point cloud typically consists of a set of 3D points, and each point is defined by 3D coordinates (i.e., (X, Y, Z)) and color attributes (i.e., (R, G, B)). In contrast to conventional 2D images and videos, 3D point cloud data are neither well aligned nor uniformly distributed in space.
The major challenge in volumetric delivery over wireless channels is how to efficiently compress and send numerous and irregular structures of the 3D point cloud within a limited bandwidth. Some compression methods have been proposed for point clouds to deliver 3D data. Specifically, Draco [167] employs kd tree-based compression [168] and a point cloud library using octree-based compression [169, 170, 171]. To further reduce the amount of data traffic in point cloud delivery, two transform techniques have been proposed for energy compaction of the non-ordered and non-uniformly distributed signals: Fourier-based transform (e.g., Graph Fourier Transform (GFT)) and wavelet-based transform (e.g., region-adaptive Haar transform) [172]. For example, recent studies used GFT for the color components [173] and 3D coordinates [174] of graph signals for signal decorrelation. They used quantization and entropy coding for the compression of decorrelated signals.
HoloCast [175] is a pioneering work on soft 3D point cloud delivery for unstable wireless channels. Specifically, they regard 3D points as vertices in a graph with edges between nearby vertices to deal with the irregular structure of the 3D points motivated by the work of Rente et al. [174] and Zhang et al. [176]. HoloCast uses GFT for such graph signals to exploit the underlying correlations among adjacent graph signals and directly transmits linear-transformed graph signals as a pseudo-analog modulation over the channel. We compared HoloCast with the conventional digital-based delivery, which is based on point cloud digital compression used in a point cloud library [169]. HoloCast gradually improves the reconstruction quality with the improvement of wireless channel quality. In addition, the GFT-based HoloCast can achieve better quality compared with the DCT-based HoloCast.
However, it has been found that graph-based coding schemes need to send the graph-based transform basis matrix used in GFT as additional metadata for signal decoding. For example, the sender needs to send \(N^2\) real elements of the graph-based transform basis matrix as the metadata when the number of 3D points is N. In some works [177, 178, 179], Givens rotation [180, 181] was used for GFT basis matrix compression. Givens rotation is used to selectively introduce zeros into a matrix to create an identity matrix from the basis matrix using angle parameters. The angle parameters are uniformly and non-uniformly quantized prior to the metadata transmission for overhead reduction. From the evaluations, Givens rotation with the uniform quantization reduces the overhead up to 89.8% [177] compared with HoloCast without the overhead reduction. In addition, Givens rotation with the non-uniform quantization further reduces the overhead up to 28.6% [178] compared with the uniform quantization.

5 Future Directions

The existing soft delivery schemes have been studied to overcome the issues of conventional image and video streaming in modern wireless and mobile networks. In this section, we foresee the future directions of soft delivery. Table 8 lists the features and challenges of each future direction. Specifically, the integration of digital-based and DNN-based operations with soft delivery will be further discussed to yield better reconstruction quality. These studies are referred to as HDA delivery and AI-empowered soft delivery. In addition, our study and other studies find that soft delivery can improve the delivery quality of the DNN architectures and tactile data. The soft delivery based schemes may become a new standard for such delivery. Although the neural network compression and haptic codec are designed for the delivery, it causes low reconstruction quality owing to insufficient energy compaction.
Table 8.
Future DirectionAdvantageDisadvantageChallenges
HDA deliveryCompact signal energy by integrating with digital-based operationsCause cliff effect if digital-coded symbols failDiscuss tradeoff between coding delay and quality
AI-empowered soft deliveryRealize energy compaction and signal reconstruction by using DNN-based architecturesLarge computational overheadDeal with bandwidth heterogeneity, Design an optimal architecture for semantic communication
Soft delivery for AIEfficiently exchange model parameters by using simultaneous transmissionRequire symbol-level synchronizationDesign power allocation for low-energy FL
Soft delivery forTactile InternetMeet strict delay constraintConsider a single vibrotactile sensorDeal with multiple vibrotactile sensors, Minimize the distortion of human tactile perception
Table 8. Future Directions and Challenges of Soft Delivery Schemes

5.1 HDA Delivery

For further quality improvement, the pioneer studies of HDA delivery [182, 183] integrate low-rate digital-based encoding and decoding into soft delivery.
They proposed the superposition of analog-coded and digital-coded symbols to take advantage of conventional digital-based and soft delivery schemes. Specifically, the digital-coded symbols provide the baseline quality of the video frames, whereas the analog-coded symbols enhance the quality of the video frames according to the wireless channel quality. Here, the low-rate digital-based operations can significantly reduce the signal energy of the analog-coded symbols (e.g., the decrement of the ratio of the maximum energy to the minimum energy of the source component). A theoretical study [40] clarified that a lower ratio improves the reconstruction quality of the analog-coded symbols. This means that the amount of quality improvement with the improvement of the wireless channel quality in HDA coding schemes becomes more significant compared to the pure soft delivery schemes. Nonetheless, the integration with digital-based encoding has one drawback: the cliff effect may occur when the decoding of digital-coded symbols fails.
Figure 7 shows an overview of the HDA delivery schemes. HDA delivery schemes consist of the digital and analog coding parts. At the sender side, the video frames are first encoded by the digital video encoder and the digitally coded bitstream is channel coded, modulated, and assigned transmission power by the sender. Meanwhile, the residuals are coded, power assigned, and modulated by the soft delivery scheme. Both outputs from the digital and analog coding parts are superposed and transmitted over wireless channels. In this case, the transmitted signal \(x_i\) is the sum of the BPSK-modulated vector signal \(x^{\langle \mathsf {d}\rangle }_{i}\) and output vector signal of the soft delivery scheme \(x^{\langle \mathsf {a}\rangle }_{i}\) as follows:
\begin{equation} x_i = x^{\langle \mathsf {d}\rangle }_{i} + \jmath x^{\langle \mathsf {a}\rangle }_{i}. \end{equation}
(6)
The BPSK-modulated symbol and the analog-modulated symbol are scaled by \(P_\mathrm{d}\) and \(g_{i}\), respectively:
\begin{equation} x^{\langle \mathsf {d}\rangle }_{i} = \sqrt {P_\mathsf {d}} \cdot b_{i}, \qquad x^{\langle \mathsf {a}\rangle }_{i} = g_{i} \cdot s_{i}, \end{equation}
(7)
where \(b_{i}\in \mathbb {X} = \left\lbrace \pm 1 \right\rbrace\) is the BPSK-modulated symbol and \(\jmath =\sqrt {-1}\) denotes the imaginary unit. Here, the near-optimal solution of \(g_i\) under the transmission power budget \(P_\mathsf {a}\) is based on Equation (4). We note that the budgets of the transmission power for the digital and analog parts need to satisfy the total power budget \(P_\mathsf {t}\) (i.e., \(P_\mathsf {t} = P_\mathsf {a} + P_\mathsf {d}\)).
Fig. 7.
Fig. 7. Typical framework of HDA delivery.
At the receiver side, it first decodes the digital-modulated symbols and then obtains the analog-modulated symbols by subtracting the digital-modulated symbols from the received symbols. Finally, the receiver reconstructs the baseline quality of the video frames from the output of the digital part and enhances the video quality by adding the output of the analog part.
A key issue in HDA delivery is the assignment of transmission power to the digital and analog parts [184]. Specifically, the power assigned to the digital part must guarantee the correct decoding of the symbols. By contrast, the digital decoder treats the superimposed analog-modulated symbols \(x^{\langle \mathsf {a}\rangle }_{i}\) as noise. To achieve better decoding performance, the I component of \(x^{\langle \mathsf {a}\rangle }_{i}\) should be kept as small as possible. In the work of Song et al. [185], the authors only select the high-frequency coefficients, which are expected to be very small values for superposition. The remaining low-frequency coefficients are delivered using pseudo-analog modulation. The HDA framework in the work of Tan et al. [186] regards the superposed symbols as three main parts: orthogonal analog symbols, digital symbols, and nonorthogonal analog symbols superimposed onto digital symbols. They designed resource allocation among these three parts to achieve a better balance between lowering interference and improving reconstruction quality. Another study [187] designs a prediction model to describe the relationship between the variance of residuals and the quantization parameter, and determines the optimal transmission power for the analog part, which maximizes the reconstruction quality with the correct decoding of the digital part. The HDA delivery scheme in the work of Zhang et al. [188] treats the imperfect decoding of the digital part and finds the best assignment of the transmission power for the digital and analog parts. This prevents too much power assignment for the digital part to ensure a low bit error rate. In contrast to the aforementioned studies, Liang et al. [189] treat the bandwidth of other digital traffic as hidden resources for HDA video delivery. Specifically, they superimpose the analog-modulated symbols and digital symbols of the other digital traffic to utilize the hidden resource under the constraint that the bit error rate requirement of the other digital traffic is not compromised.
Other studies have redesigned the power allocation in HDA delivery for practical wireless channel environments, including fading [190, 191], OFDM [192, 193], MIMO [194], and relay networks [195]. For example, the power allocation with perfect channel state information is designed in the work of Shen et al. [190], Yahampath [192], and Liu et al. [194], whereas the power allocation with imperfect channel state information is designed in other work by Yahampath [191, 193]. In view of the packet loss resilience in HDA delivery, the study of Fujihashi et al. [196] introduced CS for the residuals.
Other studies [197, 198, 199] extend HDA video delivery for immersive content. Swift [197] considers stereo video delivery and designs a zigzag coding structure for the stereo video to utilize both intra- and inter-view correlations. In the zigzag coding structure, the odd frames in the left view and the even frames in the right view are encoded digitally, and the rest of the frames are encoded in analog. Here, the reconstructions of the digitally coded frames are used as side information to further remove redundant information from the analog-coded frames. Another study [198] extends HDA delivery for MVD videos and solves the view synthesis optimization to yield the best quality from an intermediate virtual viewpoint. HoloCast+ [199] designs HDA delivery for point cloud delivery.
In future work, the recent coding standards such as H.266/VVC, learned video compression [200, 201], and point cloud coding can be used for the digital part. Although they have achieved significant energy compaction and can further improve the quality of the analog part, the recent coding standards require a long coding delay for compression. The tradeoff between coding delay and reconstructed image and video quality is an open question in HDA delivery.

5.2 AI-Empowered Soft Delivery

Some recent studies integrate DNN architectures for nonlinear encoding and decoding operations of soft delivery, namely AI-empowered soft delivery. AI-empowered soft delivery schemes utilize Deep Convolutional Neural Networks (DCNNs) [202, 203] and multi-layer perceptron networks for energy compaction, power allocation, and overhead reduction tasks.
The multi-layer perceptron auto-encoder was first adopted to reduce the overhead of soft delivery [204]. Specifically, the proposed encoder obtains a few latent variables from the pixel values, and the proposed decoder decodes the accurate power information from the received latent variables for proper power allocation. The reconstruction quality can be maintained even with only one metadata across one GoP. Another study [205] designs Deep Joint Source-Channel Coding (DJSCC) for the energy compaction of the image and video signals. The DJSCC schemes integrate a DCNN-based auto-encoder into a soft delivery scheme. The proposed encoder directly compresses each image into a limited number of latent variables, and the proposed decoder reconstructs the image from the latent variables. Here, the latent variables are transmitted over wireless channels using pseudo-analog modulation. Even though the latent variables are obtained by nonlinear functions and delivered over wireless channels with a lower SNR, cliff and leveling effects can be prevented via pseudo-analog modulation. Other studies have introduced the DNN architecture for power allocation [206] and decoding operations [207]. The study of Tang et al. [206] uses a YOLO (you-only-look-once) structure [208] to extract the Region of Interest (ROI) and non-ROI parts from each image and then assign unequal transmission power across ROI and non-ROI parts for perceptual quality enhancement. The proposed scheme in the work of Fujihashi et al. [207] integrates DCNN-based image denoising, specifically DIP (deep-image-prior) [209], into soft delivery. The DIP finds linear and nonlinear noise effects for reconstructing clean images from noisy images. The proposed scheme can remove fading and noise effects from the received images using DCNN-based image restoration. Another study [210] introduces the Graph Neural Network (GNN) [211] for wireless point cloud delivery. The GNN is a novel model for graph representation learning that allows the analysis of the irregular geometric structure of graph data. GNN-based auto-encoder [212, 213] was designed to encode 3D point clouds into a limited number of latent variables. One of the benefits of the GNN-based auto-encoder is that it allows graph signal reconstruction from a limited number of latent variables without requiring additional metadata.
DJSCC schemes with the latest neural network architectures have been well studied for further energy compaction in recent years. As a result, they have achieved better image and video quality without cliff, leveling, and staircase effects. However, the existing DJSCC schemes need to deal with the bandwidth heterogeneity among the receivers. Here, how to adaptively improve the image and video quality according to the available bandwidth using the same architecture remains challenging work. In addition, DJSCC schemes have been considered as the fundamental techniques for realizing semantic communication [214, 215] in future wireless and mobile networks. Semantic communication has been envisioned as a new transmission paradigm that delivers semantic meaning rather than a bit stream of transmitted messages. Another challenging issue is to design an optimal DJSCC scheme for the given semantic triple.

5.3 Soft Delivery for AI

Our study and other studies find that soft delivery can be utilized to support various AI architectures. Specifically, many AI-based applications need to exchange trained DNN models between the receiver over wireless networks within a short delay including viewport prediction [216] in untethered XR applications, and dead reckoning in autonomous driving [217, 218] and online gaming services [219]. In recent years, digital-based model compression schemes [220, 221] have been designed and standardized for sharing the trained model over the networks. However, the existing studies found that the cliff and leveling effects occur even in the DNN model transmissions. To prevent both effects, analog modulation is effective in the model transmissions. AirNet [222] adopts analog modulation to deliver the DNN model parameters over wireless networks. Specifically, AirNet directly maps the model parameters to the transmission symbols and sends the analog-modulated symbols via wireless channels. This process avoids the issues mentioned previously, and the model restoration quality faithfully corresponds to the instantaneous channel condition. As well, AirNet adopts SK mappings [97] to reduce the number of transmission symbols for band-limited channels.
In addition, the model parameter transmission is key technique to realize FL [223, 224] over wireless networks. Figure 8(a) shows one typical FL example over wireless networks. The FL is a decentralized learning approach, which trains the model over a federation of distributed learners and an aggregator, to obtain an accurate model even with a limited dataset in each distributed learner. Each learner in the federation uses only locally available data for training. For the training over the distributed learners and aggregator, both learners and the aggregator exchange the model parameters over wireless channels. The existing studies found that the analog modulation-based solutions are efficient for exchanging the model parameters for the FL. AirComp (analog over-the-air computation) [225, 226] is a typical solution for the model parameter transmission in FL. All learners simultaneously send the analog-modulated parameters with channel inversion to the aggregator, and the aggregator can receive the aggregated model parameters from the superimposed waveforms. Although the simultaneous analog transmission can improve the throughput of the model parameter transmission, AirComp requires a precise symbol-level synchronization between the distributed learners. In previous work [227], we proposed the model parameter transmission in a round-robin manner for quasi-asynchronous FL systems. Specifically, Federated AirNet based on HDA delivery integrates low-rate model parameter compression and energy-compact analog modulation.
Fig. 8.
Fig. 8. FL over wireless channels. Overview (a) top-1 classification accuracy (b) of the global model over the digital-based, analog-based, and the proposed Federated AirNet schemes as a function of wireless channel quality.
Figure 8(b) shows the average top-1 classification accuracy of the global model as a function of wireless channel SNR when an available number of transmission symbols is at most 6.0 Msymbols. The number of transmission iterations from 10 learners to the aggregator is 10. We compare Federated AirNet to two state-of-the-art digital and analog approaches: DeepCABAC and AirNet. In the DeepCABAC scheme, the compressed bitstream is channel coded by a half-rate convolutional code with a constraint length of 8 and digitally modulated by BPSK or 4-QAM formats. The AirNet scheme directly maps each of the model parameters onto a transmission symbol in analog modulation. We found that the AirNet and Federated AirNet schemes prevent drastic degradation in accuracy since they do not rely on quantization and entropy coding. In addition, the proposed Federated AirNet scheme yields the best accuracy, in higher-channel SNR regimes, achieving near error-free performance.
Although the soft delivery schemes have the potential for the realization of FL, they cause large energy consumption because they consider the transmission power of each distributed learner as identical. Although each learner can limit the transmission power to reduce energy consumption, the limitation may cause a long delay for convergence and low global model performance. The discussion on the transmission power allocation among the distributed learners will be a key issue for the realization of FL with low energy consumption.

5.4 Soft Delivery for Tactile Internet

In addition to visual information, multiple sensorial media (mulsemedia) delivery can enhance the quality of immersive experiences in XR applications. Tactile is one of the typical mulsemedia data. Especially, tactile communications can support the untethered and immersive XR applications. In contrast to visual information, the sampling rate of tactile information is relatively high (i.e., above 1,000 Hz) and the delay requirement for tactile communications is strict. In this case, the sender does not retransmit the tactile under the channel quality fluctuation. Although the haptic codec [228, 229], which is defined in the IEEE 1918.1.1 standard, is designed to compress the tactile (e.g., vibrotactile signals), the cliff and leveling effects also occur owing to the channel quality fluctuation. We use the analog modulation with DCT and DWT for the vibrotactile signals to discuss the feasibility of the soft haptic delivery. Figure 9(a) shows an overview of the proposed soft haptic delivery scheme, and Figure 9(b) shows the reconstruction quality of the vibrotactile signal over the soft haptic and the digital-based delivery schemes at the channel SNRs of 10 dB and 20 dB under different available bandwidths. Here, the digital-based delivery scheme uses the BPSK modulation format and controls the quantization parameter to fit the transmission symbols into the available bandwidth, whereas the soft haptic delivery scheme discards high-frequency coefficients for the same purpose. We used “1spike_Probe_-_aluminumGrid_-_fast” as the reference vibrotactile data provided by the IEEE 1918.1.1 standard. From preliminary evaluations, the soft haptic delivery scheme yields better reconstruction quality of the vibrotactile signals irrespective of the available bandwidth.
Fig. 9.
Fig. 9. Soft haptic delivery. Overview (a) and reconstruction quality (b) over the digital-based and the soft haptic delivery schemes as a function of the number of transmission symbols.
The soft haptic delivery scheme was designed only for the single vibrotactile sensor and to minimize the MSE between the original and reconstructed vibrotactile signals. However, vibrotactile signals from multiple sensors should be delivered to provide immersive experiences for users. In addition, a psychohaptic model in the work of Noll et al. [228] and Steinbach et al. [229] demonstrated that each frequency band of vibrotactile signals has unequal sensitivity for humans. Here, how to design energy compression for vibrotactile signals from multiple sensors and power allocation to minimize the distortion considering human tactile perception are challenging issues for the realization of haptic delivery.

6 Conclusion

In this article, we presented an exhaustive survey and research outlook of soft delivery schemes. We first reviewed conventional digital-based video delivery schemes and the critical issues of the schemes, including cliff, leveling, and staircase effects. We then provided an overview of the soft delivery schemes and the taxonomy of the existing schemes from the perspectives of energy compaction, power allocation, bandwidth utilization, packet loss resilience, overhead reduction, and implementation. Some studies adopted the existing techniques of the energy compaction and overhead reduction to the soft delivery for immersive content, finding that the reconstruction quality outperforms the digital-based delivery schemes even with HEVC-based source coding. Finally, we envisioned the future directions of soft delivery based on preliminary evaluations. We expect that soft delivery will be essential for sending high-quality model parameters and tactile information over wireless and mobile networks.

References

[1]
Ericsson. 2022. Ericsson Mobility Report.
[2]
Dan Grois, Detlev Marpe, Amit Mulayoff, Benaya Itzhaky, and Ofer Hadar. 2013. Performance comparison of H.265/MPEG-HEVC, VP9, and H.264/MPEG-AVC encoders. In Proceedings of the IEEE Picture Coding Symposium. 394–397.
[3]
Sebastian Schwarz, Marius Preda, Vittorio Baroncini, Madhukar Budagavi, Pablo Cesar, Philip A. Chou, Robert A. Cohen, et al. 2019. Emerging MPEG standards for point cloud compression. IEEE Journal of Emerging and Selected Topics in Circuits and Systems 9, 1 (2019), 133–148.
[4]
Danillo Bracco Graziosi, Ohji Nakagami, Satoru Kuma, Alexandre Zaghetto, Teruhiko Suzuki, and Ali Tabatabai. 2020. An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Transactions on Signal and Information Processing 9 (2020), 1–17.
[5]
Igor Kozintsev and Kannan Ramchandran. 1997. A wavelet zerotree-based hybrid compressed/uncompressed framework for wireless image transmission. In Proceedings of the 31st Asilomar Conference on Signals, Systems, and Computers, Vol. 2. 1023–1027.
[6]
Michael Gastpar, Martin Vetterli, and Pier Luigi Dragotti. 2006. Sensing reality and communicating bits: A dangerous liaison. IEEE Signal Processing Magazine 23, 4 (2006), 70–83.
[7]
Silvija Kokalj-Filipović, Emina Soljanin, and Yang Gao. 2011. Cliff effect suppression through multiple-descriptions with split personality. In Proceedings of the IEEE International Symposium on Information Theory. 948–952.
[8]
Heiko Schwarz, Detlev Marpe, and Thomas Wiegand. 2007. Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Transactions on Circuits and Systems for Video Technology 17, 9 (2007), 1103–1120.
[9]
Abdelhak Bentaleb, Bayan Taani, Ali C. Begen, Christian Timmerer, and Roger Zimmermann. 2019. A survey on bitrate adaptation schemes for streaming media over HTTP. IEEE Communications Surveys & Tutorials 21, 1 (2019), 562–585.
[10]
Tao Wen, Wenxia Cai, Anhong Wang, Jie Liang, and Lijun Zhao. 2020. HDA video transmission scheme for DASH. IEEE Access 8 (2020), 58345–58356.
[11]
Szymon Jakubczak and Dina Katabi. 2011. A cross-layer design for scalable mobile video. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking. 289–300.
[12]
Szymon Jakubczak, John Z. Sun, Dina Katabi, and Vivek K. Goyal. 2011. Performance regimes of uncoded linear communications over AWGN channels. In Proceedings of the 45th Annual Conference on Information Sciences and Systems. 1–6.
[13]
Szymon Jakubczak and Dina Katabi. 2010. SoftCast: One-size-fits-all wireless video. Computer Communication Review 40, 4 (2010), 449–450.
[14]
Helge Coward. 2002. Joint Source-Channel Coding: Development of Methods and Utilization in Image Communications. Ph.D. Dissertation. Norwegian University of Science and Technology.
[15]
Kyong-Hwa Lee and D. Petersen. 1976. Optimal linear coding for vector channels. IEEE Transactions on Communications 24, 12 (1976), 1283–1290.
[16]
Scott Pudlewski, Nan Cen, Zhangyu Guan, and Tommaso Melodia. 2015. Video transmission over lossy wireless networks: A cross-layer perspective. IEEE Journal of Selected Topics in Signal Processing 9, 1 (2015), 6–21.
[17]
Behrouz Jedari, Gopika Premsankar, Gazi Illahi, Mario Di Francesco, Abbas Mehrabi, and Antti Ylä-Jääski. 2020. Video caching, analytics, and delivery at the wireless edge: A survey and future directions. IEEE Communications Surveys & Tutorials 23, 1 (2020), 431–471.
[18]
Abid Yaqoob, Ting Bi, and Gabriel-Miro Muntean. 2020. A survey on adaptive 360 video streaming: Solutions, challenges and opportunities. IEEE Communications Surveys & Tutorials 22, 4 (2020), 2801–2838.
[19]
Ching-Ling Fan, Wen-Chih Lo, Yu-Tung Pai, and Cheng-Hsin Hsu. 2019. A survey on 360 video streaming: Acquisition, transmission, and display. ACM Computing Surveys 52, 4 (2019), 1–36.
[20]
Zhi Liu, Qiyue Li, Xianfu Chen, Celimuge Wu, Susumu Ishihara, Jie Li, and Yusheng Ji. 2021. Point cloud video streaming: Challenges and solutions. IEEE Network 35, 5 (2021), 202–209.
[21]
Thomas Stockhammer, Hrvoje Jenkac, and Gabriel Kuhn. 2004. Streaming video over variable bit-rate wireless channels. IEEE Transactions on Multimedia 6, 2 (2004), 268–277.
[22]
Zhili Guo, Yao Wang, Elza Erkip, and Shivendra Panwar. 2015. Wireless video multicast with cooperative and incremental transmission of parity packets. IEEE Transactions on Multimedia 17, 8 (2015), 1335–1346.
[23]
Saleh Almowuena, Md. Mahfuzur Rahman, Cheng Hsin Hsu, Ahmad AbdAllah Hassan, and Mohamed Hafeeda. 2016. Energy-aware and bandwidth-efficient hybrid video streaming over mobile networks. IEEE Transactions on Multimedia 18, 1 (2016), 102–115.
[24]
Wiegand Thomas, Sullivan Gary J, Bjontegaard Gisle, and Luthra Ajay. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560–576.
[25]
Maria Torres Vega, Cristian Perra, and Antonio Liotta. 2018. Resilience of video streaming services to network impairments. IEEE Transactions on Broadcasting 64, 2 (2018), 220–234.
[26]
Takumasa Ishioka, Kazuki Aiura, Ryota Shiina, Tatsuya Fukui, Tomohiro Taniguchi, Satoshi Narikawa, Katsuya Minami, et al. 2021. Design and prototype implementation of software-defined radio over fiber. IEEE Access 9 (2021), 72793–72807.
[27]
Qianqian Fan, David J. Lilja, and Sachin S. Sapatnekar. 2020. Adaptive-length coding of image data for low-cost approximate storage. IEEE Transactions on Computers 69, 2 (2020), 239–252.
[28]
Ying Li, Jun Wu, Bin Tan, Min Wang, and Wei Zhang. 2019. Compressive spinal codes. IEEE Transactions on Vehicular Technology 68, 12 (2019), 11944–11954.
[29]
Lingyu Liu, Jun Wu, and Jian Wu. 2018. COQRC: A rateless video transmission solution. In Proceedings of the International Conference on Computing, Networking, and Communications. 463–467.
[30]
Lu Wang, Hailiang Yang, Xiaoke Qi, Jun Xu, and Kaishun Wu. 2019. ICast: Fine-grained wireless video streaming over Internet of Intelligent Vehicles. IEEE Internet of Things Journal 6, 1 (2019), 111–123.
[31]
Guanhua Wang, Kaishun Wu, Qian Zhang, and Lionel M. Ni. 2014. SimCast: Efficient video delivery in MU-MIMO WLANs. In Proceedings of the IEEE Conference on Computer Communications. 2454–2462.
[32]
Siripuram T. Aditya and Sachin Katti. 2011. FlexCast: Graceful wireless video streaming. In Proceedings of the Annual International Conference on Mobile Computing and Networking. 277–288.
[33]
Tomas Kratochvil and Radim Stukavec. 2008. Hierarchical modulation in DVB-T/H mobile TV transmission over fading channels. In Proceedings of the International Symposium on Information Theory and Its Applications. 1–6.
[34]
Cornelius Hellge, Shpend Mirta, Thomas Schierl, and Thomas Wiegand. 2009. Mobile TV with SVC and hierarchical modulation for DVB-H broadcast services. In Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting. 1–5.
[35]
Mahdi Ghandi and Mohammed Ghanbari. 2006. Layered H.264 video transmission with hierarchical QAM. Journal of Visual Communication and Image Representation 17, 2 (2006), 451–466.
[36]
Dongweon Yoon, Kyongkuk Cho, and Jinsock Lee. 2000. Bit error probability of M-ary quadrature amplitude modulation. In Proceedings of the Vehicular Technology Conference. 2422–2427.
[37]
Xiph. n.d. Xiph.org Video Test Media [Derf’s Collection]. Retrieved July 12, 2023 from http://media.xiph.org/video/derf/.
[38]
Anthony Trioux, Giuseppe Valenzise, Marco Cagnazzo, Michel Kieffer, François-Xavier Coudoux, Patrick Corlay, and Mohamed Gharbi. 2020. Subjective and objective quality assessment of the SoftCast video transmission scheme. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing. 96–99.
[39]
Anthony Trioux, François Xavier Coudoux, Patrick Corlay, and Mohamed Gharbi. 2018. A comparative preprocessing study for SoftCast video transmission. In Proceedings of the International Symposium on Signal, Image, Video, and Communications. 54–59.
[40]
Vinod Prabhakaran, Rohit Puri, and Kannan Ramchandran. 2011. Hybrid digital-analog codes for source-channel broadcast of Gaussian sources over Gaussian channels. IEEE Transactions on Information Theory 57, 7 (2011), 4573–4588.
[41]
Hao Cui, Ruiqin Xiong, Chong Luo, Zhihai Song, and Feng Wu. 2015. Denoising and resource allocation in uncoded video transmission. IEEE Journal on Selected Topics in Signal Processing 9, 1 (2015), 102–112.
[42]
Quan Wang, Xiaocheng Lin, Yu Liu, Lin Zhang, and Xiaofei Wu. 2014. A scalable mobile video broadcast scheme using 3D wavelet transform. In Proceedings of the IEEE Vehicular Technology Conference. 3–7.
[43]
Quan Wang, Xiaocheng Lin, Yu Liu, Lin Zhang, and Xiaofei Wu. 2015. A scalable framework for mobile video broadcast using MCTF and 2D-DWT. In Proceedings of the International Symposium on Wireless Personal Multimedia Communications. 118–123.
[44]
Xiaopeng Fan, Ruiqin Xiong, Feng Wu, and Debin Zhao. 2012. WaveCast: Wavelet based wireless video broadcast using lossy transmission. In Proceedings of the IEEE Visual Communications and Image Processing Conference. 1–6.
[45]
Hao Cui, Zhihai Song, Zhe Yang, Chong Luo, Ruiqin Xiong, and Feng Wu. 2013. Cactus: A hybrid digital-analog wireless video communication system. In Proceedings of the ACM International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems. 273–278.
[46]
Xiao Zhao, Hancheng Lu, Chang Wen Chen, and Jun Wu. 2016. Adaptive hybrid digital-analog video transmission in wireless fading channel. IEEE Transactions on Circuits and Systems for Video Technology 26, 6 (2016), 1117–1130.
[47]
Ruiqin Xiong, Feng Wu, Jizheng Xu, Xiaopeng Fan, Chong Luo, and Wen Gao. 2016. Analysis of decorrelation transform gain for uncoded wireless image and video communication. IEEE Transactions on Image Processing 25, 4 (2016), 1820–1833.
[48]
Ruiqin Xiong, Feng Wu, Jizheng Xu, and Wen Gao. 2013. Performance analysis of transform in uncoded wireless visual communication. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1159–1162.
[49]
Anthony Trioux, François Xavier Coudoux, Patrick Corlay, and Mohamed Gharbi. 2020. Temporal information based GoP adaptation for linear video delivery schemes. Signal Processing: Image Communication 82 (2020), 1–17.
[50]
Xiaocheng Lin, Nianfei Fan, Yu Liu, Shuri Cai, and Xiaojing Wang. 2014. Soft wireless image/video broadcast based on component protection. In Proceedings of the IEEE International Conference on Network Infrastructure and Digital Content. 84–89.
[51]
Chenfeng He, Huachan Qin, Zhiqiang He, and Kai Niu. 2016. Adaptive GoP dividing video coding for wireless broadcast based on power allocation optimization. In Proceedings of the International Conference on Wireless Communications and Signal Processing. 1–5.
[52]
Anthony Trioux, Francois Xavier Coudoux, Patrick Corlay, and Mohamed Gharbi. 2019. A reduced complexity/side information preprocessing method for high quality SoftCast-based video delivery. In Proceedings of the European Workshop on Visual Information Processing. 205–210.
[53]
Zhihai Song, Ruiqin Xiong, Siwei Ma, Xiaopeng Fan, and Wen Gao. 2014. Layered image/video SoftCast with hybrid digital-analog transmission for robust wireless visual communication. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1–6.
[54]
Jing Zhao, Jiyu Xie, and Ruiqin Xiong. 2018. Residual signals modeling for layered image/video SoftCast with hybrid digital-analog transmission. In Proceedings of the IEEE International Conference on Image Processing.3284–3288.
[55]
Dongliang He, Chong Luo, Cuiling Lan, Feng Wu, and Wenjun Zeng. 2015. Structure-preserving hybrid digital-analog video delivery in wireless networks. IEEE Transactions on Multimedia 17, 9 (2015), 1658–1670.
[56]
Yuanyuan Li, Yu Liu, Yumei Wang, and Zhexin Li. 2016. Visual information exploited hybrid digital-analog scheme for wireless video multicast. In Proceedings of the Visual Communications and Image ProcessingConference. 1–4.
[57]
Ruiqin Xiong, Hangfan Liu, Siwei Ma, Xiaopeng Fan, Feng Wu, and Wen Gao. 2014. G-CAST: Gradient based image SoftCast for perception-friendly wireless visual communication. In Proceedings of the Data Compression Conference. 133–142.
[58]
Hangfan Liu, Ruiqin Xiong, Xiaopeng Fan, Debin Zhao, Yongbing Zhang, and Wen Gao. 2019. CG-Cast: Scalable wireless image SoftCast using compressive gradient. IEEE Transactions on Circuits and Systems for Video Technology 29, 6 (2019), 1832–1843.
[59]
A. Sehgal, A. Jagmohan, and N. Ahuja. 2004. Wyner-Ziv coding of video: An error-resilient compression framework. IEEE Transactions on Multimedia 6, 2 (2004), 249–258.
[60]
S. Pradhan and K. Ramchandran. 2003. Distributed source coding using syndromes (DISCUS): Design and construction. IEEE Transactions on Information Theory 49, 3 (2003), 626–643.
[61]
Ailing Zhang, Xiaopeng Fan, Ruiqin Xiong, and Debin Zhao. 2013. Distributed soft video broadcast with variable block size motion estimation. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing. 1–5.
[62]
Xiaopeng Fan, Feng Wu, Debin Zhao, Oscar C. Au, and Wen Gao. 2012. Distributed soft video broadcast (DCAST) with explicit motion. In Proceedings of the Data Compression Conference. 199–208.
[63]
Xiaopeng Fan, Feng Wu, and Debin Zhao. 2011. D-Cast: DSC based soft mobile video broadcast. In Proceedings of the International Conference on Mobile and Ubiquitous Multimedia. 226–235.
[64]
Xiaopeng Fan, Feng Wu, Debin Zhao, and Oscar C. Au. 2013. Distributed wireless visual communication with power distortion optimization. IEEE Transactions on Circuits and Systems for Video Technology 23, 6 (2013), 1040–1053.
[65]
Wei Huang, Xiaopeng Fan, and Debin Zhao. 2013. Soft mobile video broadcast based on side information refining. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing.
[66]
Ricardo Martins, Catarina Brites, João Ascenso, and Fernando Pereira. 2009. Refining side information for improved transform domain Wyner-Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology 19, 9 (2009), 1327–1341.
[67]
Mengyang Lv, Yu Liu, and Yumei Wang. 2017. Scalable wireless video broadcast based on unequal protection. In Proceedings of the Visual Communication and Image ProcessingConference. 1–4.
[68]
Xiaocheng Lin, Yu Liu, and Lin Zhang. 2015. Scalable video SoftCast using magnitude shift. In Proceedings of the IEEE Wireless Communications and Networking Conference. 1996–2001.
[69]
Xiaopeng Fan, Ruiqin Xiong, Debin Zhao, and Feng Wu. 2015. Layered soft video broadcast for heterogeneous receivers. IEEE Transactions on Circuits and Systems for Video Technology 25, 11 (2015), 1801–1814.
[70]
Mengyang Lv, Yu Liu, and Yumei Wang. 2017. Adaptive scalable wireless video coding based on unequal protection and quadtree partition. In Proceedings of the International Conference on Network Infrastructure and Digital Content, Vol. 9. 214–218.
[71]
Ahmed Hagag, Xiaopeng Fan, and Fathi E. Abd El-Samie. 2017. Hyperspectral image coding and transmission scheme based on wavelet transform and distributed source coding. Multimedia Tools and Applications 76, 22 (2017), 23757–23776.
[72]
Jian Shen, Fei Liang, Chong Luo, Houqiang Li, and Wenjun Zeng. 2018. Cooperative hybrid digital-analog video transmission in D2D networks. In Proceedings of the IEEE International Conference on Image Processing.3274–3278.
[73]
Yumei Wang, Mengyao Sun, and Yu Liu. 2018. Distributed and adaptive analog coding for video broadcast in wireless cooperative system. Wireless Personal Communications 102, 3 (2018), 2287–2306.
[74]
Mengyao Sun, Yumei Wang, Hao Yu, and Yu Liu. 2015. Distributed cooperative video coding for wireless video broadcast system. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1–6.
[75]
Hao Cui, Dian Liu, Yuqi Han, and Jun Wu. 2016. Robust uncoded video transmission under practical channel estimation. In Proceedings of the IEEE Global Communications Conference. 1–6.
[76]
Fan Zhang, Anhong Wang, Haidong Wang, Suyue Li, and Xiaoli Ma. 2015. Channel-aware video SoftCast scheme. In Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing. 578–581.
[77]
Zhilong Zhang, Danpu Liu, and Xin Wang. 2018. Joint carrier matching and power allocation for wireless video with general distortion measure. IEEE Transactions on Mobile Computing 17, 3 (2018), 577–589.
[78]
Jian Wu, Bin Tan, Jun Wu, and Rui Wang. 2019. Efficient soft video MIMO design to combine diversity and spatial multiplexing gain. IEEE Internet of Things Journal 6, 3 (2019), 5461–5472.
[79]
S. Zheng, M. Cagnazzo, and M. Kieffer. 2018. Precoding matrix design in linear video coding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1198–1202.
[80]
S. Zheng, M. Antonini, M. Cagnazzo, L. Guerrieri, M. Kieffer, I. Nemoianu, R. Samy, and B. Zhang. 2016. SoftCast with per-carrier power-constrained channels. In Proceedings of the IEEE International Conference on Image Processing. 2122–2126.
[81]
Xiao Lin Liu, Wenjun Hu, Qifan Pu, Feng Wu, and Yongguang Zhang. 2012. ParCast: Soft video delivery in MIMO-OFDM WLANs. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking. 233–244.
[82]
Xiao Lin Liu, Wenjun Hu, Chong Luo, Qifan Pu, Feng Wu, and Yongguang Zhang. 2014. ParCast+: Parallel video unicast in MIMO-OFDM WLANs. IEEE Transactions on Multimedia 16, 7 (2014), 2038–2051.
[83]
Zhilong Zhang, Danpu Liu, Xiaoli Ma, and Xin Wang. 2015. ECast: An enhanced video transmission design for wireless multicast systems over fading channels. IEEE Systems Journal 11, 4 (2015), 2566–2577.
[84]
Shuo Zheng, Marco Cagnazzo, and Michel Kieffer. 2020. Channel impulsive noise mitigation for linear video coding schemes. IEEE Transactions on Circuits and Systems for Video Technology 30, 9 (2020), 3196–3209.
[85]
Xiaoda Jiang, Hancheng Lu, Chang Wen Chen, and Feng Wu. 2019. Receiver-driven video multicast over NOMA systems in heterogeneous environments. In Proceedings of the IEEE International Conference on Computer Communications. IEEE, Los Alamitos, CA, 982–990.
[86]
Jian Wu, Bin Tan, Jun Wu, and Min Wang. 2019. Video multicast: Integrating scalability of soft video delivery systems into NOMA. IEEE Wireless Communications Letters 8, 6 (2019), 1722–1726.
[87]
Rongxin Zhang, Yiming Kong, Xiaoli Ma, and Deqing Wang. 2018. Adaptive video transmission designs over underwater acoustic channels. In Proceedings of the International Conference on Computing, Networking, and Communications. 295–299.
[88]
Xiao-Wei Tang, Xin-Lin Huang, and Fei Hu. 2021. QoE-driven UAV-enabled pseudo-analog wireless video broadcast: A joint optimization of power and trajectory. IEEE Transactions on Multimedia 23 (2021), 2398–2412.
[89]
Yongqiang Gui, Lu Hancheng, Feng Wu, and Chang Wen Chen. 2020. LensCast: Robust wireless video transmission over mmWave MIMO with lens antenna array. IEEE Transactions on Multimedia 24 (2020), 33–48.
[90]
Jing Zhao, Ruiqin Xiong, Chong Luo, Feng Wu, and Wen Gao. 2018. Wireless image and video soft transmission via perception-inspired power distortion optimization. In Proceedings of the IEEE Visual Communications and Image Processing Conference. 1–4.
[91]
Jian Shen, Lei Yu, Li Li, and Houqiang Li. 2018. Foveation-based wireless soft image delivery. IEEE Transactions on Multimedia 20, 10 (2018), 2788–2800.
[92]
Hadi Hadizadeh. 2017. Saliency-guided wireless transmission of still images using SoftCast. In Proceedings of the International Symposium on Telecommunications. 506–509.
[93]
Yuanyuan Li, Zhexin Li, Yu Liu, and Yumei Wang. 2017. SCAST: Wireless video multicast scheme based on segmentation and SoftCast. In Proceedings of the IEEE Wireless Communications and Networking Conference. 1–6.
[94]
Zhou Wang and Alan Conrad Bovik. 2001. Embedded foveation image coding. IEEE Transactions on Image Processing 10, 10 (2001), 1397–1410.
[95]
Laurent Itti, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 11 (1998), 1254–1259.
[96]
Zhexin Li, Yu Liu, and Yumei Wang. 2017. Unequal block for low bandwidth adaption in wireless video broadcast. In Proceedings of the International Conference on Network Infrastructure and Digital Content. 386–390.
[97]
Marco Cagnazzo and Michel Kieffer. 2015. Shannon-Kotelnikov mappings for SoftCast-based joint source-channel video coding. In Proceedings of the IEEE International Conference on Image Processing. 1085–1089.
[98]
Yongqiang Gui, Hancheng Lu, Xiaoda Jiang, Feng Wu, and Chang Wen Chen. 2020. Compressed pseudo-analog transmission system for remote sensing images over bandwidth-constrained wireless channels. IEEE Transactions on Circuits and Systems for Video Technology 30, 9 (2020), 3181–3195.
[99]
Yali Wang, Hancheng Lu, Zexue Li, and Jian Li. 2017. Robust satellite image transmission over bandwidth-constrained wireless channels. In Proceedings of the IEEE International Conference on Communications. 1–6.
[100]
Ahmad Shoja Yami and Hadi Hadizadeh. 2018. Visual attention-driven wireless multicasting of images using adaptive compressed sensing. In Proceedings of the Artificial Intelligence and Signal Processing Conference. 37–42.
[101]
Hadi Hadizadeh and Ivan V. Bajic. 2021. Soft video multicasting using adaptive compressed sensing. IEEE Transactions on Multimedia 23 (2021), 12–25.
[102]
Wenbin Yin, Xiaopeng Fan, Yunhui Shi, Ruiqin Xiong, and Debin Zhao. 2016. Compressive sensing based soft video broadcast using spatial and temporal sparsity. Mobile Networks and Applications 21, 6 (2016), 1002–1012.
[103]
Tze Yang Tung and Deniz Gunduz. 2018. SparseCast: Hybrid digital-analog wireless image transmission exploiting frequency-domain sparsity. IEEE Communications Letters 22, 12 (2018), 2451–2454.
[104]
Siyuan Liu, Kai Niu, and Chao Dong. 2019. Channel polarization based block compressive sensing SoftCast system. In Proceedings of the IEEE International Conference on Computer and Communications. 778–783.
[105]
Georgios Angelopoulos, Muriel Medard, and Anantha P. Chandrakasan. 2015. AdaptCast: An integrated source to transmission scheme for wireless sensor networks. In Proceedings of the IEEE International Conference on Communications. 2894–2899.
[106]
Jun Wu, Jian Wu, Hao Cui, Chong Luo, Xiaoyan Sun, and Feng Wu. 2016. DAC-Mobi: Data-assisted communications of mobile images with cloud computing support. IEEE Transactions on Multimedia 18, 5 (2016), 893–904.
[107]
Jun Wu, Dian Liu, Xin Lin Huang, Chong Luo, Hao Cui, and Feng Wu. 2015. DaC-RAN: A data-assisted cloud radio access network for visual communications. IEEE Wireless Communications 22, 3 (2015), 130–136.
[108]
Xin Lin Huang, Jun Wu, and Fei Hu. 2017. Knowledge-enhanced mobile video broadcasting framework with cloud support. IEEE Transactions on Circuits and Systems for Video Technology 27, 1 (2017), 6–18.
[109]
Xin Lin Huang, Xiaoning Huan, Jun Wu, Qingquan Sun, and Yingchun Yuan. 2016. Performance analysis of KMV-Cast with imperfect prior knowledge. In Proceedings of the IEEE Global Communications Conference. 1–5.
[110]
Xin Lin Huang, Xiaowei Tang, Xiaoning Huan, Ping Wang, and Jun Wu. 2018. Improved KMV-Cast with BM3D denoising. Mobile Networks and Applications 23, 1 (2018), 100–107.
[111]
Dian Liu, Jun Wu, Hao Cui, Dongdong Zhang, Chong Luo, and Feng Wu. 2018. Cost-distortion optimization and resource control in pseudo-analog visual communications. IEEE Transactions on Multimedia 20, 11 (2018), 3097–3110.
[112]
Chaofan He, Yang Hu, Yan Chen, Xiaopeng Fan, Houqiang Li, and Bing Zeng. 2020. MUcast: Linear uncoded multiuser video streaming with channel assignment and power allocation optimization. IEEE Transactions on Circuits and Systems for Video Technology 30, 4 (2020), 1136–1146.
[113]
Chaofan He, Yang Hu, Yan Chen, Xiaopeng Fan, Houqiang Li, and Bing Zeng. 2019. Exploiting channel assignment and power allocation for linear uncoded multiuser video streaming. In Proceedings of the IEEE International Conference on Communications.
[114]
David L. Donoho. 2006. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289–1306.
[115]
Emmanuel J. Candes and Michael B. Wakin. 2008. An introduction to compressive sampling. IEEE Signal Processing Magazine 25, 2 (2008), 21–30.
[116]
Xiaocheng Lin, Yu Liu, and Mengyao Sun. 2015. Analog channel coding for wireless image/video SoftCast by data division. In Proceedings of the International Conference on Telecommunications. 353–357.
[117]
Bin Tan, Jun Wu, Ying Li, Hao Cui, Wei Yu, and Chang Wen Chen. 2017. Analog coded SoftCast: A network slice design for multimedia broadcast/multicast. IEEE Transactions on Multimedia 19, 10 (2017), 2293–2306.
[118]
Yang Liu, Jing li, Xuanxuan Lu, Chau Yuen, and Jun Wu. 2015. A family of chaotic pure analog coding schemes based on Baker’s map function. EURASIP Journal on Advances in Signal Processing 2015 (2015), 58.
[119]
Chaofan He, Huiying Wang, Yang Hu, Yan Chen, Xiaopeng Fan, Houqiang Li, and Bing Zeng. 2018. MCast: High-quality linear video transmission with time and frequency diversities. IEEE Transactions on Image Processing 27, 7 (2018), 3599–3610.
[120]
Cuiling Lan, Dongliang He, Chong Luo, Feng Wu, and Wenjun Zeng. 2015. Progressive pseudo-analog transmission for mobile video live streaming. In Proceedings of the IEEE Visual Communications and Image Processing Conference. 1–4.
[121]
Dongliang He, Cuiling Lan, Chong Luo, Enhong Chen, Feng Wu, and Wenjun Zeng. 2017. Progressive pseudo-analog transmission for mobile video streaming. IEEE Transactions on Multimedia 19, 8 (2017), 1894–1907.
[122]
Anhong Wang, Bing Zeng, and Hua Chen. 2014. Wireless multicasting of video signals based on distributed compressed sensing. Signal Processing: Image Communication 29, 5 (2014), 599–606.
[123]
Anhong Wang, Qingdian Wu, Xiaoli Ma, and Bing Zeng. 2015. A wireless video multicasting scheme based on multi-scale compressed sensing. EURASIP Journal on Advances in Signal Processing 2015, 1 (2015), 1–11.
[124]
Shanshan Liu, Anhong Wang, Haidong Wang, Suyue Li, Meiling Li, and Jie Liang. 2017. Adaptive residual-based distributed compressed sensing for soft video multicasting over wireless networks. Multimedia Tools and Applications 76, 14 (2017), 15587–15606.
[125]
Sungkwang Mun and James E. Fowler. 2009. Block compressed sensing of images using directional transforms. In Proceedings of the IEEE International Conference on Image Processing. 3021–3024.
[126]
Zhihai Song, Ruiqin Xiong, Xiaopeng Fan, Siwei Ma, and Wen Gao. 2014. Transform domain energy modeling of natural images for wireless SoftCast optimization. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1114–1117.
[127]
Ruiqin Xiong, Jian Zhang, Feng Wu, Jizheng Xu, and Wen Gao. 2017. Power distortion optimization for uncoded linear transformed transmission of images and videos. IEEE Transactions on Image Processing 26, 1 (2017), 222–236.
[128]
Ruiqin Xiong, Feng Wu, Xiaopeng Fan, Chong Luo, Siwei Ma, and Wen Gao. 2013. Power-distortion optimization for wireless image/video SoftCast by transform coefficients energy modeling with adaptive chunk division. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing. 1–6.
[129]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip V. Orlik. 2018. High-quality soft video delivery with GMRF-based overhead reduction. IEEE Transactions on Multimedia 20, 2 (2018), 473–483.
[130]
Jingyu Li, Xiang E. Wen, Huizhu Jia, Xiaodong Xie, and Wen Gao. 2016. AnalogCast: Full linear coding and pseudo analog transmission for satellite remote-sensing images. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1362–1366.
[131]
Ticao Zhang and Shiwen Mao. 2019. Metadata reduction for soft video delivery. IEEE Networking Letters 1, 2 (2019), 84–88.
[132]
Yongqiang Gui, Hancheng Lu, Feng Wu, and Chang Wen Chen. 2021. Robust video broadcast for users with heterogeneous resolution in mobile networks. IEEE Transactions on Mobile Computing 20, 11 (2021), 3251–3266.
[133]
Xiao Lin, Wenjun Hu, Chong Luo, and Feng Wu. 2014. Compressive image broadcasting in MIMO systems with receiver antenna heterogeneity. Signal Processing: Image Communication 29, 3 (2014), 361–374.
[134]
Hao Cui, Chong Luo, Chang Wen Chen, and Feng Wu. 2016. Scalable video multicast for MU-MIMO systems with antenna heterogeneity. IEEE Transactions on Circuits and Systems for Video Technology 26, 5 (2016), 992–1003.
[135]
Xiao Wei Tang and Xin Lin Huang. 2020. A design of SDR-based pseudo-analog wireless video transmission system. Mobile Networks and Applications 25, 6 (2020), 2495–2505.
[136]
Shi Chen, Jun Wu, Haoqi Ren, Jian Wu, Baoye Zhang, and Fusheng Zhu. 2019. Hardware implementation of a pseudo-analog wireless video transmission system. In Proceedings of the IEEE International Conference on Communication Technology. 519–524.
[137]
Fengxiang Gao, Haoqi Gao, and Jun Wu. 2018. A reconfigurable SoC for SoftCast wireless video transmission. In Proceedings of the IEEE International Conference on Industrial Internet. IEEE, Los Alamitos, CA, 169–170.
[138]
Yao Jiang, Pengfei Xia, Jun Wu, Shi Chen, and Baoye Zhang. 2017. Pseudo-analog wireless stereo video transmission in hardware acceleration. In Proceedings of the International Conference on Wireless Communications and Signal Processing. 1–6.
[139]
Zhe Chen, Xu Zhang, Sulei Wang, Yuedong Xu, Jie Xiong, and Xin Wang. 2021. Enabling practical large-scale MIMO in WLANs with hybrid beamforming. IEEE/ACM Transactions on Networking 29, 4 (2021), 1605–1619.
[140]
Zhuang Ding, Jun Wu, Wei Yu, Yuqi Han, and Xianghuang Chen. 2016. Pseudo analog video transmission based on LTE physical layer. In Proceedings of the IEEE/CIC International Conference on Communications in China. 1–6.
[141]
Kun Tan, He Liu, Jiansong Zhang, Yongguang Zhang, Ji Fang, and Geoffrey Voelker. 2011. Sora: High-performance software radio using general-purpose multi-core processors. Communications of the ACM 54, 1 (2011), 99–107.
[142]
Zhe Chen, Xu Zhang, Yuedong Xu, Jie Xiong, Yu Zhu, and Xin Wang. 2017. MuVi: Multiview video aware transmission over MIMO wireless systems. IEEE Transactions on Multimedia 19, 12 (2017), 2788–2803.
[143]
Masayuki Tanimoto. 2012. FTV: Free-viewpoint television. Signal Processing: Image Communication 27, 6 (2012), 555–570.
[144]
Olgierd Stankiewicz, Marek Domanski, Adrian Dziembowski, Adam Grzelka, Dawid Mieloch, and Jarosaw Samelak. 2018. A free-viewpoint television system for horizontal virtual navigation. IEEE Transactions on Multimedia 20, 9 (Aug.2018), 2182–2195.
[145]
Rufael Mekuria and Lazar Bivolarsky. 2016. Overview of the MPEG activity on point cloud compression. In Proceedings of the Data Compression Conference. 620.
[146]
Christoph Fehn. 2004. Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In Proceedings Volume 5291, Stereoscopic Displays and Virtual Reality Systems. SPIE, 93–105.
[147]
Shuai Li, Ce Zhu, and Ming-Ting Sun. 2018. Hole filling with multiple reference views in DIBR view synthesis. IEEE Transactions on Multimedia 20, 8 (2018), 1948–1959.
[148]
Ying Chen, Miska M. Hannuksela, Teruhiko Suzuki, and Shinobu Hattori. 2014. Overview of the MVC+D 3D video coding standard. Journal of Visual Communication and Image Representation 25, 4 (May 2014), 679–688.
[149]
Ana De Abreu, Pascal Frossard, and Fernando Pereira. 2015. Optimizing multiview video plus depth prediction structures for interactive multiview video streaming. IEEE Journal of Selected Topics in Signal Processing 9, 3 (2015), 487–500.
[150]
Ying Chen and Sehoon Yen. 2013. 3D-AVC Draft Text 6, Document JCT3V-D1002.doc. JCT-3V.
[151]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip V. Orlik. 2017. Soft video delivery for free viewpoint video. In Proceedings of the IEEE International Conference on Communications. 1–7.
[152]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip V. Orlik. 2019. FreeCast: Graceful free-viewpoint video delivery. IEEE Transactions on Multimedia 21, 4 (2019), 1000–1010.
[153]
Ticao Zhang and Shiwen Mao. 2019. Joint power and channel resource optimization in soft multi-view video delivery. IEEE Access 7 (2019), 148084–148097.
[154]
Lei Luo, Taihai Yang, Ce Zhu, Zhi Jin, and Shu Tang. 2019. Joint texture/depth power allocation for 3-D video SoftCast. IEEE Transactions on Multimedia 21, 12 (2019), 2973–2984.
[155]
Taihai Yang, Lei Luo, Ce Zhu, and Shu Tang. 2019. Block DCT based optimization for wireless SoftCast of depth map. IEEE Access 7 (2019), 29484–29494.
[156]
Feng Qian, Bo Han, Lusheng Ji, and Vijay Gopalakrishnan. 2016. Optimizing 360 video delivery over cellular networks. In Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications, and Challenges. 1–6.
[157]
Jason Moss and Eric Muth. 2011. Characteristics of head-mounted displays and their effects on simulator sickness. Journal of the Human Factors and Ergonomics Society 53, 3 (2011), 308–319.
[158]
Daisuke Ochi, Yutaka Kunita, Akio Kameda, Akira Kojima, and Shinnosuke Iwaki. 2015. Live streaming system for omnidirectional video. In Proceedings of the IEEE Virtual Reality Conference. 349–350.
[159]
Takuya Fujihashi, Makoto Kobavashi, Keiichi Endo, Shunsuke Saruwatari, Shinya Kobayashi, and Takashi Watanabe. 2018. Graceful quality improvement in wireless 360-degree video delivery. In Proceedings of the IEEE Global Communications Conference. 1–7.
[160]
Jing Zhao, Ruiqin Xiong, and Jizheng Xu. 2019. OmniCast: Wireless pseudo-analog transmission for omnidirectional video. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 1 (March 2019), 58–70.
[161]
Yujun Lu, Takuya Fujihashi, Shunsuke Saruwatari, and Takashi Watanabe. 2020. 360Cast: Foveation-based wireless soft delivery for 360-degree video. In Proceedings of the IEEE International Conference on Communications. 1–6.
[162]
Lu Yujun, Takuya Fujihashi, Shunsuke Saruwatari, and Takashi Watanabe. 2021. 360Cast+: Viewport adaptive soft delivery for 360-degree videos. IEEE Access 9 (2021), 52684–52697.
[163]
Yule Sun, Ang Lu, and Lu Yu. 2017. Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Processing Letters 24, 9 (2017), 1408–1412.
[164]
Fraunhofer Heinrich Hertz Institute. n.d. High Efficiency Video Coding (HEVC). Retrieved July 12, 2023 from https://hevc.hhi.fraunhofer.de/.
[165]
P. A. Blanche, A. Bablumian, R. Voorakaranam, C. Christenson, W. Lin, T. Gu, D. Flores, et al. 2010. Holographic three-dimensional telepresence using large-area photorefractive polymer. Nature 468, 7320 (2010), 80–83.
[166]
Hyeonseung Yu, KyeoReh Lee, Jongchan Park, and YongKeun Park. 2017. Ultrahigh-definition dynamic 3D holographic display by active control of volume speckle fields. Nature Photonics 11, 3 (2017), 186–192.
[167]
Draco. 2022. Draco 3D Data Compression. Retrieved June 18, 2022 from https://google.github.io/draco/.
[168]
Olivier Devillers and Pierre-Marie Gandoin. 2000. Geometric compression for interactive transmission. In Proceedings of the IEEE Conference on Visualization. 319–326.
[169]
Julius Kammerl, Nico Blodow, Radu Bogdan Rusu, Suat Gedikli, Michael Beetz, and Eckehard Steinbach. 2012. Real-time compression of point cloud streams. In Proceedings of the IEEE International Conference on Robotics and Automation. 778–785.
[170]
Karsten Muller, Heiko Schwarz, Detlev Marpe, Christian Bartnik, Sebastian Bosse, Heribert Brust, Tobias Hinz, et al. 2011. 3D is here: Point Cloud Library (PCL). In Proceedings of the IEEE International Conference on Robotics and Automation. 1–4.
[171]
Ruwen Schnabel and Reinhard Klein. 2006. Octree-based point-cloud compression. In Proceedings of the Eurographics Symposium on Point-Based Graphics. 111–121.
[172]
Ricardo L. De Queiroz and Philip A. Chou. 2016. Compression of 3D point clouds using a region-adaptive hierarchical transform. IEEE Transactions on Image Processing 25, 8 (Aug. 2016), 3947–3956.
[173]
C. Zhang, D. Florêncio, and C. Loop. 2014. Point cloud attribute compression with graph transform. In Proceedings of the IEEE International Conference on Image Processing. 2066–2070.
[174]
Paulo de Oliveira Rente, Catarina Brites, Jao Ascenso, and Fernando Pereira. 2019. Graph-based static 3D point clouds geometry coding. IEEE Transactions on Multimedia 21, 2 (2019), 284–299.
[175]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip Orlik. 2019. HoloCast: Graph signal processing for graceful point cloud delivery. In Proceedings of the IEEE International Conference on Communications. 1–7.
[176]
Cha Zhang, Dinei Florencio, and Charles Loop. 2014. Point cloud attribute compression with graph transform. In Proceedings of the IEEE International Conference on Image Processing.2066–2070.
[177]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip Orlik. 2020. Overhead reduction in graph-based point cloud delivery. In Proceedings of the IEEE International Conference on Communications. 1–7.
[178]
Soushi Ueno, Takuya Fujihashi, Toshiaki Koike-Akino, and Takashi Watanabe. 2022. Overhead reduction for graph-based point cloud delivery using non-uniform quantization. In Proceedings of the IEEE International Conference on Consumer Electronics. 1–6.
[179]
Soushi Ueno, Takuya Fujihashi, Toshiaki Koike-Akino, and Takashi Watanabe. 2022. Point cloud soft multicast for untethered XR users. IEEE Transactions on Multimedia. Early access, October 31, 2022.
[180]
Mehdi Ansari Sadrabadi, Amir Khandani, and Farshad Lahouti. 2006. Channel feedback quantization for high data rate MIMO systems. IEEE Transactions on Wireless Communications 5, 12 (2006), 3335–3338.
[181]
June Chul Roh and Bhaskar D. Rao. 2007. Efficient feedback methods for MIMO channels based on parameterization. IEEE Transactions on Wireless Communications 6, 1 (2007), 282–292.
[182]
Lei Yu, Houqiang Li, and Weiping Li. 2013. Hybrid digital-analog scheme for video transmission over wireless. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1163–1166.
[183]
Lei Yu, Houquiang Li, and Weiping Li. 2014. Wireless scalable video coding using a hybrid digital-analog scheme. IEEE Transactions on Circuits and Systems for Video Technology 24, 2 (2014), 331–345.
[184]
Cuiling Lan, Chong Luo, Wenjun Zeng, and Feng Wu. 2018. A practical hybrid digital-analog scheme for wireless video transmission. IEEE Transactions on Circuits and Systems for Video Technology 28, 7 (2018), 1634–1647.
[185]
Zhihai Song, Ruiqin Xiong, Siwei Ma, and Wen Gao. 2014. Hybridcast: A wireless image/video SoftCast scheme using layered representation and hybrid digital-analog modulation. In Proceedings of the IEEE International Conference on Image Processing. 6001–6005.
[186]
Bin Tan, Jun Wu, Rui Wang, Wenlang Luo, and Jun Liu. 2019. An optimal resource allocation for hybrid digital-analog with combined multiplexing. IEEE Internet of Things Journal 6, 1 (2019), 1125–1135.
[187]
Bin Tan, Hao Cui, Jun Wu, and Chang Wen Chen. 2017. An optimal resource allocation for superposition coding-based hybrid digital–analog system. IEEE Internet of Things Journal 4, 4 (2017), 945–956.
[188]
Jing Zhang, Anhong Wang, Jie Liang, Haidong Wang, Suyue Li, and Xiong Zhang. 2019. Distortion estimation-based adaptive power allocation for hybrid digital-analog video transmission. IEEE Transactions on Circuits and Systems for Video Technology 29, 6 (2019), 1806–1818.
[189]
Fei Liang, Chong Luo, Ruiqin Xiong, Wenjun Zeng, and Feng Wu. 2018. Superimposed modulation for soft video delivery with hidden resources. IEEE Transactions on Circuits and Systems for Video Technology 28, 9 (2018), 2345–2358.
[190]
Jian Shen, Lei Yu, and Houqiang Li. 2016. Hybrid digital-analog scheme for video transmission over fading channel. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1582–1585.
[191]
Pradeepa Yahampath. 2017. Hybrid digital-analog coding with bandwidth expansion for correlated Gaussian sources under Rayleigh fading. EURASIP Journal on Advances in Signal Processing 2017, 1 (2017), 1–16.
[192]
Pradeepa Yahampath. 2018. Digital-analog superposition coding for OFDM channels with application to video transmission. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1802–1806.
[193]
Pradeepa Yahampath. 2020. Video coding for OFDM systems with imperfect CSI: A hybrid digital–analog approach. Signal Processing: Image Communication 87 (2020), 1–22.
[194]
Yu Liu, Xiaocheng Lin, Nianfei Fan, and Lin Zhang. 2016. Hybrid digital-analog video transmission in wireless multicast and multiple-input multiple-output system. Journal of Electronic Imaging 25, 1 (2016), 1–14.
[195]
Lei Yu, Houquiang Li, and Weiping Li. 2015. Wireless cooperative video coding using a hybrid digital-analog scheme. IEEE Transactions on Circuits and Systems for Video Technology 25, 3 (2015), 436–450.
[196]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip V. Orlik. 2015. Compressive sensing for loss-resilient hybrid wireless video transmission. In Proceedings of the IEEE Global Communications Conference. 1–5.
[197]
Dongliang He, Chong Luo, Feng Wu, and Wenjun Zeng. 2015. Swift: A hybrid digital-analog scheme for low-delay transmission of mobile stereo video. In Proceedings of the ACM International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems. 327–336.
[198]
Peihao Li, Fengbao Yang, Jing Zhang, Yun Guan, Anhong Wang, and Jie Liang. 2020. Synthesis-distortion-aware hybrid digital analog transmission for 3D videos. IEEE Access 8 (2020), 85128–85139.
[199]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip V. Orlik. 2021. HoloCast+: Hybrid digital-analog transmission for graceful point cloud delivery with graph Fourier transform. IEEE Transactions on Multimedia 24 (2021), 2179–2191.
[200]
Guo Lu, Xiaoyun Zhang, Wanli Ouyang, Li Chen, Zhiyong Gao, and Dong Xu. 2021. An end-to-end learning framework for video compression. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 10 (2021), 3292–3308.
[201]
Oren Rippel, Alexander G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev. 2021. ELF-VC: Efficient learned flexible-rate video coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14459–14468.
[202]
Yunjin Chen and Thomas Pock. 2017. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1256–1272.
[203]
Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. 2018. MemNet: A persistent memory network for image restoration. In Proceedings of the IEEE International Conference on Computer Vision. 4549–4557.
[204]
Takuya Fujihashi, Toshiaki Koike-Akino, Philip V. Orlik, and Takashi Watanabe. 2019. DNN-based overhead reduction for high-quality soft delivery. In Proceedings of the IEEE Global Communications Conference. 1–6.
[205]
Eirina Bourtsoulatze, David Burth Kurka, and Deniz Gunduz. 2019. Deep joint source-channel coding for wireless image transmission. IEEE Transactions on Cognitive Communications and Networking 5, 3 (2019), 567–579.
[206]
Xiao-Wei Tang, Xin-Lin Huang, Fei Hu, and Qingjiang Shi. 2020. Human-perception-oriented pseudo analog video transmissions with deep learning. IEEE Transactions on Vehicular Technology 69, 9 (2020), 9896–9909.
[207]
Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe, and Philip V. Orlik. 2020. High-quality soft image delivery with deep image denoising. In Proceedings of the IEEE International Conference on Communications. 1–6.
[208]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7263–7271.
[209]
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2018. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9446–9454.
[210]
Takuya Fujihashi, Toshiaki Koike Akino, Siheng Chen, and Takashi Watanabe. 2021. Wireless 3D point cloud delivery using deep graph neural networks. In Proceedings of the IEEE International Conference on Communications. 1–6.
[211]
Chi Thang Duong, Thanh Dat Hoang, Ha Hien Dang, Quoc Viet Hung Nguyen, and Karl Aberer. 2019. On node features for graph neural networks. arXiv e-prints arXiv:1911.08795 (2019).
[212]
Siheng Chen, Chaojing Duan, Yaoqing Yang, Duanshun Li, Chen Feng, and Dong Tian. 2020. Deep unsupervised learning of 3D point clouds via graph topology inference and filtering. IEEE Transactions on Image Processing 29 (2020), 3183–3198.
[213]
Siheng Chen, Sufeng Niu, Tian Lan, and Baoan Liu. 2019. PCT: Large-scale 3D point cloud representations via graph inception networks with applications to autonomous driving. In Proceedings of the IEEE International Conference on Image Processing. 4395–4399.
[214]
Huiqiang Xie, Zhijin Qin, Geoffrey Ye Li, and Biing-Hwang Juang. 2021. Deep learning enabled semantic communication systems. IEEE Transactions on Signal Processing 69 (2021), 2663–2675.
[215]
Zhenzi Weng and Zhijin Qin. 2021. Semantic communication systems for speech transmission. IEEE Journal on Selected Areas in Communications 39, 8 (2021), 2434–2444.
[216]
Dawei Chen, Linda Jiang Xie, BaekGyu Kim, Li Wang, Choong Seon Hong, Li-Chun Wang, and Zhu Han. 2020. Federated learning based mobile edge computing for augmented reality applications. In Proceedings of the International Conference on Computing, Networking, and Communications. 767–773.
[217]
Yifang Ma, Zhenyu Wang, Hong Yang, and Lin Yang. 2020. Artificial intelligence applications in the development of autonomous vehicles: A survey. IEEE/CAA Journal of Automatica Sinica 7, 2 (2020), 315–329.
[218]
Ivar Bjørgo Saksvik, Alex Alcocer, and Vahid Hassani. 2021. A deep learning approach to dead-reckoning navigation for autonomous underwater vehicles with limited sensor payloads. In Proceedings of the IEEE OCEANS Conference. 1–9.
[219]
Elias P. Duarte, Aurora T. R. Pozo, and Pamela Beltrani. 2020. Smart reckoning: Reducing the traffic of online multiplayer games using machine learning for movement prediction. Entertainment Computing 33 (2020), 100336.
[220]
Simon Wiedemann, Heiner Kirchhoffer, Stefan Matlage, Paul Haase, Arturo Marban, Talmaj Marinč, David Neumann, et al. 2020. DeepCABAC: A universal compression algorithm for deep neural networks. IEEE Journal of Selected Topics in Signal Processing 14, 4 (2020), 700–714.
[221]
Heiner Kirchhoffer, Paul Haase, Wojciech Samek, Karsten Müller, Hamed Rezazadegan-Tavakoli, Francesco Cricri, Emre Aksu, et al. 2022. Overview of the neural network compression and representation (NNR) standard. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2022), 3203–3216.
[222]
Mikolaj Jankowski, Deniz Gündüz, and Krystian Mikolajczyk. 2022. AirNet: Neural network transmission over the air. In Proceedings of the IEEE International Symposium on Information Theory. 2451–2456.
[223]
Solmaz Niknam, Harpreet S. Dhillon, and Jeffrey H. Reed. 2020. Federated learning for wireless communications: Motivation, opportunities, and challenges. IEEE Communications Magazine 58, 6 (2020), 46–51.
[224]
Mohammad Mohammadi Amiri and Deniz Gündüz. 2020. Federated learning over wireless fading channels. IEEE Transactions on Wireless Communications 19, 5 (2020), 3546–3557.
[225]
Guangxu Zhu, Yong Wang, and Kaibin Huang. 2020. Broadband analog aggregation for low-latency federated edge learning. IEEE Transactions on Wireless Communications 19, 1 (2020), 491–506.
[226]
Kai Yang, Tao Jiang, Yuanming Shi, and Zhi Ding. 2020. Federated learning via over-the-air computation. IEEE Transactions on Wireless Communications 19, 3 (2020), 2022–2035.
[227]
Takuya Fujihashi, Toshiaki Koike-Akino, and Takashi Watanabe. 2022. Federated AirNet: Hybrid digital-analog neural network transmission for federated learning. arXiv e-prints arXiv:2201.04557 (2022).
[228]
Andreas Noll, Basak Gülecyüz, Alexander Hofmann, and Eckehard Steinbach. 2020. A rate-scalable perceptual wavelet-based vibrotactile codec. In Proceedings of the IEEE Haptics Symposium. 854–859.
[229]
Eckehard Steinbach, Matti Strese, Mohamad Eid, Xun Liu, Amit Bhardwaj, Qian Liu, Mohammad Al-Ja’afreh, et al. 2019. Haptic codecs for the Tactile Internet. Proceedings of the IEEE 107, 2 (2019), 447–470.

Cited By

View all
  • (2025)CV-Cast: Computer Vision–Oriented Linear Coding and TransmissionIEEE Transactions on Mobile Computing10.1109/TMC.2024.347804824:2(1149-1162)Online publication date: Feb-2025
  • (2025)Joint Source-Channel Coding for Multi-Channel Vibrotactile SensorsIEEE Access10.1109/ACCESS.2025.352666413(7736-7745)Online publication date: 2025
  • (2024)Performance of Linear Coding and Transmission in Low-Latency Computer Vision Offloading2024 IEEE Wireless Communications and Networking Conference (WCNC)10.1109/WCNC57260.2024.10571040(1-6)Online publication date: 21-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 56, Issue 2
February 2024
974 pages
EISSN:1557-7341
DOI:10.1145/3613559
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2023
Online AM: 04 July 2023
Accepted: 22 June 2023
Revised: 23 March 2023
Received: 21 January 2022
Published in CSUR Volume 56, Issue 2

Check for updates

Author Tags

  1. Soft delivery
  2. hybrid digital–analog delivery
  3. extended reality

Qualifiers

  • Survey

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,074
  • Downloads (Last 6 weeks)142
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)CV-Cast: Computer Vision–Oriented Linear Coding and TransmissionIEEE Transactions on Mobile Computing10.1109/TMC.2024.347804824:2(1149-1162)Online publication date: Feb-2025
  • (2025)Joint Source-Channel Coding for Multi-Channel Vibrotactile SensorsIEEE Access10.1109/ACCESS.2025.352666413(7736-7745)Online publication date: 2025
  • (2024)Performance of Linear Coding and Transmission in Low-Latency Computer Vision Offloading2024 IEEE Wireless Communications and Networking Conference (WCNC)10.1109/WCNC57260.2024.10571040(1-6)Online publication date: 21-Apr-2024
  • (2024)IRS-Aided Over-the-Air Image Processing: Single Antenna Imaging2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)10.1109/PIMRC59610.2024.10817324(1-7)Online publication date: 2-Sep-2024
  • (2024)Generative AI-Enabled Mobile Tactical Multimedia Networks: Distribution, Generation, and PerceptionIEEE Communications Magazine10.1109/MCOM.003.230064562:10(96-102)Online publication date: Oct-2024
  • (2024)Graph-Based Analog Joint Source Channel Coding for 3D Haptic CommunicationICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622893(776-781)Online publication date: 9-Jun-2024
  • (2024)Implicit Neural Representation For Low-Overhead Graph-Based Holographic-Type CommunicationsICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10445857(2825-2829)Online publication date: 14-Apr-2024
  • (2023)Point Cloud Soft Multicast for Untethered XR UsersIEEE Transactions on Multimedia10.1109/TMM.2022.321817225(7185-7195)Online publication date: 1-Jan-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media