S <inline-formula><tex-math notation="LaTeX">$^3$</tex-math></inline-formula> Net: Self-Supervised Self-Ensembling Network for Semi-Supervised RGB-D Salient Object Detection
- Lei Zhu,
- Xiaoqiang Wang,
- Ping Li,
- Xin Yang,
- Qing Zhang,
- Weiming Wang,
- Carola-Bibiane Schönlieb,
- C. L. Philip Chen
RGB-D salient object detection aims to detect visually distinctive objects or regions from a pair of the RGB image and the depth image. State-of-the-art RGB-D saliency detectors are mainly based on convolutional neural networks but almost suffer from an ...
Multiple Description Coding for Best-Effort Delivery of Light Field Video Using GNN-Based Compression
In recent years, Light Field (LF) video has grabbed much attention as an emerging form of immersive media. LF collects, through a lens matrix, light information emanating in every direction, and obtains rich information about the scene, providing users ...
Multi-Panda Tracking
Multi-Panda Tracking (MPT) is a video-based tracking task for panda individuals, which is conducive to the observation and measurement of distribution and status of pandas. Different from tracking general objects such as pedestrians and vehicles, MPT is ...
Towards Task-Generic Image Compression: A Study of Semantics-Oriented Metrics
Instead of being observed by human, multimedia data are now more and more fed into machines to perform different kinds of semantic analysis. One image may be analyzed multiple times by different machine vision algorithms for different purposes. While ...
User-Guided Personalized Image Aesthetic Assessment Based on Deep Reinforcement Learning
Personalized image aesthetic assessment (PIAA) has recently become a hot topic due to its wide applications, such as photography, film, television, e-commerce, fashion design, and so on. This task is more seriously affected by subjective factors and ...
Deep SR-HDR: Joint Learning of Super-Resolution and High Dynamic Range Imaging for Dynamic Scenes
The visual quality of a single image captured by a digital camera usually suffers from limited spatial resolution and low dynamic range (LDR) due to sensor constraints. To address these problems, recent works have independently applied convolutional ...
Adaptive Group-Wise Consistency Network for Co-Saliency Detection
Co-saliency detection focuses on detecting common and salient objects among a group of images. With the application of deep learning in co-saliency detection, more accurate and more effective models are proposed in an end-to-end manner. However, two major ...
VTON-SCFA: A Virtual Try-On Network Based on the Semantic Constraints and Flow Alignment
An image-based virtual try-on system transfers an in-shop garment to the corresponding garment region of a reference person, which has huge application potential and commercial value in online clothing shopping. Existing methods have difficulty preserving ...
Caching in Dynamic Environments: A Near-Optimal Online Learning Approach
The rapid growth of rich multimedia data in today’s Internet, especially video traffic, has challenged the content delivery networks (CDNs). Caching serves as an important means to reduce user access latency so as to enable faster content ...
Learning Sparse and Discriminative Multimodal Feature Codes for Finger Recognition
Compared with uni-modal biometrics systems, multimodal biometrics systems using multiple sources of information for establishing an individual’s identity have received considerable attention recently. However, most traditional multimodal biometrics ...
Image Compressed Sensing Using Non-Local Neural Network
Deep network-based image Compressed Sensing (CS) has attracted much attention in recent years. However, the existing deep network-based CS schemes either reconstruct the target image in a block-by-block manner that leads to serious block artifacts or ...
Anet: A Deep Neural Network for Automatic 3D Anthropometric Measurement Extraction
3D Anthropometric measurement extraction is of paramount importance for several applications such as clothing design, online garment shopping, and medical diagnosis, to name a few. State-of-the-art 3D anthropometric measurement extraction methods estimate ...
ChestXRayBERT: A Pretrained Language Model for Chest Radiology Report Summarization
Automatically generating the “impression” section of a radiology report given the “findings” section can summarize as much salient information of the “findings” section as possible, thus promoting more effective ...
Modality-Oriented Graph Learning Toward Outfit Compatibility Modeling
Outfit compatibility modeling, which aims to automatically evaluate the matching degree of an outfit, has drawn great research attention. Regarding the comprehensive evaluation, several previous studies have attempted to solve the task of outfit ...
Cross-Domain Recommendation Via User-Clustering and Multidimensional Information Fusion
Recently, recommendation systems have been widely usedin online business scenarios, which can improve the online experience by learning the user or item characteristics to predict the user’s future behavior and to realize precision marketing. ...
Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity Learning
Recognition of emotions in user-generated videos has attracted considerable research attention. Most existing approaches focus on learning frame-level features and fail to consider frame-level emotion intensities which are critical for video ...
A Semi-Fragile Reversible Watermarking for Authenticating 3D Models Based on Virtual Polygon Projection and Double Modulation Strategy
Aiming to reduce the embedding distortion and improve tampering location precision of reversible watermarking for authenticating three-dimensional(3D) models, a semi-fragile reversible watermarking based on virtual polygon projection and double modulation ...
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution
Deep learning methods have shown outstanding performance in many applications, including single-image super-resolution (SISR). With residual connection architecture, deeply stacked convolutional neural networks provide a substantial performance boost for ...
Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification
Image-based vehicle re-identification (ReID) has witnessed much progress in recent years. However, most of existing works struggled to extract robust but discriminative features from a single image to represent one vehicle instance. We argue that images ...
FDA-GAN: Flow-Based Dual Attention GAN for Human Pose Transfer
Human pose transfer aims at transferring the appearance of the source person to the target pose. Existing methods utilizing flow-based warping for non-rigid human image generation have achieved great success. However, they fail to preserve the appearance ...
M2P2: Multimodal Persuasion Prediction Using Adaptive Fusion
Identifying persuasive speakers in an adversarial environment is a critical task. In a national election, politicians would like to have persuasive speakers campaign on their behalf. When a company faces adverse publicity, they would like to engage ...
A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks Via Learned Weights Statistics
Quantizing the floating-point weights and activations of deep convolutional neural networks to fixed-point representation yields reduced memory footprints and inference time. Recently, efforts have been afoot towards zero-shot quantization that does not ...
Depth-Distilled Multi-Focus Image Fusion
Homogeneous regions, which are smooth areas that lack blur clues to discriminate if they are focused or non-focused. Therefore, they bring a great challenge to achieve high accurate multi-focus image fusion (MFIF). Fortunately, we observe that depth maps ...
AMANet: Adaptive Multi-Path Aggregation for Learning Human 2D-3D Correspondences
Learning human 2D-3D correspondences aims to map all human 2D pixels to a 3D human template, namely human densepose estimation, involving surface patch recognition (i.e., Index-to-Patch (I)) and regression of patch-specific UV coordinates. Despite recent ...
Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization
Multi-view clustering, which appropriately integrates information from multiple sources to reveal data’s inherent structure, is gaining traction in clustering. Though existing procedures have yielded satisfactory results, we observe that they have ...
Consistent Multiple Graph Embedding for Multi-View Clustering
Graph-based multi-view clustering aiming to obtain a partition of data across multiple views, has received considerable attention in recent years. Although great efforts have been made for graph-based multi-view clustering, it is still challenging to fuse ...
Distortion Map-Guided Feature Rectification for Efficient Video Semantic Segmentation
To leverage the strong cross-frame relations of videos, many video semantic segmentation methods tend to explore feature reuse and feature warping based on motion clues. However, since the video dynamics are too complex to model accurately, some warped ...
Causal Interventional Training for Image Recognition
Deep learning models often fit undesired dataset bias in training. In this paper, we formulate the bias using <italic>causal inference</italic>, which helps us uncover the ever-elusive causalities among the key factors in training, and thus pursue the ...
Trustable Co-Label Learning From Multiple Noisy Annotators
Supervised deep learning depends on massive accurately annotated examples, which is usually impractical in many real-world scenarios. A typical alternative is learning from multiple noisy annotators. Numerous earlier works assume that all labels are noisy,...