TOM: Vol 25, No

Volume 252023

Volume 25

2023

Publisher:

IEEE Press

ISSN:1520-9210

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

research-article

S <inline-formula><tex-math notation="LaTeX">$^3$</tex-math></inline-formula> Net: Self-Supervised Self-Ensembling Network for Semi-Supervised RGB-D Salient Object Detection

Pages 676–689https://doi.org/10.1109/TMM.2021.3129730

RGB-D salient object detection aims to detect visually distinctive objects or regions from a pair of the RGB image and the depth image. State-of-the-art RGB-D saliency detectors are mainly based on convolutional neural networks but almost suffer from an ...

research-article

Multiple Description Coding for Best-Effort Delivery of Light Field Video Using GNN-Based Compression

Pages 690–705https://doi.org/10.1109/TMM.2021.3129918

In recent years, Light Field (LF) video has grabbed much attention as an emerging form of immersive media. LF collects, through a lens matrix, light information emanating in every direction, and obtains rich information about the scene, providing users ...

research-article

Multi-Panda Tracking

Pages 706–720https://doi.org/10.1109/TMM.2021.3130414

Multi-Panda Tracking (MPT) is a video-based tracking task for panda individuals, which is conducive to the observation and measurement of distribution and status of pandas. Different from tracking general objects such as pedestrians and vehicles, MPT is ...

research-article

Towards Task-Generic Image Compression: A Study of Semantics-Oriented Metrics

Pages 721–735https://doi.org/10.1109/TMM.2021.3130754

Instead of being observed by human, multimedia data are now more and more fed into machines to perform different kinds of semantic analysis. One image may be analyzed multiple times by different machine vision algorithms for different purposes. While ...

research-article

User-Guided Personalized Image Aesthetic Assessment Based on Deep Reinforcement Learning

Pages 736–749https://doi.org/10.1109/TMM.2021.3130752

Personalized image aesthetic assessment (PIAA) has recently become a hot topic due to its wide applications, such as photography, film, television, e-commerce, fashion design, and so on. This task is more seriously affected by subjective factors and ...

research-article

Deep SR-HDR: Joint Learning of Super-Resolution and High Dynamic Range Imaging for Dynamic Scenes

Pages 750–763https://doi.org/10.1109/TMM.2021.3132165

The visual quality of a single image captured by a digital camera usually suffers from limited spatial resolution and low dynamic range (LDR) due to sensor constraints. To address these problems, recent works have independently applied convolutional ...

research-article

Adaptive Group-Wise Consistency Network for Co-Saliency Detection

Pages 764–776https://doi.org/10.1109/TMM.2021.3138246

Co-saliency detection focuses on detecting common and salient objects among a group of images. With the application of deep learning in co-saliency detection, more accurate and more effective models are proposed in an end-to-end manner. However, two major ...

research-article

VTON-SCFA: A Virtual Try-On Network Based on the Semantic Constraints and Flow Alignment

Pages 777–791https://doi.org/10.1109/TMM.2022.3152367

An image-based virtual try-on system transfers an in-shop garment to the corresponding garment region of a reference person, which has huge application potential and commercial value in online clothing shopping. Existing methods have difficulty preserving ...

research-article

Caching in Dynamic Environments: A Near-Optimal Online Learning Approach

Pages 792–804https://doi.org/10.1109/TMM.2021.3132156

The rapid growth of rich multimedia data in today’s Internet, especially video traffic, has challenged the content delivery networks (CDNs). Caching serves as an important means to reduce user access latency so as to enable faster content ...

research-article

Learning Sparse and Discriminative Multimodal Feature Codes for Finger Recognition

Pages 805–815https://doi.org/10.1109/TMM.2021.3132166

Compared with uni-modal biometrics systems, multimodal biometrics systems using multiple sources of information for establishing an individual’s identity have received considerable attention recently. However, most traditional multimodal biometrics ...

research-article

Image Compressed Sensing Using Non-Local Neural Network

Pages 816–830https://doi.org/10.1109/TMM.2021.3132489

Deep network-based image Compressed Sensing (CS) has attracted much attention in recent years. However, the existing deep network-based CS schemes either reconstruct the target image in a block-by-block manner that leads to serious block artifacts or ...

research-article

Anet: A Deep Neural Network for Automatic 3D Anthropometric Measurement Extraction

Pages 831–844https://doi.org/10.1109/TMM.2021.3132487

3D Anthropometric measurement extraction is of paramount importance for several applications such as clothing design, online garment shopping, and medical diagnosis, to name a few. State-of-the-art 3D anthropometric measurement extraction methods estimate ...

research-article

ChestXRayBERT: A Pretrained Language Model for Chest Radiology Report Summarization

Pages 845–855https://doi.org/10.1109/TMM.2021.3132724

Automatically generating the “impression” section of a radiology report given the “findings” section can summarize as much salient information of the “findings” section as possible, thus promoting more effective ...

research-article

Modality-Oriented Graph Learning Toward Outfit Compatibility Modeling

Pages 856–867https://doi.org/10.1109/TMM.2021.3134164

Outfit compatibility modeling, which aims to automatically evaluate the matching degree of an outfit, has drawn great research attention. Regarding the comprehensive evaluation, several previous studies have attempted to solve the task of outfit ...

research-article

Cross-Domain Recommendation Via User-Clustering and Multidimensional Information Fusion

Pages 868–880https://doi.org/10.1109/TMM.2021.3134161

Recently, recommendation systems have been widely usedin online business scenarios, which can improve the online experience by learning the user or item characteristics to predict the user’s future behavior and to realize precision marketing. ...

research-article

Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity Learning

Pages 881–891https://doi.org/10.1109/TMM.2021.3134167

Recognition of emotions in user-generated videos has attracted considerable research attention. Most existing approaches focus on learning frame-level features and fail to consider frame-level emotion intensities which are critical for video ...

research-article

A Semi-Fragile Reversible Watermarking for Authenticating 3D Models Based on Virtual Polygon Projection and Double Modulation Strategy

Pages 892–906https://doi.org/10.1109/TMM.2021.3134159

Aiming to reduce the embedding distortion and improve tampering location precision of reversible watermarking for authenticating three-dimensional(3D) models, a semi-fragile reversible watermarking based on virtual polygon projection and double modulation ...

research-article

A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

Pages 907–918https://doi.org/10.1109/TMM.2021.3134172

Deep learning methods have shown outstanding performance in many applications, including single-image super-resolution (SISR). With residual connection architecture, deeply stacked convolutional neural networks provide a substantial performance boost for ...

research-article

Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification

Pages 919–929https://doi.org/10.1109/TMM.2021.3134839

Image-based vehicle re-identification (ReID) has witnessed much progress in recent years. However, most of existing works struggled to extract robust but discriminative features from a single image to represent one vehicle instance. We argue that images ...

research-article

FDA-GAN: Flow-Based Dual Attention GAN for Human Pose Transfer

Pages 930–941https://doi.org/10.1109/TMM.2021.3134157

Human pose transfer aims at transferring the appearance of the source person to the target pose. Existing methods utilizing flow-based warping for non-rigid human image generation have achieved great success. However, they fail to preserve the appearance ...

research-article

M2P2: Multimodal Persuasion Prediction Using Adaptive Fusion

Pages 942–952https://doi.org/10.1109/TMM.2021.3134168

Identifying persuasive speakers in an adversarial environment is a critical task. In a national election, politicians would like to have persuasive speakers campaign on their behalf. When a company faces adverse publicity, they would like to engage ...

research-article

A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks Via Learned Weights Statistics

Pages 953–965https://doi.org/10.1109/TMM.2021.3134158

Quantizing the floating-point weights and activations of deep convolutional neural networks to fixed-point representation yields reduced memory footprints and inference time. Recently, efforts have been afoot towards zero-shot quantization that does not ...

research-article

Depth-Distilled Multi-Focus Image Fusion

Pages 966–978https://doi.org/10.1109/TMM.2021.3134565

Homogeneous regions, which are smooth areas that lack blur clues to discriminate if they are focused or non-focused. Therefore, they bring a great challenge to achieve high accurate multi-focus image fusion (MFIF). Fortunately, we observe that depth maps ...

research-article

AMANet: Adaptive Multi-Path Aggregation for Learning Human 2D-3D Correspondences

Pages 979–992https://doi.org/10.1109/TMM.2021.3135145

Learning human 2D-3D correspondences aims to map all human 2D pixels to a 3D human template, namely human densepose estimation, involving surface patch recognition (i.e., Index-to-Patch (I)) and regression of patch-specific UV coordinates. Despite recent ...

research-article

Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization

Pages 993–1007https://doi.org/10.1109/TMM.2021.3136094

Multi-view clustering, which appropriately integrates information from multiple sources to reveal data’s inherent structure, is gaining traction in clustering. Though existing procedures have yielded satisfactory results, we observe that they have ...

research-article

Consistent Multiple Graph Embedding for Multi-View Clustering

Pages 1008–1018https://doi.org/10.1109/TMM.2021.3136098

Graph-based multi-view clustering aiming to obtain a partition of data across multiple views, has received considerable attention in recent years. Although great efforts have been made for graph-based multi-view clustering, it is still challenging to fuse ...

research-article

Distortion Map-Guided Feature Rectification for Efficient Video Semantic Segmentation

Pages 1019–1032https://doi.org/10.1109/TMM.2021.3136085

To leverage the strong cross-frame relations of videos, many video semantic segmentation methods tend to explore feature reuse and feature warping based on motion clues. However, since the video dynamics are too complex to model accurately, some warped ...

research-article

Causal Interventional Training for Image Recognition

Pages 1033–1044https://doi.org/10.1109/TMM.2021.3136717

Deep learning models often fit undesired dataset bias in training. In this paper, we formulate the bias using <italic>causal inference</italic>, which helps us uncover the ever-elusive causalities among the key factors in training, and thus pursue the ...

research-article

Trustable Co-Label Learning From Multiple Noisy Annotators

Pages 1045–1057https://doi.org/10.1109/TMM.2021.3137752

Supervised deep learning depends on massive accurately annotated examples, which is usually impractical in many real-world scenarios. A typical alternative is learning from multiple noisy annotators. Numerous earlier works assume that all labels are noisy,...

opinion

Editorial

Jiebo Luo

Pages 1058–1059https://doi.org/10.1109/TMM.2023.3264050

Comments

Please enable JavaScript to view thecomments powered by Disqus.

IEEE Transactions on Multimedia

Sections

S <inline-formula><tex-math notation="LaTeX">$^3$</tex-math></inline-formula> Net: Self-Supervised Self-Ensembling Network for Semi-Supervised RGB-D Salient Object Detection

Multiple Description Coding for Best-Effort Delivery of Light Field Video Using GNN-Based Compression

Multi-Panda Tracking

Towards Task-Generic Image Compression: A Study of Semantics-Oriented Metrics

User-Guided Personalized Image Aesthetic Assessment Based on Deep Reinforcement Learning

Deep SR-HDR: Joint Learning of Super-Resolution and High Dynamic Range Imaging for Dynamic Scenes

Adaptive Group-Wise Consistency Network for Co-Saliency Detection

VTON-SCFA: A Virtual Try-On Network Based on the Semantic Constraints and Flow Alignment

Caching in Dynamic Environments: A Near-Optimal Online Learning Approach

Learning Sparse and Discriminative Multimodal Feature Codes for Finger Recognition

Image Compressed Sensing Using Non-Local Neural Network

Anet: A Deep Neural Network for Automatic 3D Anthropometric Measurement Extraction

ChestXRayBERT: A Pretrained Language Model for Chest Radiology Report Summarization

Modality-Oriented Graph Learning Toward Outfit Compatibility Modeling

Cross-Domain Recommendation Via User-Clustering and Multidimensional Information Fusion

Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity Learning

A Semi-Fragile Reversible Watermarking for Authenticating 3D Models Based on Virtual Polygon Projection and Double Modulation Strategy

A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification

FDA-GAN: Flow-Based Dual Attention GAN for Human Pose Transfer

M2P2: Multimodal Persuasion Prediction Using Adaptive Fusion

A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks Via Learned Weights Statistics

Depth-Distilled Multi-Focus Image Fusion

AMANet: Adaptive Multi-Path Aggregation for Learning Human 2D-3D Correspondences

Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization

Consistent Multiple Graph Embedding for Multi-View Clustering

Distortion Map-Guided Feature Rectification for Efficient Video Semantic Segmentation

Causal Interventional Training for Image Recognition

Trustable Co-Label Learning From Multiple Noisy Annotators

Editorial