introduction

Free access

Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments

Authors:

Xun Yang,

Liang Zheng,

Elisa Ricci,

Meng WangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 18, Issue 2s

Article No.: 127e, Pages 1 - 2

https://doi.org/10.1145/3569952

Published: 06 January 2023 Publication History

All formats PDF

In recent years, with the widespread availability of digital sensors (e.g., cameras) and an increasing need for urban artificial intelligence applications, the ability of learning the representations, similarities, and associations of multimedia data in dynamic environments becomes critically important in many multimedia applications. Its goal is to design flexible learning machines to learn environmentally robust descriptors of multimedia data and model complex relationships among them in complex application scenarios, benefiting diverse tasks such as visual object re-identification, cross-modal retrieval, and human pose estimation. The aim of this Special Section on “Learning Representations, Similarities, and Associations in Dynamic Multimedia Environments” is to bring academic researchers and industry developers together for sharing the recent advances and future trends of the representation/similarity learning and association of complex multimedia data.

The Special Section attracted 25 submissions and after a rigorous review, six papers have been finally accepted for publication. Specifically, two papers are about person re-identification. The rest four papers work on cross-modal matching, human pose estimation, few-shot classification, and compatible representation learning, respectively. Those papers bring novel algorithms, insights, and meaningful discussions to their studied tasks.

In the article entitled “Rank-in-Rank Loss for Person Re-identification”, Xu et al. propose a Differentiable Retrieval-Sort Loss (DRSL) to optimize the re-ID model. Considering that the ranking and sorting operations are non-differentiable and non-convex, the DRSL also performs the optimization of automatic derivation and backpropagation. The DRSL can not only maintain the inter-class distance distribution but also preserve the intra-class similarity structure in terms of angle constraints. In the article entitled “3D Skeleton and Two Streams Approach to Person Re-identification Using Optimized Region Matching”, Han et al. propose a 3D skeleton and two-stream approach for person Re-ID. The first stream uses the 3D skeleton for background filtering and region segmentation, and the second stream uses the Siamese net for global descriptor extraction. The two streams are finally effectively fused to improve the distance learning with an optimized region matching strategy.

In the article entitled “Guided Graph Attention Learning for Video-Text Matching”, Li et al. propose a Guided Graph Attention Learning (GGAL) model to enhance the video embedding learning by capturing important region-level semantic concepts within the spatial-temporal space. The GGAL model builds connections between object regions and performs hierarchical graph reasoning on both frame-level and whole video level region graphs. The global context is used to guide the attention learning on this hierarchical graph topology. Then the learned video embedding can be better aligned with text captions.

In the article entitled “GLPose: Global-Local Representation Learning for Human Pose Estimation”, Jiao et al. propose a global-local enhanced pose estimation (GLPose) network to tackle the challenging multi-frame human pose estimation task. The GLPose framework consists of a feature processing module that conditionally incorporates global semantic information and local visual context to generate a robust human representation, and a feature enhancement module that excavates complementary information from this aggregated representation to enhance keyframe features for precise estimation.

In the article entitled “Revisiting Local Descriptor for Improved Few-Shot Classification”, He et al. propose a Dense Classification and Attentive Pooling (DCAP) method for few-shot visual object classification. Specifically, it formulates the meta-learning as a two-stage training paradigm, where it introduces a dense classification pre-training stage to reduce semantic discrepancy among local descriptors and devises an attentive pooling strategy in meta-finetuning to select more informative local descriptors for few-shot classification.

In the article entitled “CL²R: Compatible Lifelong Learning Representations”, Biondi et al. propose a method to partially mimic natural intelligence for the problem of lifelong learning representations that are compatible. The authors identify stationarity as the property that the feature representation is required to hold to achieve compatibility and propose a novel training procedure that encourages local and global stationarity on the learned representation. Due to stationarity, the statistical properties of the learned features do not change over time, making them interoperable with previously learned features.

In closing, the guest editors would like to thank all the authors who significantly contributed to this Special Section and the reviewers for their efforts in respecting deadlines and their constructive reviews. We are also grateful to the Editor-in-Chief, Abdulmotaleb El Saddik and the Information Director, Mohammad Anwar Hossain for their support. We hope this Special Section will inspire further research and development ideas for learning representations, similarity, and associations in dynamic multimedia environments.

Xun Yang

University of Science and Technology of China, China

Liang Zheng

Australian National University, Australia

Elisa Ricci

University of Trento, Italy

Meng Wang

Hefei University of Technology, China

Guest Editors

Cited By

View all

Shu CChen YTan CLuo YDou H(2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-023-00557-w
Guo ZHe XYang YQing LChen H(2024)DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497820:10(1-24)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3674978
Qiu HLi HWu QShi HWang LMeng FXu L(2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3637214
Show More Cited By

Index Terms

Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments

Index terms have been assigned to the content through auto-classification.

Recommendations

Finding associations and computing similarity via biased pair sampling

Sampling-based methods have previously been proposed for the problem of finding interesting associations in data, even for low-support items. While these methods do not guarantee precise results, they can be vastly more efficient than approaches that ...
Multiple-Instance Learning From Unlabeled Bags With Pairwise Similarity
In <italic>multiple-instance learning</italic> (MIL), each training example is represented by a bag of instances. A training bag is either negative if it contains no positive instances or positive if it has at least one positive instance. Previous MIL ...
Learning similarity with cosine similarity ensemble

This paper proposes a cosine similarity ensemble (CSE) method to learn similarity.CSE is a selective ensemble and combines multiple cosine similarity learners.A learner redefines the pattern vectors and determines its threshold adaptively.Experimental ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 18, Issue 2s

June 2022

383 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3561949

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 January 2023

Published in TOMM Volume 18, Issue 2s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Introduction
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
239
Total Downloads

Downloads (Last 12 months)183
Downloads (Last 6 weeks)29

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Shu CChen YTan CLuo YDou H(2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-023-00557-w
Guo ZHe XYang YQing LChen H(2024)DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497820:10(1-24)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3674978
Qiu HLi HWu QShi HWang LMeng FXu L(2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3637214
Xie YGuan L(2023)Sparsity-guided Discriminative Feature Encoding for Robust Keypoint DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362843220:3(1-22)Online publication date: 17-Oct-2023
https://dl.acm.org/doi/10.1145/3628432
Shi YYang SYang WShi DLi X(2023)Boosting Few-shot Object Detection with Discriminative Representation and Class MarginACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360847820:3(1-19)Online publication date: 10-Nov-2023
https://dl.acm.org/doi/10.1145/3608478
Yang DZhou YHong XZhang AWei XZeng LQiao ZWang WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Pseudo Object Replay and Mining for Incremental Object DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611952(153-162)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611952
Liu HYan ZLiu BZhao JZhou YEl Saddik A(2023)Distilled Meta-learning for Multi-Class Incremental LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357604519:4(1-16)Online publication date: 15-Mar-2023
https://dl.acm.org/doi/10.1145/3576045
Li ZXu PChang XYang LZhang YYao LChen X(2023)When Object Detection Meets Knowledge Distillation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.325754645:8(10555-10579)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3257546
Herrero R(2023)Integrating cloud and mist computing to lower latency in IoT topologiesTransactions on Emerging Telecommunications Technologies10.1002/ett.483434:10Online publication date: 12-Oct-2023
https://dl.acm.org/doi/10.1002/ett.4834
Wang ZSun WZhu QShi P(2022)Face Mask-Wearing Detection Model Based on Loss Function and Attention MechanismComputational Intelligence and Neuroscience10.1155/2022/24522912022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/2452291
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Finding associations and computing similarity via biased pair sampling

Multiple-Instance Learning From Unlabeled Bags With Pairwise Similarity

Learning similarity with cosine similarity ensemble

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations