[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
introduction
Free access

Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments

Published: 06 January 2023 Publication History
In recent years, with the widespread availability of digital sensors (e.g., cameras) and an increasing need for urban artificial intelligence applications, the ability of learning the representations, similarities, and associations of multimedia data in dynamic environments becomes critically important in many multimedia applications. Its goal is to design flexible learning machines to learn environmentally robust descriptors of multimedia data and model complex relationships among them in complex application scenarios, benefiting diverse tasks such as visual object re-identification, cross-modal retrieval, and human pose estimation. The aim of this Special Section on “Learning Representations, Similarities, and Associations in Dynamic Multimedia Environments” is to bring academic researchers and industry developers together for sharing the recent advances and future trends of the representation/similarity learning and association of complex multimedia data.
The Special Section attracted 25 submissions and after a rigorous review, six papers have been finally accepted for publication. Specifically, two papers are about person re-identification. The rest four papers work on cross-modal matching, human pose estimation, few-shot classification, and compatible representation learning, respectively. Those papers bring novel algorithms, insights, and meaningful discussions to their studied tasks.
In the article entitled “Rank-in-Rank Loss for Person Re-identification”, Xu et al. propose a Differentiable Retrieval-Sort Loss (DRSL) to optimize the re-ID model. Considering that the ranking and sorting operations are non-differentiable and non-convex, the DRSL also performs the optimization of automatic derivation and backpropagation. The DRSL can not only maintain the inter-class distance distribution but also preserve the intra-class similarity structure in terms of angle constraints. In the article entitled “3D Skeleton and Two Streams Approach to Person Re-identification Using Optimized Region Matching”, Han et al. propose a 3D skeleton and two-stream approach for person Re-ID. The first stream uses the 3D skeleton for background filtering and region segmentation, and the second stream uses the Siamese net for global descriptor extraction. The two streams are finally effectively fused to improve the distance learning with an optimized region matching strategy.
In the article entitled “Guided Graph Attention Learning for Video-Text Matching”, Li et al. propose a Guided Graph Attention Learning (GGAL) model to enhance the video embedding learning by capturing important region-level semantic concepts within the spatial-temporal space. The GGAL model builds connections between object regions and performs hierarchical graph reasoning on both frame-level and whole video level region graphs. The global context is used to guide the attention learning on this hierarchical graph topology. Then the learned video embedding can be better aligned with text captions.
In the article entitled “GLPose: Global-Local Representation Learning for Human Pose Estimation”, Jiao et al. propose a global-local enhanced pose estimation (GLPose) network to tackle the challenging multi-frame human pose estimation task. The GLPose framework consists of a feature processing module that conditionally incorporates global semantic information and local visual context to generate a robust human representation, and a feature enhancement module that excavates complementary information from this aggregated representation to enhance keyframe features for precise estimation.
In the article entitled “Revisiting Local Descriptor for Improved Few-Shot Classification”, He et al. propose a Dense Classification and Attentive Pooling (DCAP) method for few-shot visual object classification. Specifically, it formulates the meta-learning as a two-stage training paradigm, where it introduces a dense classification pre-training stage to reduce semantic discrepancy among local descriptors and devises an attentive pooling strategy in meta-finetuning to select more informative local descriptors for few-shot classification.
In the article entitled “CL2R: Compatible Lifelong Learning Representations”, Biondi et al. propose a method to partially mimic natural intelligence for the problem of lifelong learning representations that are compatible. The authors identify stationarity as the property that the feature representation is required to hold to achieve compatibility and propose a novel training procedure that encourages local and global stationarity on the learned representation. Due to stationarity, the statistical properties of the learned features do not change over time, making them interoperable with previously learned features.
In closing, the guest editors would like to thank all the authors who significantly contributed to this Special Section and the reviewers for their efforts in respecting deadlines and their constructive reviews. We are also grateful to the Editor-in-Chief, Abdulmotaleb El Saddik and the Information Director, Mohammad Anwar Hossain for their support. We hope this Special Section will inspire further research and development ideas for learning representations, similarity, and associations in dynamic multimedia environments.
Xun Yang
University of Science and Technology of China, China
Liang Zheng
Australian National University, Australia
Elisa Ricci
University of Trento, Italy
Meng Wang
Hefei University of Technology, China
Guest Editors

Cited By

View all
  • (2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
  • (2024)DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497820:10(1-24)Online publication date: 27-Jun-2024
  • (2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
  • Show More Cited By

Index Terms

  1. Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Please enable JavaScript to view thecomments powered by Disqus.

              Information & Contributors

              Information

              Published In

              cover image ACM Transactions on Multimedia Computing, Communications, and Applications
              ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 2s
              June 2022
              383 pages
              ISSN:1551-6857
              EISSN:1551-6865
              DOI:10.1145/3561949
              • Editor:
              • Abdulmotaleb El Saddik
              Issue’s Table of Contents

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              Published: 06 January 2023
              Published in TOMM Volume 18, Issue 2s

              Permissions

              Request permissions for this article.

              Check for updates

              Qualifiers

              • Introduction
              • Refereed

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • Downloads (Last 12 months)183
              • Downloads (Last 6 weeks)29
              Reflects downloads up to 02 Mar 2025

              Other Metrics

              Citations

              Cited By

              View all
              • (2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
              • (2024)DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497820:10(1-24)Online publication date: 27-Jun-2024
              • (2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
              • (2023)Sparsity-guided Discriminative Feature Encoding for Robust Keypoint DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362843220:3(1-22)Online publication date: 17-Oct-2023
              • (2023)Boosting Few-shot Object Detection with Discriminative Representation and Class MarginACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360847820:3(1-19)Online publication date: 10-Nov-2023
              • (2023)Pseudo Object Replay and Mining for Incremental Object DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611952(153-162)Online publication date: 26-Oct-2023
              • (2023)Distilled Meta-learning for Multi-Class Incremental LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357604519:4(1-16)Online publication date: 15-Mar-2023
              • (2023)When Object Detection Meets Knowledge Distillation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.325754645:8(10555-10579)Online publication date: 1-Aug-2023
              • (2023)Integrating cloud and mist computing to lower latency in IoT topologiesTransactions on Emerging Telecommunications Technologies10.1002/ett.483434:10Online publication date: 12-Oct-2023
              • (2022)Face Mask-Wearing Detection Model Based on Loss Function and Attention MechanismComputational Intelligence and Neuroscience10.1155/2022/24522912022Online publication date: 1-Jan-2022
              • Show More Cited By

              View Options

              View options

              PDF

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format.

              HTML Format

              Login options

              Full Access

              Figures

              Tables

              Media

              Share

              Share

              Share this Publication link

              Share on social media