[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
introduction
Free access

Introduction to the Special Issue on Fine-Grained Visual Recognition and Re-Identification

Published: 25 January 2022 Publication History
The ubiquitous cameras are generating a huge amount of visual data. Automatic visual content analysis and recognition are thus desirable for effective utilization of those data. Fine-Grained Visual Recognition and Re-Identification (FGVRID) aims to accurately analyze, identify visual objects, and match re-appearing targets, e.g., persons and vehicles from a large set of images and videos. It has the potential to offer an unprecedented possibility for intelligent video processing and analysis.
Compared with traditional visual search and classification tasks, FGVRID has the following properties, making it more challenging. First, proper object detection algorithms should be designed to locate objects, their local parts, or meaningful spatial contexts in videos before proceeding to the identification step. Second, the visual appearance of an object is easily affected by many factors like viewpoint changes, illumination changes, occlusions, and camera parameter differences, etc. Third, annotating the fine-grained identity or category cues is expensive and time consuming. Finally, to cope with the large-scale visual data, scalable indexing, or feature coding, algorithms should be designed to ensure the online recognition efficiency. In recent years, FGVRID tasks like person re-identification (re-id), vehicle re-id, multi-object multi-camera tracking, fine grained image classification, etc., have exhibited impressive performance thanks to the development of Convolutional Neural Networks (CNN), and self-supervised learning strategies. Besides that, novel neural network architectures like brain inspired networks and spiking neural networks have exhibited advantages in the detection and recognition of fast-moving objects.
A total of 17 submissions were received by this Special Issue, and each paper was assigned to three to four reviewers. After one or two rounds of revision, 13 papers were finally accepted. They cover a variety of FGVRID tasks. Specifically, four papers are about vehicle and person re-id, four papers are about fine-grained classification, two papers are about dataset construction for visual recognition. Besides that, two papers work on visual representation learning for fine-grained classification, and one paper is about crowd counting, respectively. Those papers bring novel algorithms, insights, and meaningful discussions to their studied topics, and have promoted the state-of-the art performance on several commonly used datasets.
Zhang et al. explore the complex within and cross modality variations for visible-infrared person re-id. They propose a comprehensive hybrid modality metric learning framework based on both class-level and modality-level similarity constraints. A new binary neural network is proposed in the study by Xu et al. for efficient person re-id (BiRe-ID). In the study of Zhao et al., the incompatibility issue of sample generation and re-id accuracy in a GAN architecture is investigated. The JoT-GAN, a generative adversarial training framework, is presented to make the generator and the re-id model mutually benefit from each other. In the study of Liang et al., a simple yet powerful deep model (EIA-Net) is introduced for vehicle model verification, which can learn a more discriminative image representation by localizing key vehicle parts and jointly incorporating two distance metrics, i.e., vehicle-level embedding and vehicle-part-sensitive embedding, respectively.
For fine-grained visual recognition, Yan et al. propose Multi-feature Fusion and Decomposition (MFD) framework for age-invariant face recognition. Zhai et al. propose to incorporate a rectified meta-learning module into a common CNN paradigm to train a noise-robust deep network for image-based plant disease classification. Tan et al. develop a fine-grained image classification model, namely Multi-scale Selective Hierarchical biQuadratic Pooling (MSHQP), with hierarchical biquadratic pooling to ensure a robust feature interaction. In the work of Cucchiara et al., the problem of fine-grained human analysis under occlusions and perspective constraints is studied. They present possible solutions to effectively detect people by fine-grained analysis, with the aim to detect people under occlusions both in the 2D image plane and in the 3D space exploiting single monocular cameras.
A novel Instance Correlation Graph for Unsupervised Domain Adaptation is proposed in the study by Wu and Ling et al., referred as ICGDA, which is trained end-to-end by jointly optimizing three types of losses, i.e., Supervised Classification loss, Centroid Alignment loss, and ICG Alignment loss, respectively. Mugnai et al. introduce a Semi-Supervised Learning (SSL) method which leverages ideas from adversarial entropy optimization and second-order pooling. Their main goal is to reduce the prohibitive annotation cost of FGVC according to the SSL setting. Luo et al. propose a novel self-supervised method, called Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos to learn spatio-temporal features.
Finally, Wang et al. propose an efficient crowd counting neural architecture search framework to search efficient crowd counting network structures. A novel search from pre-trained strategy enables their cross-task architecture search to efficiently explore the large and flexible search space. Li et al. formulate the video summarization as a hierarchical refining process. They propose a hierarchical summarization network with deep Q-learning (HQSN) to achieve the refining process and explore temporal dependency. Besides, they collect a new dataset that consists of structured game videos with fine-grained actions and importance annotations.
To summarize, those papers have illustrated the effectiveness of self-supervised learning, semi-supervised learning, transfer learning, reinforcement learning, as well as neural architecture search in FGVRID tasks, respectively. This Special Issue may benefit broader readers of researchers, practitioners, and students who are interested in FGVRID. We would like to thank the authors for their contributions to this Special Issue. We also thank the journal—ACM Transactions on Multimedia Computing, Communications, and Applications—for their support!
Shiliang Zhang
Peking University
Guorong Li
University of Chinese Academy of Sciences
Weigang Zhang
Harbin Institute of Technology
Qingming Huang,
University of Chinese Academy of Sciences
Tiejun Huang
Peking University
Mubarak Shah
University of Central Florida
Nicu Sebe
University of Trento
Guest Editors

Cited By

View all
  • (2024)Unbiased Feature Learning with Causal Intervention for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674737Online publication date: 27-Jun-2024
  • (2024)Multiple Pseudo-Siamese Network with Supervised Contrast Learning for Medical Multi-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363744120:5(1-23)Online publication date: 11-Jan-2024
  • (2024)UnifiedSC: a unified framework via collaborative optimization for multi-task person re-identificationApplied Intelligence10.1007/s10489-024-05333-054:4(2962-2975)Online publication date: 22-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1s
February 2022
352 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3505206
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 January 2022
Published in TOMM Volume 18, Issue 1s

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Introduction
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)175
  • Downloads (Last 6 weeks)23
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Unbiased Feature Learning with Causal Intervention for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674737Online publication date: 27-Jun-2024
  • (2024)Multiple Pseudo-Siamese Network with Supervised Contrast Learning for Medical Multi-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363744120:5(1-23)Online publication date: 11-Jan-2024
  • (2024)UnifiedSC: a unified framework via collaborative optimization for multi-task person re-identificationApplied Intelligence10.1007/s10489-024-05333-054:4(2962-2975)Online publication date: 22-Feb-2024
  • (2023)YuYin: a multi-task learning model of multi-modal e-commerce background music recommendationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00306-62023:1Online publication date: 19-Oct-2023
  • (2023)A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361737520:2(1-20)Online publication date: 18-Oct-2023
  • (2023)Attentional Composition Networks for Long-Tailed Human Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360325320:1(1-18)Online publication date: 9-Jun-2023
  • (2023)Towards Food Image Retrieval via Generalization-Oriented Sampling and Loss Function DesignACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360009520:1(1-19)Online publication date: 25-Aug-2023
  • (2023)Fine-grained Learning for Visible-Infrared Person Re-identification2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00412(2417-2422)Online publication date: Jul-2023
  • (undefined)Multi-grained Representation Aggregating Transformer with Gating Cycle for Change CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3660346
  • (undefined)Generative Adversarial Networks with Learnable Auxiliary Module for Image SynthesisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3653021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media