introduction

Free access

Introduction to the Special Issue on Fine-Grained Visual Recognition and Re-Identification

Authors:

Nicu SebeAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 18, Issue 1s

Article No.: 24, Pages 1 - 3

https://doi.org/10.1145/3505280

Published: 25 January 2022 Publication History

All formats PDF

The ubiquitous cameras are generating a huge amount of visual data. Automatic visual content analysis and recognition are thus desirable for effective utilization of those data. Fine-Grained Visual Recognition and Re-Identification (FGVRID) aims to accurately analyze, identify visual objects, and match re-appearing targets, e.g., persons and vehicles from a large set of images and videos. It has the potential to offer an unprecedented possibility for intelligent video processing and analysis.

Compared with traditional visual search and classification tasks, FGVRID has the following properties, making it more challenging. First, proper object detection algorithms should be designed to locate objects, their local parts, or meaningful spatial contexts in videos before proceeding to the identification step. Second, the visual appearance of an object is easily affected by many factors like viewpoint changes, illumination changes, occlusions, and camera parameter differences, etc. Third, annotating the fine-grained identity or category cues is expensive and time consuming. Finally, to cope with the large-scale visual data, scalable indexing, or feature coding, algorithms should be designed to ensure the online recognition efficiency. In recent years, FGVRID tasks like person re-identification (re-id), vehicle re-id, multi-object multi-camera tracking, fine grained image classification, etc., have exhibited impressive performance thanks to the development of Convolutional Neural Networks (CNN), and self-supervised learning strategies. Besides that, novel neural network architectures like brain inspired networks and spiking neural networks have exhibited advantages in the detection and recognition of fast-moving objects.

A total of 17 submissions were received by this Special Issue, and each paper was assigned to three to four reviewers. After one or two rounds of revision, 13 papers were finally accepted. They cover a variety of FGVRID tasks. Specifically, four papers are about vehicle and person re-id, four papers are about fine-grained classification, two papers are about dataset construction for visual recognition. Besides that, two papers work on visual representation learning for fine-grained classification, and one paper is about crowd counting, respectively. Those papers bring novel algorithms, insights, and meaningful discussions to their studied topics, and have promoted the state-of-the art performance on several commonly used datasets.

Zhang et al. explore the complex within and cross modality variations for visible-infrared person re-id. They propose a comprehensive hybrid modality metric learning framework based on both class-level and modality-level similarity constraints. A new binary neural network is proposed in the study by Xu et al. for efficient person re-id (BiRe-ID). In the study of Zhao et al., the incompatibility issue of sample generation and re-id accuracy in a GAN architecture is investigated. The JoT-GAN, a generative adversarial training framework, is presented to make the generator and the re-id model mutually benefit from each other. In the study of Liang et al., a simple yet powerful deep model (EIA-Net) is introduced for vehicle model verification, which can learn a more discriminative image representation by localizing key vehicle parts and jointly incorporating two distance metrics, i.e., vehicle-level embedding and vehicle-part-sensitive embedding, respectively.

For fine-grained visual recognition, Yan et al. propose Multi-feature Fusion and Decomposition (MFD) framework for age-invariant face recognition. Zhai et al. propose to incorporate a rectified meta-learning module into a common CNN paradigm to train a noise-robust deep network for image-based plant disease classification. Tan et al. develop a fine-grained image classification model, namely Multi-scale Selective Hierarchical biQuadratic Pooling (MSHQP), with hierarchical biquadratic pooling to ensure a robust feature interaction. In the work of Cucchiara et al., the problem of fine-grained human analysis under occlusions and perspective constraints is studied. They present possible solutions to effectively detect people by fine-grained analysis, with the aim to detect people under occlusions both in the 2D image plane and in the 3D space exploiting single monocular cameras.

A novel Instance Correlation Graph for Unsupervised Domain Adaptation is proposed in the study by Wu and Ling et al., referred as ICGDA, which is trained end-to-end by jointly optimizing three types of losses, i.e., Supervised Classification loss, Centroid Alignment loss, and ICG Alignment loss, respectively. Mugnai et al. introduce a Semi-Supervised Learning (SSL) method which leverages ideas from adversarial entropy optimization and second-order pooling. Their main goal is to reduce the prohibitive annotation cost of FGVC according to the SSL setting. Luo et al. propose a novel self-supervised method, called Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos to learn spatio-temporal features.

Finally, Wang et al. propose an efficient crowd counting neural architecture search framework to search efficient crowd counting network structures. A novel search from pre-trained strategy enables their cross-task architecture search to efficiently explore the large and flexible search space. Li et al. formulate the video summarization as a hierarchical refining process. They propose a hierarchical summarization network with deep Q-learning (HQSN) to achieve the refining process and explore temporal dependency. Besides, they collect a new dataset that consists of structured game videos with fine-grained actions and importance annotations.

To summarize, those papers have illustrated the effectiveness of self-supervised learning, semi-supervised learning, transfer learning, reinforcement learning, as well as neural architecture search in FGVRID tasks, respectively. This Special Issue may benefit broader readers of researchers, practitioners, and students who are interested in FGVRID. We would like to thank the authors for their contributions to this Special Issue. We also thank the journal—ACM Transactions on Multimedia Computing, Communications, and Applications—for their support!

Shiliang Zhang

Peking University

Guorong Li

University of Chinese Academy of Sciences

Weigang Zhang

Harbin Institute of Technology

Qingming Huang,

University of Chinese Academy of Sciences

Tiejun Huang

Peking University

Mubarak Shah

University of Central Florida

Nicu Sebe

University of Trento

Guest Editors

Cited By

View all

Yuan BLu JYou SBao B(2024)Unbiased Feature Learning with Causal Intervention for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674737Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3674737
Zeng XWang XXie Y(2024)Multiple Pseudo-Siamese Network with Supervised Contrast Learning for Medical Multi-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363744120:5(1-23)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3637441
Si THe FLi P(2024)UnifiedSC: a unified framework via collaborative optimization for multi-task person re-identificationApplied Intelligence10.1007/s10489-024-05333-054:4(2962-2975)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1007/s10489-024-05333-0
Show More Cited By

Recommendations

Fine-grained role graph model
A fine-grained, controllable, user-to-user delegation method in RBAC
SACMAT '05: Proceedings of the tenth ACM symposium on Access control models and technologies

This paper addresses the issues surrounding user-to-user delegation in RBAC. We show how delegations can be incorporated into the RBAC model in a simple and straightforward manner. A special feature of the model is that it allows fine-grained control ...
Fine-grained face verification

As performance on some aspects of the Labeled Faces in the Wild (LFW) benchmark approaches 100% accuracy, there is an intense debate on whether unconstrained face verification problem has already been solved. In this paper, we study a new face ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 18, Issue 1s

February 2022

352 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3505206

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 January 2022

Published in TOMM Volume 18, Issue 1s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Introduction
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
483
Total Downloads

Downloads (Last 12 months)175
Downloads (Last 6 weeks)23

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yuan BLu JYou SBao B(2024)Unbiased Feature Learning with Causal Intervention for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674737Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3674737
Zeng XWang XXie Y(2024)Multiple Pseudo-Siamese Network with Supervised Contrast Learning for Medical Multi-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363744120:5(1-23)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3637441
Si THe FLi P(2024)UnifiedSC: a unified framework via collaborative optimization for multi-task person re-identificationApplied Intelligence10.1007/s10489-024-05333-054:4(2962-2975)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1007/s10489-024-05333-0
Ma LWu XTang RZhong CZhang K(2023)YuYin: a multi-task learning model of multi-modal e-commerce background music recommendationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00306-62023:1Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1186/s13636-023-00306-6
He QZheng ZHu H(2023)A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361737520:2(1-20)Online publication date: 18-Oct-2023
https://dl.acm.org/doi/10.1145/3617375
Wang HWang YYu BZhan YYuan CYang W(2023)Attentional Composition Networks for Long-Tailed Human Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360325320:1(1-18)Online publication date: 9-Jun-2023
https://dl.acm.org/doi/10.1145/3603253
Song JLi ZMin WJiang S(2023)Towards Food Image Retrieval via Generalization-Oriented Sampling and Loss Function DesignACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360009520:1(1-19)Online publication date: 25-Aug-2023
https://dl.acm.org/doi/10.1145/3600095
Qi MChan SHang CZhang GLi Z(2023)Fine-grained Learning for Visible-Infrared Person Re-identification2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00412(2417-2422)Online publication date: Jul-2023
https://doi.org/10.1109/ICME55011.2023.00412
Yue STu YLi LGao SYu Z(undefined)Multi-grained Representation Aggregating Transformer with Gating Cycle for Change CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3660346
https://dl.acm.org/doi/10.1145/3660346
Gan YYang CYe MHuang ROuyang D(undefined)Generative Adversarial Networks with Learnable Auxiliary Module for Image SynthesisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3653021
https://dl.acm.org/doi/10.1145/3653021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Fine-grained role graph model

A fine-grained, controllable, user-to-user delegation method in RBAC

Fine-grained face verification

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations