research-article

Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting

Authors:

Zhe ChenAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 5722 - 5731

https://doi.org/10.1145/3664647.3681583

Published: 28 October 2024 Publication History

Get Access

Abstract

In Video-based Facial Expression Recognition (V-FER), models are typically trained on closed-set datasets with a fixed number of known classes. However, these models struggle with unknown classes common in real-world scenarios. In this paper, we introduce a challenging Open-set Video-based Facial Expression Recognition (OV-FER) task, aiming to identify both known and new, unseen facial expressions. While existing approaches use large-scale vision-language models like CLIP to identify unseen classes, we argue that these methods may not adequately capture the subtle human expressions needed for OV-FER. To address this limitation, we propose a novel Human Expression-Sensitive Prompting (HESP) mechanism to significantly enhance CLIP's ability to model video-based facial expression details effectively. Our proposed HESP comprises three components: 1) a textual prompting module with learnable prompts to enhance CLIP's textual representation of both known and unknown emotions, 2) a visual prompting module that encodes temporal emotional information from video frames using expression-sensitive attention, equipping CLIP with a new visual modeling ability to extract emotion-rich information, and 3) an open-set multi-task learning scheme that promotes interaction between the textual and visual modules, improving the understanding of novel human emotions in video sequences. Extensive experiments conducted on four OV-FER task settings demonstrate that HESP can significantly boost CLIP's performance (a relative improvement of 17.93% on AUROC and 106.18% on OSCR) and outperform other state-of-the-art open-set video understanding methods by a large margin. Code is available at https://github.com/cosinehuang/HESP.

Supplemental Material

MP4 File - 4871-video.mp4

This video presents a brief overview of our work titled "Open Set Video-based Facial Expression Recognition with Human Expression-Sensitive Prompting." We begin by introducing the Open Set Video-based Facial Expression Recognition task, followed by an analysis of the challenges it presents, which motivated the development of our HESP approach. We then describe our framework, comprising three key modules: a text prompting module, a visual prompting module, and an open-set multi-task learning scheme. The video also covers comparative experiments, ablation studies, and visualization analyses. Finally, we conclude with a summary of our work.

Download
38.69 MB

References

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).

Abstract

Supplemental Material

References

Index Terms

Recommendations

Variance-Aware Bi-Attention Expression Transformer for Open-Set Facial Expression Recognition in the Wild

Expression-invariant face recognition by facial expression transformations

Video-based facial expression analysis

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations