Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.02774 (cs)

[Submitted on 4 Jun 2024 (v1), last revised 18 Jul 2024 (this version, v2)]

Title:Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following

Authors:Qiaomu Miao, Alexandros Graikos, Jingwei Zhang, Sounak Mondal, Minh Hoai, Dimitris Samaras

Abstract:Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators, which is a laborious and inherently ambiguous process. We propose the first semi-supervised method for gaze following by introducing two novel priors to the task. We obtain the first prior using a large pretrained Visual Question Answering (VQA) model, where we compute Grad-CAM heatmaps by `prompting' the VQA model with a gaze following question. These heatmaps can be noisy and not suited for use in training. The need to refine these noisy annotations leads us to incorporate a second prior. We utilize a diffusion model trained on limited human annotations and modify the reverse sampling process to refine the Grad-CAM heatmaps. By tuning the diffusion process we achieve a trade-off between the human annotation prior and the VQA heatmap prior, which retains the useful VQA prior information while exhibiting similar properties to the training data distribution. Our method outperforms simple pseudo-annotation generation baselines on the GazeFollow image dataset. More importantly, our pseudo-annotation strategy, applied to a widely used supervised gaze following model (VAT), reduces the annotation need by 50%. Our method also performs the best on the VideoAttentionTarget dataset.

Comments:	Accepted to ECCV 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.02774 [cs.CV]
	(or arXiv:2406.02774v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.02774

Submission history

From: Qiaomu Miao [view email]
[v1] Tue, 4 Jun 2024 20:43:26 UTC (14,975 KB)
[v2] Thu, 18 Jul 2024 16:59:08 UTC (10,176 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators