Computer Science > Computer Vision and Pattern Recognition

arXiv:2301.11100 (cs)

[Submitted on 26 Jan 2023]

Title:Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities

Authors:Melissa Hall, Laura Gustafson, Aaron Adcock, Ishan Misra, Candace Ross

View PDF

Abstract:We explore the extent to which zero-shot vision-language models exhibit gender bias for different vision tasks. Vision models traditionally required task-specific labels for representing concepts, as well as finetuning; zero-shot models like CLIP instead perform tasks with an open-vocabulary, meaning they do not need a fixed set of labels, by using text embeddings to represent concepts. With these capabilities in mind, we ask: Do vision-language models exhibit gender bias when performing zero-shot image classification, object detection and semantic segmentation? We evaluate different vision-language models with multiple datasets across a set of concepts and find (i) all models evaluated show distinct performance differences based on the perceived gender of the person co-occurring with a given concept in the image and that aggregating analyses over all concepts can mask these concerns; (ii) model calibration (i.e. the relationship between accuracy and confidence) also differs distinctly by perceived gender, even when evaluating on similar representations of concepts; and (iii) these observed disparities align with existing gender biases in word embeddings from language models. These findings suggest that, while language greatly expands the capability of vision tasks, it can also contribute to social biases in zero-shot vision settings. Furthermore, biases can further propagate when foundational models like CLIP are used by other models to enable zero-shot capabilities.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2301.11100 [cs.CV]
	(or arXiv:2301.11100v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2301.11100

Submission history

From: Melissa Hall [view email]
[v1] Thu, 26 Jan 2023 13:44:31 UTC (136 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators