[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICCV.2015.279guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

VQA: Visual Question Answering

Published: 07 December 2015 Publication History

Abstract

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance.

Cited By

View all
  • (2024)DCF–VQAInternational Journal of Applied Mathematics and Computer Science10.61822/amcs-2024-003234:3(453-466)Online publication date: 1-Sep-2024
  • (2024)Knowledge Editing for Large Language Models: A SurveyACM Computing Surveys10.1145/369859057:3(1-37)Online publication date: 11-Nov-2024
  • (2024)A Benchmark Dataset for Evaluating Spatial Perception in Multimodal Large ModelsProceedings of the First International Workshop on IoT Datasets for Multi-modal Large Model10.1145/3698385.3699875(37-43)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICCV '15: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)
December 2015
4730 pages
ISBN:9781467383912

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 December 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DCF–VQAInternational Journal of Applied Mathematics and Computer Science10.61822/amcs-2024-003234:3(453-466)Online publication date: 1-Sep-2024
  • (2024)Knowledge Editing for Large Language Models: A SurveyACM Computing Surveys10.1145/369859057:3(1-37)Online publication date: 11-Nov-2024
  • (2024)A Benchmark Dataset for Evaluating Spatial Perception in Multimodal Large ModelsProceedings of the First International Workshop on IoT Datasets for Multi-modal Large Model10.1145/3698385.3699875(37-43)Online publication date: 4-Nov-2024
  • (2024)Adversarial Sample Synthesis for Visual Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368884820:12(1-24)Online publication date: 16-Sep-2024
  • (2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
  • (2024)Enhancing Visual Question Answering with Prompt-based Learning: A Cross-modal Approach for Deep Semantic UnderstandingProceedings of the International Conference on Algorithms, Software Engineering, and Network Security10.1145/3677182.3677310(713-717)Online publication date: 26-Apr-2024
  • (2024)TADACap: Time-series Adaptive Domain-Aware CaptioningProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698690(54-62)Online publication date: 14-Nov-2024
  • (2024)HCCL: Hierarchical Counterfactual Contrastive Learning for Robust Visual Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367390220:10(1-21)Online publication date: 27-Jun-2024
  • (2024)DCMFNet: Deep Cross-Modal Fusion Network for Different Modalities with Iterative Gated FusionProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670956(1-12)Online publication date: 3-Jun-2024
  • (2024)Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3689000(11464-11469)Online publication date: 28-Oct-2024
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media