More Web Proxy on the site http://driver.im/

Article

Explicit knowledge-based reasoning for visual question answering

Authors:

Anton Van Den HengeAuthors Info & Claims

IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

Pages 1290 - 1296

Published: 19 August 2017 Publication History

Abstract

We describe a method for visual question answering which is capable of reasoning about an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can explain the reasoning by which it developed its answer. It is capable of answering far more complex questions than the predominant long short-term memory-based approach, and outperforms it significantly in testing. We also provide a dataset and a protocol by which to evaluate general visual question answering methods.

References

[1]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. VQA: Visual Question Answering. In Proc. ICCV , 2015.

Digital Library

[2]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. DBpedia: A nucleus for a web of open data . Springer, 2007.

[3]

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic Parsing on Freebase from Question-Answer Pairs. In Proc. EMNLP , 2013.

[4]

Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python . O'Reilly Media, Inc., 2009.

Digital Library

[5]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In ACM SIGMOD , 2008.

Digital Library

[6]

Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. Large-scale simple question answering with memory networks. In Proc. ICLR , 2015.

[7]

Richard Cyganiak, David Wood, and Markus Lanthaler. Rdf 1.1 concepts and abstract syntax, 2014. http://www.w3.org/standards/techs/rdf.

[8]

Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. Open Information Extraction: The Second Generation. In Proc. IJCAI , 2011.

Digital Library

[9]

Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In Proc. EMNLP , 2016.

[10]

Donald Geman, Stuart Geman, Neil Hallonquist, and Laurent Younes. Visual Turing test for computer vision systems. Proc. NAS , 112(12):3618-3623, 2015.

[11]

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation , 9(8):1735-1780, 1997.

Digital Library

[12]

Konrad Höffner, Sebastian Walter, Edgard Marx, Ricardo Usbeck, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. Survey on challenges of question answering in the semantic web. Semantic Web , 2016.

[13]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. Visual genome: Connecting language and vision using crowd-sourced dense image annotations. Int. J. Comput. Vision , 123(1):32-73, 2017.

Digital Library

[14]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In Proc. ECCV , 2014.

[15]

Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchical question-image co-attention for visual question answering. In Proc. NIPS , 2016.

[16]

Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. YAGO3: A knowledge base from multilingual Wikipedias. In CIDR , 2015.

[17]

Mateusz Malinowski and Mario Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In Proc. NIPS , 2014.

Digital Library

[18]

Mateusz Malinowski and Mario Fritz. Towards a Visual Turing Challenge. In NIPS Workshop on Learning Semantics , 2014.

[19]

Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images. In Proc. ICCV , 2015.

Digital Library

[20]

Mengye Ren, Ryan Kiros, and Richard Zemel. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset. In Proc. NIPS , 2015.

[21]

Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočisky, and Phil Blunsom. Reasoning about Entailment with Neural Attention. In Proc. ICLR , 2016.

[22]

Max Schmachtenberg, C Bizer, and H Paulheim. State of the LOD Cloud 2014, 2014. http://linkeddatacatalog.dws.informatik.unimannheim.de/state.

[23]

Peng Wang, Qi Wu, Chunhua Shen, and Anton van den Hengel. The vqa-machine: Learning how to use existing vision algorithms to answer new questions. In Proc. CVPR , 2017.

[24]

Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources. In Proc. CVPR , 2016.

[25]

Huijuan Xu and Kate Saenko. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In Proc. ECCV , 2016.

[26]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proc. ICML , 2015.

Digital Library

[27]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked Attention Networks for Image Question Answering. In Proc. CVPR , 2016.

[28]

Licheng Yu, Eunbyung Park, Alexander C Berg, and Tamara L Berg. Visual madlibs: Fill in the blank image generation and question answering. In Proc. ICCV , 2015.

Digital Library

[29]

Yuke Zhu, Ce Zhang, Christopher Ré, and Li Fei-Fei. Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries. arXiv preprint arXiv:1507.05670 , 2015.

Cited By

Davis E(2023)Benchmarks for Automated Commonsense Reasoning: A SurveyACM Computing Surveys10.1145/361535556:4(1-41)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3615355
Huang XPeng YWen Z(2020)RCE-HILACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336500316:1(1-21)Online publication date: 17-Feb-2020
https://dl.acm.org/doi/10.1145/3365003
Yang XXu C(2019)Image Captioning by Asking QuestionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331387315:2s(1-19)Online publication date: 19-Jul-2019
https://dl.acm.org/doi/10.1145/3313873
Show More Cited By

Recommendations

Inner Knowledge-based Img2Doc Scheme for Visual Question Answering
Visual Question Answering (VQA) is a research topic of significant interest at the intersection of computer vision and natural language understanding. Recent research indicates that attributes and knowledge can effectively improve performance for both ...
Knowledge-Based Visual Question Generation
Visual question generation task aims to generate meaningful questions about an image targeting an answer. Existing methods focus on the visual concepts in the image for question generation. However, humans inevitably use their knowledge related to visual ...
Visual Reasoning and Image Understanding: A Question Answering Approach

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

August 2017

5253 pages

ISBN:9780999241103

Editor:
Carles Sierra
IIIA-CSIC

Sponsors

Australian Comp Soc: Australian Computer Society
NSF: National Science Foundation
Griffith University
University of Technology Sydney
AI Journal: AI Journal

Publisher

AAAI Press

Publication History

Published: 19 August 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Davis E(2023)Benchmarks for Automated Commonsense Reasoning: A SurveyACM Computing Surveys10.1145/361535556:4(1-41)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3615355
Huang XPeng YWen Z(2020)RCE-HILACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336500316:1(1-21)Online publication date: 17-Feb-2020
https://dl.acm.org/doi/10.1145/3365003
Yang XXu C(2019)Image Captioning by Asking QuestionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331387315:2s(1-19)Online publication date: 19-Jul-2019
https://dl.acm.org/doi/10.1145/3313873
(2019)BTDPACM Transactions on Multimedia Computing, Communications, and Applications10.1145/328246915:2s(1-21)Online publication date: 3-Jul-2019
https://dl.acm.org/doi/10.1145/3282469
Narasimhan MLazebnik SSchwing A(2018)Out of the boxProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327190(2659-2670)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327190
Yi KWu JGan CTorralba AKohli PTenenbaum J(2018)Neural-symbolic VQAProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3326943.3327039(1039-1050)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3326943.3327039
Riley HSridharan MImai MNorman TSklar EKomatsu T(2018)Non-monotonic Logical Reasoning and Deep Learning for Explainable Visual Question AnsweringProceedings of the 6th International Conference on Human-Agent Interaction10.1145/3284432.3284456(11-19)Online publication date: 4-Dec-2018
https://dl.acm.org/doi/10.1145/3284432.3284456

View Options

View options

Media

Figures

Other

Tables

View Table of Contents