[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3171642.3171825guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Explicit knowledge-based reasoning for visual question answering

Published: 19 August 2017 Publication History

Abstract

We describe a method for visual question answering which is capable of reasoning about an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can explain the reasoning by which it developed its answer. It is capable of answering far more complex questions than the predominant long short-term memory-based approach, and outperforms it significantly in testing. We also provide a dataset and a protocol by which to evaluate general visual question answering methods.

References

[1]
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. VQA: Visual Question Answering. In Proc. ICCV , 2015.
[2]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. DBpedia: A nucleus for a web of open data . Springer, 2007.
[3]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic Parsing on Freebase from Question-Answer Pairs. In Proc. EMNLP , 2013.
[4]
Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python . O'Reilly Media, Inc., 2009.
[5]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In ACM SIGMOD , 2008.
[6]
Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. Large-scale simple question answering with memory networks. In Proc. ICLR , 2015.
[7]
Richard Cyganiak, David Wood, and Markus Lanthaler. Rdf 1.1 concepts and abstract syntax, 2014. http://www.w3.org/standards/techs/rdf.
[8]
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. Open Information Extraction: The Second Generation. In Proc. IJCAI , 2011.
[9]
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In Proc. EMNLP , 2016.
[10]
Donald Geman, Stuart Geman, Neil Hallonquist, and Laurent Younes. Visual Turing test for computer vision systems. Proc. NAS , 112(12):3618-3623, 2015.
[11]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation , 9(8):1735-1780, 1997.
[12]
Konrad Höffner, Sebastian Walter, Edgard Marx, Ricardo Usbeck, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. Survey on challenges of question answering in the semantic web. Semantic Web , 2016.
[13]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. Visual genome: Connecting language and vision using crowd-sourced dense image annotations. Int. J. Comput. Vision , 123(1):32-73, 2017.
[14]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In Proc. ECCV , 2014.
[15]
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchical question-image co-attention for visual question answering. In Proc. NIPS , 2016.
[16]
Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. YAGO3: A knowledge base from multilingual Wikipedias. In CIDR , 2015.
[17]
Mateusz Malinowski and Mario Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In Proc. NIPS , 2014.
[18]
Mateusz Malinowski and Mario Fritz. Towards a Visual Turing Challenge. In NIPS Workshop on Learning Semantics , 2014.
[19]
Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images. In Proc. ICCV , 2015.
[20]
Mengye Ren, Ryan Kiros, and Richard Zemel. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset. In Proc. NIPS , 2015.
[21]
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočisky, and Phil Blunsom. Reasoning about Entailment with Neural Attention. In Proc. ICLR , 2016.
[22]
Max Schmachtenberg, C Bizer, and H Paulheim. State of the LOD Cloud 2014, 2014. http://linkeddatacatalog.dws.informatik.unimannheim.de/state.
[23]
Peng Wang, Qi Wu, Chunhua Shen, and Anton van den Hengel. The vqa-machine: Learning how to use existing vision algorithms to answer new questions. In Proc. CVPR , 2017.
[24]
Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources. In Proc. CVPR , 2016.
[25]
Huijuan Xu and Kate Saenko. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In Proc. ECCV , 2016.
[26]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proc. ICML , 2015.
[27]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked Attention Networks for Image Question Answering. In Proc. CVPR , 2016.
[28]
Licheng Yu, Eunbyung Park, Alexander C Berg, and Tamara L Berg. Visual madlibs: Fill in the blank image generation and question answering. In Proc. ICCV , 2015.
[29]
Yuke Zhu, Ce Zhang, Christopher Ré, and Li Fei-Fei. Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries. arXiv preprint arXiv:1507.05670 , 2015.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence
August 2017
5253 pages
ISBN:9780999241103

Sponsors

  • Australian Comp Soc: Australian Computer Society
  • NSF: National Science Foundation
  • Griffith University
  • University of Technology Sydney
  • AI Journal: AI Journal

Publisher

AAAI Press

Publication History

Published: 19 August 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Benchmarks for Automated Commonsense Reasoning: A SurveyACM Computing Surveys10.1145/361535556:4(1-41)Online publication date: 23-Oct-2023
  • (2020)RCE-HILACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336500316:1(1-21)Online publication date: 17-Feb-2020
  • (2019)Image Captioning by Asking QuestionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331387315:2s(1-19)Online publication date: 19-Jul-2019
  • (2019)BTDPACM Transactions on Multimedia Computing, Communications, and Applications10.1145/328246915:2s(1-21)Online publication date: 3-Jul-2019
  • (2018)Out of the boxProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327190(2659-2670)Online publication date: 3-Dec-2018
  • (2018)Neural-symbolic VQAProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3326943.3327039(1039-1050)Online publication date: 3-Dec-2018
  • (2018)Non-monotonic Logical Reasoning and Deep Learning for Explainable Visual Question AnsweringProceedings of the 6th International Conference on Human-Agent Interaction10.1145/3284432.3284456(11-19)Online publication date: 4-Dec-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media