More Web Proxy on the site http://driver.im/

research-article

Free access

TallyQA: answering complex counting questions

AUTHORs:

Christopher KananAuthors Info & Claims

AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

Article No.: 990, Pages 8076 - 8084

https://doi.org/10.1609/aaai.v33i01.33018076

Published: 27 January 2019 Publication History

PDF eReader Publisher Site

Abstract

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.

References

[1]

Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; and Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR.

[2]

Andreas, J.; Rohrbach, M.; Darrell, T.; and Klein, D. 2016. Learning to compose neural networks for question answering. In NAACL.

[3]

Antol, S.; Agrawal, A.; Lu, J.; Mitchell, M.; Batra, D.; Zitnick, C. L.; and Parikh, D. 2015. VQA: Visual question answering. In ICCV.

Digital Library

[4]

Ben-younes, H.; Cadene, R.; Cord, M.; and Thome, N. 2017. Mutan: Multimodal tucker fusion for visual question answering. In ICCV.

[5]

Chattopadhyay, P.; Vedantam, R.; RS, R.; Batra, D.; and Parikh, D. 2017. Counting everyday objects in everyday scenes. In CVPR.

[6]

Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In CVPR.

[7]

Fukui, A.; Park, D. H.; Yang, D.; Rohrbach, A.; Darrell, T.; and Rohrbach, M. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding.

[8]

Girshick, R. 2015. Fast R-CNN. In ICCV.

[9]

Goyal, Y.; Khot, T.; Summers-Stay, D.; Batra, D.; and Parikh, D. 2017. Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR.

[10]

He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.

[11]

Johnson, J.; Hariharan, B.; van der Maaten, L.; Fei-Fei, L.; Zitnick, C. L.; and Girshick, R. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR.

[12]

Kafle, K., and Kanan, C. 2016. Answer-type prediction for visual question answering. In CVPR.

[13]

Kafle, K., and Kanan, C. 2017a. Visual question answering: Datasets, algorithms, and future challenges. CVIU.

[14]

Kafle, K., and Kanan, C. 2017b. An analysis of visual question answering algorithms. In ICCV.

[15]

Kafle, K.; Cohen, S.; Price, B.; and Kanan, C. 2018. DVQA: Understanding data visualizations via question answering. In CVPR.

[16]

Kafle, K.; Yousefhussien, M.; and Kanan, C. 2017. Data augmentation for visual question answering. In INLG.

[17]

Kim, J.-H.; Jun, J.; and Zhang, B.-T. 2018. Bilinear attention networks. arXiv preprint arXiv:1805.07932.

[18]

Lu, J.; Yang, J.; Batra, D.; and Parikh, D. 2016. Hierarchical question-image co-attention for visual question answering. In NIPS.

[19]

Malinowski, M., and Fritz, M. 2014. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS.

[20]

Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In EMNLP.

[21]

Redmon, J., and Farhadi, A. 2017. YOLO9000: better, faster, stronger. In CVPR.

[22]

Ren, M., and Zemel, R. S. 2017. End-to-end instance segmentation with recurrent attention. In CVPR.

[23]

Ren, S.; He, K.; Girshick, R.; and Sun, J. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS.

[24]

Ren, M.; Kiros, R.; and Zemel, R. 2015. Exploring models and data for image question answering. In NIPS.

[25]

Ryan, D.; Denman, S.; Fookes, C.; and Sridharan, S. 2009. Crowd counting using multiple local features. In Digital Image Computing: Techniques and Applications, 2009. DICTA'09., 81–88. IEEE.

[26]

Santoro, A.; Raposo, D.; Barrett, D. G.; Malinowski, M.; Pascanu, R.; Battaglia, P.; and Lillicrap, T. 2017. A simple neural network module for relational reasoning. In NIPS.

[27]

Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D.; et al. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV.

[28]

Teney, D.; Anderson, P.; He, X.; and Hengel, A. v. d. 2017. Tips and tricks for visual question answering: Learnings from the 2017 challenge. arXiv preprint arXiv:1708.02711.

[29]

Trott, A.; Xiong, C.; and Socher, R. 2018. Interpretable counting for visual question answering. In ICLR.

[30]

Wang, M., and Wang, X. 2011. Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In CVPR.

[31]

Zhang, C.; Li, H.; Wang, X.; and Yang, X. 2015. Cross-scene crowd counting via deep convolutional neural networks. In CVPR.

[32]

Zhang, P.; Goyal, Y.; Summers-Stay, D.; Batra, D.; and Parikh, D. 2016. Yin and yang: Balancing and answering binary visual questions. In CVPR.

[33]

Zhang, Y.; Hare, J.; and Prügel-Bennett, A. 2018. Learning to count objects in natural images for visual question answering. In ICLR.

Cited By

Ying KMeng FWang JLi ZLin HYang YZhang HZhang WLin YLiu SLei JLu QChen RXu PZhang RZhang HGao PWang YQiao YLuo PZhang KShao WSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)MMT-benchProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694429(57116-57198)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694429
Ning MWang QHuang KHuang XEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)A Symbolic Characters Aware Model for Solving Geometry ProblemsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612570(7767-7775)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612570
Rogers AGardner MAugenstein I(2023)QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading ComprehensionACM Computing Surveys10.1145/356026055:10(1-45)Online publication date: 2-Feb-2023
https://dl.acm.org/doi/10.1145/3560260
Show More Cited By

Index Terms

TallyQA: answering complex counting questions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Classification and reassembly of archaeological fragments
VRCAI '16: Proceedings of the Symposium on VR Culture and Heritage - Volume 2

Large number of archaeological fragments are mixed up when they are excavated, which brings difficulties for their reassembly. Computer aided restoration technologies have developed greatly in recent years, and approaches nowadays assist experts in a ...
Influences of counting methods on country rankings: a perspective from patent analysis

The counting of patents and citations is commonly used to evaluate technological innovation and its impact. However, in an age of increasing international collaboration, the counting of international collaboration patents has become a methodological ...
A graph-based optimization algorithm for fragmented image reassembly

We propose a graph-based optimization framework for automatic 2D image fragment reassembly. First, we compute the potential matching between each pair of the image fragments based on their geometry and color. After that, a novel multi-piece matching ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

January 2019

10088 pages

ISBN:978-1-57735-809-1

Copyright © 2019 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 27 January 2019

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
46
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)9

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ying KMeng FWang JLi ZLin HYang YZhang HZhang WLin YLiu SLei JLu QChen RXu PZhang RZhang HGao PWang YQiao YLuo PZhang KShao WSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)MMT-benchProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694429(57116-57198)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694429
Ning MWang QHuang KHuang XEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)A Symbolic Characters Aware Model for Solving Geometry ProblemsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612570(7767-7775)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612570
Rogers AGardner MAugenstein I(2023)QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading ComprehensionACM Computing Surveys10.1145/356026055:10(1-45)Online publication date: 2-Feb-2023
https://dl.acm.org/doi/10.1145/3560260
Li QXiao FBhanu BSheng BHong R(2022)Inner Knowledge-based Img2Doc Scheme for Visual Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/348914218:3(1-21)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3489142

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten