[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1609/aaai.v33i01.33018076guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

TallyQA: answering complex counting questions

Published: 27 January 2019 Publication History

Abstract

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.

References

[1]
Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; and Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR.
[2]
Andreas, J.; Rohrbach, M.; Darrell, T.; and Klein, D. 2016. Learning to compose neural networks for question answering. In NAACL.
[3]
Antol, S.; Agrawal, A.; Lu, J.; Mitchell, M.; Batra, D.; Zitnick, C. L.; and Parikh, D. 2015. VQA: Visual question answering. In ICCV.
[4]
Ben-younes, H.; Cadene, R.; Cord, M.; and Thome, N. 2017. Mutan: Multimodal tucker fusion for visual question answering. In ICCV.
[5]
Chattopadhyay, P.; Vedantam, R.; RS, R.; Batra, D.; and Parikh, D. 2017. Counting everyday objects in everyday scenes. In CVPR.
[6]
Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In CVPR.
[7]
Fukui, A.; Park, D. H.; Yang, D.; Rohrbach, A.; Darrell, T.; and Rohrbach, M. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding.
[8]
Girshick, R. 2015. Fast R-CNN. In ICCV.
[9]
Goyal, Y.; Khot, T.; Summers-Stay, D.; Batra, D.; and Parikh, D. 2017. Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR.
[10]
He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.
[11]
Johnson, J.; Hariharan, B.; van der Maaten, L.; Fei-Fei, L.; Zitnick, C. L.; and Girshick, R. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR.
[12]
Kafle, K., and Kanan, C. 2016. Answer-type prediction for visual question answering. In CVPR.
[13]
Kafle, K., and Kanan, C. 2017a. Visual question answering: Datasets, algorithms, and future challenges. CVIU.
[14]
Kafle, K., and Kanan, C. 2017b. An analysis of visual question answering algorithms. In ICCV.
[15]
Kafle, K.; Cohen, S.; Price, B.; and Kanan, C. 2018. DVQA: Understanding data visualizations via question answering. In CVPR.
[16]
Kafle, K.; Yousefhussien, M.; and Kanan, C. 2017. Data augmentation for visual question answering. In INLG.
[17]
Kim, J.-H.; Jun, J.; and Zhang, B.-T. 2018. Bilinear attention networks. arXiv preprint arXiv:1805.07932.
[18]
Lu, J.; Yang, J.; Batra, D.; and Parikh, D. 2016. Hierarchical question-image co-attention for visual question answering. In NIPS.
[19]
Malinowski, M., and Fritz, M. 2014. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS.
[20]
Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In EMNLP.
[21]
Redmon, J., and Farhadi, A. 2017. YOLO9000: better, faster, stronger. In CVPR.
[22]
Ren, M., and Zemel, R. S. 2017. End-to-end instance segmentation with recurrent attention. In CVPR.
[23]
Ren, S.; He, K.; Girshick, R.; and Sun, J. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS.
[24]
Ren, M.; Kiros, R.; and Zemel, R. 2015. Exploring models and data for image question answering. In NIPS.
[25]
Ryan, D.; Denman, S.; Fookes, C.; and Sridharan, S. 2009. Crowd counting using multiple local features. In Digital Image Computing: Techniques and Applications, 2009. DICTA'09., 81–88. IEEE.
[26]
Santoro, A.; Raposo, D.; Barrett, D. G.; Malinowski, M.; Pascanu, R.; Battaglia, P.; and Lillicrap, T. 2017. A simple neural network module for relational reasoning. In NIPS.
[27]
Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D.; et al. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV.
[28]
Teney, D.; Anderson, P.; He, X.; and Hengel, A. v. d. 2017. Tips and tricks for visual question answering: Learnings from the 2017 challenge. arXiv preprint arXiv:1708.02711.
[29]
Trott, A.; Xiong, C.; and Socher, R. 2018. Interpretable counting for visual question answering. In ICLR.
[30]
Wang, M., and Wang, X. 2011. Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In CVPR.
[31]
Zhang, C.; Li, H.; Wang, X.; and Yang, X. 2015. Cross-scene crowd counting via deep convolutional neural networks. In CVPR.
[32]
Zhang, P.; Goyal, Y.; Summers-Stay, D.; Batra, D.; and Parikh, D. 2016. Yin and yang: Balancing and answering binary visual questions. In CVPR.
[33]
Zhang, Y.; Hare, J.; and Prügel-Bennett, A. 2018. Learning to count objects in natural images for visual question answering. In ICLR.

Cited By

View all
  • (2024)MMT-benchProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694429(57116-57198)Online publication date: 21-Jul-2024
  • (2023)A Symbolic Characters Aware Model for Solving Geometry ProblemsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612570(7767-7775)Online publication date: 26-Oct-2023
  • (2023)QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading ComprehensionACM Computing Surveys10.1145/356026055:10(1-45)Online publication date: 2-Feb-2023
  • Show More Cited By

Index Terms

  1. TallyQA: answering complex counting questions
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence
      January 2019
      10088 pages
      ISBN:978-1-57735-809-1

      Sponsors

      • Association for the Advancement of Artificial Intelligence

      Publisher

      AAAI Press

      Publication History

      Published: 27 January 2019

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)35
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MMT-benchProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694429(57116-57198)Online publication date: 21-Jul-2024
      • (2023)A Symbolic Characters Aware Model for Solving Geometry ProblemsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612570(7767-7775)Online publication date: 26-Oct-2023
      • (2023)QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading ComprehensionACM Computing Surveys10.1145/356026055:10(1-45)Online publication date: 2-Feb-2023
      • (2022)Inner Knowledge-based Img2Doc Scheme for Visual Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/348914218:3(1-21)Online publication date: 4-Mar-2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media