A Neuro-Symbolic AI System for Visual Question Answering in Pedestrian Video Sequences

Jaeil Park¹⁸,
Seok-Jun Bu¹⁸ &
Sung-Bae Cho¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13469))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

905 Accesses
1 Citations

Abstract

With the rapid increase in the amount of video data, efficient object recognition is mandatory for a system capable of automatically performing question and answering. In particular, real-world video environments with numerous types of objects and complex relationships require extensive knowledge representation and inference algorithms with the properties and relations of objects. In this paper, we propose a hybrid neuro-symbolic AI system that handles scene-graph of real-world video data. The method combines neural networks that generate scene graphs in consideration of the relationship between objects on real roads and symbol-based inference algorithms for responding to questions. We define object properties, relationships, and question coverage to cover the real-world objects in pedestrian video and traverse a scene-graph to perform complex visual question-answering. We have demonstrated the superiority of the proposed method by confirming that it answered with 99.71% accuracy to 5-types of questions in a pedestrian video environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering

Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering

Question Answering for Visual Navigation in Human-Centered Environments

References

Park, K.-W., Bu, S.-J., Cho, S.-B.: Evolutionary optimization of neuro-symbolic integration for phishing URL detection. In: International Conference on Hybrid Artificial Intelligent Systems, pp. 88–100 (2021)
Google Scholar
Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J.: Neural-symbolic VQA: disentangling reasoning from vision and language understanding. In: Advances in Neural Information Processing Systems, pp. 1031–1042 (2018)
Google Scholar
Amizadeh, S., Palangi, H., Polozov, O., Huang, Y., Kishida, K.: Neuro-symbolic visual reasoning: disentangling ‘visual’ from ‘reasoning’. In: International Conference on Machine Learning, pp. 279–290 (2020)
Google Scholar
Shi, J., Zhang, H., Li, J.: Explainable and Explicit Visual Reasoning over Scene Graphs. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8368–8376 (2019)
Google Scholar
Wang, P., Wu, Q., Shen, C., Dick, A., Van Den Hengel, A.: FVQA: fact-based visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1367–1381 (2018)
Article Google Scholar
Teney, D., Liu, L., van Den Hengel, A.: Graph-structured representations for visual question answering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2017)
Google Scholar
Song, Y.-S., Cho, S.-B.: Objects relationship modeling for improving object detection of service robots using Bayesian network integration. In: International Conference on Intelligent Computing, pp. 678–683 (2006)
Google Scholar
Mao, J., Gan, C., Deepmind, P.K., Tenenbaum, J.B., Wu, J.: The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. In: International Conference on Learning Representations (2019)
Google Scholar
Han, C., Mao, J., Gan, C., Tenenbaum, J.B., Wu, J.: Visual concept metaconcept learning. In: Advances in Neural Information Processing Systems, pp. 5001–5012 (2019)
Google Scholar
Yu, J., et al.: Reasoning on the relation: enhancing visual representation for visual question answering and cross-modal retrieval. IEEE Trans. Multimedia 22, 3196–3209 (2020)
Article Google Scholar
Agrawal, A., et al.: VQA: visual question answering. Int. J. Comput. Vision 123(1), 4–31 (2016). https://doi.org/10.1007/s11263-016-0966-6
Article MathSciNet Google Scholar
Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: IEEE International Conference on Computer Vision, pp. 804–813 (2017)
Google Scholar
Cong, W., Wang, W., Lee, W.-C.: Scene Graph Generation via Conditional Random Fields. arXiv preprint arXiv:1811.08075 (2018)
Kolesnikov, A., Kuznetsova, A., Lampert, C., Ferrari, V.: Detecting visual relationships using box attention. In: IEEE International Conference on Computer Vision Workshops, pp. 1749–1753 (2019)
Google Scholar
Yin, G., et al.: Zoom-net: mining deep feature interactions for visual relationship recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 330–347. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_20
Chapter Google Scholar
Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6619–6628 (2019)
Google Scholar
Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: International Conference on Neural Networks, pp. 347–352 (1996)
Google Scholar
Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: IEEE International Joint Conference on Neural Networks, pp.729–734 (2005)
Google Scholar
Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C., Wang, X.: Factorizable net: an efficient subgraph-based framework for scene graph generation. In: European Conference on Computer Vision, pp. 346–363 (2018)
Google Scholar
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 690–706. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_41
Chapter Google Scholar
Shin, W.-S., Bu, S.-J., Cho, S.-B.: 3D-convolutional neural network with generative adversarial network and autoencoder for robust anomaly detection in video surveillance. Int. J. Neural Syst. 40(6), 2050034 (2020)
Article Google Scholar

Download references

Acknowledgment

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University); No. 2021-0-02068, Artificial Intelligence Innovation Hub).

Author information

Authors and Affiliations

Department of Computer Science, Yonsei University, Seoul, South Korea
Jaeil Park, Seok-Jun Bu & Sung-Bae Cho

Authors

Jaeil Park
View author publications
You can also search for this author in PubMed Google Scholar
Seok-Jun Bu
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Bae Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung-Bae Cho .

Editor information

Editors and Affiliations

University of Deusto, Bilbao, Spain
Pablo García Bringas
University of León, León, Spain
Hilde Pérez García
University of La Rioja, Logroño, La Rioja, Spain
Francisco Javier Martínez de Pisón
University of Oviedo, Oviedo, Spain
José Ramón Villar Flecha
Data Science and Big Data Analytics Lab, Pablo de Olavide University, Sevilla, Spain
Alicia Troncoso Lora
Department of Computer Science, University of Oviedo, Oviedo, Spain
Enrique A. de la Cal
Applied Computational Intelligence, University of Burgos, Burgos, Burgos, Spain
Álvaro Herrero
Universidad Pablo de Olavide, Seville, Spain
Francisco Martínez Álvarez
DIGIP, University of Bergamo, Dalmine, Bergamo, Italy
Giuseppe Psaila
Department of Industrial Engineering, University of A Coruña, Ferrol, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, J., Bu, SJ., Cho, SB. (2022). A Neuro-Symbolic AI System for Visual Question Answering in Pedestrian Video Sequences. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2022. Lecture Notes in Computer Science(), vol 13469. Springer, Cham. https://doi.org/10.1007/978-3-031-15471-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-031-15471-3_38
Published: 12 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15470-6
Online ISBN: 978-3-031-15471-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics