Abstract
With the rapid increase in the amount of video data, efficient object recognition is mandatory for a system capable of automatically performing question and answering. In particular, real-world video environments with numerous types of objects and complex relationships require extensive knowledge representation and inference algorithms with the properties and relations of objects. In this paper, we propose a hybrid neuro-symbolic AI system that handles scene-graph of real-world video data. The method combines neural networks that generate scene graphs in consideration of the relationship between objects on real roads and symbol-based inference algorithms for responding to questions. We define object properties, relationships, and question coverage to cover the real-world objects in pedestrian video and traverse a scene-graph to perform complex visual question-answering. We have demonstrated the superiority of the proposed method by confirming that it answered with 99.71% accuracy to 5-types of questions in a pedestrian video environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Park, K.-W., Bu, S.-J., Cho, S.-B.: Evolutionary optimization of neuro-symbolic integration for phishing URL detection. In: International Conference on Hybrid Artificial Intelligent Systems, pp. 88–100 (2021)
Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J.: Neural-symbolic VQA: disentangling reasoning from vision and language understanding. In: Advances in Neural Information Processing Systems, pp. 1031–1042 (2018)
Amizadeh, S., Palangi, H., Polozov, O., Huang, Y., Kishida, K.: Neuro-symbolic visual reasoning: disentangling ‘visual’ from ‘reasoning’. In: International Conference on Machine Learning, pp. 279–290 (2020)
Shi, J., Zhang, H., Li, J.: Explainable and Explicit Visual Reasoning over Scene Graphs. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8368–8376 (2019)
Wang, P., Wu, Q., Shen, C., Dick, A., Van Den Hengel, A.: FVQA: fact-based visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1367–1381 (2018)
Teney, D., Liu, L., van Den Hengel, A.: Graph-structured representations for visual question answering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2017)
Song, Y.-S., Cho, S.-B.: Objects relationship modeling for improving object detection of service robots using Bayesian network integration. In: International Conference on Intelligent Computing, pp. 678–683 (2006)
Mao, J., Gan, C., Deepmind, P.K., Tenenbaum, J.B., Wu, J.: The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. In: International Conference on Learning Representations (2019)
Han, C., Mao, J., Gan, C., Tenenbaum, J.B., Wu, J.: Visual concept metaconcept learning. In: Advances in Neural Information Processing Systems, pp. 5001–5012 (2019)
Yu, J., et al.: Reasoning on the relation: enhancing visual representation for visual question answering and cross-modal retrieval. IEEE Trans. Multimedia 22, 3196–3209 (2020)
Agrawal, A., et al.: VQA: visual question answering. Int. J. Comput. Vision 123(1), 4–31 (2016). https://doi.org/10.1007/s11263-016-0966-6
Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: IEEE International Conference on Computer Vision, pp. 804–813 (2017)
Cong, W., Wang, W., Lee, W.-C.: Scene Graph Generation via Conditional Random Fields. arXiv preprint arXiv:1811.08075 (2018)
Kolesnikov, A., Kuznetsova, A., Lampert, C., Ferrari, V.: Detecting visual relationships using box attention. In: IEEE International Conference on Computer Vision Workshops, pp. 1749–1753 (2019)
Yin, G., et al.: Zoom-net: mining deep feature interactions for visual relationship recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 330–347. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_20
Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6619–6628 (2019)
Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: International Conference on Neural Networks, pp. 347–352 (1996)
Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: IEEE International Joint Conference on Neural Networks, pp.729–734 (2005)
Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C., Wang, X.: Factorizable net: an efficient subgraph-based framework for scene graph generation. In: European Conference on Computer Vision, pp. 346–363 (2018)
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 690–706. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_41
Shin, W.-S., Bu, S.-J., Cho, S.-B.: 3D-convolutional neural network with generative adversarial network and autoencoder for robust anomaly detection in video surveillance. Int. J. Neural Syst. 40(6), 2050034 (2020)
Acknowledgment
This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University); No. 2021-0-02068, Artificial Intelligence Innovation Hub).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Park, J., Bu, SJ., Cho, SB. (2022). A Neuro-Symbolic AI System for Visual Question Answering in Pedestrian Video Sequences. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2022. Lecture Notes in Computer Science(), vol 13469. Springer, Cham. https://doi.org/10.1007/978-3-031-15471-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-15471-3_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15470-6
Online ISBN: 978-3-031-15471-3
eBook Packages: Computer ScienceComputer Science (R0)