[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Scene Graph Prediction with Concept Knowledge Base

  • Conference paper
  • First Online:
Cognitive Systems and Information Processing (ICCSIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1515))

Included in the following conference series:

  • 1383 Accesses

Abstract

Image understanding is an emerging research direction in computer vision, and scene graphs are the most mainstream form of understanding. A scene graph is a topological graph with objects in the scene as nodes and relationships as edges, used to describe the composition and semantic association of objects in an image scene. Scene graph prediction requires not only object detection, but also relationship prediction.

In this work, we propose a scene graph prediction method based on a conceptual knowledge base, which uses the condensed human understanding stored in the knowledge base to assist the generation of the scene graph. We designed a simple model to fuse image features, label features and knowledge features. Then the data filtered by the model is used as the input of the classic scene graph generation model, and better prediction results are obtained. Finally, we analyzed the reasons for the slight increase in the results, and summarized and prospected.

Supported by Major Project of the New Generation of Artificial Intelligence (No. 2018AAA0102900).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  2. Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1533–1544 (2013)

    Google Scholar 

  3. Bizer, C., et al.: DBpedia-a crystallization point for the web of data. J. Web Semant. 7(3), 154–165 (2009)

    Article  Google Scholar 

  4. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250 (2008)

    Google Scholar 

  5. Cohen, W.W., Sun, H., Hofer, R.A., Siegler, M.: Scalable neural methods for reasoning with a symbolic knowledge base. arXiv preprint arXiv:2002.06115 (2020)

  6. Dhingra, B., Zaheer, M., Balachandran, V., Neubig, G., Salakhutdinov, R., Cohen, W.W.: Differentiable reasoning over a virtual knowledge base. arXiv preprint arXiv:2002.10640 (2020)

  7. Gao, L., Wang, B., Wang, W.: Image captioning with scene-graph based semantic concepts. In: Proceedings of the 2018 10th ICML, pp. 225–229 (2018)

    Google Scholar 

  8. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)

    Google Scholar 

  9. Ji, S., Pan, S., Cambria, E., Marttinen, P., Philip, S.Y.: A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. (2021)

    Google Scholar 

  10. Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)

    Google Scholar 

  11. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332 (2016)

  12. Li, Z., Ding, X., Liu, T.: Constructing narrative event evolutionary graph for script event prediction. arXiv preprint arXiv:1805.05081 (2018)

  13. Liang, X., Hu, Z., Zhang, H., Lin, L., Xing, E.P.: Symbolic graph reasoning meets convolutions. Adv. Neural. Inf. Process. Syst. 31, 1853–1863 (2018)

    Google Scholar 

  14. Liang, Y., Bai, Y., Zhang, W., Qian, X., Zhu, L., Mei, T.: VRR-VG: refocusing visually-relevant relationships. In: Proceedings of the IEEE/CVF ICCV, pp. 10403–10412 (2019)

    Google Scholar 

  15. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  16. Narasimhan, M., Lazebnik, S., Schwing, A.G.: Out of the box: reasoning with graph convolution nets for factual visual question answering. arXiv preprint arXiv:1811.00538 (2018)

  17. Pan, B., et al.: Spatio-temporal graph for video captioning with knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10870–10879 (2020)

    Google Scholar 

  18. Qi, M., Wang, Y., Li, A.: Online cross-modal scene retrieval by binary representation and semantic graph. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 744–752 (2017)

    Google Scholar 

  19. Ren, H., Hu, W., Leskovec, J.: Query2box: reasoning over knowledge graphs in vector space using box embeddings. arXiv preprint arXiv:2002.05969 (2020)

  20. Shih, K.J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. In: Proceedings of the 2019 CVPR, pp. 4613–4621 (2016)

    Google Scholar 

  21. Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  22. Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF CVPR, pp. 3716–3725 (2020)

    Google Scholar 

  23. Wan, H., Luo, Y., Peng, B., Zheng, W.-S.: Representation learning for scene graph completion via jointly structural and visual embedding. In: IJCAI, Stockholm, Sweden, pp. 949–956 (2018)

    Google Scholar 

  24. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)

    Article  MathSciNet  Google Scholar 

  25. Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5419 (2017)

    Google Scholar 

  26. Xu, K., Li, J., Zhang, M., Du, S.S., Kawarabayashi, K.I., Jegelka, S.: What can neural networks reason about? arXiv preprint arXiv:1905.13211 (2019)

  27. Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Proceedings of the ECCV, pp. 670–685 (2018)

    Google Scholar 

  28. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)

    Google Scholar 

  29. Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: Proceedings of the CVPR, pp. 5831–5840 (2018)

    Google Scholar 

  30. Zhang, M., Liu, X., Liu, W., Zhou, A., Ma, H., Mei, T.: Multi-granularity reasoning for social relation recognition from images. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1618–1623. IEEE (2019)

    Google Scholar 

  31. Zhao, B., Meng, L., Yin, W., Sigal, L.: Image generation from layout. In: Proceedings of the 2019 CVPR, pp. 8584–8593 (2019)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Major Project of the New Generation of Artificial Intelligence (No. 2018AAA0102900).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Runqing Miao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miao, R., Jia, Q. (2022). Scene Graph Prediction with Concept Knowledge Base. In: Sun, F., Hu, D., Wermter, S., Yang, L., Liu, H., Fang, B. (eds) Cognitive Systems and Information Processing. ICCSIP 2021. Communications in Computer and Information Science, vol 1515. Springer, Singapore. https://doi.org/10.1007/978-981-16-9247-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-9247-5_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-9246-8

  • Online ISBN: 978-981-16-9247-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics