Abstract
Unlike natural pictures, diagrams are a highly abstract vehicle for knowledge representation, and Diagram Question Answering involves complex reasoning processes such as diagram element detection. However, due to low resource constraints, achieving efficient extraction of diagram elements is challenging. In addition, vision tasks rely on image feature extraction, and most feature extraction today is based on real scenario images on ImageNet. To solve the above problems, we programmatically synthesized a diagram dataset to implement diagram element prediction and put its feature extraction module to use on downstream task. In the actual task, we explicitly input the predicted image elements from the diagram into the model. The experimental comparison shows a significant improvement in our model compared to the baseline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
Johnson, J., Hariharan, B., Maaten, L.V.D., Fei-Fei, L., Zitnick, C.L., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2901–2910 (2017)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)
Wang, S., Zhang, L.L., Luo, X., et al.: RL-CSDia: representation learning of computer science diagrams. arXiv preprint arXiv:2103.05900 (2021)
Suhr, A., Lewis, M., Yeh, J., Artzi, Y.: A corpus of natural language for visual reasoning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 217–223 (2017)
EbrahimiKahou, S., Michalski, V., Atkinson, A., Kádár, Á., Trischler, A., Bengio, Y.: FigureQA: an annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300 (2017)
Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: Understanding data visualizations via question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5648–5656 (2018)
Lu, P., et al.: Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning. In: The 59th Annual Meeting of the Association for Computational Linguistics (ACL) (2021)
Sachan, M., Dubey, K.A., Mitchell, T.M., Roth, D., Xing, E.P.: Learning pipelines with limited data and domain knowledge: a study in parsing physics problems. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 140–151 (2018)
Lu, P., Qiu, L., Chen, J., et al.: IconQA: a new benchmark for abstract diagram understanding and visual language reasoning. arXiv preprint arXiv:2110.13214 (2021)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (2017)
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Acknowledgements
We appreciate the support from Pudong New Area Science & Technology Development Fund (Project Number: PKX2021-R05).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Y., Chen, Y., Ren, Y., Lan, M., Chen, Y. (2022). Element Information Enhancement for Diagram Question Answering with Synthetic Data. In: Zhang, N., Wang, M., Wu, T., Hu, W., Deng, S. (eds) CCKS 2022 - Evaluation Track. CCKS 2022. Communications in Computer and Information Science, vol 1711. Springer, Singapore. https://doi.org/10.1007/978-981-19-8300-9_9
Download citation
DOI: https://doi.org/10.1007/978-981-19-8300-9_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8299-6
Online ISBN: 978-981-19-8300-9
eBook Packages: Computer ScienceComputer Science (R0)