Abstract
Obtaining context information in a scene is an essential ability for semantic segmentation. GloRe [1] learns to infer the context from a graph-based feature constructed by the Global Reasoning unit. The graph nodes are features that are segmented into regions in image space, and the edges are relationships between nodes. Therefore, a failure to construct the graph results in poor performance. In this study, to resolve this problem, we propose a novel unit to construct the graph using multi-scale information. We call it Multi-scale Global Reasoning Unit. It considers the relationship between each region that retains detailed multi-scale spatial information. Specifically, the proposed unit consists of a Feature Aggregation Module and a Global Reasoning Module. The former selects the features required to construct the graph using the multi-scale features. The latter uses GloRe to infer the relationship from the features. The unit is trained in an end-to-end manner. In experiments, we evaluate the effectiveness of the proposed method on Cityscapes and Pascal-context datasets. As a result, we confirmed that the proposed method outperforms the original GloRe.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., Kalantidis, Y.: Graph-based global reasoning networks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 433–442 (2019)
Yin, L., Abhinav, G.: Beyond grids: learning graph representations for visual recognition. In: Advances in Neural Information Processing Systems, pp. 9225–9235 (2018)
Zhiheng, L., Wenxuan, B., Jiayang, Z., Chenliang, X.: Deep grouping model for unified perceptual parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4053–4063 (2020)
Xia, L., Yibo, Y., Qijie, Z., Tiancheng, S., Zhouchen, L., Hong, L.: Spatial pyramid based graph reasoning for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8950–8959 (2020)
Yu, C., Liu, Y., Gao, C., Shen, C., Sang, N.: Representative graph neural network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 379–396. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_23
Wu, T., et al.: GINet: graph interaction network for scene parsing. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 34–51. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_3
Hu, H., Ji, D., Gan, W., Bai, S., Wu, W., Yan, J.: Class-wise dynamic graph convolution for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_1
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations, pp.1–14 (2017)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations, pp.1–14 (2016)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: International Conference on Learning Representations, pp.1–14 (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Fu, J., et al.: Dual attention network for scene segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Yuan, Y., Wang, J.: Ocnet: object context network for scene parsing. arXiv:1809.00916 (2018)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898 (2014)
Sun, K., et al.: High-resolution representations for labeling pixels and regions. arXiv:1904.04514 (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Wu, Z., Shen, C., Hengel, A.V.D.: High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Domae, Y., Aizawa, H., Kato, K. (2021). Multi-scale Global Reasoning Unit for Semantic Segmentation. In: Jeong, H., Sumi, K. (eds) Frontiers of Computer Vision. IW-FCV 2021. Communications in Computer and Information Science, vol 1405. Springer, Cham. https://doi.org/10.1007/978-3-030-81638-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-81638-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81637-7
Online ISBN: 978-3-030-81638-4
eBook Packages: Computer ScienceComputer Science (R0)