ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encoders

Dongpan Chen¹,
Dehui Kong ORCID: orcid.org/0000-0001-7722-7172¹,
Jinghua Li¹,
Shaofan Wang¹ &
…
Baocai Yin¹

291 Accesses
3 Citations
Explore all metrics

Abstract

Visual affordance detection aims to understand the functional attributes of objects, which is crucial for robots to achieve interactive tasks. Most existing affordance detection methods mainly utilize the global image features for affordance detection while do not fully exploit the features of local relevant objects in the image, which often leads to suboptimal detection accuracy under the interference of cluttered backgrounds and neighbour objects. Numerous researches have proved that the accuracy of affordance detection largely depends on the quality of extracted image features. In this paper, we propose a novel affordance detection network with object shape mask guided feature encoders. The masks play as an attention mechanism that enforce the network to focus on the shape regions of target objects in the image, which facilitate to obtain high-quality features. Specifically, we first propose a shape mask guided encoder, which uses masks to effectively locate all target objects so as to extract more expressive features. Based on the encoder, we then propose a dual enhance feature aggregation module, which consists of two branches. The first branch encodes the global features of the original image, while the second branch locates each local relevant object and encodes its precise features. Aggregating these features enhances the feature representation of each object, further improving feature quality and suppressing interference. Quantitative and qualitative evaluations compared with state-of-the-art methods demonstrate that the proposed method achieves superior performance on the two commonly used affordance detection datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

One-Shot Object Affordance Detection in the Wild

Article 08 August 2022

A New Semantic Edge Aware Network for Object Affordance Detection

Article 08 December 2021

Object affordance detection with relationship-aware network

Article 04 July 2019

Data availability

The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

References

Gibson JJ, Carmichael L (1966) The Senses Considered as Perceptual Systems 2:44–73
Google Scholar
Gibson JJ (1977) The theory of affordances. Hilldale, USA 1(2):67–82
Google Scholar
Liu Z, Liu Q, Xu W, Wang L, Zhou Z (2022) Robot learning towards smart robotic manufacturing: A review. Robot Comput Integr Manuf 77:102360
Article Google Scholar
Munguia-Galeano F, Veeramani S, Hernández JD, Wen Q, Ji Z (2023) Affordance-based human-robot interaction with reinforcement learning. IEEE Access 11:31282–31292
Article Google Scholar
Wu, Y-H, Liu, Y, Zhan, X, Cheng, M-M (2022) P2t: Pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 1–12
Hou, Z, Yu, B, Qiao, Y, Peng, X, Tao, D (2021) Affordance transfer learning for human-object interaction detection. In: IEEE conference on computer vision and pattern recognition, pp 495–504
Shao, D, Zhao, Y, Dai, B, Lin, D (2020) Finegym: A hierarchical video dataset for fine-grained action understanding. In: IEEE conference on computer vision and pattern recognition, pp 2616–2625
Gupta N, Gupta SK, Pathak RK, Jain V, Rashidi P, Suri JS (2022) Human activity recognition in artificial intelligence framework: A narrative review. Artif Intell Rev 55(6):4755–4808
Article PubMed PubMed Central Google Scholar
Srivastava, Y, Murali, V, Dubey, SR, Mukherjee, S (2021) Visual question answering using deep learning: A survey and performance analysis. In: Computer vision and image processing: 5th international conference, CVIP 2020, Prayagraj, India, December 4-6, 2020, Revised Selected Papers, Part II 5, pp 75–86
Chen, L, Zheng, Y, Xiao, J (2022) Rethinking data augmentation for robust visual question answering. In: European conference on computer vision, pp 95–112
Roy, A, Todorovic, S (2016) A multi-scale cnn for affordance segmentation in rgb images. In: European conference on computer vision, pp 186–201
Cao, Y, Xu, J, Lin, S, Wei, F, Hu, H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: IEEE international conference on computer vision, pp 0–0
Gu Q, Su J, Yuan L (2021) Visual affordance detection using an efficient attention convolutional neural network. Neurocomputing 440:36–44
Article Google Scholar
Minh, CND, Gilani, SZ, Islam, SMS, Suter, D (2020) Learning affordance segmentation: An investigative study. In: 2020 Digital image computing: techniques and applications, pp 1–8
Lu, L, Zhai, W, Luo, H, Kang, Y, Cao, Y (2022) Phrase-based affordance detection via cyclic bilateral interaction. IEEE Transactions on Artificial Intelligence, pp 1–13
Nguyen, A, Kanoulas, D, Caldwell, DG, Tsagarakis, NG (2017) Object-based affordances detection with convolutional neural networks and dense conditional random fields. In: 2017 IEEE/RSJ international conference on intelligent robots and systems, pp 5908–5915
Do, T-T, Nguyen, A, Reid, I (2018) Affordancenet: An end-to-end deep learning approach for object affordance detection. In: 2018 IEEE international conference on robotics and automation, pp 5882–5889
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28:91–99
Google Scholar
Zhao, H, Shi, J, Qi, X, Wang, X, Jia, J (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition, pp 2881–2890
Fooladgar F, Kasaei S (2020) A survey on indoor rgb-d semantic segmentation: from hand-crafted features to deep convolutional neural networks. Multimedia Tools and Applications 79(7):4499–4524
Article Google Scholar
Tang, Y, Zhang, C, Cheng, Q, Li, Z, Qian, L (2022) Fast semantic segmentation network with attention gate and multi-layer fusion. Multimedia Tools and Applications, pp 1–16
Haq NU, Khan A, Din A, Shao L, Shah S et al (2021) A novel weight initialization with adaptive hyper-parameters for deep semantic segmentation. Multimedia Tools and Applications 80(14):21771–21787
Article Google Scholar
Yuan, X, Liu, C, Feng, F, Zhu, Y, Wang, Y (2022) Slice-mask based 3d cardiac shape reconstruction from ct volume. In: Proceedings of the asian conference on computer vision, pp 1909–1925
Sun, J, Chen, L, Xie, Y, Zhang, S, Jiang, Q, Zhou, X, Bao, H (2020) Disp r-cnn: Stereo 3d object detection via shape prior guided instance disparity estimation. In: IEEE conference on computer vision and pattern recognition, pp 10548–10557
Myers, A, Teo, C.L, Fermüller, C, Aloimonos, Y (2015) Affordance detection of tool parts from geometric features. In: 2015 IEEE international conference on robotics and automation, pp 1374–1381
Hermans, T, Rehg, JM, Bobick, A (2011) Affordance prediction via learned object attributes. In: IEEE international conference on robotics and automation: workshop on semantic perception, mapping, and exploration, pp 181–184
Kjellström H, Romero J, Kragić D (2011) Visual object-action recognition: Inferring object affordances from human demonstration. Comput Vis Image Underst 115(1):81–90
Article Google Scholar
Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from rgb-d videos. The International Journal of Robotics Research 32(8):951–970
Article Google Scholar
He, K, Gkioxari, G, Dollár, P, Girshick, R (2017) Mask r-cnn. In: IEEE international conference on computer vision, pp 2961–2969
Bastanfard A, Amirkhani D, Mohammadi M (2022) Toward image super-resolution based on local regression and nonlocal means. Multimedia Tools and Applications 81(16):23473–23492
Article Google Scholar
Zhao X, Cao Y, Kang Y (2020) Object affordance detection with relationship-aware network. Neural Comput & Applic 32(18):14321–14333
Article Google Scholar
Sawatzky, J, Gall, J (2017) Adaptive binarization for weakly supervised affordance segmentation. In: IEEE international conference on computer vision, pp 1383–1391
Chu F-J, Xu R, Vela PA (2019) Learning affordance segmentation for real-world robotic manipulation via synthetic images. IEEE Robotics and Automation Letters 4(2):1140–1147
Article Google Scholar
Deng, S, Xu, X, Wu, C, Chen, K, Jia, K (2021) 3d affordancenet: A benchmark for visual object affordance understanding. In: IEEE conference on computer vision and pattern recognition, pp 1778–1787
Mo, K, Zhu, S, Chang, AX, Yi, L, Tripathi, S, Guibas, LJ, Su, H (2019) Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: IEEE conference on computer vision and pattern recognition, pp 909–918
Chang, A.X, Funkhouser, T, Guibas, L, Hanrahan, P, Huang, Q, Li, Z, Savarese, S, Savva, M, Song, S, Su, H, et al (2015) Shapenet: An information-rich 3d model repository. In: arXiv:1512.03012
Xu, C, Chen, Y, Wang, H, Zhu, S-C, Zhu, Y, Huang, S (2022) Partafford: Part-level affordance discovery from 3d objects. arXiv:2202.13519
Lun, Z, Gadelha, M, Kalogerakis, E, Maji, S, Wang, R (2017) 3d shape reconstruction from sketches via multi-view convolutional networks. In: 2017 International conference on 3D vision, pp 67–77
Chen X, Li Y, Luo X, Shao T, Yu J, Zhou K, Zheng Y (2018) Autosweep: Recovering 3d editable objects from a single photograph. IEEE Trans Vis Comput Graph 26(3):1466–1475
Article PubMed Google Scholar
Wimbauer, F, Yang, N, von Stumberg, L, Zeller, N, Cremers, D (2021) Monorec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera. In: IEEE conference on computer vision and pattern recognition, pp 6112–6122
Zhong Y, Qi Y, Gryaditskaya Y, Zhang H, Song Y-Z (2020) Towards practical sketch-based 3d shape generation: The role of professional sketches. IEEE Transactions on Circuits and Systems for Video Technology 31(9):3518–3528
Article Google Scholar
Nie J, Wei Z-Q, Nie W, Liu A-A (2021) Pgnet: Progressive feature guide learning network for three-dimensional shape recognition. ACM Trans Multimed Comput Commun Appl 17(3):1–17
Article Google Scholar
Ding, H, Jiang, X, Shuai, B, Liu, AQ, Wang, G (2019) Semantic correlation promoted shape-variant context for segmentation. In: IEEE conference on computer vision and pattern recognition, pp 8885–8894
Kuo, W, Angelova, A, Malik, J, Lin, T-Y (2019) Shapemask: Learning to segment novel objects by refining shape priors. In: IEEE international conference on computer vision, pp 9207–9216
Amirkhani D, Bastanfard A (2021) An objective method to evaluate exemplar-based inpainted images quality using jaccard index. Multimedia Tools and Applications 80(17):26199–26212
Article Google Scholar
Wang X, Shen C, Li H, Xu S (2020) Human detection aided by deeply learned semantic masks. IEEE Trans. Circuits Syst Video Technol 30(8):2663–2673
Article Google Scholar
Jiang S, Lu X, Lei Y, Liu L (2020) Mask-aware networks for crowd counting. IEEE Transactions on Circuits and Systems for Video Technology 30(9):3119–3129
Article Google Scholar
Mao A, Liang Y, Jiao J, Liu Y, He S (2022) Mask-guided deformation adaptive network for human parsing. ACM Trans Multimed Comput Commun Appl 18(1):1–20
Article Google Scholar
Wang X, Tian Y, Zhao X, Yang T, Gelernter J, Wang J, Cheng G, Hu W (2020) Improving multiperson pose estimation by mask-aware deep reinforcement learning. ACM Transactions on Multimedia Computing, Communications, and Applications 16(3):1–18
Article CAS Google Scholar
Chen, L-C, Zhu, Y, Papandreou, G, Schroff, F, Adam, H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the european conference on computer vision, pp 801–818
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Ronneberger, O, Fischer, P, Brox, T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Long, J, Shelhamer, E, Darrell, T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431–3440
Wang, X, Girshick, R, Gupta, A, He, K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition, pp 7794–7803
Chen, L-C, Papandreou, G, Schroff, F, Adam, H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Nguyen, A, Kanoulas, D, Caldwell, DG, Tsagarakis, NG (2016) Detecting object affordances with convolutional neural networks. In: 2016 IEEE/RSJ international conference on intelligent robots and systems, pp 2765–2770
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):834–848
Article PubMed Google Scholar
Yin C, Zhang Q (2022) Object affordance detection with boundary-preserving network for robotic manipulation tasks. Neural Comput & Applic 34(20):17963–17980
Article Google Scholar
Zhang, Y, Li, H, Ren, T, Dou, Y, Li, Q (2022) Multi-scale fusion and global semantic encoding for affordance detection. In: 2022 International joint conference on neural networks, pp 1–8
Zheng, G, Zhang, F, Zheng, Z, Xiang, Y, Yuan, NJ, Xie, X, Li, Z (2018) Drn: A deep reinforcement learning framework for news recommendation. In: The 2018 WWW Conference, pp 167–176

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 62172022, and U21B2038, in part by the Beijing Outstanding Young Scientists Project under Grant BJJWZYJH01201910005018.

Author information

Authors and Affiliations

Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing University of Technology, 100 Pingleyuan, Chaoyang District, Beijing, 100124, China
Dongpan Chen, Dehui Kong, Jinghua Li, Shaofan Wang & Baocai Yin

Authors

Dongpan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dehui Kong
View author publications
You can also search for this author in PubMed Google Scholar
Jinghua Li
View author publications
You can also search for this author in PubMed Google Scholar
Shaofan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Baocai Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dehui Kong.

Ethics declarations

Conflicts of interest

The authors declare that they have no competing financial interests in the subject matter or materials discussed in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, D., Kong, D., Li, J. et al. ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encoders. Multimed Tools Appl 83, 31629–31653 (2024). https://doi.org/10.1007/s11042-023-16898-2

Download citation

Received: 15 July 2022
Revised: 18 May 2023
Accepted: 04 September 2023
Published: 18 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16898-2

ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encoders

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

One-Shot Object Affordance Detection in the Wild

A New Semantic Edge Aware Network for Object Affordance Detection

Object affordance detection with relationship-aware network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encoders

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

One-Shot Object Affordance Detection in the Wild

A New Semantic Edge Aware Network for Object Affordance Detection

Object affordance detection with relationship-aware network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation