[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encoders

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual affordance detection aims to understand the functional attributes of objects, which is crucial for robots to achieve interactive tasks. Most existing affordance detection methods mainly utilize the global image features for affordance detection while do not fully exploit the features of local relevant objects in the image, which often leads to suboptimal detection accuracy under the interference of cluttered backgrounds and neighbour objects. Numerous researches have proved that the accuracy of affordance detection largely depends on the quality of extracted image features. In this paper, we propose a novel affordance detection network with object shape mask guided feature encoders. The masks play as an attention mechanism that enforce the network to focus on the shape regions of target objects in the image, which facilitate to obtain high-quality features. Specifically, we first propose a shape mask guided encoder, which uses masks to effectively locate all target objects so as to extract more expressive features. Based on the encoder, we then propose a dual enhance feature aggregation module, which consists of two branches. The first branch encodes the global features of the original image, while the second branch locates each local relevant object and encodes its precise features. Aggregating these features enhances the feature representation of each object, further improving feature quality and suppressing interference. Quantitative and qualitative evaluations compared with state-of-the-art methods demonstrate that the proposed method achieves superior performance on the two commonly used affordance detection datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

References

  1. Gibson JJ, Carmichael L (1966) The Senses Considered as Perceptual Systems 2:44–73

    Google Scholar 

  2. Gibson JJ (1977) The theory of affordances. Hilldale, USA 1(2):67–82

    Google Scholar 

  3. Liu Z, Liu Q, Xu W, Wang L, Zhou Z (2022) Robot learning towards smart robotic manufacturing: A review. Robot Comput Integr Manuf 77:102360

    Article  Google Scholar 

  4. Munguia-Galeano F, Veeramani S, Hernández JD, Wen Q, Ji Z (2023) Affordance-based human-robot interaction with reinforcement learning. IEEE Access 11:31282–31292

    Article  Google Scholar 

  5. Wu, Y-H, Liu, Y, Zhan, X, Cheng, M-M (2022) P2t: Pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 1–12

  6. Hou, Z, Yu, B, Qiao, Y, Peng, X, Tao, D (2021) Affordance transfer learning for human-object interaction detection. In: IEEE conference on computer vision and pattern recognition, pp 495–504

  7. Shao, D, Zhao, Y, Dai, B, Lin, D (2020) Finegym: A hierarchical video dataset for fine-grained action understanding. In: IEEE conference on computer vision and pattern recognition, pp 2616–2625

  8. Gupta N, Gupta SK, Pathak RK, Jain V, Rashidi P, Suri JS (2022) Human activity recognition in artificial intelligence framework: A narrative review. Artif Intell Rev 55(6):4755–4808

    Article  PubMed  PubMed Central  Google Scholar 

  9. Srivastava, Y, Murali, V, Dubey, SR, Mukherjee, S (2021) Visual question answering using deep learning: A survey and performance analysis. In: Computer vision and image processing: 5th international conference, CVIP 2020, Prayagraj, India, December 4-6, 2020, Revised Selected Papers, Part II 5, pp 75–86

  10. Chen, L, Zheng, Y, Xiao, J (2022) Rethinking data augmentation for robust visual question answering. In: European conference on computer vision, pp 95–112

  11. Roy, A, Todorovic, S (2016) A multi-scale cnn for affordance segmentation in rgb images. In: European conference on computer vision, pp 186–201

  12. Cao, Y, Xu, J, Lin, S, Wei, F, Hu, H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: IEEE international conference on computer vision, pp 0–0

  13. Gu Q, Su J, Yuan L (2021) Visual affordance detection using an efficient attention convolutional neural network. Neurocomputing 440:36–44

    Article  Google Scholar 

  14. Minh, CND, Gilani, SZ, Islam, SMS, Suter, D (2020) Learning affordance segmentation: An investigative study. In: 2020 Digital image computing: techniques and applications, pp 1–8

  15. Lu, L, Zhai, W, Luo, H, Kang, Y, Cao, Y (2022) Phrase-based affordance detection via cyclic bilateral interaction. IEEE Transactions on Artificial Intelligence, pp 1–13

  16. Nguyen, A, Kanoulas, D, Caldwell, DG, Tsagarakis, NG (2017) Object-based affordances detection with convolutional neural networks and dense conditional random fields. In: 2017 IEEE/RSJ international conference on intelligent robots and systems, pp 5908–5915

  17. Do, T-T, Nguyen, A, Reid, I (2018) Affordancenet: An end-to-end deep learning approach for object affordance detection. In: 2018 IEEE international conference on robotics and automation, pp 5882–5889

  18. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28:91–99

    Google Scholar 

  19. Zhao, H, Shi, J, Qi, X, Wang, X, Jia, J (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition, pp 2881–2890

  20. Fooladgar F, Kasaei S (2020) A survey on indoor rgb-d semantic segmentation: from hand-crafted features to deep convolutional neural networks. Multimedia Tools and Applications 79(7):4499–4524

    Article  Google Scholar 

  21. Tang, Y, Zhang, C, Cheng, Q, Li, Z, Qian, L (2022) Fast semantic segmentation network with attention gate and multi-layer fusion. Multimedia Tools and Applications, pp 1–16

  22. Haq NU, Khan A, Din A, Shao L, Shah S et al (2021) A novel weight initialization with adaptive hyper-parameters for deep semantic segmentation. Multimedia Tools and Applications 80(14):21771–21787

    Article  Google Scholar 

  23. Yuan, X, Liu, C, Feng, F, Zhu, Y, Wang, Y (2022) Slice-mask based 3d cardiac shape reconstruction from ct volume. In: Proceedings of the asian conference on computer vision, pp 1909–1925

  24. Sun, J, Chen, L, Xie, Y, Zhang, S, Jiang, Q, Zhou, X, Bao, H (2020) Disp r-cnn: Stereo 3d object detection via shape prior guided instance disparity estimation. In: IEEE conference on computer vision and pattern recognition, pp 10548–10557

  25. Myers, A, Teo, C.L, Fermüller, C, Aloimonos, Y (2015) Affordance detection of tool parts from geometric features. In: 2015 IEEE international conference on robotics and automation, pp 1374–1381

  26. Hermans, T, Rehg, JM, Bobick, A (2011) Affordance prediction via learned object attributes. In: IEEE international conference on robotics and automation: workshop on semantic perception, mapping, and exploration, pp 181–184

  27. Kjellström H, Romero J, Kragić D (2011) Visual object-action recognition: Inferring object affordances from human demonstration. Comput Vis Image Underst 115(1):81–90

    Article  Google Scholar 

  28. Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from rgb-d videos. The International Journal of Robotics Research 32(8):951–970

    Article  Google Scholar 

  29. He, K, Gkioxari, G, Dollár, P, Girshick, R (2017) Mask r-cnn. In: IEEE international conference on computer vision, pp 2961–2969

  30. Bastanfard A, Amirkhani D, Mohammadi M (2022) Toward image super-resolution based on local regression and nonlocal means. Multimedia Tools and Applications 81(16):23473–23492

    Article  Google Scholar 

  31. Zhao X, Cao Y, Kang Y (2020) Object affordance detection with relationship-aware network. Neural Comput & Applic 32(18):14321–14333

    Article  Google Scholar 

  32. Sawatzky, J, Gall, J (2017) Adaptive binarization for weakly supervised affordance segmentation. In: IEEE international conference on computer vision, pp 1383–1391

  33. Chu F-J, Xu R, Vela PA (2019) Learning affordance segmentation for real-world robotic manipulation via synthetic images. IEEE Robotics and Automation Letters 4(2):1140–1147

    Article  Google Scholar 

  34. Deng, S, Xu, X, Wu, C, Chen, K, Jia, K (2021) 3d affordancenet: A benchmark for visual object affordance understanding. In: IEEE conference on computer vision and pattern recognition, pp 1778–1787

  35. Mo, K, Zhu, S, Chang, AX, Yi, L, Tripathi, S, Guibas, LJ, Su, H (2019) Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: IEEE conference on computer vision and pattern recognition, pp 909–918

  36. Chang, A.X, Funkhouser, T, Guibas, L, Hanrahan, P, Huang, Q, Li, Z, Savarese, S, Savva, M, Song, S, Su, H, et al (2015) Shapenet: An information-rich 3d model repository. In: arXiv:1512.03012

  37. Xu, C, Chen, Y, Wang, H, Zhu, S-C, Zhu, Y, Huang, S (2022) Partafford: Part-level affordance discovery from 3d objects. arXiv:2202.13519

  38. Lun, Z, Gadelha, M, Kalogerakis, E, Maji, S, Wang, R (2017) 3d shape reconstruction from sketches via multi-view convolutional networks. In: 2017 International conference on 3D vision, pp 67–77

  39. Chen X, Li Y, Luo X, Shao T, Yu J, Zhou K, Zheng Y (2018) Autosweep: Recovering 3d editable objects from a single photograph. IEEE Trans Vis Comput Graph 26(3):1466–1475

    Article  PubMed  Google Scholar 

  40. Wimbauer, F, Yang, N, von Stumberg, L, Zeller, N, Cremers, D (2021) Monorec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera. In: IEEE conference on computer vision and pattern recognition, pp 6112–6122

  41. Zhong Y, Qi Y, Gryaditskaya Y, Zhang H, Song Y-Z (2020) Towards practical sketch-based 3d shape generation: The role of professional sketches. IEEE Transactions on Circuits and Systems for Video Technology 31(9):3518–3528

    Article  Google Scholar 

  42. Nie J, Wei Z-Q, Nie W, Liu A-A (2021) Pgnet: Progressive feature guide learning network for three-dimensional shape recognition. ACM Trans Multimed Comput Commun Appl 17(3):1–17

    Article  Google Scholar 

  43. Ding, H, Jiang, X, Shuai, B, Liu, AQ, Wang, G (2019) Semantic correlation promoted shape-variant context for segmentation. In: IEEE conference on computer vision and pattern recognition, pp 8885–8894

  44. Kuo, W, Angelova, A, Malik, J, Lin, T-Y (2019) Shapemask: Learning to segment novel objects by refining shape priors. In: IEEE international conference on computer vision, pp 9207–9216

  45. Amirkhani D, Bastanfard A (2021) An objective method to evaluate exemplar-based inpainted images quality using jaccard index. Multimedia Tools and Applications 80(17):26199–26212

    Article  Google Scholar 

  46. Wang X, Shen C, Li H, Xu S (2020) Human detection aided by deeply learned semantic masks. IEEE Trans. Circuits Syst Video Technol 30(8):2663–2673

    Article  Google Scholar 

  47. Jiang S, Lu X, Lei Y, Liu L (2020) Mask-aware networks for crowd counting. IEEE Transactions on Circuits and Systems for Video Technology 30(9):3119–3129

    Article  Google Scholar 

  48. Mao A, Liang Y, Jiao J, Liu Y, He S (2022) Mask-guided deformation adaptive network for human parsing. ACM Trans Multimed Comput Commun Appl 18(1):1–20

    Article  Google Scholar 

  49. Wang X, Tian Y, Zhao X, Yang T, Gelernter J, Wang J, Cheng G, Hu W (2020) Improving multiperson pose estimation by mask-aware deep reinforcement learning. ACM Transactions on Multimedia Computing, Communications, and Applications 16(3):1–18

    Article  CAS  Google Scholar 

  50. Chen, L-C, Zhu, Y, Papandreou, G, Schroff, F, Adam, H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the european conference on computer vision, pp 801–818

  51. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  52. Ronneberger, O, Fischer, P, Brox, T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241

  53. Long, J, Shelhamer, E, Darrell, T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431–3440

  54. Wang, X, Girshick, R, Gupta, A, He, K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition, pp 7794–7803

  55. Chen, L-C, Papandreou, G, Schroff, F, Adam, H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  56. Nguyen, A, Kanoulas, D, Caldwell, DG, Tsagarakis, NG (2016) Detecting object affordances with convolutional neural networks. In: 2016 IEEE/RSJ international conference on intelligent robots and systems, pp 2765–2770

  57. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):834–848

    Article  PubMed  Google Scholar 

  58. Yin C, Zhang Q (2022) Object affordance detection with boundary-preserving network for robotic manipulation tasks. Neural Comput & Applic 34(20):17963–17980

    Article  Google Scholar 

  59. Zhang, Y, Li, H, Ren, T, Dou, Y, Li, Q (2022) Multi-scale fusion and global semantic encoding for affordance detection. In: 2022 International joint conference on neural networks, pp 1–8

  60. Zheng, G, Zhang, F, Zheng, Z, Xiang, Y, Yuan, NJ, Xie, X, Li, Z (2018) Drn: A deep reinforcement learning framework for news recommendation. In: The 2018 WWW Conference, pp 167–176

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 62172022, and U21B2038, in part by the Beijing Outstanding Young Scientists Project under Grant BJJWZYJH01201910005018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dehui Kong.

Ethics declarations

Conflicts of interest

The authors declare that they have no competing financial interests in the subject matter or materials discussed in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, D., Kong, D., Li, J. et al. ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encoders. Multimed Tools Appl 83, 31629–31653 (2024). https://doi.org/10.1007/s11042-023-16898-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16898-2

Keywords

Navigation