[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multimodal Recommendation (MR) exploits multimodal features of items (e.g., visual or textual features) to provide personalized recommendations for users. Recently, scholars have integrated Graph Convolutional Networks (GCN) into MR to model complicated multimodal relationships, but still with two significant challenges: (1) Most MR methods fail to consider the correlations between different modalities, which significantly affects the modal alignment, resulting in poor performance on MR tasks. (2) Most MR methods leverage multimodal features to enhance item representation learning. However, the connection between multimodal features and user representations remains largely unexplored. To this end, we propose a novel yet effective Cross-modal Attention-enhanced graph convolution network for user-specific Multimodal Recommendation, named CAMR. Specifically, we design a cross-modal attention mechanism to mine the cross-modal correlations. In addition, we devise a modality-aware user feature learning method that uses rich item information to learn user feature representations. Experimental results on four real-world datasets demonstrate the superiority of CAMR compared with several state-of-the-art methods. The codes of this work are available at https://github.com/ZZY-GraphMiningLab/CAMR

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The data that support the findings of this study are openly available in Amazon at http://jmcauley.ucsd.edu/data/amazon/links.html.

References

  1. Gu S, Wang X, Shi C, Xiao D (2022) Self-supervised graph neural networks for multi-behavior recommendation. In: Proceedings of the thirty-first international joint conference on artificial intelligence, pp 2052–2058. https://doi.org/10.24963/ijcai.2022/285

  2. Chen J, Zhang H, He X, Nie L, Liu W, Chua T-S (2017) Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. https://doi.org/10.1145/3077136.3080797

  3. Chen L, Wu L, Hong R, Zhang K, Wang M (2020) Revisiting graph based collaborative filtering: a linear residual graph convolutional network approach. In: Proceedings of the AAAI conference on artificial intelligence, pp 27–34. https://doi.org/10.1609/aaai.v34i01.5330

  4. He X, Deng K, Wang X, Li Y, Zhang Y, Wang M (2020) Lightgcn: simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 639–648. https://doi.org/10.1145/3397271.3401063

  5. Yue G, Xiao R, Zhao Z, Li C (2023) AF-GCN: attribute-fusing graph convolution network for recommendation. IEEE Trans Big Data:597–607. https://doi.org/10.1109/TBDATA.2022.3192598

  6. Li S, Guo D, Liu K, Hong R, Xue F (2023) Multimodal counterfactual learning network for multimedia-based recommendation. In: Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, pp 1539–1548. https://doi.org/10.1145/3539618.3591739

  7. Liu K, Xue F, Guo D, Wu L, Li S, Hong R (2023) MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation. ACM Trans Inform Syst:1–27. https://doi.org/10.1145/3544106

  8. Mu Z, Zhuang Y, Tan J, Xiao J, Tang S (2022) Learning hybrid behavior patterns for multimedia recommendation. In: Proceedings of the 30th ACM international conference on multimedia, pp 376–384. https://doi.org/10.1145/3503161.3548119

  9. He R, McAuley J (2016) VBPR: visual Bayesian personalized ranking from implicit feedback. In: Proceedings of the Thirtieth AAAI conference on artificial intelligence, pp 144–150. https://doi.org/10.5555/3015812.3015834

  10. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pp 452–461. https://doi.org/10.5555/1795114.1795167

  11. Wei Y, Wang X, Nie L, He X, Hong R, Chua T-S (2019) MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM international conference on multimedia, pp 1437–1445. https://doi.org/10.1145/3343031.3351034

  12. Zhang J, Zhu Y, Liu Q, Wu S, Wang S, Wang L (2021) Mining Latent Structures for Multimedia Recommendation. In: Proceedings of the 29th ACM international conference on multimedia. https://doi.org/10.1145/3474085.3475259

  13. Kim T, Lee Y-C, Shin K, Kim S-W (2022) MARIO: modality-aware attention and modality-preserving decoders for multimedia recommendation. In: Proceedings of the 31st ACM international conference on information & knowledge management, pp 993–1002. https://doi.org/10.1145/3511808.3557387

  14. Zhang J, Zhu Y, Liu Q, Zhang M, Wu S, Wang L (2023) Latent Structure Mining With Contrastive Modality Fusion for Multimedia Recommendation. IEEE Trans Knowl Data Eng:9154–9167. https://doi.org/10.1109/TKDE.2022.3221949

  15. Zhou X, Zhou H, Liu Y, Zeng Z, Miao C, Wang P, You Y, Jiang F (2023) Bootstrap Latent Representations for Multi-modal Recommendation. In: Proceedings of the ACM web conference 2023. https://doi.org/10.1145/3543507.3583251

  16. Wei W, Huang C, Xia L, Zhang C (2023) Multi-modal self-supervised learning for recommendation. In: Proceedings of the ACM web conference 2023. https://doi.org/10.1145/3543507.3583206

  17. Zhang Q, Zhao Z, Zhou H, Li X, Li C (2023) Self-supervised contrastive learning on heterogeneous graphs with mutual constraints of structure and feature. Inform Sci:119026. https://doi.org/10.1016/j.ins.2023.119026

  18. Zhao Z, Yang Z, Li C, Zeng Q, Guan W, Zhou M (2023) Dual Feature Interaction-Based Graph Convolutional Network. IEEE Trans Knowl Data Eng:9019–9030. https://doi.org/10.1109/TKDE.2022.3220789

  19. Yang L, Wang S, Tao Y, Sun J, Liu X, Yu PS, Wang T (2023) DGRec: graph neural network for recommendation with diversified embedding generation. In: Proceedings of the sixteenth ACM international conference on web search and data mining, pp 661–669. https://doi.org/10.1145/3539597.3570472

  20. Cai L, Li J, Wang J, Ji S (2022) Line graph neural networks for link prediction. IEEE Trans Pattern Anal Mach Intell:5103–5113. https://doi.org/10.1109/TPAMI.2021.3080635

  21. Wang X, He X, Wang M, Feng F, Chua T-S (2019) Neural graph collaborative filtering. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. https://doi.org/10.1145/3331184.3331267

  22. Liu F, Cheng Z, Zhu L, Gao Z, Nie L (2021) Interest-aware message-passing GCN for recommendation. In: Proceedings of the web conference 2021, pp 1296–1305. https://doi.org/10.1145/3442381.3449986

  23. Cai D, Qian S, Fang Q, Xu C (2022) Heterogeneous hierarchical feature aggregation network for personalized micro-video recommendation. IEEE Trans Multimedia:805–818. https://doi.org/10.1109/TMM.2021.3059508

  24. Liu S, Chen Z, Liu H, Hu X (2019) User-video co-attention network for personalized micro-video recommendation. In: The world wide web conference, pp 3020–3026. https://doi.org/10.1145/3308558.3313513

  25. Yang L, Liu Z, Wang Y, Wang C, Fan Z, Yu PS (2022) Large-scale personalized video game recommendation via social-aware contextualized graph neural network. In: Proceedings of the ACM web conference 2022, pp 3376–3386. https://doi.org/10.1145/3485447.3512273

  26. Yu J, Yin H, Li J, Wang Q, Hung NQV, Zhang X (2021) Self-supervised multi-channel hypergraph convolutional network for social recommendation. In: Proceedings of the web conference 2021, pp 413–424. https://doi.org/10.1145/3442381.3449844

  27. Wang Z, Wei W, Cong G, Li X-L, Mao X-L, Qiu M (2020) Global context enhanced graph neural networks for session-based recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 169–178. https://doi.org/10.1145/3397271.3401142

  28. Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: 6th International conference on learning representations. https://doi.org/10.1007/978-3-031-01587-8_7

  29. Tao Z, Wei Y, Wang X, He X, Huang X, Chua T-S (2020) MGAT: multimodal graph attention network for recommendation. Inform Process Manag:102277. https://doi.org/10.1016/j.ipm.2020.102277

  30. Wang X, He X, Cao Y, Liu M, Chua T-S (2019) KGAT: Knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 950–958. https://doi.org/10.1145/3292500.3330989

  31. Zhou Y, Guo J, Sun H, Song B, Yu FR (2023) Attention-guided multi-step fusion: a hierarchical fusion network for multimodal recommendation. In: Proceedings of the 46th international acm sigir conference on research and development in information retrieval, pp 1816–1820. https://doi.org/10.1145/3539618.3591950

  32. Jing L, Tian Y (2021) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell:4037–4058. https://doi.org/10.1109/TPAMI.2020.2992393

  33. Mahendran A, Thewlis J, Vedaldi A (2019) Cross pixel optical-flow similarity for self-supervised learning. In: Computer vision–ACCV 2018: 14th asian conference on computer vision, pp 99–116. https://doi.org/10.1007/978-3-030-20873-8_7

  34. Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: generative or contrastive. IEEE Trans Knowl Data Eng:857–876. https://doi.org/10.1109/TKDE.2021.3090866

  35. Veličković P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm D (2019) Deep Graph Infomax. In: International conference on learning representations, p 4

  36. Wei W, Huang C, Xia L, Xu Y, Zhao J, Yin D (2022) Contrastive meta learning with behavior multiplicity for recommendation. In: Proceedings of the fifteenth acm international conference on web search and data mining, pp 1120–1128. https://doi.org/10.1145/3488560.3498527

  37. He R, McAuley J (2016) Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th international conference on world wide web, pp 507–517. https://doi.org/10.1145/2872427.2883037

  38. Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 3982–3992. https://doi.org/10.18653/v1/D19-1410

  39. Chen J, Fang H-R, Saad Y (2009) Fast approximate KNN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res:1989–2012. https://doi.org/10.5555/1577069.1755852

  40. Wei Y, Wang X, Nie L, He X, Chua T-S (2020) Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM international conference on multimedia, pp 3541–3549. https://doi.org/10.1145/3394171.3413556

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (Grant No. 62472263, 62072288), the Taishan Scholar Program of Shandong Province, Shandong Youth Innovation Team, the Natural Science Foundation of Shandong Province (Grant No. ZR2024MF034, ZR2022MF268).

Author information

Authors and Affiliations

Authors

Contributions

Ruidong Wang: Conceptualization, Investigation, Methodology, Writing original draft. Zhongying Zhao: Methodology, Writing - review & editing, Supervision, Funding acquisition. Chao Li: Writing - review & editing.

Corresponding author

Correspondence to Zhongying Zhao.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest to this work.

Ethical and informed consent for data used

The datasets used for this experiment are publicly available by the respective organizations/authors to further improve the Multimodal Recommendation research field. Thus, informed consent is not required to use the dataset. References and citations to relevant datasets are included in the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, R., Li, C. & Zhao, Z. Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network. Appl Intell 55, 2 (2025). https://doi.org/10.1007/s10489-024-06061-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06061-1

Keywords

Navigation