Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network

115 Accesses
Explore all metrics

Abstract

Multimodal Recommendation (MR) exploits multimodal features of items (e.g., visual or textual features) to provide personalized recommendations for users. Recently, scholars have integrated Graph Convolutional Networks (GCN) into MR to model complicated multimodal relationships, but still with two significant challenges: (1) Most MR methods fail to consider the correlations between different modalities, which significantly affects the modal alignment, resulting in poor performance on MR tasks. (2) Most MR methods leverage multimodal features to enhance item representation learning. However, the connection between multimodal features and user representations remains largely unexplored. To this end, we propose a novel yet effective Cross-modal Attention-enhanced graph convolution network for user-specific Multimodal Recommendation, named CAMR. Specifically, we design a cross-modal attention mechanism to mine the cross-modal correlations. In addition, we devise a modality-aware user feature learning method that uses rich item information to learn user feature representations. Experimental results on four real-world datasets demonstrate the superiority of CAMR compared with several state-of-the-art methods. The codes of this work are available at https://github.com/ZZY-GraphMiningLab/CAMR

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A personalized recommendation model with multimodal preference-based graph attention network

Article 20 June 2024

Exploiting heterogeneous information isolation and multi-view aggregation for multimodal recommendation

Article 19 November 2024

Multimodal heterogeneous graph convolutional network for image recommendation

Article 26 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The data that support the findings of this study are openly available in Amazon at http://jmcauley.ucsd.edu/data/amazon/links.html.

References

Gu S, Wang X, Shi C, Xiao D (2022) Self-supervised graph neural networks for multi-behavior recommendation. In: Proceedings of the thirty-first international joint conference on artificial intelligence, pp 2052–2058. https://doi.org/10.24963/ijcai.2022/285
Chen J, Zhang H, He X, Nie L, Liu W, Chua T-S (2017) Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. https://doi.org/10.1145/3077136.3080797
Chen L, Wu L, Hong R, Zhang K, Wang M (2020) Revisiting graph based collaborative filtering: a linear residual graph convolutional network approach. In: Proceedings of the AAAI conference on artificial intelligence, pp 27–34. https://doi.org/10.1609/aaai.v34i01.5330
He X, Deng K, Wang X, Li Y, Zhang Y, Wang M (2020) Lightgcn: simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 639–648. https://doi.org/10.1145/3397271.3401063
Yue G, Xiao R, Zhao Z, Li C (2023) AF-GCN: attribute-fusing graph convolution network for recommendation. IEEE Trans Big Data:597–607. https://doi.org/10.1109/TBDATA.2022.3192598
Li S, Guo D, Liu K, Hong R, Xue F (2023) Multimodal counterfactual learning network for multimedia-based recommendation. In: Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, pp 1539–1548. https://doi.org/10.1145/3539618.3591739
Liu K, Xue F, Guo D, Wu L, Li S, Hong R (2023) MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation. ACM Trans Inform Syst:1–27. https://doi.org/10.1145/3544106
Mu Z, Zhuang Y, Tan J, Xiao J, Tang S (2022) Learning hybrid behavior patterns for multimedia recommendation. In: Proceedings of the 30th ACM international conference on multimedia, pp 376–384. https://doi.org/10.1145/3503161.3548119
He R, McAuley J (2016) VBPR: visual Bayesian personalized ranking from implicit feedback. In: Proceedings of the Thirtieth AAAI conference on artificial intelligence, pp 144–150. https://doi.org/10.5555/3015812.3015834
Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pp 452–461. https://doi.org/10.5555/1795114.1795167
Wei Y, Wang X, Nie L, He X, Hong R, Chua T-S (2019) MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM international conference on multimedia, pp 1437–1445. https://doi.org/10.1145/3343031.3351034
Zhang J, Zhu Y, Liu Q, Wu S, Wang S, Wang L (2021) Mining Latent Structures for Multimedia Recommendation. In: Proceedings of the 29th ACM international conference on multimedia. https://doi.org/10.1145/3474085.3475259
Kim T, Lee Y-C, Shin K, Kim S-W (2022) MARIO: modality-aware attention and modality-preserving decoders for multimedia recommendation. In: Proceedings of the 31st ACM international conference on information & knowledge management, pp 993–1002. https://doi.org/10.1145/3511808.3557387
Zhang J, Zhu Y, Liu Q, Zhang M, Wu S, Wang L (2023) Latent Structure Mining With Contrastive Modality Fusion for Multimedia Recommendation. IEEE Trans Knowl Data Eng:9154–9167. https://doi.org/10.1109/TKDE.2022.3221949
Zhou X, Zhou H, Liu Y, Zeng Z, Miao C, Wang P, You Y, Jiang F (2023) Bootstrap Latent Representations for Multi-modal Recommendation. In: Proceedings of the ACM web conference 2023. https://doi.org/10.1145/3543507.3583251
Wei W, Huang C, Xia L, Zhang C (2023) Multi-modal self-supervised learning for recommendation. In: Proceedings of the ACM web conference 2023. https://doi.org/10.1145/3543507.3583206
Zhang Q, Zhao Z, Zhou H, Li X, Li C (2023) Self-supervised contrastive learning on heterogeneous graphs with mutual constraints of structure and feature. Inform Sci:119026. https://doi.org/10.1016/j.ins.2023.119026
Zhao Z, Yang Z, Li C, Zeng Q, Guan W, Zhou M (2023) Dual Feature Interaction-Based Graph Convolutional Network. IEEE Trans Knowl Data Eng:9019–9030. https://doi.org/10.1109/TKDE.2022.3220789
Yang L, Wang S, Tao Y, Sun J, Liu X, Yu PS, Wang T (2023) DGRec: graph neural network for recommendation with diversified embedding generation. In: Proceedings of the sixteenth ACM international conference on web search and data mining, pp 661–669. https://doi.org/10.1145/3539597.3570472
Cai L, Li J, Wang J, Ji S (2022) Line graph neural networks for link prediction. IEEE Trans Pattern Anal Mach Intell:5103–5113. https://doi.org/10.1109/TPAMI.2021.3080635
Wang X, He X, Wang M, Feng F, Chua T-S (2019) Neural graph collaborative filtering. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. https://doi.org/10.1145/3331184.3331267
Liu F, Cheng Z, Zhu L, Gao Z, Nie L (2021) Interest-aware message-passing GCN for recommendation. In: Proceedings of the web conference 2021, pp 1296–1305. https://doi.org/10.1145/3442381.3449986
Cai D, Qian S, Fang Q, Xu C (2022) Heterogeneous hierarchical feature aggregation network for personalized micro-video recommendation. IEEE Trans Multimedia:805–818. https://doi.org/10.1109/TMM.2021.3059508
Liu S, Chen Z, Liu H, Hu X (2019) User-video co-attention network for personalized micro-video recommendation. In: The world wide web conference, pp 3020–3026. https://doi.org/10.1145/3308558.3313513
Yang L, Liu Z, Wang Y, Wang C, Fan Z, Yu PS (2022) Large-scale personalized video game recommendation via social-aware contextualized graph neural network. In: Proceedings of the ACM web conference 2022, pp 3376–3386. https://doi.org/10.1145/3485447.3512273
Yu J, Yin H, Li J, Wang Q, Hung NQV, Zhang X (2021) Self-supervised multi-channel hypergraph convolutional network for social recommendation. In: Proceedings of the web conference 2021, pp 413–424. https://doi.org/10.1145/3442381.3449844
Wang Z, Wei W, Cong G, Li X-L, Mao X-L, Qiu M (2020) Global context enhanced graph neural networks for session-based recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 169–178. https://doi.org/10.1145/3397271.3401142
Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: 6th International conference on learning representations. https://doi.org/10.1007/978-3-031-01587-8_7
Tao Z, Wei Y, Wang X, He X, Huang X, Chua T-S (2020) MGAT: multimodal graph attention network for recommendation. Inform Process Manag:102277. https://doi.org/10.1016/j.ipm.2020.102277
Wang X, He X, Cao Y, Liu M, Chua T-S (2019) KGAT: Knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 950–958. https://doi.org/10.1145/3292500.3330989
Zhou Y, Guo J, Sun H, Song B, Yu FR (2023) Attention-guided multi-step fusion: a hierarchical fusion network for multimodal recommendation. In: Proceedings of the 46th international acm sigir conference on research and development in information retrieval, pp 1816–1820. https://doi.org/10.1145/3539618.3591950
Jing L, Tian Y (2021) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell:4037–4058. https://doi.org/10.1109/TPAMI.2020.2992393
Mahendran A, Thewlis J, Vedaldi A (2019) Cross pixel optical-flow similarity for self-supervised learning. In: Computer vision–ACCV 2018: 14th asian conference on computer vision, pp 99–116. https://doi.org/10.1007/978-3-030-20873-8_7
Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: generative or contrastive. IEEE Trans Knowl Data Eng:857–876. https://doi.org/10.1109/TKDE.2021.3090866
Veličković P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm D (2019) Deep Graph Infomax. In: International conference on learning representations, p 4
Wei W, Huang C, Xia L, Xu Y, Zhao J, Yin D (2022) Contrastive meta learning with behavior multiplicity for recommendation. In: Proceedings of the fifteenth acm international conference on web search and data mining, pp 1120–1128. https://doi.org/10.1145/3488560.3498527
He R, McAuley J (2016) Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th international conference on world wide web, pp 507–517. https://doi.org/10.1145/2872427.2883037
Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 3982–3992. https://doi.org/10.18653/v1/D19-1410
Chen J, Fang H-R, Saad Y (2009) Fast approximate KNN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res:1989–2012. https://doi.org/10.5555/1577069.1755852
Wei Y, Wang X, Nie L, He X, Chua T-S (2020) Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM international conference on multimedia, pp 3541–3549. https://doi.org/10.1145/3394171.3413556

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (Grant No. 62472263, 62072288), the Taishan Scholar Program of Shandong Province, Shandong Youth Innovation Team, the Natural Science Foundation of Shandong Province (Grant No. ZR2024MF034, ZR2022MF268).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, Shandong, China
Ruidong Wang & Zhongying Zhao
College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao, 266590, Shandong, China
Chao Li

Authors

Ruidong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhongying Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ruidong Wang: Conceptualization, Investigation, Methodology, Writing original draft. Zhongying Zhao: Methodology, Writing - review & editing, Supervision, Funding acquisition. Chao Li: Writing - review & editing.

Corresponding author

Correspondence to Zhongying Zhao.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest to this work.

Ethical and informed consent for data used

The datasets used for this experiment are publicly available by the respective organizations/authors to further improve the Multimodal Recommendation research field. Thus, informed consent is not required to use the dataset. References and citations to relevant datasets are included in the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, R., Li, C. & Zhao, Z. Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network. Appl Intell 55, 2 (2025). https://doi.org/10.1007/s10489-024-06061-1

Download citation

Accepted: 07 September 2024
Published: 18 November 2024
DOI: https://doi.org/10.1007/s10489-024-06061-1

Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A personalized recommendation model with multimodal preference-based graph attention network

Exploiting heterogeneous information isolation and multi-view aggregation for multimodal recommendation

Multimodal heterogeneous graph convolutional network for image recommendation

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A personalized recommendation model with multimodal preference-based graph attention network

Exploiting heterogeneous information isolation and multi-view aggregation for multimodal recommendation

Multimodal heterogeneous graph convolutional network for image recommendation

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation