[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3503161.3548399acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation

Published: 10 October 2022 Publication History

Abstract

In a multimedia recommender system, rich multimodal dynamics of user-item interactions are worth availing ourselves of and have been facilitated by Graph Convolutional Networks (GCNs). Yet, the typical way of conducting multimodal fusion with GCN-based models is either through graph mergence fusion that delivers insufficient inter-modal dynamics, or through node alignment fusion that brings in noises which potentially harm multimodal modelling. Unlike existing works, we propose EgoGCN, a structure that seeks to enhance multimodal learning of user-item interactions. At its core is a simple yet effective fusion operation dubbed EdGe-wise mOdulation (EGO) fusion. EGO fusion adaptively distils edge-wise multimodal information and learns to modulate each unimodal node under the supervision of other modalities. It breaks isolated unimodal propagations, allows the most informative inter-modal messages to spread, whilst preserving intra-modal processing. We present a hard modulation and a soft modulation to fully investigate the multimodal dynamics behind. Experiments on two real-world datasets show that EgoGCN comfortably beats prior methods.

Supplementary Material

MP4 File (MM22-fp3047.mp4)
Presentation video of paper 3047

References

[1]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
[2]
Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 41, 2 (2019), 423--443.
[3]
Desheng Cai, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2022. Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-video Recommendation. IEEE Trans. Multim., Vol. 24 (2022), 805--818.
[4]
Feiyu Chen, Zhengxiao Sun, Deqiang Ouyang, Xueliang Liu, and Jie Shao. 2021b. Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. 1064--1073.
[5]
Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. 335--344.
[6]
Xiaolin Chen, Xuemeng Song, Guozhen Peng, Shanshan Feng, and Liqiang Nie. 2021a. Adversarial-Enhanced Hybrid Graph Network for User Identity Linkage. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. 1084--1093.
[7]
Yashar Deldjoo, Markus Schedl, Balázs Hidasi, Yinwei Wei, and Xiangnan He. 2022. Multimedia Recommender Systems: Algorithms and Challenges. In Recommender Systems Handbook. Springer, 973--1014.
[8]
William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 1024--1034.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. 770--778.
[10]
Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA. 144--150.
[11]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. 639--648.
[12]
Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin W. Wilson. 2017. CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. 131--135.
[13]
Andreas Holzinger, Bernd Malle, Anna Saranti, and Bastian Pfeifer. 2021. Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI. Inf. Fusion, Vol. 71 (2021), 28--37.
[14]
Jingwen Hu, Yuchen Liu, Jinming Zhao, and Qin Jin. 2021. MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. 5666--5675.
[15]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
[16]
Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled Graph Convolutional Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. 4212--4221.
[17]
Sijie Mai, Haifeng Hu, and Songlong Xing. 2019. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-August 2, 2019, Volume 1: Long Papers. 481--492.
[18]
Wasifur Rahman, Md. Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Mohammed E. Hoque. 2020. Integrating Multimodal Information in Large Pretrained Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. 2359--2369.
[19]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009. 452--461.
[20]
Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal Knowledge Graphs for Recommender Systems. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020. 1405--1414.
[21]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
[22]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019. 165--174.
[23]
Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, and Junzhou Huang. 2020. Deep Multimodal Fusion by Channel Exchanging. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual.
[24]
Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat-Seng Chua. 2022. Hierarchical User Intent Graph Network for Multimedia Recommendation. IEEE Trans. Multim., Vol. 24 (2022), 2701--2712.
[25]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. 3541--3549.
[26]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video. In Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019. 1437--1445.
[27]
Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self-supervised Graph Learning for Recommendation. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. 726--735.
[28]
Lianghao Xia, Yong Xu, Chao Huang, Peng Dai, and Liefeng Bo. 2021. Graph Meta Network for Multi-Behavior Recommendation. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. 757--766.
[29]
Jianing Yang, Yongxin Wang, Ruitao Yi, Yuying Zhu, Azaan Rehman, Amir Zadeh, Soujanya Poria, and Louis-Philippe Morency. 2021. MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021. 1009--1021.
[30]
Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. 3872--3880.

Cited By

View all
  • (2024)UPGCN: User Perception-Guided Graph Convolutional Network for Multimodal RecommendationApplied Sciences10.3390/app14221018714:22(10187)Online publication date: 6-Nov-2024
  • (2024)The 2nd International Workshop on Deep Multi-modal Generation and RetrievalProceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval10.1145/3689091.3690093(1-6)Online publication date: 28-Oct-2024
  • (2024)Modality-Balanced Learning for Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680626(7551-7560)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph fusion
    2. multimedia recommendation
    3. multimodal dynamics

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)208
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)UPGCN: User Perception-Guided Graph Convolutional Network for Multimodal RecommendationApplied Sciences10.3390/app14221018714:22(10187)Online publication date: 6-Nov-2024
    • (2024)The 2nd International Workshop on Deep Multi-modal Generation and RetrievalProceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval10.1145/3689091.3690093(1-6)Online publication date: 28-Oct-2024
    • (2024)Modality-Balanced Learning for Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680626(7551-7560)Online publication date: 28-Oct-2024
    • (2024)Formalizing Multimedia Recommendation through Multimodal Deep LearningACM Transactions on Recommender Systems10.1145/3662738Online publication date: 29-Apr-2024
    • (2024)MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour ExpansionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671511(4896-4905)Online publication date: 25-Aug-2024
    • (2024)Improving Item-side Fairness of Multimodal Recommendation via Modality DebiasingProceedings of the ACM Web Conference 202410.1145/3589334.3648156(4697-4705)Online publication date: 13-May-2024
    • (2024)Domain-Oriented Knowledge Transfer for Cross-Domain RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.339468626(9539-9550)Online publication date: 2024
    • (2024)A Hierarchical Multi-modal Content-Based approach to Graph-based Recommendation System2024 International Conference on Computational Intelligence and Computing Applications (ICCICA)10.1109/ICCICA60014.2024.10585224(394-399)Online publication date: 23-May-2024
    • (2024)CMC-MMR: multi-modal recommendation model with cross-modal correctionJournal of Intelligent Information Systems10.1007/s10844-024-00848-x62:5(1187-1211)Online publication date: 1-Oct-2024
    • (2023)Adaptive Multi-Modalities Fusion in Sequential Recommendation SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614775(843-853)Online publication date: 21-Oct-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media