More Web Proxy on the site http://driver.im/

research-article

Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation

Authors:

Jie ShaoAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 385 - 394

https://doi.org/10.1145/3503161.3548399

Published: 10 October 2022 Publication History

Abstract

In a multimedia recommender system, rich multimodal dynamics of user-item interactions are worth availing ourselves of and have been facilitated by Graph Convolutional Networks (GCNs). Yet, the typical way of conducting multimodal fusion with GCN-based models is either through graph mergence fusion that delivers insufficient inter-modal dynamics, or through node alignment fusion that brings in noises which potentially harm multimodal modelling. Unlike existing works, we propose EgoGCN, a structure that seeks to enhance multimodal learning of user-item interactions. At its core is a simple yet effective fusion operation dubbed EdGe-wise mOdulation (EGO) fusion. EGO fusion adaptively distils edge-wise multimodal information and learns to modulate each unimodal node under the supervision of other modalities. It breaks isolated unimodal propagations, allows the most informative inter-modal messages to spread, whilst preserving intra-modal processing. We present a hard modulation and a soft modulation to fully investigate the multimodal dynamics behind. Experiments on two real-world datasets show that EgoGCN comfortably beats prior methods.

Supplementary Material

MP4 File (MM22-fp3047.mp4)

Presentation video of paper 3047

Download
23.68 MB

References

[1]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.

[2]

Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 41, 2 (2019), 423--443.

Digital Library

[3]

Desheng Cai, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2022. Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-video Recommendation. IEEE Trans. Multim., Vol. 24 (2022), 805--818.

[4]

Feiyu Chen, Zhengxiao Sun, Deqiang Ouyang, Xueliang Liu, and Jie Shao. 2021b. Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. 1064--1073.

Digital Library

[5]

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. 335--344.

Digital Library

[6]

Xiaolin Chen, Xuemeng Song, Guozhen Peng, Shanshan Feng, and Liqiang Nie. 2021a. Adversarial-Enhanced Hybrid Graph Network for User Identity Linkage. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. 1084--1093.

[7]

Yashar Deldjoo, Markus Schedl, Balázs Hidasi, Yinwei Wei, and Xiangnan He. 2022. Multimedia Recommender Systems: Algorithms and Challenges. In Recommender Systems Handbook. Springer, 973--1014.

[8]

William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 1024--1034.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. 770--778.

[10]

Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA. 144--150.

[11]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. 639--648.

Digital Library

[12]

Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin W. Wilson. 2017. CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. 131--135.

Digital Library

[13]

Andreas Holzinger, Bernd Malle, Anna Saranti, and Bastian Pfeifer. 2021. Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI. Inf. Fusion, Vol. 71 (2021), 28--37.

[14]

Jingwen Hu, Yuchen Liu, Jinming Zhao, and Qin Jin. 2021. MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. 5666--5675.

[15]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.

[16]

Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. 2019. Disentangled Graph Convolutional Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. 4212--4221.

[17]

Sijie Mai, Haifeng Hu, and Songlong Xing. 2019. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-August 2, 2019, Volume 1: Long Papers. 481--492.

[18]

Wasifur Rahman, Md. Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Mohammed E. Hoque. 2020. Integrating Multimodal Information in Large Pretrained Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. 2359--2369.

[19]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009. 452--461.

Digital Library

[20]

Rui Sun, Xuezhi Cao, Yan Zhao, Junchen Wan, Kun Zhou, Fuzheng Zhang, Zhongyuan Wang, and Kai Zheng. 2020. Multi-modal Knowledge Graphs for Recommender Systems. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020. 1405--1414.

Digital Library

[21]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.

[22]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019. 165--174.

Digital Library

[23]

Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, and Junzhou Huang. 2020. Deep Multimodal Fusion by Channel Exchanging. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual.

[24]

Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat-Seng Chua. 2022. Hierarchical User Intent Graph Network for Multimedia Recommendation. IEEE Trans. Multim., Vol. 24 (2022), 2701--2712.

Digital Library

[25]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. 3541--3549.

[26]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video. In Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019. 1437--1445.

Digital Library

[27]

Jiancan Wu, Xiang Wang, Fuli Feng, Xiangnan He, Liang Chen, Jianxun Lian, and Xing Xie. 2021. Self-supervised Graph Learning for Recommendation. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. 726--735.

Digital Library

[28]

Lianghao Xia, Yong Xu, Chao Huang, Peng Dai, and Liefeng Bo. 2021. Graph Meta Network for Multi-Behavior Recommendation. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. 757--766.

[29]

Jianing Yang, Yongxin Wang, Ruitao Yi, Yuying Zhu, Azaan Rehman, Amir Zadeh, Soujanya Poria, and Louis-Philippe Morency. 2021. MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021. 1009--1021.

[30]

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. 3872--3880.

Digital Library

Cited By

Li PZhan WGao LWang SYang L(2025)Multimodal Recommendation System Based on Cross Self-Attention FusionSystems10.3390/systems1301005713:1(57)Online publication date: 17-Jan-2025
https://doi.org/10.3390/systems13010057
Yu LHu JDu QNiu X(2025)MVideoRec: Micro Video Recommendations through Modality Decomposition and Contrastive LearningACM Transactions on Information Systems10.1145/371185543:3(1-27)Online publication date: 24-Jan-2025
https://dl.acm.org/doi/10.1145/3711855
Wang ZFeng YZhang XYang RDu B(2025)Multi-Modal Correction Network for RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.349337437:2(810-822)Online publication date: Feb-2025
https://doi.org/10.1109/TKDE.2024.3493374
Show More Cited By

Index Terms

Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Mining Latent Structures for Multimedia Recommendation
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Multimedia content is of predominance in the modern Web era. Investigating how users interact with multimodal items is a continuing concern within the rapid development of recommender systems. The majority of previous work focuses on modeling user-item ...
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Most recommender systems adopt collaborative filtering (CF) and provide recommendations based on past collective interactions. Therefore, the performance of CF algorithms degrades when few or no interactions are available, a scenario referred to as cold-...
Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Multi-sensory data has exhibited a clear advantage in expressing richer and more complex feelings, on the Emotion Recognition in Conversation (ERC) task. Yet, current methods for multimodal dynamics that aggregate modalities or employ additional ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of Guangdong Province
Shenzhen Fundamental Research Program
National Natural Science Foundation of China
Beijing Academy of Artificial Intelligence
Overseas Cooperation Research Fund of Tsinghua Shenzhen International Graduate School

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
708
Total Downloads

Downloads (Last 12 months)210
Downloads (Last 6 weeks)19

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li PZhan WGao LWang SYang L(2025)Multimodal Recommendation System Based on Cross Self-Attention FusionSystems10.3390/systems1301005713:1(57)Online publication date: 17-Jan-2025
https://doi.org/10.3390/systems13010057
Yu LHu JDu QNiu X(2025)MVideoRec: Micro Video Recommendations through Modality Decomposition and Contrastive LearningACM Transactions on Information Systems10.1145/371185543:3(1-27)Online publication date: 24-Jan-2025
https://dl.acm.org/doi/10.1145/3711855
Wang ZFeng YZhang XYang RDu B(2025)Multi-Modal Correction Network for RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.349337437:2(810-822)Online publication date: Feb-2025
https://doi.org/10.1109/TKDE.2024.3493374
Zhou BLiang Y(2024)UPGCN: User Perception-Guided Graph Convolutional Network for Multimodal RecommendationApplied Sciences10.3390/app14221018714:22(10187)Online publication date: 6-Nov-2024
https://doi.org/10.3390/app142210187
Ji WFei HWei YZheng ZLi JChen LLiao LZhuang YZimmermann RJi WFei HZheng ZFei HWei YZheng Z(2024)The 2nd International Workshop on Deep Multi-modal Generation and RetrievalProceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval10.1145/3689091.3690093(1-6)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689091.3690093
Zhang JLiu GLiu QWu SWang LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Modality-Balanced Learning for Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680626(7551-7560)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680626
Malitesta DCornacchia GPomo CMerra FDi Noia TDi Sciascio E(2024)Formalizing Multimedia Recommendation through Multimodal Deep LearningACM Transactions on Recommender Systems10.1145/3662738Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3662738
Deng JWang SWang YQi JZhao LZhou GMeng GBaeza-Yates RBonchi F(2024)MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour ExpansionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671511(4896-4905)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671511
Shang YGao CChen JJin DLi YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Improving Item-side Fairness of Multimodal Recommendation via Modality DebiasingProceedings of the ACM Web Conference 202410.1145/3589334.3648156(4697-4705)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3648156
Zhao GZhang XTang HShen JQian X(2024)Domain-Oriented Knowledge Transfer for Cross-Domain RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.339468626(9539-9550)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3394686
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten