research-article

Modality-Balanced Learning for Multimedia Recommendation

Authors:

Jinghao Zhang,

Guofan Liu,

Qiang Liu,

Shu Wu,

Liang WangAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 7551 - 7560

https://doi.org/10.1145/3664647.3680626

Published: 28 October 2024 Publication History

Get Access

Abstract

Many recommender models have been proposed to investigate how to incorporate multimodal content information into traditional collaborative filtering framework effectively. The use of multimodal information is expected to provide more comprehensive information and lead to superior performance. However, the integration of multiple modalities often encounters the modal imbalance problem: since the information in different modalities is unbalanced, optimizing the same objective across all modalities leads to the under-optimization problem of the weak modalities with a slower convergence rate or lower performance. Even worse, we find that in multimodal recommendation models, all modalities suffer from the problem of insufficient optimization. To address these issues, we propose a Counterfactual Knowledge Distillation method that could solve the imbalance problem and make the best use of all modalities. Through modality-specific knowledge distillation, it could guide the multimodal model to learn modality-specific knowledge from uni-modal teachers. We also design a novel generic-and-specific distillation loss to guide the multimodal student to learn wider-and-deeper knowledge from teachers. Additionally, to adaptively recalibrate the focus of the multimodal model towards weaker modalities during training, we estimate the causal effect of each modality on the training objective using counterfactual inference techniques, through which we could determine the weak modalities, quantify the imbalance degree and re-weight the distillation loss accordingly. Our method could serve as a plug-and-play module for both late-fusion and early-fusion backbones. Extensive experiments on six backbones show that our proposed method can improve the performance by a large margin. The source code will be released at https://github.com/CRIPAC-DIG/Balanced-Multimodal-Rec

References

[1]

Feiyu Chen, Junjie Wang, Yinwei Wei, Hai-Tao Zheng, and Jie Shao. 2022. Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation. In ACM Multimedia. 385--394.

Abstract

References

Index Terms

Recommendations

EliMRec: Eliminating Single-modal Bias in Multimedia Recommendation

A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios

Multi-View Graph Convolutional Network for Multimedia Recommendation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations