A graph convolution-based heterogeneous fusion network for multimodal sentiment analysis

Tong Zhao¹,
Junjie Peng ORCID: orcid.org/0000-0002-0143-9862^1,2,
Yansong Huang¹,
Lan Wang¹,
Huiran Zhang¹ &
…
Zesu Cai³

762 Accesses
7 Citations
Explore all metrics

Abstract

Multimodal sentiment analysis leverages various modalities, including text, audio, and video, to determine human sentiment tendencies, which holds significance in fields such as intention understanding and opinion analysis. However, there are two critical challenges in multimodal sentiment analysis: one is how to effectively extract and integrate information from various modalities, which is important for reducing the heterogeneity gap among modalities; the other is how to overcome the problem of information forgetting while modelling long sequences, which leads to significant information loss and adversely affect the fusion performance of modalities. Based on the above issues, this paper proposes a multimodal heterogeneity fusion network based on graph convolutional neural networks (HFNGC). A shared convolutional aggregation mechanism is used to overcome the semantic gap among modalities and reduce the noise effect caused by modality heterogeneity. In addition, the model applies Dynamic Routing to convert modality features into graph structures. By learning semantic information in the graph representation space, our model can improve the capability of remote-dependent learning. Furthermore, the model integrates complementary information among modalities and explores the intra- and inter-modal interactions during the modality fusion stage. To validate the effectiveness of our model, we conduct experiments on two benchmark datasets. The experimental results demonstrate that our method outperforms the existing methods, exhibiting strong generalisation capability and high competitiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

MECG: modality-enhanced convolutional graph for unbalanced multimodal representations

Article 20 December 2024

TEMM: text-enhanced multi-interactive attention and multitask learning network for multimodal sentiment analysis

Article 12 August 2024

ConD2: Contrastive Decomposition Distilling for Multimodal Sentiment Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

In this work, we have used two publicly available datasets, CMU-MOSI dataset and CMU-MOSEI dataset, both of which can be available at https://github.com/A2Zadeh/CMU-MultimodalSDK.

References

Zadeh A, Chen M, Poria S, Cambria E, Morency L (2017) Tensor fusion network for multimodal sentiment analysis. In: Palmer M, Hwa R, Riedel S (eds) Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp 1103–1114. https://doi.org/10.18653/v1/d17-1115
Liu Z, Shen Y, Lakshminarasimhan VB, Liang PP, Zadeh A, Morency L (2018) Efficient low-rank multimodal fusion with modality-specific factors. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, vol 1: Long papers, pp 2247–2256. https://doi.org/10.18653/v1/P18-1209
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wang Y, Shen Y, Liu Z, Liang PP, Zadeh A, Morency L (2019) Words can shift: dynamically adjusting word representations using nonverbal behaviors. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, IAAI 2019, the ninth AAAI symposium on educational advances in artificial intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, pp 7216–7223. https://doi.org/10.1609/aaai.v33i01.33017216
Akhtar MS, Chauhan DS, Ghosal D, Poria S, Ekbal A, Bhattacharyya P (2019) Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, vol 1 (Long and Short Papers). Association for computational linguistics, pp 370–379. https://doi.org/10.18653/v1/n19-1034
Baltrusaitis T, Ahuja C, Morency L (2019) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443. https://doi.org/10.1109/TPAMI.2018.2798607
Article Google Scholar
Gkoumas D, Li Q, Lioma C, Yu Y, Song D (2021) What makes the difference? an empirical comparison of fusion strategies for multimodal language analysis. Inf Fusion 66:184–197. https://doi.org/10.1016/j.inffus.2020.09.005
Article Google Scholar
Abdu SA, Yousef AH, Salem A (2021) Multimodal video sentiment analysis using deep learning approaches, a survey. Inf Fusion 76:204–226. https://doi.org/10.1016/j.inffus.2021.06.003
Article Google Scholar
Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency L-P (2018) Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 32. https://doi.org/10.1609/aaai.v32i1.12021
Mai S, Hu H, Xu J, Xing S (2020) Multi-fusion residual memory network for multimodal human sentiment comprehension. IEEE Trans Affect Comput 13(1):320–334. https://doi.org/10.1109/TAFFC.2020.3000510
Article Google Scholar
Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) Abcdm: an attention-based bidirectional cnn-rnn deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294. https://doi.org/10.1016/j.future.2020.08.005
Article Google Scholar
Wu T, Peng J, Zhang W, Zhang H, Tan S, Yi F, Ma C, Huang Y (2022) Video sentiment analysis with bimodal information-augmented multi-head attention. Knowl Based Syst 235:107676. https://doi.org/10.1016/j.knosys.2021.107676
Article Google Scholar
Wang D, Guo X, Tian Y, Liu J, He L, Luo X (2023) TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis. Pattern Recognit 136:109259. https://doi.org/10.1016/j.patcog.2022.109259
Article Google Scholar
Xue X, Zhang C, Niu Z, Wu X (2023) Multi-level attention map network for multimodal sentiment analysis. IEEE Trans Knowl Data Eng 35(5):5105–5118. https://doi.org/10.1109/TKDE.2022.3155290
Article Google Scholar
Zhu T, Li L, Yang J, Zhao S, Liu H, Qian J (2023) Multimodal sentiment analysis with image-text interaction network. IEEE Trans Multim 25:3375–3385. https://doi.org/10.1109/TMM.2022.3160060
Article Google Scholar
Zhang X, Chen Y, He L (2023) Information block multi-head subspace based long short-term memory networks for sentiment analysis. Appl Intell 53(10):12179–12197. https://doi.org/10.1007/s10489-022-03998-z
Article Google Scholar
Peng J, Wu T, Zhang W, Cheng F, Tan S, Yi F, Huang Y (2023) A fine-grained modal label-based multi-stage network for multimodal sentiment analysis. Expert Syst Appl 221:119721. https://doi.org/10.1016/j.eswa.2023.119721
Article Google Scholar
Chen Q, Huang G, Wang Y (2022) The weighted cross-modal attention mechanism with sentiment prediction auxiliary task for multimodal sentiment analysis. IEEE ACM Trans Audio Speech Lang Process 30:2689–2695. https://doi.org/10.1109/TASLP.2022.3192728
Article Google Scholar
Wu J, Mai S, Hu H (2021) Graph capsule aggregation for unaligned multimodal sequences. In: Proceedings of the 2021 international conference on multimodal interaction, pp 521–529. https://doi.org/10.1145/3462244.3479931
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 3856–3866. https://proceedings.neurips.cc/paper/2017/hash/2cad8fa47bbef282badbb8de5374b894-Abstract.html
Yang J, Wang Y, Yi R, Zhu Y, Rehman A, Zadeh A, Poria S, Morency L-P (2021) Mtag: modal-temporal attention graph for unaligned human multimodal language sequences. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1009–1021. https://doi.org/10.18653/v1/2021.naacl-main.79
Yang X, Feng S, Zhang Y, Wang D (2021) Multimodal sentiment detection based on multi-channel graph neural networks. In: Zong C, Xia F, Li W, Navigli R (eds) Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (vol 1: Long Papers), Virtual Event, August 1-6, 2021. Association for computational linguistics, pp 328–339. https://doi.org/10.18653/v1/2021.acl-long.28
Zeng Z, Sun S, Li Q (2023) Multimodal negative sentiment recognition of online public opinion on public health emergencies based on graph convolutional networks and ensemble learning. Inf Process Manag 60(4):103378. https://doi.org/10.1016/j.ipm.2023.103378
Zhang Y, Tiwari P, Zheng Q, El-Saddik A, Hossain MS (2023) A multimodal coupled graph attention network for joint traffic event detection and sentiment classification. IEEE Trans Intell Transp Syst 24(8):8542–8554. https://doi.org/10.1109/TITS.2022.3205477
Article Google Scholar
Lu Q, Zhu Z, Zhang G, Kang S, Liu P (2021) Aspect-gated graph convolutional networks for aspect-based sentiment analysis. Appl Intell 51(7):4408–4419. https://doi.org/10.1007/s10489-020-02095-3
Article Google Scholar
Xu Q, Peng J, Zheng C, Tan S, Yi F, Cheng F (2023) Short text classification of chinese with label information assisting. ACM Transactions on Asian and Low-Resource Language Information Processing, 1–18. https://doi.org/10.1145/3582301
Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell Syst 31(6):82–88. https://doi.org/10.1109/MIS.2016.94
Article Google Scholar
Zadeh AB, Liang PP, Poria S, Cambria E, Morency L-P (2018) Multimodal language analysis in the wild: cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 2236–2246. https://doi.org/10.18653/v1/P18-1208
Tsai Y-HH, Bai S, Liang PP, Kolter JZ, Morency L-P, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for computational linguistics. Meeting, vol 2019, p 6558. https://doi.org/10.18653/v1/p19-1656
Hazarika D, Zimmermann R, Poria S (2020) MISA: modality-invariant and -specific representations for multimodal sentiment analysis. In: Chen CW, Cucchiara R, Hua X, Qi G, Ricci E, Zhang Z, Zimmermann R (eds) MM ’20: the 28th ACM international conference on multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. ACM, pp 1122–1131. https://doi.org/10.1145/3394171.3413678

Download references

Acknowledgements

The authors would like to thank the funding from the Open Project Program of Shanghai Key Laboratory of Data Science (No. 2020090600004) and the resources and technical support from the High Performance Computing Center of Shanghai University, and Shanghai Engineering Research Center of Intelligent Computing System (No. 19DZ2252600).

Funding

This study was supported by the Open Project Program of Shanghai Key Laboratory of Data Science (No. 2020090600004) and the High Performance Computing Center of Shanghai University, and Shanghai Engineering Research Center of Intelligent Computing System (No. 19DZ2252600).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Tong Zhao, Junjie Peng, Yansong Huang, Lan Wang & Huiran Zhang
Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China
Junjie Peng
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Zesu Cai

Authors

Tong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Junjie Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yansong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huiran Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zesu Cai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

[Tong Zhao]: Conceptualization of this study, Methodology, Software, Writing-Original Draft. [Junjie Peng]: Conceptualization of this study, Writing-Review & Editing, Supervision. [Yansong Huang]: Formal analysis, Visualization. [Lan Wang]: Validation, Investigation. [Huiran Zhang]: Conceptualization of this study, Resources. [Zesu Cai]: Conceptualization of this study, Writing-Review & Editing.

Corresponding author

Correspondence to Junjie Peng.

Ethics declarations

Ethics approval

This article has never been submitted to more than one journal for simultaneous consideration. This article is original.

Consent to participate

The authors have approved this article before submission, including the names and order of authors.

Consent for publication

The authors agreed with the content and gave explicit consent to submit.

Competing interests

The authors declared that they have no conflict of interest to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, T., Peng, J., Huang, Y. et al. A graph convolution-based heterogeneous fusion network for multimodal sentiment analysis. Appl Intell 53, 30455–30468 (2023). https://doi.org/10.1007/s10489-023-05151-w

Download citation

Accepted: 01 November 2023
Published: 18 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05151-w

A graph convolution-based heterogeneous fusion network for multimodal sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MECG: modality-enhanced convolutional graph for unbalanced multimodal representations

TEMM: text-enhanced multi-interactive attention and multitask learning network for multimodal sentiment analysis

ConD2: Contrastive Decomposition Distilling for Multimodal Sentiment Analysis

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A graph convolution-based heterogeneous fusion network for multimodal sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MECG: modality-enhanced convolutional graph for unbalanced multimodal representations

TEMM: text-enhanced multi-interactive attention and multitask learning network for multimodal sentiment analysis

ConD2: Contrastive Decomposition Distilling for Multimodal Sentiment Analysis

Explore related subjects

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation