EGSRNet: Emotion-Label Guiding and Similarity Reasoning Network for Multimodal Sentiment Analysis

Chunlan Zhan¹⁵,
Wenhua Qian¹⁵ &
Peng Liu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15035))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

197 Accesses

Abstract

Multimodal sentiment analysis has attracted many research interests in social media. Existing methods mainly rely on mining the global/local information in the image, to realize the fusion with better text information, while ignoring the inherent semantic information contained in the text. For this purpose, the Emotion-label Guiding and Similarity Reasoning Network(EGSRNet) is proposed, which introduces emotion-label guided text features to extract hidden semantic information to improve local image-text interaction, and realize deeper understanding and analysis of image-text by combining context information. Specifically, the Image-Text Feature Extraction module is used to fully extract the global/local-entity image-text features to improve the utilization rate of vital features. For text features, the emotion-label is introduced to enhance the representation ability of deep semantic information. Secondly, to explicitly calculate the similarity between text and local-entity image features, capture the image-text correlation and fully interact, a Local-Entity Similarity Reasoning module based on the attention mechanism is designed. Finally, multimodal interaction is achieved by combining the global image-text context, and the data/label-based contrastive learning is introduced to improve performance. Experimental results show that the proposed model outperforms the baseline methods on three public datasets.

Supported by the National Natural Science Foundation of China under Grant (62162065), the Joint Special Project Research Foundation of Yunnan Province (202401BF070001-023), and the Yunnan Fundamental Research Projects (202201AT070167), the Yunnan University Research Innovation Project for Recommended Exempt Postgraduates (TM-23236964).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 64.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 79.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal sentiment analysis based on cross-instance graph neural networks

Article 21 February 2024

Multimodal Aspect-Based Sentiment Analysis with External Knowledge and Multi-granularity Image-Text Features

Article Open access 06 March 2025

Cross-Modal Sentiment Analysis Based on Fine-Grained Feature Interaction Learning

References

Cai, Y., Cai, H., Wan, X.: Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Association for Computational Linguistics, pp. 2506–2515 (2019)
Google Scholar
Chen, T., Borth, D., Darrell, T., Chang, S.: DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. Comput. Sci. (2014)
Google Scholar
Chen, Y.: Convolutional neural network for sentence classification. MS thesis. University of Waterloo (2015)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: Practical automated data augmentation with a reduced search space. In: Conference on Computer Vision and Pattern Recognition, pp. 3008–3017 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Association for Computational Linguistics, pp. 4171–4186 (2018)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Houlsby, N.: An image is worth 16×16 words: transformers for image recognition at scale. In: Computer Vision and Pattern Recognition (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Association for Computational Linguistics, pp. 1746–1751 (2014)
Google Scholar
Kumar, A., Srinivasan, K., Cheng, W., Zomaya, A.Y.: Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf. Process. Manag. 57(1), 102141.1–102141.25 (2020)
Google Scholar
Li, Z., Xu, B., Zhu, C., Zhao, T.: CLMLF:A contrastive learning and multi-layer fusion method for multimodal sentiment detection. In: Association for Computational Linguistics, pp. 2282–2294 (2022)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 2579–2605 (2008)
Google Scholar
Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th International Conference on Multimedia, pp. 83–92 (2010)
Google Scholar
Niu, T., Zhu, S., Pang, L., El Saddik, A.: Sentiment analysis on multi-view social data. In: Tian, Q., Sebe, N., Qi, G.J., Huet, B., Hong, R., Liu, X. (eds.) MultiMedia Modeling 2016. LNCS, vol. 9517, pp. 15–27. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27674-8_2
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Association for Computational Linguistics, pp.79-86 (2002)
Google Scholar
Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260 (2010)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)
Article MATH Google Scholar
Schifanella, R., De Juan, P., Tetreault, J., Cao, L.: Detecting sarcasm in multimodal social platforms. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1136–1145 (2016)
Google Scholar
Shin, B., Lee, T., Choi, J.D.: Lexicon integrated CNN models with attention for sentiment analysis. In: Workshop on Computational Approaches to Subjectivity, pp. 149–158 (2017)
Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Article MATH Google Scholar
Turney, P.: Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Association for Computational Linguistics, pp. 417–424 (2002)
Google Scholar
Wang, X., Jia, J., Yin, J., Cai, L.: Interpretable aesthetic features for affective image classification. In: 20th IEEE International Conference on Image Processing(ICIP), pp. 3230–3234 (2013)
Google Scholar
Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. In: Advances in Neural Information Processing Systems, pp. 6256–6268 (2020)
Google Scholar
Xiong, Y., Feng, Y., Wu, H., Kamigaito, H., Okumura, M.: Fusing label embedding into BERT: an efficient improvement for text classification. In: Association for Computational Linguistics, pp. 1743–1750 (2021)
Google Scholar
Xu, N.: Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In: International Conference on Intelligence and Security Informatics, pp. 152–154 (2017)
Google Scholar
Xu, N., Mao, W.: MultiSentiNet: A deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2399–2402 (2017)
Google Scholar
Xu, N., Mao, W., Chen, G: A Co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 929–932 (2018)
Google Scholar
Xu, N., Zeng, Z., Mao, W.: Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In: Association for Computational Linguistics, pp. 3777–3786 (2020)
Google Scholar
Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3266–3272 (2017)
Google Scholar
Yang, X., Feng, S., Wang, D., Zhang, Y.: Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans. Multimedia 23, 4014–4026 (2021)
Article Google Scholar
Yang, X., Feng, S., Zhang, Y., Wang, D.: Multimodal sentiment detection based on multi-channel graph neural networks. In: Association for Computational Linguistics, pp. 328–339 (2021)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Association for Computational Linguistics, pp. 1480–1489 (2016)
Google Scholar
You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: 31st AAAI Conference on Artificial Intelligence, pp. 231–237 (2017)
Google Scholar
Yu, J., Jiang, J., Xia, R.: Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 429–439 (2020)
Article MATH Google Scholar
Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. Trans. Circuits Syst. Video Technol. 24(6), 1–1 (2014)
MATH Google Scholar
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: Association for Computational Linguistics, pp. 207–212 (2016)
Google Scholar
Zhu, T., Li, L., Yang, J., Zhap, S., Liu, H., Qian, J.: Multimodal sentiment analysis with image-text interaction network. Trans. Multimed. 25, 3375–3385 (2023)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
Chunlan Zhan, Wenhua Qian & Peng Liu

Authors

Chunlan Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Wenhua Qian
View author publications
You can also search for this author in PubMed Google Scholar
Peng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenhua Qian .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhan, C., Qian, W., Liu, P. (2025). EGSRNet: Emotion-Label Guiding and Similarity Reasoning Network for Multimodal Sentiment Analysis. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15035. Springer, Singapore. https://doi.org/10.1007/978-981-97-8620-6_25

Download citation

DOI: https://doi.org/10.1007/978-981-97-8620-6_25
Published: 20 October 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8619-0
Online ISBN: 978-981-97-8620-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics