Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

Zikai Zhou¹,
Baiyou Qiao¹,
Haisong Feng²,
Donghong Han¹ &
…
Gang Wu¹

166 Accesses
Explore all metrics

Abstract

To fulfill the explosion of multi-modal data, multi-modal sentiment analysis (MSA) emerged and attracted widespread attention. Unfortunately, conventional multi-modal research relies on large-scale datasets. On the one hand, collecting and annotating large-scale datasets is challenging and resource-intensive. On the other hand, the training on large-scale datasets also increases the research cost. However, the few-shot MSA (FMSA), which is proposed recently, requires only few samples for training. Therefore, in comparison, it is more practical and realistic. There have been approaches to investigating the prompt-based method in the field of FMSA, but they have not sufficiently considered or leveraged the information specificity of visual modality. Thus, we propose a vision-enhanced prompt-based model based on graph structure to better utilize vision information for fusion and collaboration in encoding and optimizing prompt representations. Specifically, we first design an aggregation-based multi-modal attention module. Then, based on this module and the biaffine attention, we construct a syntax–semantic dual-channel graph convolutional network to optimize the encoding of learnable prompts by understanding the vision-enhanced information in semantic and syntactic knowledge. Finally, we propose a collaboration-based optimization module based on the collaborative attention mechanism, which employs visual information to collaboratively optimize prompt representations. Extensive experiments conducted on both coarse-grained and fine-grained MSA datasets have demonstrated that our model significantly outperforms the baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Prompt Learning-Based Few-Shot Sentiment Analysis

Article 28 March 2023

MACSA: A multimodal aspect-category sentiment analysis dataset with multimodal fine-grained aligned annotations

Article 09 March 2024

Multi-granularity visual-textual jointly modeling for aspect-level multimodal sentiment analysis

Article 21 October 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

We have given the accompanying official websites of data; the completed data will be made available on reasonable request.

Notes

References

Truong Q-T, Lauw HW (2019) Vistanet: visual aspect attention network for multimodal sentiment analysis. Proc AAAI Conf Artif Intell 33(01):305–312. https://doi.org/10.1609/aaai.v33i01.3301305
Article Google Scholar
Cai Y, Cai H, Wan X (2019) Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2506–2515. Association for Computational Linguistics, Florence, Italy . https://doi.org/10.18653/v1/P19-1239
Jia J, Zhou S, Yin Y, Wu B, Chen W, Meng F, Wang Y (2019) Inferring emotions from large-scale internet voice data. IEEE Trans Multimed 21(7):1853–1866. https://doi.org/10.1109/TMM.2018.2887016
Article Google Scholar
Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: Tian Q, Sebe N, Qi G-J, Huet B, Hong R, Liu X (eds) MultiMedia Modeling Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-27674-8_2
Chapter Google Scholar
Yu Y, Zhang D, Li S (2022) Unified multi-modal pre-training for few-shot sentiment analysis with prompt-based learning. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 189–198. ACM, Lisboa Portugal https://doi.org/10.1145/3503161.3548306
Ju X, Zhang D, Xiao R, Li J, Li S, Zhang M, Zhou G 2021 Joint multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection. In: Proceedings of theConference on Empirical Methods in Natural Language Processing, pp. 4395–4405. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.360
Xu L, Lu X, Yuan C, Zhang X, Xu H, Yuan H, Wei G, Pan X, Tian X, Qin L. Hai H (2021) FewCLUE: A Chinese Few-Shot Learning Evaluation Benchmark. arXiv . https://doi.org/10.48550/arXiv.2107.07498
Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G., Lakshmi, S.: Contrastive Language-Image Pre-Training for the Italian Language. arXiv (2021). https://doi.org/10.48550/arXiv.2108.08688
Tsimpoukelli, M., Menick, J., Cabi, S., Eslami, S.M.A., Vinyals, O., Hill, F.: Multimodal Few-Shot Learning with Frozen Language Models. arXiv (2021)
Yu, Y, Zhang D, Li S (2022) Unified multi-modal pre-training for few-shot sentiment analysis with prompt-based learning. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 189–198. ACM, Lisboa Portugal . https://doi.org/10.1145/3503161.3548306
Yu, Y., Zhang, D.: Few-shot multi-modal sentiment analysis with prompt-based vision-aware language modeling. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022). https://doi.org/10.1109/ICME52920.2022.9859654
Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 929–932. ACM, Ann Arbor MI USA (2018). https://doi.org/10.1145/3209978.3210093
Yang X, Feng S, Wang D, Zhang Y (2021) Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans Multimed 23:4014–4026. https://doi.org/10.1109/TMM.2020.3035277
Article Google Scholar
Xu N, Mao W, Chen G (2019) Multi-interactive memory network for aspect based multimodal sentiment analysis. Proc AAAI Conf Artif Intell 33(01):371–378. https://doi.org/10.1609/aaai.v33i01.3301371
Article Google Scholar
YU, J., JIANG, J.: Adapting bert for target-oriented multimodal sentiment classification. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 5408–5414 (2019) https://doi.org/10.24963/ijcai.2019/751
Zhou J, Zhao J, Huang JX, Hu QV, He L (2021) Masad: a large-scale dataset for multimodal aspect-based sentiment analysis. Neurocomputing 455:47–58. https://doi.org/10.1016/j.neucom.2021.05.040
Article Google Scholar
Gao, T., Fisch, A., Chen, D.: Making Pre-Trained Language Models Better Few-Shot Learners. arXiv (2021). https://doi.org/10.48550/arXiv.2012.15723
Gu, Y., Han, X., Liu, Z., Huang, M.: PPT: Pre-Trained Prompt Tuning for Few-Shot Learning. arXiv (2022). https://doi.org/10.48550/arXiv.2109.04332
Schick, T., Schütze, H.: It’s not just size that matters: Small language models are also few-shot learners. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2339–2352. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.185
Şahin, G.G., Steedman, M.: Data augmentation via dependency tree morphing for low-resource languages. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 5004–5009. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1545
Wei, J., Zou, K.: Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6382–6388. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1670
Zhang P, Chai T, Xu Y (2023) Adaptive prompt learning-based few-shot sentiment analysis. Neural Process Lett. https://doi.org/10.1007/s11063-023-11259-4
Article Google Scholar
Ji Y, Zhang H, Jonathan Wu (2018) Salient object detection via multi-scale attention cnn. Neurocomputing 322:130–140. https://doi.org/10.1016/j.neucom.2018.09.061
Article Google Scholar
Qian X, Fu Y, Xiang T, Jiang Y-G, Xue X (2020) Leader-based multi-scale attention deep architecture for person re-identification. IEEE Trans Pattern Anal Mach Intell 42(2):371–385. https://doi.org/10.1109/TPAMI.2019.2928294
Article Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical Question-Image Co-Attention for Visual Question Answering. arXiv (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv (2019). https://doi.org/10.48550/arXiv.1810.04805
YU, J., JIANG, J., YANG, L., XIA, R.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3342–3352 (2020) https://doi.org/10.18653/v1/2020.acl-main.306
Myerson J, Green L, Warusawitharana M (2001) Area under the curve as a measure of discounting. J Exp Anal Behav 76(2):235–243. https://doi.org/10.1901/jeab.2001.76-235
Article Google Scholar

Download references

Acknowledgments

The study was funded by the National Natural Science Foundation of China (Grant no. 61672144). Correspondence should be addressed to Baiyou Qiao.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China
Zikai Zhou, Baiyou Qiao, Donghong Han & Gang Wu
School of Informatics, Xiamen University, Xiamen, 361105, China
Haisong Feng

Authors

Zikai Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Baiyou Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Haisong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Han
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baiyou Qiao.

Ethics declarations

Conflict of interest

We assert that there are no known competing financial interests or personal relationships that could have an impact on the results presented in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, Z., Qiao, B., Feng, H. et al. Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis. Neural Comput & Applic 36, 21091–21105 (2024). https://doi.org/10.1007/s00521-024-10297-w

Download citation

Received: 29 September 2023
Accepted: 29 July 2024
Published: 22 August 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s00521-024-10297-w

Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Prompt Learning-Based Few-Shot Sentiment Analysis

MACSA: A multimodal aspect-category sentiment analysis dataset with multimodal fine-grained aligned annotations

Multi-granularity visual-textual jointly modeling for aspect-level multimodal sentiment analysis

Data availability

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Prompt Learning-Based Few-Shot Sentiment Analysis

MACSA: A multimodal aspect-category sentiment analysis dataset with multimodal fine-grained aligned annotations

Multi-granularity visual-textual jointly modeling for aspect-level multimodal sentiment analysis

Explore related subjects

Data availability

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation