[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

To fulfill the explosion of multi-modal data, multi-modal sentiment analysis (MSA) emerged and attracted widespread attention. Unfortunately, conventional multi-modal research relies on large-scale datasets. On the one hand, collecting and annotating large-scale datasets is challenging and resource-intensive. On the other hand, the training on large-scale datasets also increases the research cost. However, the few-shot MSA (FMSA), which is proposed recently, requires only few samples for training. Therefore, in comparison, it is more practical and realistic. There have been approaches to investigating the prompt-based method in the field of FMSA, but they have not sufficiently considered or leveraged the information specificity of visual modality. Thus, we propose a vision-enhanced prompt-based model based on graph structure to better utilize vision information for fusion and collaboration in encoding and optimizing prompt representations. Specifically, we first design an aggregation-based multi-modal attention module. Then, based on this module and the biaffine attention, we construct a syntax–semantic dual-channel graph convolutional network to optimize the encoding of learnable prompts by understanding the vision-enhanced information in semantic and syntactic knowledge. Finally, we propose a collaboration-based optimization module based on the collaborative attention mechanism, which employs visual information to collaboratively optimize prompt representations. Extensive experiments conducted on both coarse-grained and fine-grained MSA datasets have demonstrated that our model significantly outperforms the baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

We have given the accompanying official websites of data; the completed data will be made available on reasonable request.

Notes

  1. https://github.com/jefferyYu/TomBERT.

  2. http://mcrlab.net/research/mvsa-sentiment-analysis-on-multi-view-social-data/.

References

  1. Truong Q-T, Lauw HW (2019) Vistanet: visual aspect attention network for multimodal sentiment analysis. Proc AAAI Conf Artif Intell 33(01):305–312. https://doi.org/10.1609/aaai.v33i01.3301305

    Article  Google Scholar 

  2. Cai Y, Cai H, Wan X (2019) Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2506–2515. Association for Computational Linguistics, Florence, Italy . https://doi.org/10.18653/v1/P19-1239

  3. Jia J, Zhou S, Yin Y, Wu B, Chen W, Meng F, Wang Y (2019) Inferring emotions from large-scale internet voice data. IEEE Trans Multimed 21(7):1853–1866. https://doi.org/10.1109/TMM.2018.2887016

    Article  Google Scholar 

  4. Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: Tian Q, Sebe N, Qi G-J, Huet B, Hong R, Liu X (eds) MultiMedia Modeling Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-27674-8_2

    Chapter  Google Scholar 

  5. Yu Y, Zhang D, Li S (2022) Unified multi-modal pre-training for few-shot sentiment analysis with prompt-based learning. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 189–198. ACM, Lisboa Portugal https://doi.org/10.1145/3503161.3548306

  6. Ju X, Zhang D, Xiao R, Li J, Li S, Zhang M, Zhou G 2021 Joint multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection. In: Proceedings of theConference on Empirical Methods in Natural Language Processing, pp. 4395–4405. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.360

  7. Xu L, Lu X, Yuan C, Zhang X, Xu H, Yuan H, Wei G, Pan X, Tian X, Qin L. Hai H (2021) FewCLUE: A Chinese Few-Shot Learning Evaluation Benchmark. arXiv . https://doi.org/10.48550/arXiv.2107.07498

  8. Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G., Lakshmi, S.: Contrastive Language-Image Pre-Training for the Italian Language. arXiv (2021). https://doi.org/10.48550/arXiv.2108.08688

  9. Tsimpoukelli, M., Menick, J., Cabi, S., Eslami, S.M.A., Vinyals, O., Hill, F.: Multimodal Few-Shot Learning with Frozen Language Models. arXiv (2021)

  10. Yu, Y, Zhang D, Li S (2022) Unified multi-modal pre-training for few-shot sentiment analysis with prompt-based learning. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 189–198. ACM, Lisboa Portugal . https://doi.org/10.1145/3503161.3548306

  11. Yu, Y., Zhang, D.: Few-shot multi-modal sentiment analysis with prompt-based vision-aware language modeling. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2022). https://doi.org/10.1109/ICME52920.2022.9859654

  12. Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 929–932. ACM, Ann Arbor MI USA (2018). https://doi.org/10.1145/3209978.3210093

  13. Yang X, Feng S, Wang D, Zhang Y (2021) Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans Multimed 23:4014–4026. https://doi.org/10.1109/TMM.2020.3035277

    Article  Google Scholar 

  14. Xu N, Mao W, Chen G (2019) Multi-interactive memory network for aspect based multimodal sentiment analysis. Proc AAAI Conf Artif Intell 33(01):371–378. https://doi.org/10.1609/aaai.v33i01.3301371

    Article  Google Scholar 

  15. YU, J., JIANG, J.: Adapting bert for target-oriented multimodal sentiment classification. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 5408–5414 (2019) https://doi.org/10.24963/ijcai.2019/751

  16. Zhou J, Zhao J, Huang JX, Hu QV, He L (2021) Masad: a large-scale dataset for multimodal aspect-based sentiment analysis. Neurocomputing 455:47–58. https://doi.org/10.1016/j.neucom.2021.05.040

    Article  Google Scholar 

  17. Gao, T., Fisch, A., Chen, D.: Making Pre-Trained Language Models Better Few-Shot Learners. arXiv (2021). https://doi.org/10.48550/arXiv.2012.15723

  18. Gu, Y., Han, X., Liu, Z., Huang, M.: PPT: Pre-Trained Prompt Tuning for Few-Shot Learning. arXiv (2022). https://doi.org/10.48550/arXiv.2109.04332

  19. Schick, T., Schütze, H.: It’s not just size that matters: Small language models are also few-shot learners. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2339–2352. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.185

  20. Şahin, G.G., Steedman, M.: Data augmentation via dependency tree morphing for low-resource languages. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 5004–5009. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1545

  21. Wei, J., Zou, K.: Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6382–6388. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1670

  22. Zhang P, Chai T, Xu Y (2023) Adaptive prompt learning-based few-shot sentiment analysis. Neural Process Lett. https://doi.org/10.1007/s11063-023-11259-4

    Article  Google Scholar 

  23. Ji Y, Zhang H, Jonathan Wu (2018) Salient object detection via multi-scale attention cnn. Neurocomputing 322:130–140. https://doi.org/10.1016/j.neucom.2018.09.061

    Article  Google Scholar 

  24. Qian X, Fu Y, Xiang T, Jiang Y-G, Xue X (2020) Leader-based multi-scale attention deep architecture for person re-identification. IEEE Trans Pattern Anal Mach Intell 42(2):371–385. https://doi.org/10.1109/TPAMI.2019.2928294

    Article  Google Scholar 

  25. Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical Question-Image Co-Attention for Visual Question Answering. arXiv (2017)

  26. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv (2019). https://doi.org/10.48550/arXiv.1810.04805

  27. YU, J., JIANG, J., YANG, L., XIA, R.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3342–3352 (2020) https://doi.org/10.18653/v1/2020.acl-main.306

  28. Myerson J, Green L, Warusawitharana M (2001) Area under the curve as a measure of discounting. J Exp Anal Behav 76(2):235–243. https://doi.org/10.1901/jeab.2001.76-235

    Article  Google Scholar 

Download references

Acknowledgments

The study was funded by the National Natural Science Foundation of China (Grant no. 61672144). Correspondence should be addressed to Baiyou Qiao.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baiyou Qiao.

Ethics declarations

Conflict of interest

We assert that there are no known competing financial interests or personal relationships that could have an impact on the results presented in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Z., Qiao, B., Feng, H. et al. Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis. Neural Comput & Applic 36, 21091–21105 (2024). https://doi.org/10.1007/s00521-024-10297-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-10297-w

Keywords

Navigation