[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Code availability

Not Applicable

Availability of data and materials

Not Applicable

References

  1. Sánchez-Rada JF, Iglesias, CA (2019) Social context in sentiment analysis: Formal definition, overview of current trends and framework for comparison. Inf Fus 52:344–356. https://doi.org/10.1016/j.inffus.2019.05.003

  2. Praveena HD, Guptha NS, Kazemzadeh A, Parameshachari BD, Hemalatha KL (2022) Effective cbmir system using hybrid features-based independent condensed nearest neighbor model. J Health Eng 2022:1–9. https://doi.org/10.1155/2022/3297316

    Article  Google Scholar 

  3. Thouheed Ahmed SS, Thanuja K,Guptha NS, Narasimha S (2016) Telemedicine approach for remote patient monitoring system using smart phones with an economical hardware kit. In: 2016 international conference on computing technologies and intelligent data engineering (ICCTIDE’16), pp 1–4. https://doi.org/10.1109/ICCTIDE.2016.7725324

  4. Guptha N, Patil K (2018) Detection of macro and micro nodule using online region based-active contour model in histopathological liver cirrhosis. Int J Intell Eng Syst 11:256–265. https://doi.org/10.1109/ICCTIDE.2016.7725324

    Article  Google Scholar 

  5. Guptha NS, Patil KK (2017) Earth mover’s distance-based cbir using adaptive regularised kernel fuzzy c-means method of liver cirrhosis histopathological segmentation. Int J Signal Imaging Syst Eng 10:39. https://doi.org/10.1504/IJSISE.2017.084568

    Article  Google Scholar 

  6. Abd El-Jawad MH, Hodhod R, Omar YMK (2018) Sentiment analysis of social media networks using machine learning. In: 2018 14th international computer engineering conference (ICENCO), pp 174–176. https://doi.org/10.1109/ICENCO.2018.8636124

  7. Zadeh A, Chen M, Poria S, Cambria E, Morency L-P (2017) Tensor fusion network for multimodal sentiment analysis 1

  8. Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2539–2544. Assoc Comput Linguist, Lisbon, Portugal. https://doi.org/10.18653/v1/D15-1303

  9. Cambria E, White B (2014) Jumping nlp curves: A review of natural language processing research [review article]. IEEE computational intelligence magazine 9:48–57. https://doi.org/10.1109/MCI.2014.2307227

    Article  Google Scholar 

  10. Ahmed ST, Guptha NS, Lavanya NL, Basha SM, Fathima AS (2022) Improving medical image pixel quality using micq unsupervised machine learning technique. Malays J Comput Sci 53–64. https://doi.org/10.22452/mjcs.sp2022no2.5

  11. Guptha NS, Balamurugan V, Megharaj G, Sattar KNA, Rose JD (2011) Cross lingual handwritten character recognition using long short term memory network with aid of elephant herding optimization algorithm. Pattern Recogn Lett 159:16–22. https://doi.org/10.1016/j.patrec.2022.04.038

    Article  Google Scholar 

  12. Liu Z, Shen Y, Lakshminarasimhan VB, Liang PP, Bagher Zadeh A, Morency L-P (2018) Efficient low-rank multimodal fusion with modality-specific factors. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2247–2256. Assoc Comput Linguist, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1209

  13. Hu G, Hua Y, Yuan Y, Zhang Z, Lu Z, Mukherjee SS, Hospedales TM, Robertson NM, Yang Y (2017) Attribute-enhanced face recognition with neural tensor fusion networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 3764–3773. https://doi.org/10.1109/ICCV.2017.404

  14. Chen M, Wang S, Liang PP, Baltrušaitis T, Zadeh A, Morency L-P (2017) Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM international conference on multimodal interaction. ICMI ’17, pp 163–171. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3136755.3136801

  15. Mai S, Hu H, Xing S (2020) Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion. Proceedings of the AAAI conference on artificial intelligence 34:164–172. https://doi.org/10.1609/aaai.v34i01.5347

    Article  Google Scholar 

  16. Yang J, She D, Sun M, Cheng M-M, Rosin PL, Wang L (2018) Visual sentiment prediction based on automatic discovery of affective regions. IEEE transactions on multimedia 20:2513–2525. https://doi.org/10.1109/TMM.2018.2803520

    Article  Google Scholar 

  17. Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency L-P (2017) Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 873–883. Assoc Comput Linguist, Vancouver, Canada. https://doi.org/10.18653/v1/P17-1081

  18. Gu Y, Li X, Huang K, Fu S, Yang K, Chen S, Zhou M, Marsic I (2018) Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder. In: Proceedings of the 26th ACM international conference on multimedia. MM ’18, pp 537–545. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3240508.3240714

  19. Akhtar MS, Chauhan DS, Ghosal D, Poria S, Ekbal A, Bhattacharyya P (2019) Multi-task learning for multi-modal emotion recognition and sentiment analysis 1

  20. Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image-text sentiment analysis via deep multimodal attentive fusion. Knowledge-Based Systems 167:26–37. https://doi.org/10.1016/j.knosys.2019.01.019

    Article  Google Scholar 

  21. Pham H, Manzini T, Liang PP, Poczós B (2018) Seq2Seq2Sentiment: Multimodal sequence to sequence models for sentiment analysis. In: Proceedings of grand challenge and workshop on human multimodal language (Challenge-HML), pp 53–63. Assoc Comput Linguist, Melbourne, Australia. https://doi.org/10.18653/v1/W18-3308

  22. Pham H, Liang PP, Manzini T, Morency L-P, Póczos B (2019) Found in translation: Learning robust joint representations by cyclic translations between modalities. Proceedings of the AAAI conference on artificial intelligence 33:6892–6899. https://doi.org/10.1609/aaai.v33i01.33016892

    Article  Google Scholar 

  23. Sun Z, Sarma P, Sethares W, Liang Y (2020) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. Proceedings of the AAAI conference on artificial intelligence 34:8992–8999. https://doi.org/10.1609/aaai.v34i05.6431

    Article  Google Scholar 

  24. Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28, pp 1247–1255. PMLR, Atlanta, Georgia, USA. https://proceedings.mlr.press/v28/andrew13.html

  25. Park G, Im W (2016) Image-text multi-modal representation learning by adversarial backpropagation 1

  26. Peng Y, Qi J (2019) Cm-gans. ACM Trans Multimed Comput Commun Appl 15:1–24. https://doi.org/10.1145/3284750

    Article  Google Scholar 

  27. Zhu Q, Jiang X, Ye R (2021) Sentiment analysis of review text based on bigru-attention and hybrid cnn. IEEE Access 9:149077–149088. https://doi.org/10.1109/ACCESS.2021.3118537

    Article  Google Scholar 

  28. Liu Y, Lu J, Yang J, Mao F (2020) Sentiment analysis for e-commerce product reviews by deep learning model of bert-bigru-softmax. Math Biosci Eng 17:7819–7837

    Article  MathSciNet  Google Scholar 

  29. Stateczny A, Narahari SC, Vurubindi P, Guptha NS, Srinivas K (2023) Underground water level prediction in remote sensing images using improved hydro index value with ensemble classifier. Remote Sensing 15. https://doi.org/10.3390/rs15082015

  30. Kim T, Lee B (2020) Multi-attention multimodal sentiment analysis. In: Proceedings of the 2020 international conference on multimedia retrieval. ICMR ’20, pp 436–441. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3372278.3390698

  31. Han W, Chen H, Gelbukh A, Zadeh A, Morency L-p, Poria S (2021) Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. In: Proceedings of the 2021 international conference on multimodal interaction. ICMI ’21, pp 6–15. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3462244.3479919

  32. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space 1

  33. Eyben F, Wöllmer M, Schuller B (2010) Opensmile: The munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on multimedia. MM ’10, pp 1459–1462. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/1873951.1874246

  34. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks, vol 1

  35. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  36. Cho K, Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation 1

  37. Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive lstm. Proceedings of the AAAI conference on artificial intelligence 32. https://doi.org/10.1609/aaai.v32i1.12048

  38. Wang Y, Huang M, Zhu X, Zhao L (2016) Attention-based lstm for aspect-level sentiment classification 1:606–615

  39. Wang Z, Yang B (2020) Attention-based bidirectional long short-term memory networks for relation classification using knowledge distillation from bert. In: 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, pp 562–568. https://doi.org/10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00100

  40. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling 1

  41. Borth D, Ji R, Chen T, Breuel T, Chang S-F (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia. MM ’13, pp 223–232. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/2502081.2502282

  42. Niu T, Zhu S, Pang L, Saddik AE (2016) Sentiment Analysis on Multi-View Social Data. https://doi.org/10.1007/978-3-319-27674-8_2

  43. You Q, Luo J, Jin H, Yang J (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the ninth ACM international conference on web search and data mining. WSDM ’16, pp 13–22. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/2835776.2835779

  44. Vadicamo L, Carrara F, Cimino A, Cresci S, Dell’Orletta F, Falchi F, Tesconi M (2017) Cross-media learning for image sentiment analysis in the wild, vol 1

  45. Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE intelligent systems 31:82–88. https://doi.org/10.1109/MIS.2016.94

    Article  Google Scholar 

  46. Yang K, Xu H, Gao K (2020) Cm-bert: Cross-modal bert for text-audio sentiment analysis. In: Proceedings of the 28th ACM international conference on multimedia. MM ’20, pp 521–528. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3394171.3413690

  47. Huang F, Wei K, Weng J, Li Z (2020) Attention-based modality-gated networks for image-text sentiment analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 16:1–19. https://doi.org/10.1145/3388861

    Article  Google Scholar 

  48. Xu N, Mao W, Chen G (2018) A co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR conference on research & development in information retrieval. SIGIR ’18, pp 929–932. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3209978.3210093

  49. Jiang T, Wang J, Liu Z, Ling Y (2020) Fusion-Extraction Network for Multimodal Sentiment. Analysis. https://doi.org/10.1007/978-3-030-47436-2_59

    Article  Google Scholar 

  50. Xu N, Mao W (2017) Multisentinet: A deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on conference on information and knowledge management. CIKM ’17, pp 2399–2402. Assoc Comput Mach, New York, NY, USA. https://doi.org/10.1145/3132847.3133142

  51. Rahman W, Hasan MK, Lee S, Bagher Zadeh A, Mao C, Morency L-P, Hoque E (2020) Integrating multimodal information in large pretrained transformers. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 2359–2369. Assoc Comput Linguist, Online. https://doi.org/10.18653/v1/2020.acl-main.214, https://aclanthology.org/2020.acl-main.214

  52. Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowledge-Based Systems 178:61–73. https://doi.org/10.1016/j.knosys.2019.04.018

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by National Research Foundation (NRF) of Korea. Grant number: 2020R1A2C1012196.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Jothi Prakash V and Arul Antran Vijay S; Methodology: Jothi Prakash V and Arul Antran Vijay S; Formal analysis and investigation: GaneshKumar P, Anand Paul and Anand Nayyar; Writing - original draft preparation: Jothi Prakash V and Arul Antran Vijay S; Writing - review and editing: GaneshKumar P, Anand Paul and Anand Nayyar.

Corresponding authors

Correspondence to Arul Antran Vijay S or Jothi Prakash V.

Ethics declarations

Competing interests

The authors declare there is no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

P, G.K., S, A.A.V., V, J.P. et al. A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis. Multimed Tools Appl 83, 54249–54278 (2024). https://doi.org/10.1007/s11042-023-17601-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17601-1

Keywords

Navigation