More Web Proxy on the site http://driver.im/

short-paper

An Efficient Fusion Mechanism for Multimodal Low-resource Setting

Authors:

Dushyant Singh Chauhan,

Pushpak BhattacharyyaAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2583 - 2588

https://doi.org/10.1145/3477495.3531900

Published: 07 July 2022 Publication History

Abstract

The effective fusion of multiple modalities (i.e., text, acoustic, and visual) is a non-trivial task, as these modalities often carry specific and diverse information and do not contribute equally. The fusion of different modalities could even be more challenging under the low-resource setting, where we have fewer samples for training. This paper proposes a multi-representative fusion mechanism that generates diverse fusions with multiple modalities and then chooses the best fusion among them. To achieve this, we first apply convolution filters on multimodal inputs to generate different and diverse representations of modalities. We then fuse pairwise modalities with multiple representations to get the multiple fusions. Finally, we propose an attention mechanism that only selects the most appropriate fusion, which eventually helps resolve the noise problem by ignoring the noisy fusions. We evaluate our proposed approach on three low-resource multimodal sentiment analysis datasets, i.e., YouTube, MOUD, and ICT-MMMO. Experimental results show the effectiveness of our proposed approach with the accuracies of 59.3%, 83.0%, and 84.1% for the YouTube, MOUD, and ICT-MMMO datasets, respectively.

Supplementary Material

MP4 File (SIGIR22-sp2044.mp4)

Presentation video - short version

Download
5.58 MB

References

[1]

Md Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis. arXiv preprint arXiv:1905.05812 (2019). arXiv:1905.05812 [cs.CL]

[2]

Dushyant Singh Chauhan, Md Shad Akhtar, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5651--5661.

[3]

Dushyant Singh Chauhan, Dhanush S R, Asif Ekbal, and Pushpak Bhattacharyya. 2020. All-in-One: A Deep Attentive Multi-task Learning Framework for Humour, Sarcasm, Offensive, Motivation, and Sentiment on Memes. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 281--290. https://www.aclweb.org/anthology/2020.aacl-main.31

[4]

Dushyant Singh Chauhan, Dhanush S R, Asif Ekbal, and Pushpak Bhattacharyya. 2020. Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4351--4360. https://doi.org/10.18653/v1/2020.aclmain.401

[5]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[6]

Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. 2016. Gated-attention readers for text comprehension. arXiv preprint arXiv:1606.01549 (2016).

[7]

Deepanway Ghosal, Md Shad Akhtar, Dushyant Singh Chauhan, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Contextual Inter-modal Attention for Multi-modal Sentiment Analysis. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3454--3466.

[8]

Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-philippe Morency, and Soujanya Poria. 2021. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. In Proceedings of the 2021 International Conference on Multimodal Interaction. 6--15.

Digital Library

[9]

Jing He, Haonan Yanga, Changfan Zhang, Hongrun Chen, and Yifu Xua. 2022. Dynamic Invariant-Specific Representation Fusion Network for Multimodal Sentiment Analysis. Computational Intelligence and Neuroscience 2022 (2022).

[10]

Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, Jie Xu, and Zhoujun Li. 2019. Image--text sentiment analysis via deep multimodal attentive fusion. KnowledgeBased Systems 167 (2019), 26--37.

Digital Library

[11]

Tao Jiang, Jiahai Wang, Zhiyue Liu, and Yingbiao Ling. 2020. Fusion-extraction network for multimodal sentiment analysis. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 785--797.

Digital Library

[12]

Qiuchi Li, Dimitris Gkoumas, Christina Lioma, and Massimo Melucci. 2021. Quantum-inspired multimodal fusion for video sentiment analysis. Information Fusion 65 (2021), 58--71.

[13]

Sijie Mai, Haifeng Hu, and Songlong Xing. 2019. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 481--492.

[14]

Navonil Majumder, Devamanyu Hazarika, Alexander Gelbukh, Erik Cambria, and Soujanya Poria. 2018. Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowledge-based systems 161 (2018), 124--133.

[15]

Navonil Majumder, Soujanya Poria, Gangeshwar Krishnamurthy, Niyati Chhaya, Rada Mihalcea, and Alexander Gelbukh. 2019. Variational Fusion for Multimodal Sentiment Analysis. arXiv preprint arXiv:1908.06008 (2019).

[16]

Louis-Philippe Morency, Rada Mihalcea, and Payal Doshi. 2011. Towards multimodal sentiment analysis: Harvesting opinions from the web. In Proceedings of the 13th international conference on multimodal interfaces. ACM, 169--176.

Digital Library

[17]

Verónica Pérez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 973--982.

[18]

Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis-Philippe Morency. 2017. Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 873--883.

[19]

Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 439--448.

[20]

Soujanya Poria, Haiyun Peng, Amir Hussain, Newton Howard, and Erik Cambria. 2017. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing 261 (2017), 217--230.

[21]

Hiranmayi Ranganathan, Shayok Chakraborty, and Sethuraman Panchanathan. 2016. Multimodal emotion recognition using deep learning architectures. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--9.

[22]

Verónica Pérez Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Multimodal sentiment analysis of spanish online videos. IEEE Intelligent Systems 28, 3 (2013), 38--45.

Digital Library

[23]

Suyash Sangwan, Dushyant Singh Chauhan, Md Akhtar, Asif Ekbal, Pushpak Bhattacharyya, et al. 2019. Multi-task gated contextual cross-modal attention framework for sentiment and emotion analysis. In International Conference on Neural Information Processing. Springer, 662--669.

[24]

Jiajia Tang, Kang Li, Xuanyu Jin, Andrzej Cichocki, Qibin Zhao, and Wanzeng Kong. 2021. CTFN: Hierarchical Learning for Multimodal Sentiment Analysis Using Coupled-Translation Fusion Network. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 5301--5311. https: //doi.org/10.18653/v1/2021.acl-long.412

[25]

Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018. Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176 (2018).

[26]

Zilong Wang, Zhaohong Wan, and Xiaojun Wan. 2020. Transmodality: An end2end fusion method with transformer for multimodal sentiment analysis. In Proceedings of The Web Conference 2020. 2514--2520.

Digital Library

[27]

Martin Wöllmer, Felix Weninger, Tobias Knaup, Björn Schuller, Congkai Sun, Kenji Sagae, and Louis-Philippe Morency. 2013. Youtube movie reviews: Sentiment analysis in an audio-visual context. IEEE Intelligent Systems 28, 3 (2013), 46--53.

Digital Library

[28]

Xueming Yan, Haiwei Xue, Shengyi Jiang, and Ziang Liu. 2021. Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling. Applied Artificial Intelligence (2021), 1--16.

[29]

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, 1103--1114.

[30]

A Zadeh, PP Liang, S Poria, P Vij, E Cambria, and LP Morency. 2018. Multiattention Recurrent Network for Human Communication Comprehension. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018). New Orleans, USA., 5642 -- 5649.

[31]

Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Memory Fusion Network for Multi-view Sequential Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press.

[32]

Weifeng Zhang, Jing Yu, Yuxia Wang, and Wei Wang. 2021. Multimodal deep fusion for image question answering. Knowledge-Based Systems 212 (2021), 106639. https://doi.org/10.1016/j.knosys.2020.106639

Cited By

Mai SSun YXiong AZeng YHu H(2024)Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality ContributionIEEE Transactions on Multimedia10.1109/TMM.2023.330648926(3018-3033)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3306489
Chauhan DSingh GArora AEkbal ABhattacharyya P(2022)An emoji-aware multitask framework for multimodal sarcasm detectionKnowledge-Based Systems10.1016/j.knosys.2022.109924257:COnline publication date: 5-Dec-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.109924

Index Terms

An Efficient Fusion Mechanism for Multimodal Low-resource Setting
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

A multi-stage multimodal framework for sentiment analysis of Assamese in low resource setting
Abstract
Multimodality has shown to be helpful in several natural language processing tasks. Thus, adding multiple modalities to the traditional sentiment analysis has also proven to be useful. However, multimodality in a low resource setting ...
Highlights
- A multimodal and multi-purpose Assamese news dataset is presented.
- Two models ...
Multi-attention Fusion for Multimodal Sentiment Classification
MVRMLM '24: Proceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval

Multimodal sentiment analysis (MSA) aims to leverage information from multiple modalities (e.g., text, image, etc.) to improve sentiment classification accuracy compared to solely analyzing each modality independently. Most previous works only rely on a ...
Attentive Intra-modality Fusion for Multimodal Sentiment Analysis
Chinese Lexical Semantics
Abstract
The growing trend of sharing opinion videos on social media platforms leads to more and more attention to multimodal sentiment analysis research. A number of approaches in multimodal sentiment analysis have been proposed and continual improved ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
203
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mai SSun YXiong AZeng YHu H(2024)Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality ContributionIEEE Transactions on Multimedia10.1109/TMM.2023.330648926(3018-3033)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3306489
Chauhan DSingh GArora AEkbal ABhattacharyya P(2022)An emoji-aware multitask framework for multimodal sarcasm detectionKnowledge-Based Systems10.1016/j.knosys.2022.109924257:COnline publication date: 5-Dec-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.109924

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten