[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3477495.3531900acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

An Efficient Fusion Mechanism for Multimodal Low-resource Setting

Published: 07 July 2022 Publication History

Abstract

The effective fusion of multiple modalities (i.e., text, acoustic, and visual) is a non-trivial task, as these modalities often carry specific and diverse information and do not contribute equally. The fusion of different modalities could even be more challenging under the low-resource setting, where we have fewer samples for training. This paper proposes a multi-representative fusion mechanism that generates diverse fusions with multiple modalities and then chooses the best fusion among them. To achieve this, we first apply convolution filters on multimodal inputs to generate different and diverse representations of modalities. We then fuse pairwise modalities with multiple representations to get the multiple fusions. Finally, we propose an attention mechanism that only selects the most appropriate fusion, which eventually helps resolve the noise problem by ignoring the noisy fusions. We evaluate our proposed approach on three low-resource multimodal sentiment analysis datasets, i.e., YouTube, MOUD, and ICT-MMMO. Experimental results show the effectiveness of our proposed approach with the accuracies of 59.3%, 83.0%, and 84.1% for the YouTube, MOUD, and ICT-MMMO datasets, respectively.

Supplementary Material

MP4 File (SIGIR22-sp2044.mp4)
Presentation video - short version

References

[1]
Md Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis. arXiv preprint arXiv:1905.05812 (2019). arXiv:1905.05812 [cs.CL]
[2]
Dushyant Singh Chauhan, Md Shad Akhtar, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5651--5661.
[3]
Dushyant Singh Chauhan, Dhanush S R, Asif Ekbal, and Pushpak Bhattacharyya. 2020. All-in-One: A Deep Attentive Multi-task Learning Framework for Humour, Sarcasm, Offensive, Motivation, and Sentiment on Memes. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 281--290. https://www.aclweb.org/anthology/2020.aacl-main.31
[4]
Dushyant Singh Chauhan, Dhanush S R, Asif Ekbal, and Pushpak Bhattacharyya. 2020. Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4351--4360. https://doi.org/10.18653/v1/2020.aclmain.401
[5]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[6]
Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. 2016. Gated-attention readers for text comprehension. arXiv preprint arXiv:1606.01549 (2016).
[7]
Deepanway Ghosal, Md Shad Akhtar, Dushyant Singh Chauhan, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Contextual Inter-modal Attention for Multi-modal Sentiment Analysis. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3454--3466.
[8]
Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-philippe Morency, and Soujanya Poria. 2021. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. In Proceedings of the 2021 International Conference on Multimodal Interaction. 6--15.
[9]
Jing He, Haonan Yanga, Changfan Zhang, Hongrun Chen, and Yifu Xua. 2022. Dynamic Invariant-Specific Representation Fusion Network for Multimodal Sentiment Analysis. Computational Intelligence and Neuroscience 2022 (2022).
[10]
Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, Jie Xu, and Zhoujun Li. 2019. Image--text sentiment analysis via deep multimodal attentive fusion. KnowledgeBased Systems 167 (2019), 26--37.
[11]
Tao Jiang, Jiahai Wang, Zhiyue Liu, and Yingbiao Ling. 2020. Fusion-extraction network for multimodal sentiment analysis. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 785--797.
[12]
Qiuchi Li, Dimitris Gkoumas, Christina Lioma, and Massimo Melucci. 2021. Quantum-inspired multimodal fusion for video sentiment analysis. Information Fusion 65 (2021), 58--71.
[13]
Sijie Mai, Haifeng Hu, and Songlong Xing. 2019. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 481--492.
[14]
Navonil Majumder, Devamanyu Hazarika, Alexander Gelbukh, Erik Cambria, and Soujanya Poria. 2018. Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowledge-based systems 161 (2018), 124--133.
[15]
Navonil Majumder, Soujanya Poria, Gangeshwar Krishnamurthy, Niyati Chhaya, Rada Mihalcea, and Alexander Gelbukh. 2019. Variational Fusion for Multimodal Sentiment Analysis. arXiv preprint arXiv:1908.06008 (2019).
[16]
Louis-Philippe Morency, Rada Mihalcea, and Payal Doshi. 2011. Towards multimodal sentiment analysis: Harvesting opinions from the web. In Proceedings of the 13th international conference on multimodal interfaces. ACM, 169--176.
[17]
Verónica Pérez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 973--982.
[18]
Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis-Philippe Morency. 2017. Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 873--883.
[19]
Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 439--448.
[20]
Soujanya Poria, Haiyun Peng, Amir Hussain, Newton Howard, and Erik Cambria. 2017. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing 261 (2017), 217--230.
[21]
Hiranmayi Ranganathan, Shayok Chakraborty, and Sethuraman Panchanathan. 2016. Multimodal emotion recognition using deep learning architectures. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--9.
[22]
Verónica Pérez Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Multimodal sentiment analysis of spanish online videos. IEEE Intelligent Systems 28, 3 (2013), 38--45.
[23]
Suyash Sangwan, Dushyant Singh Chauhan, Md Akhtar, Asif Ekbal, Pushpak Bhattacharyya, et al. 2019. Multi-task gated contextual cross-modal attention framework for sentiment and emotion analysis. In International Conference on Neural Information Processing. Springer, 662--669.
[24]
Jiajia Tang, Kang Li, Xuanyu Jin, Andrzej Cichocki, Qibin Zhao, and Wanzeng Kong. 2021. CTFN: Hierarchical Learning for Multimodal Sentiment Analysis Using Coupled-Translation Fusion Network. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 5301--5311. https: //doi.org/10.18653/v1/2021.acl-long.412
[25]
Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018. Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176 (2018).
[26]
Zilong Wang, Zhaohong Wan, and Xiaojun Wan. 2020. Transmodality: An end2end fusion method with transformer for multimodal sentiment analysis. In Proceedings of The Web Conference 2020. 2514--2520.
[27]
Martin Wöllmer, Felix Weninger, Tobias Knaup, Björn Schuller, Congkai Sun, Kenji Sagae, and Louis-Philippe Morency. 2013. Youtube movie reviews: Sentiment analysis in an audio-visual context. IEEE Intelligent Systems 28, 3 (2013), 46--53.
[28]
Xueming Yan, Haiwei Xue, Shengyi Jiang, and Ziang Liu. 2021. Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling. Applied Artificial Intelligence (2021), 1--16.
[29]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, 1103--1114.
[30]
A Zadeh, PP Liang, S Poria, P Vij, E Cambria, and LP Morency. 2018. Multiattention Recurrent Network for Human Communication Comprehension. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018). New Orleans, USA., 5642 -- 5649.
[31]
Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Memory Fusion Network for Multi-view Sequential Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press.
[32]
Weifeng Zhang, Jing Yu, Yuxia Wang, and Wei Wang. 2021. Multimodal deep fusion for image question answering. Knowledge-Based Systems 212 (2021), 106639. https://doi.org/10.1016/j.knosys.2020.106639

Cited By

View all
  • (2024)Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality ContributionIEEE Transactions on Multimedia10.1109/TMM.2023.330648926(3018-3033)Online publication date: 1-Jan-2024
  • (2022)An emoji-aware multitask framework for multimodal sarcasm detectionKnowledge-Based Systems10.1016/j.knosys.2022.109924257:COnline publication date: 5-Dec-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. low-resource dataset
  3. multi-representative fusion
  4. sentiment analysis

Qualifiers

  • Short-paper

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality ContributionIEEE Transactions on Multimedia10.1109/TMM.2023.330648926(3018-3033)Online publication date: 1-Jan-2024
  • (2022)An emoji-aware multitask framework for multimodal sarcasm detectionKnowledge-Based Systems10.1016/j.knosys.2022.109924257:COnline publication date: 5-Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media