More Web Proxy on the site http://driver.im/

research-article

AMSA: Adaptive Multimodal Learning for Sentiment Analysis

Authors:

Wen GaoAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 3s

Article No.: 135, Pages 1 - 21

https://doi.org/10.1145/3572915

Published: 24 February 2023 Publication History

Abstract

Efficient recognition of emotions has attracted extensive research interest, which makes new applications in many fields possible, such as human-computer interaction, disease diagnosis, service robots, and so forth. Although existing work on sentiment analysis relying on sensors or unimodal methods performs well for simple contexts like business recommendation and facial expression recognition, it does far below expectations for complex scenes, such as sarcasm, disdain, and metaphors. In this article, we propose a novel two-stage multimodal learning framework, called AMSA, to adaptively learn correlation and complementarity between modalities for dynamic fusion, achieving more stable and precise sentiment analysis results. Specifically, a multiscale attention model with a slice positioning scheme is proposed to get stable quintuplets of sentiment in images, texts, and speeches in the first stage. Then a Transformer-based self-adaptive network is proposed to assign weights flexibly for multimodal fusion in the second stage and update the parameters of the loss function through compensation iteration. To quickly locate key areas for efficient affective computing, a patch-based selection scheme is proposed to iteratively remove redundant information through a novel loss function before fusion. Extensive experiments have been conducted on both machine weakly labeled and manually annotated datasets of self-made Video-SA, CMU-MOSEI, and CMU-MOSI. The results demonstrate the superiority of our approach through comparison with baselines.

References

[1]

Kashif Ahmad, Mohamed Lamine Mekhalfi, Nicola Conci, Farid Melgani, and Francesco De Natale. 2018. Ensemble of deep models for event recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 2 (2018), 1–20.

Digital Library

[2]

Jessica Elan Chung and Eni Mustafaraj. 2011. Can collective sentiment expressed on twitter predict political elections? In 25th AAAI Conference on Artificial Intelligence.

[3]

Hang Cui, Vibhu Mittal, and Mayur Datar. 2006. Comparative experiments on sentiment classification for online product reviews. In AAAI, Vol. 6. 30.

[4]

Masoud Mazloom, Robert Rietveld, Stevan Rudinac, Marcel Worring, and Willemijn Van Dolen. 2016. Multimodal popularity prediction of brand-related social media posts. In Proceedings of the 24th ACM International Conference on Multimedia. 197–201.

Digital Library

[5]

Sicheng Zhao, Shangfei Wang, Mohammad Soleymani, Dhiraj Joshi, and Qiang Ji. 2019. Affective computing for large-scale heterogeneous multimedia data: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3s (2019), 1–32.

Digital Library

[6]

Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, and Shuicheng Yan. 2020. PSGAN: Pose and expression robust spatial-aware GAN for customizable makeup transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5194–5202.

[7]

Luntian Mou, Chao Zhou, Pengfei Zhao, Bahareh Nakisa, Mohammad Naim Rastgoo, Ramesh Jain, and Wen Gao. 2021. Driver stress detection via multimodal fusion using attention-based CNN-LSTM. Expert Systems with Applications 173 (2021), 114693.

Digital Library

[8]

Mohammad Soleymani, David Garcia, Brendan Jou, Björn Schuller, Shih-Fu Chang, and Maja Pantic. 2017. A survey of multimodal sentiment analysis. Image and Vision Computing 65 (2017), 3–14.

[9]

Ashima Yadav and Dinesh Kumar Vishwakarma. 2020. Sentiment analysis using deep learning architectures: A review. Artificial Intelligence Review 53, 6 (2020), 4335–4385.

Digital Library

[10]

Luntian Mou, Chao Zhou, Pengtao Xie, Pengfei Zhao, Ramesh C. Jain, Wen Gao, and Baocai Yin. 2021. Isotropic self-supervised learning for driver drowsiness detection with attention-based multimodal fusion. IEEE Transactions on Multimedia (2021).

[11]

Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In Proceedings of the 22nd ACM International Conference on Multimedia. 47–56.

Digital Library

[12]

Monisha Kanakaraj and Ram Mohana Reddy Guddeti. 2015. Performance analysis of ensemble methods on Twitter sentiment analysis using NLP techniques. In Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC’15). IEEE, 169–170.

[13]

Ezgi Yıldırım, Fatih Samet Çetin, Gülşen Eryiğit, and Tanel Temel. 2015. The impact of NLP on Turkish sentiment analysis. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 7, 1 (2015), 43–51.

[14]

Shangfei Wang and Qiang Ji. 2015. Video affective content analysis: A survey of state-of-the-art methods. IEEE Transactions on Affective Computing 6, 4 (2015), 410–430.

Digital Library

[15]

Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, and Guiguang Ding. 2016. Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Transactions on Multimedia 19, 3 (2016), 632–645.

Digital Library

[16]

Nusrat J. Shoumy, Li-Minn Ang, Kah Phooi Seng, D. M. Motiur Rahaman, and Tanveer Zia. 2020. Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals. Journal of Network and Computer Applications 149 (2020), 102447.

Digital Library

[17]

Sudhanshu Kumar, Mahendra Yadava, and Partha Pratim Roy. 2019. Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction. Information Fusion 52 (2019), 41–52.

Digital Library

[18]

Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16, 6 (2010), 345–379.

Digital Library

[19]

Verónica Pérez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 973–982.

[20]

Soujanya Poria, Erik Cambria, and Alexander Gelbukh. 2015. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2539–2544.

Digital Library

[21]

Ningning Liu, Emmanuel Dellandréa, Liming Chen, Chao Zhu, Yu Zhang, Charles-Edmond Bichot, Stéphane Bres, and Bruno Tellez. 2013. Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme. Computer Vision and Image Understanding 117, 5 (2013), 493–512.

Digital Library

[22]

Wenmeng Yu, Hua Xu, Fanyang Meng, Yilin Zhu, Yixiao Ma, Jiele Wu, Jiyun Zou, and Kaicheng Yang. 2020. Ch-sims: A Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3718–3727.

[23]

Minghai Chen, Sen Wang, Paul Pu Liang, Tadas Baltrušaitis, Amir Zadeh, and Louis-Philippe Morency. 2017. Multimodal sentiment analysis with word-level fusion and reinforcement learning. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 163–171.

Digital Library

[24]

Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Mazumder, Amir Zadeh, and Louis-Philippe Morency. 2017. Multi-level multiple attentions for contextual multimodal sentiment analysis. In 2017 IEEE International Conference on Data Mining (ICDM’17). IEEE, 1033–1038.

[25]

Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 13–22.

Digital Library

[26]

Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Memory fusion network for multi-view sequential learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[27]

Yagya Raj Pandeya, Bhuwan Bhattarai, and Joonwhoan Lee. 2021. Deep-learning-based multimodal emotion classification for music videos. Sensors 21, 14 (2021), 4927.

[28]

Linghui Li, Sheng Tang, Lixi Deng, Yongdong Zhang, and Qi Tian. 2017. Image caption with global-local attention. In 31st AAAI Conference on Artificial Intelligence.

[29]

Jie Wu, Haifeng Hu, and Yi Wu. 2018. Image captioning via semantic guidance attention and consensus selection strategy. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 4 (2018), 1–19.

Digital Library

[30]

Hongtao Xie, Shancheng Fang, Zheng-Jun Zha, Yating Yang, Yan Li, and Yongdong Zhang. 2019. Convolutional attention networks for scene text recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 1s (2019), 1–17.

Digital Library

[31]

Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fusion 37 (2017), 98–125. DOI:

Digital Library

[32]

Junjun Chen. 2021. Refining the teacher emotion model: Evidence from a review of literature published between 1985 and 2019. Cambridge Journal of Education 51, 3 (2021), 327–357.

[33]

Kenneth Ward Church. 2017. Word2Vec. Natural Language Engineering 23, 1 (2017), 155–162.

[34]

Vishwanath A. Sindagi and Vishal M. Patel. 2018. A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recognition Letters 107 (2018), 3–16.

[35]

Guozhen Zhao, Jinjing Song, Yan Ge, Yongjin Liu, Lin Yao, and Tao Wen. 2016. Advances in emotion recognition based on physiological big data. Journal of Computer Research and Development 53, 1 (2016), 80.

[36]

ReadFace. 2020. ReadFace webpage on 36Kr. http://36kr.com/p/5038637.html. (2020).

[37]

Srikumar Krishnamoorthy. 2018. Sentiment analysis of financial news articles using performance indicators. Knowledge and Information Systems 56, 2 (2018), 373–394.

Digital Library

[38]

Xiaodong Li, Haoran Xie, Li Chen, Jianping Wang, and Xiaotie Deng. 2014. News impact on stock price return via sentiment analysis. Knowledge-Based Systems 69 (2014), 14–23.

Digital Library

[39]

Qianren Mao, Jianxin Li, Senzhang Wang, Yuanning Zhang, Hao Peng, Min He, and Lihong Wang. 2019. Aspect-based sentiment classification with attentive neural turing machines. In IJCAI. 5139–5145.

[40]

Yanghui Rao, Jingsheng Lei, Liu Wenyin, Qing Li, and Mingliang Chen. 2014. Building emotional dictionary for sentiment analysis of online news. World Wide Web 17, 4 (2014), 723–742.

Digital Library

[41]

Yuxiang Zhang, Jiamei Fu, Dongyu She, Ying Zhang, Senzhang Wang, and Jufeng Yang. 2018. Text emotion distribution learning via multi-task convolutional neural network. In IJCAI. 4595–4601.

[42]

Dushyant Singh Chauhan, S. R. Dhanush, Asif Ekbal, and Pushpak Bhattacharyya. 2020. Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4351–4360.

[43]

Ngoc-Dau Mai, Boon-Giin Lee, and Wan-Young Chung. 2021. Affective computing on machine learning-based emotion recognition using a self-made EEG device. Sensors 21, 15 (2021), 5135. DOI:

[44]

Dhanesh Ramachandram and Graham W. Taylor. 2017. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine 34, 6 (2017), 96–108.

[45]

Sijie Mai, Haifeng Hu, and Songlong Xing. 2019. Divide, conquer and combine: Hierarchical feature fusion network with local and global perspectives for multimodal affective computing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 481–492.

[46]

Dilana Hazer-Rau, Sascha Meudt, Andreas Daucher, Jennifer Spohrs, Holger Hoffmann, Friedhelm Schwenker, and Harald C. Traue. 2020. The uulmMAC database-A multimodal affective corpus for affective computing in human-computer interaction. Sensors 20, 8 (2020), 2308.

[47]

Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, and Xiangang Li. 2019. Learning alignment for multimodal emotion recognition from speech. arXiv preprint arXiv:1909.05645 (2019).

[48]

Yubo Xie, Junze Li, and Pearl Pu. 2020. Uncertainty and surprisal jointly deliver the punchline: Exploiting incongruity-based features for humor recognition. arXiv preprint arXiv:2012.12007 (2020).

[49]

Xiangyu Wang and Chengqing Zong. 2021. Distributed representations of emotion categories in emotion space. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2364–2375.

[50]

Mohammad Ehsan Basiri, Shahla Nemati, Moloud Abdar, Erik Cambria, and U. Rajendra Acharya. 2021. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Generation Computer Systems 115 (2021), 279–294.

Digital Library

[51]

Yuxiao Chen, Jianbo Yuan, Quanzeng You, and Jiebo Luo. 2018. Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM. In Proceedings of the 26th ACM International Conference on Multimedia. 117–125.

Digital Library

[52]

Nan Xu, Wenji Mao, and Guandan Chen. 2019. Multi-interactive memory network for aspect based multimodal sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 371–378.

Digital Library

[53]

Xiao Bai, Xiang Wang, Xianglong Liu, Qiang Liu, Jingkuan Song, Nicu Sebe, and Been Kim. 2021. Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments. Pattern Recognition 120 (2021), 108102.

Digital Library

[54]

Chen Wang, Xiang Wang, Jiawei Zhang, Liang Zhang, Xiao Bai, Xin Ning, Jun Zhou, and Edwin Hancock. 2022. Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recognition 124 (2022), 108498.

Digital Library

[55]

Xinyu Ou, Hefei Ling, Han Yu, Ping Li, Fuhao Zou, and Si Liu. 2017. Adult image and video recognition by a deep multicontext network and fine-to-coarse strategy. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 5 (2017), 1–25.

Digital Library

[56]

Xiaocui Yang, Shi Feng, Daling Wang, and Yifei Zhang. 2020. Image-text multimodal emotion classification via multi-view attentional network. IEEE Transactions on Multimedia 23 (2020), 4014–4026.

[57]

Quoc-Tuan Truong and Hady W. Lauw. 2019. Vistanet: Visual aspect attention network for multimodal sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 305–312.

Digital Library

[58]

Nan Xu and Wenji Mao. 2017. Multisentinet: A deep semantic network for multimodal sentiment analysis. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2399–2402.

Digital Library

[59]

Tuan-Linh Nguyen, Swathi Kavuri, and Minho Lee. 2019. A multimodal convolutional neuro-fuzzy network for emotion understanding of movie clips. Neural Networks 118 (2019), 208–219.

Digital Library

[60]

Usman Naseem, Imran Razzak, Katarzyna Musial, and Muhammad Imran. 2020. Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Future Generation Computer Systems 113 (2020), 58–69.

[61]

Chaozhuo Li, Senzhang Wang, Yukun Wang, Philip Yu, Yanbo Liang, Yun Liu, and Zhoujun Li. 2019. Adversarial learning for weakly-supervised social network alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 996–1003.

Digital Library

[62]

Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 375–383.

[63]

Jonghwan Mun, Minsu Cho, and Bohyung Han. 2017. Text-guided attention model for image captioning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.

[64]

Tingting Qiao, Jianfeng Dong, and Duanqing Xu. 2018. Exploring human-like attention supervision in visual question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[65]

Luzi Sennhauser and Robert C. Berwick. 2018. Evaluating the ability of LSTMs to learn context-free grammars. arXiv preprint arXiv:1811.02611 (2018).

[66]

Amir Zadeh and Paul Pu. 2018. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers).

[67]

Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259 (2016).

[68]

Dou Hu, Lingwei Wei, and Xiaoyong Huai. 2021. DialogueCRN: Contextual reasoning networks for emotion recognition in conversations. arXiv preprint arXiv:2106.01978 (2021).

[69]

Ruifan Li, Hao Chen, Fangxiang Feng, Zhanyu Ma, Xiaojie Wang, and Eduard Hovy. 2021. Dual graph convolutional networks for aspect-based sentiment analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 6319–6329.

[70]

Akmaljon Palvanov and Young Im Cho. 2019. Visnet: Deep convolutional neural networks for forecasting atmospheric visibility. Sensors 19, 6 (2019), 1343.

[71]

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017).

Cited By

Zhang YLin XYang HHe JQing LHe XLi YChen H(2024)A Multi-Attention Feature Distillation Neural Network for Lightweight Single Image Super-ResolutionInternational Journal of Intelligent Systems10.1155/2024/32552332024Online publication date: 15-Feb-2024
https://dl.acm.org/doi/10.1155/2024/3255233
Lan GDu YYang Z(2024)Robust Multimodal Representation under Uncertain Missing ModalitiesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370200321:1(1-23)Online publication date: 26-Oct-2024
https://dl.acm.org/doi/10.1145/3702003
Zhang DZhu WLiao XQi FYang GDing X(2024)Spatiotemporal Inconsistency Learning and Interactive Fusion for Deepfake Video DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3664654Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664654
Show More Cited By

Index Terms

AMSA: Adaptive Multimodal Learning for Sentiment Analysis
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
Abstract
Multimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific ...
Highlights
- Novel multimodal adaptive weight matrix enables accurate sentiment analysis by considering unique contributions of each modality.
- Multimodal attention mechanism addresses over-focusing on intra-modality attention.
- Multiple Softmax ...
Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions
Highlights
- Multimodal sentiment analysis using audio, visual and textual modalities.
- ...
Abstract
Sentiment analysis (SA) has gained much traction In the field of artificial intelligence (AI) and natural language processing (NLP). There is growing demand to automate analysis of user sentiment towards products or services. Opinions ...
Attentive Intra-modality Fusion for Multimodal Sentiment Analysis
Chinese Lexical Semantics
Abstract
The growing trend of sharing opinion videos on social media platforms leads to more and more attention to multimodal sentiment analysis research. A number of approaches in multimodal sentiment analysis have been proposed and continual improved ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 3s

June 2023

270 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3582887

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2023

Online AM: 01 December 2022

Accepted: 20 November 2022

Revised: 06 October 2022

Received: 17 July 2022

Published in TOMM Volume 19, Issue 3s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
812
Total Downloads

Downloads (Last 12 months)392
Downloads (Last 6 weeks)51

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YLin XYang HHe JQing LHe XLi YChen H(2024)A Multi-Attention Feature Distillation Neural Network for Lightweight Single Image Super-ResolutionInternational Journal of Intelligent Systems10.1155/2024/32552332024Online publication date: 15-Feb-2024
https://dl.acm.org/doi/10.1155/2024/3255233
Lan GDu YYang Z(2024)Robust Multimodal Representation under Uncertain Missing ModalitiesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370200321:1(1-23)Online publication date: 26-Oct-2024
https://dl.acm.org/doi/10.1145/3702003
Zhang DZhu WLiao XQi FYang GDing X(2024)Spatiotemporal Inconsistency Learning and Interactive Fusion for Deepfake Video DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3664654Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664654
Zhang PLiu MSong XCao DGao ZNie L(2024)Universal Relocalizer for Weakly Supervised Referring Expression GroundingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604520:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3656045
Ben HWang SWang MHong RGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Pseudo Content Hallucination for Unpaired Image CaptioningProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658080(320-329)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658080
Antil ADhiman C(2024)MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364081720:6(1-21)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3640817
Li MZhou THuang ZYang JYang JGong C(2024)Dynamic Weighted Adversarial Learning for Semi-Supervised Classification under Intersectional Class MismatchACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363531020:4(1-24)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3635310
Shi PHu MShi XRen F(2024)Deep Modular Co-Attention Shifting Network for Multimodal Sentiment AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363470620:4(1-23)Online publication date: 11-Jan-2024
https://doi.org/10.1145/3634706
Feng ZXu JMa LZhang S(2024)Efficient Video Transformers via Spatial-temporal Token Merging for Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363378120:4(1-21)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3633781
Nai KChen S(2024)Learning a Novel Ensemble Tracker for Robust Visual TrackingIEEE Transactions on Multimedia10.1109/TMM.2023.330793926(3194-3206)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3307939
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents