More Web Proxy on the site http://driver.im/

research-article

Early Classifying Multimodal Sequences

Authors:

Diego KlabjanAuthors Info & Claims

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

Pages 183 - 189

https://doi.org/10.1145/3577190.3614163

Published: 09 October 2023 Publication History

Abstract

Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand into early classifying multimodal sequences by combining existing methods. Spatial-temporal transformers trained in the supervised framework of Classifier-Induced Stopping outperform exploration-based methods. We show our new method yields experimental AUC advantages of up to 8.7%.

References

[1]

Ayush Agarwal, Ashima Yadav, and Dinesh Kumar Vishwakarma. 2019. Multimodal sentiment analysis via RNN variants. In 2019 IEEE International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD). IEEE, 19–23.

[2]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).

[3]

Alexander Cao, Jean Utke, and Diego Klabjan. 2023. A Policy for Early Sequence Classification. arXiv preprint arXiv:2304.03463 (2023).

[4]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[6]

Weijiang Feng, Naiyang Guan, Yuan Li, Xiang Zhang, and Zhigang Luo. 2017. Audio visual speech recognition with multimodal recurrent neural networks. In 2017 International Joint Conference on neural networks (IJCNN). IEEE, 681–688.

[7]

Thomas Hartvigsen, Cansu Sen, Xiangnan Kong, and Elke Rundensteiner. 2019. Adaptive-halting policy network for early classification. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 101–110.

Digital Library

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[9]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[10]

Zhengjie Huang, Zi Ye, Shuangyin Li, and Rong Pan. 2017. Length adaptive recurrent model for text classification. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1019–1027.

Digital Library

[11]

Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, and Sen Song. 2020. Finding decision jumps in text classification. Neurocomputing 371 (2020), 177–187.

Digital Library

[12]

Jiaxin Ma, Hao Tang, Wei-Long Zheng, and Bao-Liang Lu. 2019. Emotion recognition using multimodal residual LSTM network. In Proceedings of the 27th ACM international conference on multimedia. 176–183.

Digital Library

[13]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.

[14]

Subhojeet Pramanik, Priyanka Agrawal, and Aman Hussain. 2019. Omninet: A unified architecture for multi-modal multi-task learning. arXiv preprint arXiv:1907.07804 (2019).

[15]

Jimmy Ren, Yongtao Hu, Yu-Wing Tai, Chuan Wang, Li Xu, Wenxiu Sun, and Qiong Yan. 2016. Look, listen and learn—A multimodal LSTM for speaker identification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.

[16]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[17]

Hao Tang, Wei Liu, Wei-Long Zheng, and Bao-Liang Lu. 2017. Multimodal emotion recognition using deep neural networks. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, Part IV 24. Springer, 811–819.

Digital Library

[18]

Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558.

[19]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[20]

Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems. 319–326.

Digital Library

[21]

Zhen Wang, X Shan, Xiangxie Zhang, and J Yang. 2022. N24News: A New Dataset for Multimodal News Classification. In 2022 Language Resources and Evaluation Conference, LREC 2022. European Language Resources Association (ELRA).

[22]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning (1992), 5–32.

[23]

Peng Xu, Xiatian Zhu, and David A Clifton. 2022. Multimodal learning with transformers: A survey. arXiv preprint arXiv:2206.06488 (2022).

[24]

Jun Yu, Jing Li, Zhou Yu, and Qingming Huang. 2019. Multimodal transformer with multi-view visual representation for image captioning. IEEE transactions on circuits and systems for video technology 30, 12 (2019), 4467–4480.

Digital Library

Index Terms

Early Classifying Multimodal Sequences
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

A Policy for Early Sequence Classification
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
Sequences are often not received in their entirety at once, but instead, received incrementally over time, element by element. Early predictions yielding a higher benefit, one aims to classify a sequence as accurately as possible, as soon as ...
A PSO-AB classifier for solving sequence classification problems

The proposed sequential pattern mining-based sequence classification method. A two-stage SPM-based sequence classification method is proposed.Compact sequential patterns can efficiently represent important features.A particle swarm optimization-AdaBoost ...
Confidence-based early classification of multivariate time series with multiple interpretable rules
Abstract
In the process of early classification, earliness and accuracy are two key indicators to evaluate the performance of classification, and early classification usually weaken its accuracy to some degree. Therefore, how to find a tradeoff between two ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

October 2023

858 pages

ISBN:9798400700552

DOI:10.1145/3577190

Editors:
Elisabeth André
University of Augsburg
,
Mohamed Chetouani
Sorbonne University
,
Dominique Vaufreydaz
Univ. Grenoble Alpes
,
Gale Lucas
USC Institute for Creative Technologies
,
Tanja Schultz
University of Bremen
,
Louis-Philippe Morency
Carnegie Mellon University
,
Alessandro Vinciarelli
University of Glasgow

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMI '23

Sponsor:

SIGCHI

ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 9 - 13, 2023

Paris, France

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
45
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents