More Web Proxy on the site http://driver.im/

research-article

CoMO-NAS: Core-Structures-Guided Multi-Objective Neural Architecture Search for Multi-Modal Classification

Authors:

Wen LiAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 9126 - 9135

https://doi.org/10.1145/3664647.3681351

Published: 28 October 2024 Publication History

Abstract

Most existing NAS-based multi-modal classification (MMC-NAS) methods are optimized using the classification accuracy.They can not simultaneously provide multiple models with diverse perferences such as model complex and classification performance for meeting different users' demands. Combining NAS-MMC with multi-objective optimization is a nature way for this issue. However, the challenge problem of this solution is the high computation cost. For multi-objective optimization, the computing bottleneck is pareto front search. Some higher-quality MMC models (namely core structures, CSs) consisting of high-quality features and fusion operators are easier to identify. We find that CSs have a close relation with the pareto front (PF), i.e., the individuals lying in PF contain the CSs. Based on the finding, we propose an efficient multi-objective neural architecture search for multi-modal classification by applying CSs to guide the PF search (CoMO-NAS). In conclusion, experimental results thoroughly demonstrate the effectiveness of our CoMO-NAS. Compared to state-of-the-art competitors on benchmark multi-modal tasks, we achieve comparable performance with lower model complexity in shorter search time.

References

[1]

John Arevalo, Thamar Solorio, Manuel Montes-y Gómez, and FabioA. González. 2017. Gated Multimodal Units for Information Fusion. Cornell University - arXiv (2017).

[2]

Fabien Baradel, Christian Wolf, Julien Mille, and Graham W. Taylor. 2018. Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 469--478.

[3]

João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition. 4724--4733.

[4]

Michael Emmerich, Nicola Beume, and Boris Naujoks. 2005. An EMO Algorithm Using the Hypervolume Measure as Selection Criterion. 62--76.

[5]

Pinhan Fu, Xinyan Liang, Tingjin Luo, Qian Guo, Yayu Zhang, and Yuhua Qian. 2024. Core-Structures-Guided Multi-Modal Classification Neural Architecture Search. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Kate Larson (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3980--3988. https://doi.org/10.24963/ijcai.2024/440 Main Track.

[6]

Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. Maxout Networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning, Vol. 28. 1319--1327.

[7]

Vikram Gupta, Sai Kumar Dwivedi, Rishabh Dabral, and Arjun Jain. 2019. Progression Modelling for Online and Early Gesture Detection. In 2019 International Conference on 3D Vision. 289--297.

[8]

Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. 2023. Trusted Multi-View Classification With Dynamic Evidential Fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 2 (2023), 2551--2566.

[9]

Ming Hou, Jiajia Tang, Jianhai Zhang, Wanzeng Kong, and Qibin Zhao. 2019. Deep Multimodal Multilinear Fusion with High-Order Polynomial Pooling.

[10]

Bingbing Jiang, Xingyu Wu, Xiren Zhou, Yi Liu, Anthony G. Cohn, Weiguo Sheng, and Huanhuan Chen. 2024. Semi-Supervised Multiview Feature Selection With Adaptive Graph Learning. IEEE Transactions on Neural Networks and Learning Systems, Vol. 35, 3 (2024), 3615--3629.

[11]

Bingbing Jiang, Junhao Xiang, Xingyu Wu, Yadi Wang, Huanhuan Chen, Weiwei Cao, and Weiguo Sheng. 2022. Robust multi-view learning via adaptive regression. Information Sciences, Vol. 610 (2022), 916--937.

Digital Library

[12]

Jin-Hwa Kim, KyoungWoon On, Woosang Lim, Jeong-Hee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Hadamard Product for Low-rank Bilinear Pooling. International Conference on Learning Representations (2017), 1--10.

[13]

Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. 2019. Real-Time Hand Gesture Detection and Classification Using Convolutional Neural Networks. In Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition. 1--8.

Digital Library

[14]

Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-Occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 786--792.

Digital Library

[15]

Xinyan Liang, Pinhan Fu, Qian Guo, Keyin Zheng, and Yuhua Qian. 2024. DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 12 (Mar. 2024), 13754--13762.

[16]

Xinyan Liang, Qian Guo, Yuhua Qian, Weiping Ding, and Qingfu Zhang. 2021. Evolutionary Deep Fusion Method and Its Application in Chemical Structure Recognition. IEEE Transactions on Evolutionary Computation, Vol. 25, 5 (2021), 883--893.

[17]

Xinyan Liang, Yuhua Qian, Qian Guo, Honghong Cheng, and Jiye Liang. 2022. AF: An Association-Based Fusion Method for Multi-Modal Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 12 (2022), 9236--9254.

[18]

Xinyan Liang, Yuhua Qian, Qian Guo, and Qin Huang. 2022. Multi-granulation fusion-driven method for many-view classification. Journal of Computer Research and Development, Vol. 59, 8 (2022), 1653--1667.

[19]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In Proceedings of the International Conference on Learning Representations. 1--11.

[20]

Wei Liu, Yufei Chen, Xiaodong Yue, Changqing Zhang, and Shaorong Xie. 2023. Safe Multi-View Deep Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 8870--8878.

Digital Library

[21]

Wei Liu, Xiaodong Yue, Yufei Chen, and Thierry Denoeux. 2022. Trusted Multi-View Deep Learning with Opinion Aggregation. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 7 (Jun. 2022), 7585--7593.

[22]

Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2247--2256.

[23]

Pavlo Molchanov, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen Tyree, and Jan Kautz. 2016. Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 4207--4215.

[24]

Tongjie Pan, Yalan Ye, Hecheng Cai, Shudong Huang, Yang Yang, and Guoqing Wang. 2023. Multimodal Physiological Signals Fusion for Online Emotion Recognition. In Proceedings of the 31st ACM International Conference on Multimedia (MM '23). Association for Computing Machinery, 5879--5888.

Digital Library

[25]

Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, and Di Hu. 2022. Balanced Multimodal Learning via On-the-fly Gradient Modulation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8228--8237.

[26]

JuanManuel Perez Rua, Valentin Vielzeuf, Stephane Pateux, Moez Baccouche, and Frederic Jurie. 2019. MFAS: Multimodal Fusion Architecture Search. In Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition. 6959--6968.

[27]

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGBD: A Large Scale Dataset for 3D Human Activity Analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 1010--1019.

[28]

Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 1. 568--576.

[29]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the Third International Conference on Learning Representations. 1--14.

[30]

Jinhui Tang, Xiangbo Shu, Guojun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain. 2017. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 8 (2017), 1662--1674.

Digital Library

[31]

Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, and Kazuhito Koishida. 2020. MMTM: Multimodal Transfer Module for CNN Fusion. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13286--13296.

[32]

Valentin Vielzeuf, Alexis Lechervy, Stéphane Pateux, and Frédéric Jurie. 2019. CentralNet: A Multilayer Approach for Multimodal Fusion. In European Conference on Computer Vision Workshops. 575--589.

[33]

Jie Wen, Gehui Xu, Chengliang Liu, Lunke Fei, Chao Huang, Wei Wang, and Yong Xu. 2023. Localized and Balanced Efficient Incomplete Multi-view Clustering. In Proceedings of the 31st ACM International Conference on Multimedia. 2927--2935.

Digital Library

[34]

Jie Wen, Zheng Zhang, Lunke Fei, Bob Zhang, Yong Xu, Zhao Zhang, and Jinxing Li. 2023. A Survey on Incomplete Multiview Clustering. IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 53, 2 (2023), 1136--1149.

[35]

Cai Xu, Jiajun Si, Ziyu Guan, Wei Zhao, Yue Wu, and Xiyue Gao. 2024. Reliable Conflictive Multi-View Learning. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 14 (Mar. 2024), 16129--16137.

[36]

Jie Xu, Chao Li, Liang Peng, Yazhou Ren, Xiaoshuang Shi, Heng Tao Shen, and Xiaofeng Zhu. 2023. Adaptive Feature Projection With Distribution Alignment for Deep Incomplete Multi-View Clustering. IEEE Transactions on Image Processing, Vol. 32 (2023), 1354--1366.

Digital Library

[37]

Zihui Xue and Radu Marculescu. 2023. Dynamic Multimodal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2575--2584.

[38]

Muli Yang, Cheng Deng, and Feiping Nie. 2019. Adaptive-weighting discriminative regression for multi-view classification. Pattern Recognition, Vol. 88 (2019), 236--245.

Digital Library

[39]

Xiaodong Yang and YingLi Tian. 2014. Super Normal Vector for Activity Recognition Using Depth Sequences. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 804--811.

[40]

Yihang Yin, Siyu Huang, Xiang Zhang, and Dejing Dou. 2022. BM-NAS: Bilevel Multimodal Neural Architecture Search. In Association for the Advancement of Artificial Intelligence. 8901--8909.

[41]

Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, and Dacheng Tao. 2018. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering. IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, 12 (2018), 5947--5959.

[42]

Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, and Guoying Zhao. 2021. Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition. IEEE Transactions on Image Processing, Vol. 30 (2021), 5626--5640.

[43]

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103--1114.

[44]

Chaoyang Zhang, Zhengzheng Lou, Qinglei Zhou, and Shizhe Hu. 2023. Multi-View Clustering via Triplex Information Maximization. IEEE Transactions on Image Processing, Vol. 32 (2023), 4299--4313.

Digital Library

[45]

Haoyu Zhang, Yaochu Jin, and Kuangrong Hao. 2022. Evolutionary Search for Complete Neural Network Architectures With Partial Weight Sharing. IEEE Transactions on Evolutionary Computation, Vol. 26, 5 (2022), 1072--1086.

Digital Library

[46]

Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. 2018. EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Transactions on Multimedia (2018), 1038--1050.

[47]

Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. 2023. CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5906--5916. n

Index Terms

CoMO-NAS: Core-Structures-Guided Multi-Objective Neural Architecture Search for Multi-Modal Classification
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Inverted PBI in MOEA/D and its impact on the search performance on multi and many-objective optimization
GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

MOEA/D decomposes a multi-objective optimization problem into a number of single objective optimization problems. Each single objective optimization problem is defined by a scalarizing function using a weight vector. In MOEA/D, there are several ...
Multi-Label Classification Based on Multi-Objective Optimization
Special Issue on Linking Social Granularity and Functions

Multi-label classification refers to the task of predicting potentially multiple labels for a given instance. Conventional multi-label classification approaches focus on single objective setting, where the learning algorithm optimizes over a single ...
Insightful and Practical Multi-objective Convolutional Neural Network Architecture Search with Evolutionary Algorithms
Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices
Abstract
This paper investigates a comprehensive convolutional neural network (CNN) representation that encodes both layer connections, and computational block attributes for neural architecture search (NAS). We formulate NAS as a bi-objective optimization ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Natural Science Foundation of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
190
Total Downloads

Downloads (Last 12 months)190
Downloads (Last 6 weeks)134

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten