[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3664647.3681351acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

CoMO-NAS: Core-Structures-Guided Multi-Objective Neural Architecture Search for Multi-Modal Classification

Published: 28 October 2024 Publication History

Abstract

Most existing NAS-based multi-modal classification (MMC-NAS) methods are optimized using the classification accuracy.They can not simultaneously provide multiple models with diverse perferences such as model complex and classification performance for meeting different users' demands. Combining NAS-MMC with multi-objective optimization is a nature way for this issue. However, the challenge problem of this solution is the high computation cost. For multi-objective optimization, the computing bottleneck is pareto front search. Some higher-quality MMC models (namely core structures, CSs) consisting of high-quality features and fusion operators are easier to identify. We find that CSs have a close relation with the pareto front (PF), i.e., the individuals lying in PF contain the CSs. Based on the finding, we propose an efficient multi-objective neural architecture search for multi-modal classification by applying CSs to guide the PF search (CoMO-NAS). In conclusion, experimental results thoroughly demonstrate the effectiveness of our CoMO-NAS. Compared to state-of-the-art competitors on benchmark multi-modal tasks, we achieve comparable performance with lower model complexity in shorter search time.

References

[1]
John Arevalo, Thamar Solorio, Manuel Montes-y Gómez, and FabioA. González. 2017. Gated Multimodal Units for Information Fusion. Cornell University - arXiv (2017).
[2]
Fabien Baradel, Christian Wolf, Julien Mille, and Graham W. Taylor. 2018. Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 469--478.
[3]
João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition. 4724--4733.
[4]
Michael Emmerich, Nicola Beume, and Boris Naujoks. 2005. An EMO Algorithm Using the Hypervolume Measure as Selection Criterion. 62--76.
[5]
Pinhan Fu, Xinyan Liang, Tingjin Luo, Qian Guo, Yayu Zhang, and Yuhua Qian. 2024. Core-Structures-Guided Multi-Modal Classification Neural Architecture Search. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Kate Larson (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3980--3988. https://doi.org/10.24963/ijcai.2024/440 Main Track.
[6]
Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. Maxout Networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning, Vol. 28. 1319--1327.
[7]
Vikram Gupta, Sai Kumar Dwivedi, Rishabh Dabral, and Arjun Jain. 2019. Progression Modelling for Online and Early Gesture Detection. In 2019 International Conference on 3D Vision. 289--297.
[8]
Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. 2023. Trusted Multi-View Classification With Dynamic Evidential Fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 2 (2023), 2551--2566.
[9]
Ming Hou, Jiajia Tang, Jianhai Zhang, Wanzeng Kong, and Qibin Zhao. 2019. Deep Multimodal Multilinear Fusion with High-Order Polynomial Pooling.
[10]
Bingbing Jiang, Xingyu Wu, Xiren Zhou, Yi Liu, Anthony G. Cohn, Weiguo Sheng, and Huanhuan Chen. 2024. Semi-Supervised Multiview Feature Selection With Adaptive Graph Learning. IEEE Transactions on Neural Networks and Learning Systems, Vol. 35, 3 (2024), 3615--3629.
[11]
Bingbing Jiang, Junhao Xiang, Xingyu Wu, Yadi Wang, Huanhuan Chen, Weiwei Cao, and Weiguo Sheng. 2022. Robust multi-view learning via adaptive regression. Information Sciences, Vol. 610 (2022), 916--937.
[12]
Jin-Hwa Kim, KyoungWoon On, Woosang Lim, Jeong-Hee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Hadamard Product for Low-rank Bilinear Pooling. International Conference on Learning Representations (2017), 1--10.
[13]
Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. 2019. Real-Time Hand Gesture Detection and Classification Using Convolutional Neural Networks. In Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition. 1--8.
[14]
Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-Occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 786--792.
[15]
Xinyan Liang, Pinhan Fu, Qian Guo, Keyin Zheng, and Yuhua Qian. 2024. DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 12 (Mar. 2024), 13754--13762.
[16]
Xinyan Liang, Qian Guo, Yuhua Qian, Weiping Ding, and Qingfu Zhang. 2021. Evolutionary Deep Fusion Method and Its Application in Chemical Structure Recognition. IEEE Transactions on Evolutionary Computation, Vol. 25, 5 (2021), 883--893.
[17]
Xinyan Liang, Yuhua Qian, Qian Guo, Honghong Cheng, and Jiye Liang. 2022. AF: An Association-Based Fusion Method for Multi-Modal Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 12 (2022), 9236--9254.
[18]
Xinyan Liang, Yuhua Qian, Qian Guo, and Qin Huang. 2022. Multi-granulation fusion-driven method for many-view classification. Journal of Computer Research and Development, Vol. 59, 8 (2022), 1653--1667.
[19]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In Proceedings of the International Conference on Learning Representations. 1--11.
[20]
Wei Liu, Yufei Chen, Xiaodong Yue, Changqing Zhang, and Shaorong Xie. 2023. Safe Multi-View Deep Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 8870--8878.
[21]
Wei Liu, Xiaodong Yue, Yufei Chen, and Thierry Denoeux. 2022. Trusted Multi-View Deep Learning with Opinion Aggregation. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 7 (Jun. 2022), 7585--7593.
[22]
Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2247--2256.
[23]
Pavlo Molchanov, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen Tyree, and Jan Kautz. 2016. Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 4207--4215.
[24]
Tongjie Pan, Yalan Ye, Hecheng Cai, Shudong Huang, Yang Yang, and Guoqing Wang. 2023. Multimodal Physiological Signals Fusion for Online Emotion Recognition. In Proceedings of the 31st ACM International Conference on Multimedia (MM '23). Association for Computing Machinery, 5879--5888.
[25]
Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, and Di Hu. 2022. Balanced Multimodal Learning via On-the-fly Gradient Modulation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8228--8237.
[26]
JuanManuel Perez Rua, Valentin Vielzeuf, Stephane Pateux, Moez Baccouche, and Frederic Jurie. 2019. MFAS: Multimodal Fusion Architecture Search. In Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition. 6959--6968.
[27]
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGBD: A Large Scale Dataset for 3D Human Activity Analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 1010--1019.
[28]
Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 1. 568--576.
[29]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the Third International Conference on Learning Representations. 1--14.
[30]
Jinhui Tang, Xiangbo Shu, Guojun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain. 2017. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 8 (2017), 1662--1674.
[31]
Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, and Kazuhito Koishida. 2020. MMTM: Multimodal Transfer Module for CNN Fusion. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13286--13296.
[32]
Valentin Vielzeuf, Alexis Lechervy, Stéphane Pateux, and Frédéric Jurie. 2019. CentralNet: A Multilayer Approach for Multimodal Fusion. In European Conference on Computer Vision Workshops. 575--589.
[33]
Jie Wen, Gehui Xu, Chengliang Liu, Lunke Fei, Chao Huang, Wei Wang, and Yong Xu. 2023. Localized and Balanced Efficient Incomplete Multi-view Clustering. In Proceedings of the 31st ACM International Conference on Multimedia. 2927--2935.
[34]
Jie Wen, Zheng Zhang, Lunke Fei, Bob Zhang, Yong Xu, Zhao Zhang, and Jinxing Li. 2023. A Survey on Incomplete Multiview Clustering. IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 53, 2 (2023), 1136--1149.
[35]
Cai Xu, Jiajun Si, Ziyu Guan, Wei Zhao, Yue Wu, and Xiyue Gao. 2024. Reliable Conflictive Multi-View Learning. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 14 (Mar. 2024), 16129--16137.
[36]
Jie Xu, Chao Li, Liang Peng, Yazhou Ren, Xiaoshuang Shi, Heng Tao Shen, and Xiaofeng Zhu. 2023. Adaptive Feature Projection With Distribution Alignment for Deep Incomplete Multi-View Clustering. IEEE Transactions on Image Processing, Vol. 32 (2023), 1354--1366.
[37]
Zihui Xue and Radu Marculescu. 2023. Dynamic Multimodal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2575--2584.
[38]
Muli Yang, Cheng Deng, and Feiping Nie. 2019. Adaptive-weighting discriminative regression for multi-view classification. Pattern Recognition, Vol. 88 (2019), 236--245.
[39]
Xiaodong Yang and YingLi Tian. 2014. Super Normal Vector for Activity Recognition Using Depth Sequences. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 804--811.
[40]
Yihang Yin, Siyu Huang, Xiang Zhang, and Dejing Dou. 2022. BM-NAS: Bilevel Multimodal Neural Architecture Search. In Association for the Advancement of Artificial Intelligence. 8901--8909.
[41]
Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, and Dacheng Tao. 2018. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering. IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, 12 (2018), 5947--5959.
[42]
Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, and Guoying Zhao. 2021. Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition. IEEE Transactions on Image Processing, Vol. 30 (2021), 5626--5640.
[43]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103--1114.
[44]
Chaoyang Zhang, Zhengzheng Lou, Qinglei Zhou, and Shizhe Hu. 2023. Multi-View Clustering via Triplex Information Maximization. IEEE Transactions on Image Processing, Vol. 32 (2023), 4299--4313.
[45]
Haoyu Zhang, Yaochu Jin, and Kuangrong Hao. 2022. Evolutionary Search for Complete Neural Network Architectures With Partial Weight Sharing. IEEE Transactions on Evolutionary Computation, Vol. 26, 5 (2022), 1072--1086.
[46]
Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. 2018. EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Transactions on Multimedia (2018), 1038--1050.
[47]
Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. 2023. CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5906--5916. n

Index Terms

  1. CoMO-NAS: Core-Structures-Guided Multi-Objective Neural Architecture Search for Multi-Modal Classification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. core structures
    3. multi-modal fusion
    4. multi-objective optimization
    5. neural architecture search

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 55
      Total Downloads
    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media