[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3664647.3681226acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection

Published: 28 October 2024 Publication History

Abstract

Action detection and understanding provide the foundation for the generation and interaction of multimedia content. However, existing methods mainly focus on constructing complex relational inference networks, overlooking the judgment of detection effectiveness. Moreover, these methods frequently generate detection results with cognitive abnormalities. To solve the above problems, this study proposes a cognitive effectiveness network based on fuzzy inference (Cefdet), which introduces the concept of 'cognition--based detection' to simulate human cognition. First, a fuzzy-driven cognitive effectiveness evaluation module (FCM) is established to introduce fuzzy inference into action detection. FCM is combined with human action features to simulate the cognition-based detection process, which clearly locates the position of frames with cognitive abnormalities. Then, a fuzzy cognitive update strategy (FCS) is proposed based on the FCM, which utilizes fuzzy logic to re-detect the cognition-based detection results and effectively update the results with cognitive abnormalities. Experimental results demonstrate that Cefdet exhibits superior performance against several mainstream algorithms on the public datasets, validating its effectiveness and superiority.

References

[1]
Qianyue Bao, Fang Liu, Yang Liu, Licheng Jiao, Xu Liu, and Lingling Li. 2022. Hierarchical scene normality-binding modeling for anomaly detection in surveillance videos. In Proceedings of the 30th ACM international conference on multimedia. 6103--6112.
[2]
Bin Cao, Jianwei Zhao, Zhihan Lv, Yu Gu, Peng Yang, and Saman K Halgamuge. 2020. Multiobjective evolution of fuzzy rough neural network via distributed parallelism for stock prediction. IEEE Transactions on Fuzzy Systems 28, 5 (2020), 939--952.
[3]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[4]
Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, and Limin Wang. 2023. Cycleacr: Cycle modeling of actor-context relations for video action detection. arXiv preprint arXiv:2303.16118 (2023).
[5]
Shoufa Chen, Peize Sun, Enze Xie, Chongjian Ge, Jiannan Wu, Lan Ma, Jiajun Shen, and Ping Luo. 2021. Watch only once: An end-to-end video action detection framework. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8178--8187.
[6]
Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Yu Guan, Xuming He, and Errui Ding. 2021. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition. In Proceedings of the 29th ACM international conference on multimedia. 4334--4342.
[7]
Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng, and Mohan Kankanhalli. 2023. Sample less, learn more: Efficient action recognition via frame feature restoration. In Proceedings of the 31st ACM International Conference on Multimedia. 7101--7110.
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[9]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2625--2634.
[10]
Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. 2022. Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2969--2978.
[11]
Hehe Fan and Mohan Kankanhalli. 2021. Motion= video-content: Towards unsupervised learning of motion representation from videos. In Proceedings of the 3rd ACM International Conference on Multimedia in Asia. 1--7.
[12]
Gueter Josmy Faure, Min-Hung Chen, and Shang-Hong Lai. 2023. Holistic interaction transformer network for action detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3340--3350.
[13]
Christoph Feichtenhofer. 2020. X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 203--213.
[14]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slow- fast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision. 6202--6211.
[15]
Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, et al. 2018. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6047--6056.
[16]
Weili Guan, Xuemeng Song, Kejie Wang, Haokun Wen, Hongda Ni, Yaowei Wang, and Xiaojun Chang. 2023. Egocentric early action prediction via multimodal transformer-based dual action prediction. IEEE Transactions on Circuits and Systems for Video Technology 33, 9 (2023), 4472--4483.
[17]
Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J Black. 2013. Towards understanding action recognition. In Proceedings of the IEEE international conference on computer vision. 3192--3199.
[18]
Eun-Hu Kim, Sung-Kwun Oh, and Witold Pedrycz. 2017. Design of reinforced interval type-2 fuzzy c-means-based fuzzy classifier. IEEE Transactions on Fuzzy Systems 26, 5 (2017), 3054--3068.
[19]
Okan Köpüklü, Xiangyu Wei, and Gerhard Rigoll. 2019. You only watch once: A unified cnn architecture for real-time spatiotemporal action localization. arXiv preprint arXiv:1911.06644 (2019).
[20]
Hongyi Li, Jiahui Wang, Ligang Wu, Hak-Keung Lam, and Yabin Gao. 2017. Optimal guaranteed cost sliding-mode control of interval type-2 fuzzy time-delay systems. IEEE Transactions on Fuzzy Systems 26, 1 (2017), 246--257.
[21]
Yixuan Li, Zixu Wang, Limin Wang, and Gangshan Wu. 2020. Actions as moving points. In Computer Vision?ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVI 16. Springer, 68--84.
[22]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740--755.
[23]
Min Liu, Yuan Bian, Qing Liu, Xueping Wang, and Yaonan Wang. 2023. Weakly supervised tracklet association learning with video labels for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023), 3595--3607.
[24]
Min Liu, Fei Wang, Xueping Wang, Yaonan Wang, and Amit K Roy-Chowdhury. 2024. A two-stage noise-tolerant paradigm for label corrupted person reidentification. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024), 4944--4956.
[25]
Shuai Liu, Shuai Wang, Xinyu Liu, Chin-Teng Lin, and Zhihan Lv. 2020. Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Transactions on Fuzzy Systems 29, 1 (2020), 90--102.
[26]
Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Lingling Li, Yuwei Guo, and Puhua Chen. 2024. A knowledge-based hierarchical causal inference network for video action recognition. IEEE Transactions on Multimedia (2024), 1--16.
[27]
Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, and Hongsheng Li. 2021. Actor-context-actor relation network for spatio-temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 464--474.
[28]
Xiaojiang Peng and Cordelia Schmid. 2016. Multi-region two-stream R-CNN for action detection. In Computer Vision?ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV 14. Springer, 744--759.
[29]
Rizard Renanda Adhi Pramono, Yie-Tarng Chen, and Wen-Hsien Fang. 2019. Hierarchical self-attention network for action localization in videos. In Proceedings of the IEEE/CVF international conference on computer vision. 61--70.
[30]
Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representa- tion with pseudo-3d residual networks. In proceedings of the IEEE International Conference on Computer Vision. 5533--5541.
[31]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: To- wards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 6 (2016), 1137--1149.
[32]
Nannan Rong, Zhanshan Wang, and Huaguang Zhang. 2018. Finite-time stabilization for discontinuous interconnected delayed systems via interval type-2 T-S fuzzy model approach. IEEE Transactions on Fuzzy Systems 27, 2 (2018), 249--261.
[33]
Adrian Rubio-Solis, George Panoutsos, Carlos Beltran-Perez, and Uriel Martinez- Hernandez. 2020. A multilayer interval type-2 fuzzy extreme learning machine for the recognition of walking activities and gait events using wearable sensors. Neurocomputing 389 (2020), 42--55.
[34]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (2014).
[35]
Ashish Singh, Michael J Jones, and Erik G Learned-Miller. 2023. Eval: Explainable video anomaly localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18717--18726.
[36]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
[37]
Rui Su, Wanli Ouyang, Luping Zhou, and Dong Xu. 2019. Improving action local- ization by progressive cross-stream cooperation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12016--12025.
[38]
Lin Sui, Chen-Lin Zhang, Lixin Gu, and Feng Han. 2023. A simple and efficient pipeline to build an end-to-end spatial-temporal action detector. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 5999--6008.
[39]
Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, and Cordelia Schmid. 2018. Actor-centric relation network. In Proceedings of the European Conference on Computer Vision (ECCV). 318--334.
[40]
Jiajun Tang, Jin Xia, Xinzhi Mu, Bo Pang, and Cewu Lu. 2020. Asynchronous interaction aggregation for action detection. In Computer Vision?ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XV 16. Springer, 71--87.
[41]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20--36.
[42]
Chao-Yuan Wu and Philipp Krahenbuhl. 2021. Towards long-form video un- derstanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1884--1894.
[43]
Jianchao Wu, Zhanghui Kuang, Limin Wang, Wayne Zhang, and Gangshan Wu. 2020. Context-aware rcnn: A baseline for action detection in videos. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV 16. Springer, 440--456.
[44]
Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry S Davis, and Heng Wang. 2021. Beyond short clips: End-to-end video-level learning with collaborative memories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7567--7576.
[45]
Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Bing Shuai, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, et al. 2022. Tuber: Tubelet transformer for video action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13598--13607.
[46]
Weiji Zhao, Kefeng Huang, and Chongyang Zhang. 2023. Modulation-Based Center Alignment and Motion Mining for Spatial Temporal Action Detection. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1--5.
[47]
Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal relational reasoning in videos. In Proceedings of the European conference on computer vision (ECCV). 803--818.

Index Terms

  1. Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Check for updates

    Author Tags

    1. action detection
    2. feature fusion
    3. fuzzy inference
    4. multimedia content
    5. visual cognition

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 39
      Total Downloads
    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)39
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media