More Web Proxy on the site http://driver.im/

research-article

Open access

Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection

Authors:

Muhammad Saqib,

Khan MuhammadAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 7985 - 7994

https://doi.org/10.1145/3664647.3681226

Published: 28 October 2024 Publication History

Abstract

Action detection and understanding provide the foundation for the generation and interaction of multimedia content. However, existing methods mainly focus on constructing complex relational inference networks, overlooking the judgment of detection effectiveness. Moreover, these methods frequently generate detection results with cognitive abnormalities. To solve the above problems, this study proposes a cognitive effectiveness network based on fuzzy inference (Cefdet), which introduces the concept of 'cognition--based detection' to simulate human cognition. First, a fuzzy-driven cognitive effectiveness evaluation module (FCM) is established to introduce fuzzy inference into action detection. FCM is combined with human action features to simulate the cognition-based detection process, which clearly locates the position of frames with cognitive abnormalities. Then, a fuzzy cognitive update strategy (FCS) is proposed based on the FCM, which utilizes fuzzy logic to re-detect the cognition-based detection results and effectively update the results with cognitive abnormalities. Experimental results demonstrate that Cefdet exhibits superior performance against several mainstream algorithms on the public datasets, validating its effectiveness and superiority.

References

[1]

Qianyue Bao, Fang Liu, Yang Liu, Licheng Jiao, Xu Liu, and Lingling Li. 2022. Hierarchical scene normality-binding modeling for anomaly detection in surveillance videos. In Proceedings of the 30th ACM international conference on multimedia. 6103--6112.

Digital Library

[2]

Bin Cao, Jianwei Zhao, Zhihan Lv, Yu Gu, Peng Yang, and Saman K Halgamuge. 2020. Multiobjective evolution of fuzzy rough neural network via distributed parallelism for stock prediction. IEEE Transactions on Fuzzy Systems 28, 5 (2020), 939--952.

[3]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.

[4]

Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, and Limin Wang. 2023. Cycleacr: Cycle modeling of actor-context relations for video action detection. arXiv preprint arXiv:2303.16118 (2023).

[5]

Shoufa Chen, Peize Sun, Enze Xie, Chongjian Ge, Jiannan Wu, Lan Ma, Jiajun Shen, and Ping Luo. 2021. Watch only once: An end-to-end video action detection framework. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8178--8187.

[6]

Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Yu Guan, Xuming He, and Errui Ding. 2021. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition. In Proceedings of the 29th ACM international conference on multimedia. 4334--4342.

Digital Library

[7]

Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng, and Mohan Kankanhalli. 2023. Sample less, learn more: Efficient action recognition via frame feature restoration. In Proceedings of the 31st ACM International Conference on Multimedia. 7101--7110.

Digital Library

[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[9]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2625--2634.

[10]

Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. 2022. Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2969--2978.

[11]

Hehe Fan and Mohan Kankanhalli. 2021. Motion= video-content: Towards unsupervised learning of motion representation from videos. In Proceedings of the 3rd ACM International Conference on Multimedia in Asia. 1--7.

Digital Library

[12]

Gueter Josmy Faure, Min-Hung Chen, and Shang-Hong Lai. 2023. Holistic interaction transformer network for action detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3340--3350.

[13]

Christoph Feichtenhofer. 2020. X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 203--213.

[14]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slow- fast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision. 6202--6211.

[15]

Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, et al. 2018. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6047--6056.

[16]

Weili Guan, Xuemeng Song, Kejie Wang, Haokun Wen, Hongda Ni, Yaowei Wang, and Xiaojun Chang. 2023. Egocentric early action prediction via multimodal transformer-based dual action prediction. IEEE Transactions on Circuits and Systems for Video Technology 33, 9 (2023), 4472--4483.

Digital Library

[17]

Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J Black. 2013. Towards understanding action recognition. In Proceedings of the IEEE international conference on computer vision. 3192--3199.

Digital Library

[18]

Eun-Hu Kim, Sung-Kwun Oh, and Witold Pedrycz. 2017. Design of reinforced interval type-2 fuzzy c-means-based fuzzy classifier. IEEE Transactions on Fuzzy Systems 26, 5 (2017), 3054--3068.

Digital Library

[19]

Okan Köpüklü, Xiangyu Wei, and Gerhard Rigoll. 2019. You only watch once: A unified cnn architecture for real-time spatiotemporal action localization. arXiv preprint arXiv:1911.06644 (2019).

[20]

Hongyi Li, Jiahui Wang, Ligang Wu, Hak-Keung Lam, and Yabin Gao. 2017. Optimal guaranteed cost sliding-mode control of interval type-2 fuzzy time-delay systems. IEEE Transactions on Fuzzy Systems 26, 1 (2017), 246--257.

[21]

Yixuan Li, Zixu Wang, Limin Wang, and Gangshan Wu. 2020. Actions as moving points. In Computer Vision?ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVI 16. Springer, 68--84.

Digital Library

[22]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740--755.

[23]

Min Liu, Yuan Bian, Qing Liu, Xueping Wang, and Yaonan Wang. 2023. Weakly supervised tracklet association learning with video labels for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023), 3595--3607.

[24]

Min Liu, Fei Wang, Xueping Wang, Yaonan Wang, and Amit K Roy-Chowdhury. 2024. A two-stage noise-tolerant paradigm for label corrupted person reidentification. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024), 4944--4956.

[25]

Shuai Liu, Shuai Wang, Xinyu Liu, Chin-Teng Lin, and Zhihan Lv. 2020. Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Transactions on Fuzzy Systems 29, 1 (2020), 90--102.

Digital Library

[26]

Yang Liu, Fang Liu, Licheng Jiao, Qianyue Bao, Lingling Li, Yuwei Guo, and Puhua Chen. 2024. A knowledge-based hierarchical causal inference network for video action recognition. IEEE Transactions on Multimedia (2024), 1--16.

Digital Library

[27]

Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, and Hongsheng Li. 2021. Actor-context-actor relation network for spatio-temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 464--474.

[28]

Xiaojiang Peng and Cordelia Schmid. 2016. Multi-region two-stream R-CNN for action detection. In Computer Vision?ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV 14. Springer, 744--759.

[29]

Rizard Renanda Adhi Pramono, Yie-Tarng Chen, and Wen-Hsien Fang. 2019. Hierarchical self-attention network for action localization in videos. In Proceedings of the IEEE/CVF international conference on computer vision. 61--70.

[30]

Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representa- tion with pseudo-3d residual networks. In proceedings of the IEEE International Conference on Computer Vision. 5533--5541.

[31]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: To- wards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 6 (2016), 1137--1149.

[32]

Nannan Rong, Zhanshan Wang, and Huaguang Zhang. 2018. Finite-time stabilization for discontinuous interconnected delayed systems via interval type-2 T-S fuzzy model approach. IEEE Transactions on Fuzzy Systems 27, 2 (2018), 249--261.

[33]

Adrian Rubio-Solis, George Panoutsos, Carlos Beltran-Perez, and Uriel Martinez- Hernandez. 2020. A multilayer interval type-2 fuzzy extreme learning machine for the recognition of walking activities and gait events using wearable sensors. Neurocomputing 389 (2020), 42--55.

[34]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (2014).

[35]

Ashish Singh, Michael J Jones, and Erik G Learned-Miller. 2023. Eval: Explainable video anomaly localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18717--18726.

[36]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).

[37]

Rui Su, Wanli Ouyang, Luping Zhou, and Dong Xu. 2019. Improving action local- ization by progressive cross-stream cooperation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12016--12025.

[38]

Lin Sui, Chen-Lin Zhang, Lixin Gu, and Feng Han. 2023. A simple and efficient pipeline to build an end-to-end spatial-temporal action detector. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 5999--6008.

[39]

Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, and Cordelia Schmid. 2018. Actor-centric relation network. In Proceedings of the European Conference on Computer Vision (ECCV). 318--334.

Digital Library

[40]

Jiajun Tang, Jin Xia, Xinzhi Mu, Bo Pang, and Cewu Lu. 2020. Asynchronous interaction aggregation for action detection. In Computer Vision?ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XV 16. Springer, 71--87.

Digital Library

[41]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20--36.

[42]

Chao-Yuan Wu and Philipp Krahenbuhl. 2021. Towards long-form video un- derstanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1884--1894.

[43]

Jianchao Wu, Zhanghui Kuang, Limin Wang, Wayne Zhang, and Gangshan Wu. 2020. Context-aware rcnn: A baseline for action detection in videos. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV 16. Springer, 440--456.

Digital Library

[44]

Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry S Davis, and Heng Wang. 2021. Beyond short clips: End-to-end video-level learning with collaborative memories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7567--7576.

[45]

Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Bing Shuai, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, et al. 2022. Tuber: Tubelet transformer for video action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13598--13607.

[46]

Weiji Zhao, Kefeng Huang, and Chongyang Zhang. 2023. Modulation-Based Center Alignment and Motion Mining for Spatial Temporal Action Detection. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1--5.

[47]

Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal relational reasoning in videos. In Proceedings of the European conference on computer vision (ECCV). 803--818.

Digital Library

Index Terms

Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

On the monotonicity of fuzzy-inference methods related to T-S inference method
Special section on computing with words

Yubazaki et al. have proposed a "single-input rule modules connected-type fuzzy-inference method" (SIRMs method) whose final output is obtained by combining the products of the importance degrees and the inference results from single-input fuzzy-rule ...
Fuzzy subsethood for fuzzy sets of type-2 and generalized type-n

In this paper,we use Zadeh's extension principle to extend Kosko's definition of the fuzzy subsethood measure S(G, H) to type-2 fuzzy sets defined on any set X equipped with a measure. Subsethood is itself a fuzzy set that is a crisp interval when G and ...
Fuzzy inference based on fuzzy concept lattice

In this paper, a fuzzy inference method based on the notion of fuzzy concept lattice is studied. We first propose a new form of fuzzy concept lattice, and then based on three kinds of known fuzzy concept lattices and our new fuzzy concept lattice, two ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
161
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)104

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten