More Web Proxy on the site http://driver.im/

research-article

Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection

Authors:

Shengjin WangAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 33

Pages 4923 - 4936

https://doi.org/10.1109/TIP.2024.3451935

Published: 05 September 2024 Publication History

Abstract

Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution networks or self-attention mechanisms alongside multiple instance learning (MIL)-based classification loss to model temporal relations and learn discriminative features. However, these approaches are limited in two aspects: 1) Multi-branch parallel architectures, while capturing multi-scale temporal dependencies, inevitably lead to increased parameter and computational costs. 2) The binarized MIL constraint only ensures the interclass separability while neglecting the fine-grained discriminability within anomalous classes. To this end, we introduce a novel WS-VAD framework that focuses on efficient temporal modeling and anomaly innerclass discriminability. We first construct a Temporal Context Aggregation (TCA) module that simultaneously captures local-global dependencies by reusing an attention matrix along with adaptive context fusion. In addition, we propose a Prompt-Enhanced Learning (PEL) module that incorporates semantic priors using knowledge-based prompts to boost the discrimination of visual features while ensuring separability across anomaly subclasses. The proposed components have been validated through extensive experiments, which demonstrate superior performance on three challenging datasets, UCF-Crime, XD-Violence and ShanghaiTech, with fewer parameters and reduced computational effort. Notably, our method can significantly improve the detection accuracy for certain anomaly subclasses and reduced the false alarm rate. Our code is available at: <uri>https://github.com/yujiangpu20/PEL4VAD</uri>.

References

[1]

A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robust real-time unusual event detection using multiple fixed-location monitors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 555–560, Mar. 2008.

Digital Library

[2]

B. Ramachandra, M. J. Jones, and R. R. Vatsavai, “A survey of single-scene video anomaly detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2293–2312, May 2022.

[3]

Y. Liu et al., “Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models,” ACM Comput. Surv., vol. 56, no. 7, pp. 1–38, Jul. 2024.

Digital Library

[4]

J. Liu et al., “Networking systems for video anomaly detection: A tutorial and survey,” 2024, arXiv:2405.10347.

[5]

G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for anomaly detection: A review,” ACM Comput. Surv., vol. 54, no. 2, pp. 1–38, Mar. 2021.

Digital Library

[6]

M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning temporal regularity in video sequences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 733–742.

[7]

D. Gong et al., “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1705–1714.

[8]

H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 14360–14369.

[9]

W. Luo et al., “Video anomaly detection with sparse coding inspired deep neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 3, pp. 1070–1084, Mar. 2021.

[10]

C. Chen et al., “Comprehensive regularization in a Bi-directional predictive network for video anomaly detection,” in Proc. AAAI Conf. Artif. Intell., Jun. 2022, vol. 36, no. 1, pp. 230–238.

[11]

L. Wang, J. Tian, S. Zhou, H. Shi, and G. Hua, “Memory-augmented appearance-motion network for video anomaly detection,” Pattern Recognit., vol. 138, Jun. 2023, Art. no.

[12]

W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 6479–6488.

[13]

P. Wu and J. Liu, “Learning causal temporal relation and feature discrimination for anomaly detection,” IEEE Trans. Image Process., vol. 30, pp. 3513–3527, 2021.

Digital Library

[14]

J.-C. Wu, H.-Y. Hsieh, D.-J. Chen, C.-S. Fuh, and T.-L. Liu, “Self-supervised sparse representation for video anomaly detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, pp. 729–745.

[15]

C. Zhang et al., “Exploiting completeness and uncertainty of pseudo labels for weakly supervised video anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 16271–16280.

[16]

P. Wu et al., “Not only look, but also listen: Learning multimodal violence detection under weak supervision,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Aug. 2020, pp. 322–339.

[17]

Y. Tian, G. Pang, Y. Chen, R. Singh, J. W. Verjans, and G. Carneiro, “Weakly-supervised video anomaly detection with robust temporal feature magnitude learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 4955–4966.

[18]

H. Zhou, J. Yu, and W. Yang, “Dual memory units with uncertainty regulation for weakly supervised video anomaly detection,” 2023, arXiv:2302.05160.

[19]

A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inform. Process. Syst., vol. 30, 2017, pp. 5998–6008.

[20]

M. Cho, M. Kim, S. Hwang, C. Park, K. Lee, and S. Lee, “Look around for anomalies: Weakly-supervised anomaly detection via context-motion relational learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 12137–12146.

[21]

X. Wu, Y. Pu, S. Wang, and Z. Liu, “Special video recognition based on semantic embedding learning,” Acta Electronica Sinica, vol. 51, no. 11, pp. 3225–3237, 2023.

[22]

S. Park, H. Kim, M. Kim, D. Kim, and K. Sohn, “Normality guided multiple instance learning for weakly supervised video anomaly detection,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2023, pp. 2664–2673.

[23]

H. Lv, Z. Yue, Q. Sun, B. Luo, Z. Cui, and H. Zhang, “Unbiased multiple instance learning for weakly supervised video anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 8022–8031.

[24]

J. Yu, J. Liu, Y. Cheng, R. Feng, and Y. Zhang, “Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection,” in Proc. 30th ACM Int. Conf. Multimedia, Oct. 2022, pp. 6278–6287.

[25]

D. Wei, Y. Liu, X. Zhu, J. Liu, and X. Zeng, “MSAF: Multimodal supervise-attention enhanced fusion for video anomaly detection,” IEEE Signal Process. Lett., vol. 29, pp. 2178–2182, 2022.

[26]

Y. Pu and X. Wu, “Audio-guided attention network for weakly supervised violence detection,” in Proc. IEEE Int. Conf. Consum. Electron. Comput. Eng. (ICCECE), Jan. 2022, pp. 219–223.

[27]

A. Flaborea, L. Collorone, G. M. D’. Di Melendugno, S. D’Arrigo, B. Prenkaj, and F. Galasso, “Multimodal motion conditioned diffusion model for skeleton-based video anomaly detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 10318–10329.

[28]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Comput. Surv., vol. 55, no. 9, pp. 1–35, Jan. 2023.

Digital Library

[29]

A. Radford et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 139, 2021, pp. 8748–8763.

[30]

M. Wang, J. Xing, and Y. Liu, “ActionCLIP: A new paradigm for video action recognition,” 2021, arXiv:2109.08472.

[31]

K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” Int. J. Comput. Vis., vol. 130, no. 9, pp. 2337–2348, Jul. 2022.

Digital Library

[32]

C. Ju, T. Han, K. Zheng, Y. Zhang, and W. Xie, “Prompting visual-language models for efficient video understanding,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 105–124.

[33]

R. Speer, J. Chin, and C. Havasi, “ConceptNet 5.5: An open multilingual graph of general knowledge,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 4444–4451.

[34]

R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 935–942.

[35]

B. Zhao, L. Fei-Fei, and E. P. Xing, “Online detection of unusual events in videos via dynamic sparse coding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 3313–3320.

[36]

C. Lu, J. Shi, and J. Jia, “Abnormal event detection at 150 FPS in MATLAB,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2013, pp. 2720–2727.

[37]

W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 18–32, Jan. 2014.

Digital Library

[38]

D. Xu, Y. Yan, E. Ricci, and N. Sebe, “Detecting anomalous events in videos by learning deep representations of appearance and motion,” Comput. Vis. Image Understand., vol. 156, pp. 117–127, Mar. 2017.

Digital Library

[39]

M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 3379–3388.

[40]

P. Wu, J. Liu, and F. Shen, “A deep one-class neural network for anomalous event detection in complex scenes,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 7, pp. 2609–2622, Jul. 2020.

[41]

J. Wang and A. Cherian, “GODS: Generalized one-class discriminative subspaces for anomaly detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 8200–8210.

[42]

W. Liu, W. Luo, D. Lian, and S. Gao, “Future frame prediction for anomaly Detection—A new baseline,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6536–6545.

[43]

M. Z. Zaheer, J.-H. Lee, M. Astrid, and S.-I. Lee, “Old is gold: Redefining the adversarially learned one-class classifier training paradigm,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 14171–14181.

[44]

J. R. Medel and A. Savakis, “Anomaly detection in video using predictive convolutional long short-term memory networks,” 2016, arXiv:1612.00390.

[45]

Y. Liu et al., “AMP-Net: Appearance-motion prototype network assisted automatic video anomaly detection system,” IEEE Trans. Ind. Informat., vol. 20, no. 2, pp. 2843–2855, Feb. 2024.

[46]

Y. Liu et al., “Learning causality-inspired representation consistency for video anomaly detection,” in Proc. 31st ACM Int. Conf. Multimedia, Oct. 2023, pp. 203–212.

[47]

J.-X. Zhong, N. Li, W. Kong, S. Liu, T. H. Li, and G. Li, “Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1237–1246.

[48]

J.-C. Feng, F.-T. Hong, and W.-S. Zheng, “MIST: Multiple instance self-training framework for video anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 14009–14018.

[49]

S. Li, F. Liu, and L. Jiao, “Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection,” in Proc. AAAI Conf. Artif. Intell., 2022, vol. 36, no. 2, pp. 1395–1403.

[50]

Z. Yang, Y. Wang, X. Chen, J. Liu, and Y. Qiao, “Context-transformer: Tackling object confusion for few-shot detection,” in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 7, pp. 12653–12660.

[51]

Y. Zhen, Y. Guo, J. Wei, X. Bao, and D. Huang, “Multi-scale background suppression anomaly detection in surveillance videos,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2021, pp. 1114–1118.

[52]

T. Gao, A. Fisch, and D. Chen, “Making pre-trained language models better few-shot learners,” 2020, arXiv:2012.15723.

[53]

L. Yao et al., “DetCLIP: Dictionary-enriched visual-concept paralleled pre-training for open-world detection,” 2022, arXiv:2209.09407.

[54]

G. A. Miller, “WordNet,” Commun. ACM, vol. 38, no. 11, pp. 39–41, Nov. 1995.

Digital Library

[55]

J. Carreira and A. Zisserman, “Quo Vadis, action recognition? A new model and the kinetics dataset,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4724–4733.

[56]

Z. Yu, J. Yu, J. Fan, and D. Tao, “Multi-modal factorized bilinear pooling with co-attention learning for visual question answering,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1839–1848.

[57]

W. Luo, W. Liu, and S. Gao, “A revisit of sparse coding based anomaly detection in stacked RNN framework,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 341–349.

[58]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.

[59]

Y. Pu and X. Wu, “Locality-aware attention network with discriminative dynamics learning for weakly supervised anomaly detection,” in Proc. IEEE Int. Conf. Multimedia Expo. (ICME), Jul. 2022, pp. 1–6.

[60]

J. Zhang, L. Qing, and J. Miao, “Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2019, pp. 4030–4034.

[61]

C. Zhang, G. Li, Q. Xu, X. Zhang, L. Su, and Q. Huang, “Weakly supervised anomaly detection in videos considering the openness of events,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 11, pp. 21687–21699, Nov. 2022.

[62]

S. AlMarri, M. Zaigham Zaheer, and K. Nandakumar, “A multi-head approach with shuffled segments for weakly-supervised video anomaly detection,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. Workshops (WACVW), Jan. 2024, pp. 132–142.

[63]

B. Schölkopf, R. C. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt, “Support vector method for novelty detection,” in Proc. Adv. Neural Inf. Process. Syst., vol. 12, 1999, pp. 582–588.

[64]

S. Chang, Y. Li, S. Shen, J. Feng, and Z. Zhou, “Contrastive attention for video anomaly detection,” IEEE Trans. Multimedia, vol. 24, pp. 4067–4076, 2022.

[65]

Z. Liu, Y. Nie, C. Long, Q. Zhang, and G. Li, “A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 13568–13577.

[66]

M. Z. Zaheer, A. Mahmood, M. Astrid, and S.-I. Lee, “CLAWS: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020, pp. 358–376.

[67]

B. Wan, Y. Fang, X. Xia, and J. Mei, “Weakly supervised video anomaly detection via center-guided discriminative learning,” in Proc. IEEE Int. Conf. Multimedia Expo. (ICME), Jul. 2020, pp. 1–6.

Index Terms

Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Activity recognition and understanding
        Video summarization
        Visual content-based indexing and retrieval
  2. Machine learning
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Weakly-supervised Anomaly Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Recent semi-supervised anomaly detection methods that are trained using small labeled anomaly examples and large unlabeled data (mostly normal data) have shown largely improved performance over unsupervised methods. However, these methods often focus on ...
Weakly-Supervised Video Anomaly Detection With Snippet Anomalous Attention
With a focus on abnormal events contained within untrimmed videos, there is increasing interest among researchers in video anomaly detection. Among different video anomaly detection scenarios, weakly-supervised video anomaly detection poses a significant ...
Abnormal Ratios Guided Multi-Phase Self-Training for Weakly-Supervised Video Anomaly Detection
Weakly-supervised Video Anomaly Detection (W-VAD) aims to detect abnormal events in videos given only video-level labels for training. Recent methods relying on multiple instance learning (MIL) and self-training achieve good performance, but they tend to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 33, Issue

2024

5933 pages

ISSN:1057-7149

Issue’s Table of Contents

1941-0042 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 05 September 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents