[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection

Published: 05 September 2024 Publication History

Abstract

Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution networks or self-attention mechanisms alongside multiple instance learning (MIL)-based classification loss to model temporal relations and learn discriminative features. However, these approaches are limited in two aspects: 1) Multi-branch parallel architectures, while capturing multi-scale temporal dependencies, inevitably lead to increased parameter and computational costs. 2) The binarized MIL constraint only ensures the interclass separability while neglecting the fine-grained discriminability within anomalous classes. To this end, we introduce a novel WS-VAD framework that focuses on efficient temporal modeling and anomaly innerclass discriminability. We first construct a Temporal Context Aggregation (TCA) module that simultaneously captures local-global dependencies by reusing an attention matrix along with adaptive context fusion. In addition, we propose a Prompt-Enhanced Learning (PEL) module that incorporates semantic priors using knowledge-based prompts to boost the discrimination of visual features while ensuring separability across anomaly subclasses. The proposed components have been validated through extensive experiments, which demonstrate superior performance on three challenging datasets, UCF-Crime, XD-Violence and ShanghaiTech, with fewer parameters and reduced computational effort. Notably, our method can significantly improve the detection accuracy for certain anomaly subclasses and reduced the false alarm rate. Our code is available at: <uri>https://github.com/yujiangpu20/PEL4VAD</uri>.

References

[1]
A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robust real-time unusual event detection using multiple fixed-location monitors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 555–560, Mar. 2008.
[2]
B. Ramachandra, M. J. Jones, and R. R. Vatsavai, “A survey of single-scene video anomaly detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2293–2312, May 2022.
[3]
Y. Liu et al., “Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models,” ACM Comput. Surv., vol. 56, no. 7, pp. 1–38, Jul. 2024.
[4]
J. Liu et al., “Networking systems for video anomaly detection: A tutorial and survey,” 2024, arXiv:2405.10347.
[5]
G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for anomaly detection: A review,” ACM Comput. Surv., vol. 54, no. 2, pp. 1–38, Mar. 2021.
[6]
M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning temporal regularity in video sequences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 733–742.
[7]
D. Gong et al., “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1705–1714.
[8]
H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 14360–14369.
[9]
W. Luo et al., “Video anomaly detection with sparse coding inspired deep neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 3, pp. 1070–1084, Mar. 2021.
[10]
C. Chen et al., “Comprehensive regularization in a Bi-directional predictive network for video anomaly detection,” in Proc. AAAI Conf. Artif. Intell., Jun. 2022, vol. 36, no. 1, pp. 230–238.
[11]
L. Wang, J. Tian, S. Zhou, H. Shi, and G. Hua, “Memory-augmented appearance-motion network for video anomaly detection,” Pattern Recognit., vol. 138, Jun. 2023, Art. no.
[12]
W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 6479–6488.
[13]
P. Wu and J. Liu, “Learning causal temporal relation and feature discrimination for anomaly detection,” IEEE Trans. Image Process., vol. 30, pp. 3513–3527, 2021.
[14]
J.-C. Wu, H.-Y. Hsieh, D.-J. Chen, C.-S. Fuh, and T.-L. Liu, “Self-supervised sparse representation for video anomaly detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, pp. 729–745.
[15]
C. Zhang et al., “Exploiting completeness and uncertainty of pseudo labels for weakly supervised video anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 16271–16280.
[16]
P. Wu et al., “Not only look, but also listen: Learning multimodal violence detection under weak supervision,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Aug. 2020, pp. 322–339.
[17]
Y. Tian, G. Pang, Y. Chen, R. Singh, J. W. Verjans, and G. Carneiro, “Weakly-supervised video anomaly detection with robust temporal feature magnitude learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 4955–4966.
[18]
H. Zhou, J. Yu, and W. Yang, “Dual memory units with uncertainty regulation for weakly supervised video anomaly detection,” 2023, arXiv:2302.05160.
[19]
A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inform. Process. Syst., vol. 30, 2017, pp. 5998–6008.
[20]
M. Cho, M. Kim, S. Hwang, C. Park, K. Lee, and S. Lee, “Look around for anomalies: Weakly-supervised anomaly detection via context-motion relational learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 12137–12146.
[21]
X. Wu, Y. Pu, S. Wang, and Z. Liu, “Special video recognition based on semantic embedding learning,” Acta Electronica Sinica, vol. 51, no. 11, pp. 3225–3237, 2023.
[22]
S. Park, H. Kim, M. Kim, D. Kim, and K. Sohn, “Normality guided multiple instance learning for weakly supervised video anomaly detection,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2023, pp. 2664–2673.
[23]
H. Lv, Z. Yue, Q. Sun, B. Luo, Z. Cui, and H. Zhang, “Unbiased multiple instance learning for weakly supervised video anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 8022–8031.
[24]
J. Yu, J. Liu, Y. Cheng, R. Feng, and Y. Zhang, “Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection,” in Proc. 30th ACM Int. Conf. Multimedia, Oct. 2022, pp. 6278–6287.
[25]
D. Wei, Y. Liu, X. Zhu, J. Liu, and X. Zeng, “MSAF: Multimodal supervise-attention enhanced fusion for video anomaly detection,” IEEE Signal Process. Lett., vol. 29, pp. 2178–2182, 2022.
[26]
Y. Pu and X. Wu, “Audio-guided attention network for weakly supervised violence detection,” in Proc. IEEE Int. Conf. Consum. Electron. Comput. Eng. (ICCECE), Jan. 2022, pp. 219–223.
[27]
A. Flaborea, L. Collorone, G. M. D’. Di Melendugno, S. D’Arrigo, B. Prenkaj, and F. Galasso, “Multimodal motion conditioned diffusion model for skeleton-based video anomaly detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 10318–10329.
[28]
P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Comput. Surv., vol. 55, no. 9, pp. 1–35, Jan. 2023.
[29]
A. Radford et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 139, 2021, pp. 8748–8763.
[30]
M. Wang, J. Xing, and Y. Liu, “ActionCLIP: A new paradigm for video action recognition,” 2021, arXiv:2109.08472.
[31]
K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” Int. J. Comput. Vis., vol. 130, no. 9, pp. 2337–2348, Jul. 2022.
[32]
C. Ju, T. Han, K. Zheng, Y. Zhang, and W. Xie, “Prompting visual-language models for efficient video understanding,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 105–124.
[33]
R. Speer, J. Chin, and C. Havasi, “ConceptNet 5.5: An open multilingual graph of general knowledge,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 4444–4451.
[34]
R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 935–942.
[35]
B. Zhao, L. Fei-Fei, and E. P. Xing, “Online detection of unusual events in videos via dynamic sparse coding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 3313–3320.
[36]
C. Lu, J. Shi, and J. Jia, “Abnormal event detection at 150 FPS in MATLAB,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2013, pp. 2720–2727.
[37]
W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 18–32, Jan. 2014.
[38]
D. Xu, Y. Yan, E. Ricci, and N. Sebe, “Detecting anomalous events in videos by learning deep representations of appearance and motion,” Comput. Vis. Image Understand., vol. 156, pp. 117–127, Mar. 2017.
[39]
M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 3379–3388.
[40]
P. Wu, J. Liu, and F. Shen, “A deep one-class neural network for anomalous event detection in complex scenes,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 7, pp. 2609–2622, Jul. 2020.
[41]
J. Wang and A. Cherian, “GODS: Generalized one-class discriminative subspaces for anomaly detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 8200–8210.
[42]
W. Liu, W. Luo, D. Lian, and S. Gao, “Future frame prediction for anomaly Detection—A new baseline,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6536–6545.
[43]
M. Z. Zaheer, J.-H. Lee, M. Astrid, and S.-I. Lee, “Old is gold: Redefining the adversarially learned one-class classifier training paradigm,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 14171–14181.
[44]
J. R. Medel and A. Savakis, “Anomaly detection in video using predictive convolutional long short-term memory networks,” 2016, arXiv:1612.00390.
[45]
Y. Liu et al., “AMP-Net: Appearance-motion prototype network assisted automatic video anomaly detection system,” IEEE Trans. Ind. Informat., vol. 20, no. 2, pp. 2843–2855, Feb. 2024.
[46]
Y. Liu et al., “Learning causality-inspired representation consistency for video anomaly detection,” in Proc. 31st ACM Int. Conf. Multimedia, Oct. 2023, pp. 203–212.
[47]
J.-X. Zhong, N. Li, W. Kong, S. Liu, T. H. Li, and G. Li, “Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1237–1246.
[48]
J.-C. Feng, F.-T. Hong, and W.-S. Zheng, “MIST: Multiple instance self-training framework for video anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 14009–14018.
[49]
S. Li, F. Liu, and L. Jiao, “Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection,” in Proc. AAAI Conf. Artif. Intell., 2022, vol. 36, no. 2, pp. 1395–1403.
[50]
Z. Yang, Y. Wang, X. Chen, J. Liu, and Y. Qiao, “Context-transformer: Tackling object confusion for few-shot detection,” in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 7, pp. 12653–12660.
[51]
Y. Zhen, Y. Guo, J. Wei, X. Bao, and D. Huang, “Multi-scale background suppression anomaly detection in surveillance videos,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2021, pp. 1114–1118.
[52]
T. Gao, A. Fisch, and D. Chen, “Making pre-trained language models better few-shot learners,” 2020, arXiv:2012.15723.
[53]
L. Yao et al., “DetCLIP: Dictionary-enriched visual-concept paralleled pre-training for open-world detection,” 2022, arXiv:2209.09407.
[54]
G. A. Miller, “WordNet,” Commun. ACM, vol. 38, no. 11, pp. 39–41, Nov. 1995.
[55]
J. Carreira and A. Zisserman, “Quo Vadis, action recognition? A new model and the kinetics dataset,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4724–4733.
[56]
Z. Yu, J. Yu, J. Fan, and D. Tao, “Multi-modal factorized bilinear pooling with co-attention learning for visual question answering,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1839–1848.
[57]
W. Luo, W. Liu, and S. Gao, “A revisit of sparse coding based anomaly detection in stacked RNN framework,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 341–349.
[58]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[59]
Y. Pu and X. Wu, “Locality-aware attention network with discriminative dynamics learning for weakly supervised anomaly detection,” in Proc. IEEE Int. Conf. Multimedia Expo. (ICME), Jul. 2022, pp. 1–6.
[60]
J. Zhang, L. Qing, and J. Miao, “Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2019, pp. 4030–4034.
[61]
C. Zhang, G. Li, Q. Xu, X. Zhang, L. Su, and Q. Huang, “Weakly supervised anomaly detection in videos considering the openness of events,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 11, pp. 21687–21699, Nov. 2022.
[62]
S. AlMarri, M. Zaigham Zaheer, and K. Nandakumar, “A multi-head approach with shuffled segments for weakly-supervised video anomaly detection,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. Workshops (WACVW), Jan. 2024, pp. 132–142.
[63]
B. Schölkopf, R. C. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt, “Support vector method for novelty detection,” in Proc. Adv. Neural Inf. Process. Syst., vol. 12, 1999, pp. 582–588.
[64]
S. Chang, Y. Li, S. Shen, J. Feng, and Z. Zhou, “Contrastive attention for video anomaly detection,” IEEE Trans. Multimedia, vol. 24, pp. 4067–4076, 2022.
[65]
Z. Liu, Y. Nie, C. Long, Q. Zhang, and G. Li, “A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 13568–13577.
[66]
M. Z. Zaheer, A. Mahmood, M. Astrid, and S.-I. Lee, “CLAWS: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020, pp. 358–376.
[67]
B. Wan, Y. Fang, X. Xia, and J. Mei, “Weakly supervised video anomaly detection via center-guided discriminative learning,” in Proc. IEEE Int. Conf. Multimedia Expo. (ICME), Jul. 2020, pp. 1–6.

Index Terms

  1. Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Please enable JavaScript to view thecomments powered by Disqus.

              Information & Contributors

              Information

              Published In

              cover image IEEE Transactions on Image Processing
              IEEE Transactions on Image Processing  Volume 33, Issue
              2024
              5933 pages

              Publisher

              IEEE Press

              Publication History

              Published: 05 September 2024

              Qualifiers

              • Research-article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 11 Dec 2024

              Other Metrics

              Citations

              View Options

              View options

              Login options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media