[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Learning dual updatable memory modules for video anomaly detection: Learning dual updatable memory...

Published: 05 December 2024 Publication History

Abstract

We propose a novel video anomaly detection method that leverages two updatable memory modules to learn and update prototypical patterns of normal and abnormal data within an autoencoder (AE) framework. To enhance the robustness of the model, we employ a pseudo anomaly synthesizer to generate synthetic anomalies from normal data, and train the AE to minimize the reconstruction loss on pseudo anomalies while maximizing it on normal data. The memory modules are optimized using a feature compactness loss and a separateness loss to refine the representation of details, and skip connections are incorporated to prevent the recording of only the most prototypical patterns. Additionally, a memory loss is proposed to enhance the distinction between the two memory modules, thereby enabling effective anomaly detection. Experimental results demonstrate the efficacy of our approach, underscoring the importance of the two updatable memory modules in achieving state-of-the-art performance in video anomaly detection. Our code is available at https://github.com/SVIL2024/Memup.git.

References

[1]
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR), 1–58 (2009)
[2]
Chang, Y., Tu, Z., Xie, W., Yuan, J.: Clustering driven deep autoencoder for video anomaly detection. In: Proceedings of the European Conference on Computer Vision, pp. 329–345 (2020)
[3]
Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018)
[4]
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2720–2727 (2013)
[5]
Luo, W., Liu, W., Gao, S.: A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 341–349 (2017)
[6]
Abati, D., Porrello, A., Calderara, S., Cucchiara, R.: Latent space autoregression for novelty detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 481–490 (2019)
[7]
Kumar, N., Segvic, S., Eslami, A., Gumhold, S.: Normalizing flow based feature synthesis for outlier-aware object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5156–5165 (2023)
[8]
Islam A, Long C, and Radke R A hybrid attention mechanism for weakly-supervised temporal action localization Proc. AAAI Conf. Artif. Intell. 2021 35 1637-1645
[9]
Zhang, J., Lin, X., Zhang, W., Wang, K., Tan, X., Han, J., Ding, E., Wang, J., Li, G.: Semi-detr: Semi-supervised object detection with detection transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23809–23818 (2023)
[10]
Ma, J., Niu, Y., Xu, J., Huang, S., Han, G., Chang, S.-F.: Digeo: Discriminative geometry-aware learning for generalized few-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3208–3218 (2023)
[11]
Lee, Y.-L., Tsai, Y.-H., Chiu, W.-C., Lee, C.-Y.: Multimodal prompting with missing modalities for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14943–14952 (2023)
[12]
Wang, Y., Shi, B., Zhang, X., Li, J., Liu, Y., Dai, W., Li, C., Xiong, H., Tian, Q.: Adapting shortcut with normalizing flow: An efficient tuning framework for visual recognition. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15965–15974 (2023)
[13]
Jin, Y., Li, M., Lu, Y., Cheung, Y.-m., Wang, H.: Long-tailed visual recognition via self-heterogeneous integration with knowledge excavation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23695–23704 (2023)
[14]
Li, J., Meng, Z., Shi, D., Song, R., Diao, X., Wang, J., Xu, H.: Fcc: Feature clusters compression for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24080–24089 (2023)
[15]
Gu, J., Hu, C., Zhang, T., Chen, X., Wang, Y., Wang, Y., Zhao, H.: Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5496–5506 (2023)
[16]
Mao, W., Xu, C., Zhu, Q., Chen, S., Wang, Y.: Leapfrog diffusion model for stochastic trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5517–5526 (2023)
[17]
Sun, J., Li, Y., Chai, L., Lu, C.: Stimulus verification is a universal and effective sampler in multi-modal human trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22014–22023 (2023)
[18]
Mo X, Huang Z, Xing Y, and Lv C Multi-agent trajectory prediction with heterogeneous edge-enhanced graph attention network IEEE Trans. Intell. Transp. Syst. 2022 23 7 9554-9567
[19]
Dessi, R., Bevilacqua, M., Gualdoni, E., Rakotonirina, N.C., Franzon, F., Baroni, M.: Cross-domain image captioning with discriminative finetuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6935–6944 (2023)
[20]
Ramos, R., Martins, B., Elliott, D., Kementchedjhieva, Y.: Smallcap: lightweight image captioning prompted with retrieval augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2840–2849 (2023)
[21]
Zeng, Z., Zhang, H., Lu, R., Wang, D., Chen, B., Wang, Z.: Conzic: Controllable zero-shot image captioning by sampling-based polishing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23465–23476 (2023)
[22]
Liu, Z., Nie, Y., Long, C., Zhang, Q., Li, G.: A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13588–13597 (2021)
[23]
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14372–14381 (2020)
[24]
Yu, G., Wang, S., Cai, Z., Zhu, E., Xu, C., Yin, J., Kloft, M.: Cloze test helps: effective video anomaly detection via learning to complete video events. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 583–591 (2020)
[25]
Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019)
[26]
Fan, Y., Wen, G., Li, D., Qiu, S., Levine, M.D., Xiao, F.: Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comput. Vis. Image Understand., 102920 (2020)
[27]
Astrid, M., Zaheer, M.Z., Lee, S.-I.: Synthetic temporal anomaly guided end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 207–214 (2021)
[28]
Munawar, A., Vinayavekhin, P., De Magistris, G.: Limiting the reconstruction capability of generative neural network using negative learning. In: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2017). IEEE
[29]
Zaheer, M.Z., Lee, J.-H., Astrid, M., Lee, S.-I.: Old is gold: Redefining the adversarially learned one-class classifier training paradigm. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14183–14193 (2020)
[30]
Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D., Chen, H.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In: International Conference on Learning Representations (2018)
[31]
Hochreiter S and Schmidhuber J Long short-term memory Neural Comput. 1997 9 8 1735-1780
[32]
Jason Weston, S.C., Borde, A.: Memory networks. In: International Conference on Learning Representations (2015)
[33]
Sukhbaatar, S., szlam, a., Weston, J., Fergus, R.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
[34]
Georgescu MI, Ionescu RT, Khan FS, Popescu M, and Shah M A background-agnostic framework with adversarial training for abnormal event detection in video IEEE Trans. Pattern Anal. Mach. Intell. 2021 44 9 4505-4523
[35]
Huang, X., Zhao, C., Gao, C., Chen, L., Wu, Z.: Synthetic pseudo anomalies for unsupervised video anomaly detection: a simple yet efficient framework based on masked autoencoder. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–5 (2023)
[36]
Lappas, D., Argyriou, V., Makris, D.: Dynamic distinction learning: adaptive pseudo anomalies for video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3961–3970 (2024)
[37]
Ionescu, R.T., Khan, F.S., Georgescu, M.-I., Shao, L.: Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7842–7851 (2019)
[38]
Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
[39]
Piciarelli C, Micheloni C, and Foresti GL Trajectory-based anomalous event detection IEEE Trans. Circ. Syst. Video Technol. 2008 18 11 1544-1554
[40]
Zhang, T., Lu, H., Li, S.Z.: Learning semantic scene models by object classification and trajectory clustering. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1940–1947 (2009)
[41]
Kiran, B.R., Thomas, D.M., Parakkal, R.: An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging 36 (2018)
[42]
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–742 (2016)
[43]
Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., Hua, X.-S.: Spatio-temporal autoencoder for video anomaly detection. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1933–1941 (2017)
[44]
Luo, W., Liu, W., Gao, S.: Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo, pp. 439–444 (2017)
[45]
Lu, Y., Kumar, K.M., Nabavi, S., Wang, Y.: Future frame prediction using convolutional vrnn for anomaly detection. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–8 (2019)
[46]
Ristea, N.-C., Croitoru, F.-A., Ionescu, R.T., Popescu, M., Khan, F.S., Shah, M., : Self-distilled masked auto-encoders are efficient video anomaly detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15984–15995 (2024)
[47]
Munawar, A., Vinayavekhin, P., De Magistris, G.: Limiting the reconstruction capability of generative neural network using negative learning. In: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2017)
[48]
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4489–4497 (2015)
[49]
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
[50]
Maas, A.L., Hannun, A.Y., Ng, A.Y., : Rectifier nonlinearities improve neural network acoustic models. In: Proc. Icml, vol. 30, p. 3 (2013)
[51]
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1520–1528 (2015)
[52]
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
[53]
Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I.: Claws: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp. 358–376 (2020)
[54]
Mathieu, M., Couprie, C., Lecun, Y.: Deep multi-scale video prediction beyond mean square error. In: International Conference on Learning Representations (2016)
[55]
Li W, Mahadevan V, and Vasconcelos N Anomaly detection and localization in crowded scenes IEEE Trans. Pattern Anal. Mach. Intell. 2013 36 1 18-32
[56]
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013)
[57]
Tudor Ionescu, R., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2895–2903 (2017)
[58]
Liu, Z., Wu, X.-M., Zheng, D., Lin, K.-Y., Zheng, W.-S.: Generating anomalies for video anomaly detection with prompt-based feature mapping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24500–24510 (2023)
[59]
Habeb MH, Salama M, and Elrefaei LA Enhancing video anomaly detection using a transformer spatiotemporal attention unsupervised framework for large datasets Algorithms 2024 17 7 286

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Multimedia Systems
Multimedia Systems  Volume 31, Issue 1
Feb 2025
1408 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 05 December 2024
Accepted: 22 November 2024
Received: 11 June 2024

Author Tags

  1. Video anomaly detection
  2. Pseudo anomaly
  3. Memory modules
  4. Updating strategy

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Science and Technology Research Project of the Department of Education of Liaoning Province
  • Social Science Planning Fund of Liaoning Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media