[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Video anomaly detection with spatio-temporal dissociation

Published: 01 February 2022 Publication History

Highlights

We propose a novel autoencoder architecture to dissociate the spatio temporal representation and learn the regularity in both the spatial and motion feature spaces to detect anomaly in videos.
We design an efficient motion autoencoder, which takes consecutive video frames as input and RGB difference as output to imitate the movement of optical flow. The proposed method is much faster than the optical flow-based motion representation learning approach, where its average running time is 32fps.
We exploit a variance attention module to automatically assign an importance weight to the moving part of video clips, which is useful to improve the performance of the motion autoencoder.
To learn the normality in both the spatial and motion feature spaces, we concatenate these representations extracted from the two streams at the same spatial location, and optimize the two streams and the deep K-means cluster jointly with the early fusion strategy.
We fuse the spatio-temporal information with their distance from the deep K-means cluster in the pixel level to calculate the anomaly score. Compared with our prior frame level fusion scheme, experimental results show that the performance of the new architecture is improved.

Abstract

Anomaly detection in videos remains a challenging task due to the ambiguous definition of anomaly and the complexity of visual scenes from real video data. Different from the previous work which utilizes reconstruction or prediction as an auxiliary task to learn the temporal regularity, in this work, we explore a novel convolution autoencoder architecture that can dissociate the spatio-temporal representation to separately capture the spatial and the temporal information, since abnormal events are usually different from the normality in appearance and/or motion behavior. Specifically, the spatial autoencoder models the normality on the appearance feature space by learning to reconstruct the input of the first individual frame (FIF), while the temporal part takes the first four consecutive frames as the input and the RGB difference as the output to simulate the motion of optical flow in an efficient way. The abnormal events, which are irregular in appearance or in motion behavior, lead to a large reconstruction error. To improve detection performance on fast moving outliers, we exploit a variance-based attention module and insert it into the motion autoencoder to highlight large movement areas. In addition, we propose a deep K-means cluster strategy to force the spatial and the motion encoder to extract a compact representation. Extensive experiments on some publicly available datasets have demonstrated the effectiveness of our method which achieves the state-of-the-art performance. The code is publicly released at the link https://github.com/ChangYunPeng/VideoAnomalyDetection .

References

[1]
M. Hasan, J. Choi, J. Neumann, A.K. Roy-Chowdhury, L.S. Davis, Learning temporal regularity in video sequences, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 733–742.
[2]
W. Luo, W. Liu, D. Lian, J. Tang, L. Duan, X. Peng, S. Gao, Video anomaly detection with sparse coding inspired deep neural networks, IEEE Trans Pattern Anal Mach Intell (2019),.
[3]
W. Liu, W. Luo, D. Lian, S. Gao, Future frame prediction for anomaly detection–a new baseline, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6536–6545.
[4]
A. Markovitz, G. Sharir, I. Friedman, L. Zelnik-Manor, S. Avidan, Graph embedded pose clustering for anomaly detection, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10536–10544,.
[5]
N. Srivastava, E. Mansimov, R. Salakhudinov, Unsupervised learning of video representations using lstms, International conference on machine learning, 2015, pp. 843–852.
[6]
F. Tung, J.S. Zelek, D.A. Clausi, Goal-based trajectory analysis for unusual behaviour detection in intelligent surveillance, Image Vis Comput 29 (4) (2011) 230–240.
[7]
D. Xu, Y. Yan, E. Ricci, N. Sebe, Detecting anomalous events in videos by learning deep representations of appearance and motion, Comput. Vision Image Understanding 156 (2017) 117–127.
[8]
V. Mahadevan, W. Li, V. Bhalodia, N. Vasconcelos, Anomaly detection in crowded scenes, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, 2010, pp. 1975–1981.
[9]
Y. Zhang, H. Lu, L. Zhang, X. Ruan, Combining motion and appearance cues for anomaly detection, Pattern Recognit 51 (2016) 443–452,.
[10]
T.-N. Nguyen, J. Meunier, Anomaly detection in video sequence with appearance-motion correspondence, Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1273–1283.
[11]
M. Ravanbakhsh, M. Nabi, E. Sangineto, L. Marcenaro, C. Regazzoni, N. Sebe, Abnormal event detection in videos using generative adversarial nets, 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 1577–1581.
[12]
W. Luo, W. Liu, S. Gao, A revisit of sparse coding based anomaly detection in stacked rnn framework, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 341–349.
[13]
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, L. Van Gool, Temporal segment networks for action recognition in videos, IEEE Trans Pattern Anal Mach Intell (2018).
[14]
R.T. Ionescu, F.S. Khan, M.-I. Georgescu, L. Shao, Object-centric auto-encoders and dummy anomalies for abnormal event detection in video, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7842–7851.
[15]
D. Arthur, S. Vassilvitskii, K-means++: the advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, 2007.
[16]
Y. Chang, Z. Tu, W. Xie, J. Yuan, Clustering driven deep autoencoder for video anomaly detection, in: A. Vedaldi, H. Bischof, T. Brox, J.-M. Frahm (Eds.), Computer Vision–ECCV 2020, Springer International Publishing, Cham, 2020, pp. 329–345.
[17]
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.
[18]
C. Poultney, S. Chopra, Y.L. Cun, et al., Efficient learning of sparse representations with an energy-based model, Advances in Neural Information Processing Systems, 2007, pp. 1137–1144.
[19]
P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, pp. 1096–1103.
[20]
S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders: Explicit invariance during feature extraction, Proceedings of the 28th International Conference on International Conference on Machine Learning, Omnipress, 2011, pp. 833–840.
[21]
J. Kuen, K.M. Lim, C.P. Lee, Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle, Pattern Recognit 48 (10) (2015) 2964–2982,.
[22]
V. Ramanathan, K. Tang, G. Mori, L. Fei-Fei, Learning temporal embeddings for complex video analysis, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4471–4479.
[23]
P. Wu, J. Liu, M. Li, Y. Sun, F. Shen, Fast sparse coding networks for anomaly detection in videos, Pattern Recognit 107 (2020) 107515,.
[24]
R. Stewart, S. Ermon, Label-free supervision of neural networks with physics and domain knowledge., AAAI, 1, 2017, pp. 1–7.
[25]
K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems, 2014, pp. 568–576.
[26]
S. Das, R. Dai, M. Koperski, L. Minciullo, L. Garattoni, F. Bremond, G. Francesca, Toyota smarthome: Real-world activities of daily living, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2020.
[27]
R. Morais, V. Le, T. Tran, B. Saha, M. Mansour, S. Venkatesh, Learning regularity in skeleton trajectories for anomaly detection in videos, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11996–12004.
[28]
Z. Tu, W. Xie, D. Zhang, R. Poppe, R.C. Veltkamp, B. Li, J. Yuan, A survey of variational and CNN-based optical flow techniques, Signal Process. Image Commun. 72 (2019) 9–24.
[29]
G. Blanchard, G. Lee, C. Scott, Semi-supervised novelty detection, Journal of Machine Learning Research 11 (2010) 2973–3009.
[30]
L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S.A. Siddiqui, A. Binder, E. Müller, M. Kloft, Deep one-class classification, International Conference on Machine Learning, 2018, pp. 4393–4402.
[31]
M. Turkoz, S. Kim, Y. Son, M.K. Jeong, E.A. Elsayed, Generalized support vector data description for anomaly detection, Pattern Recognit 100 (2020) 107119,.
[32]
P. Perera, R. Nallapati, B. Xiang, Ocgan: One-class novelty detection using gans with constrained latent representations, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2898–2906.
[33]
S.M. Erfani, S. Rajasegarar, S. Karunasekera, C. Leckie, High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning, Pattern Recognit 58 (2016) 121–134,.
[34]
K. Ghasedi Dizaji, A. Herandi, C. Deng, W. Cai, H. Huang, Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5736–5745.
[35]
C. Hsu, C. Lin, Cnn-based joint clustering and representation learning with feature drift compensation for large-scale image data, IEEE Trans Multimedia 20 (2) (2017) 421–429.
[36]
J. Xie, R. Girshick, A. Farhadi, Unsupervised deep embedding for clustering analysis, International Conference on Machine Learning, 2016, pp. 478–487.
[37]
X. Guo, L. Gao, X. Liu, J. Yin, Improved deep embedded clustering with local structure preservation., IJCAI, 2017, pp. 1753–1759.
[38]
A. Markovitz, G. Sharir, I. Friedman, L. Zelnik-Manor, S. Avidan, Graph embedded pose clustering for anomaly detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10539–10547.
[39]
H. Park, J. Noh, B. Ham, Learning memory-guided normality for anomaly detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14372–14381.
[40]
M.M. Fard, T. Thonet, E. Gaussier, Deep k-means: jointly clustering with k-means and learning representations, arXiv: Learning (2018).
[41]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778,.
[42]
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, CoRR abs/1502.03167 (2015).
[43]
M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, Deconvolutional networks, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2528–2535,.
[44]
W. Shi, J. Caballero, F. Huszár, J. Totz, Z. Wang, Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[45]
A. Odena, V. Dumoulin, C. Olah, Deconvolution and checkerboard artifacts, Distill 1 (10) (2016).
[46]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems, 25, Curran Associates, Inc., 2012, pp. 1097–1105.
[47]
Z. Tu, W. Xie, Q. Qin, R. Poppe, R.C. Veltkamp, B. Li, J. Yuan, Multi-stream cnn: learning representations based on human-related regions for action recognition, Pattern Recognit 79 (2018) 32–43.
[48]
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: N. Navab, J. Hornegger, W.M. Wells, A.F. Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Springer International Publishing, Cham, 2015, pp. 234–241.
[49]
L. Ruff, R.A. Vandermeulen, D. Shoaib, A. Binder, M. Emmanuel, M. Kloft, Deep One-Class Classification(2018).
[50]
C. Lu, J. Shi, J. Jia, Abnormal event detection at 150 fps in matlab, Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2720–2727.
[51]
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
[52]
A. Zimek, E. Schubert, H.-P. Kriegel, A survey on unsupervised outlier detection in high-dimensional numerical data, Statistical Analysis and Data Mining: The ASA Data Science Journal 5 (5) (2012) 363–387.
[53]
J. Kim, K. Grauman, Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates, 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 2921–2928.
[54]
R. Hinami, T. Mei, S. Satoh, Joint detection and recounting of abnormal events by learning deep generic knowledge, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3619–3627.
[55]
W. Luo, W. Liu, S. Gao, Remembering history with convolutional lstm for anomaly detection, 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2017, pp. 439–444.
[56]
D. Abati, A. Porrello, S. Calderara, R. Cucchiara, Latent space autoregression for novelty detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 481–490.
[57]
D. Gong, L. Liu, V. Le, B. Saha, M.R. Mansour, S. Venkatesh, A.V. Den Hengel, Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection, arXiv: Computer Vision and Pattern Recognition (2019).
[58]
L.V.D. Maaten, Accelerating t-SNE using tree-based algorithms, JMLR.org, 2014.
[59]
C. Zach, T. Pock, H. Bischof, A duality based approach for realtime tv-l 1 optical flow, Joint Pattern Recognition Symposium, Springer, 2007, pp. 214–223.
[60]
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, T. Brox, Flownet 2.0: evolution of optical flow estimation with deep networks, IEEE International Conference on Computer Vision (ICCV) (2016),.
[61]
B. Zhao, L. Fei-Fei, E.P. Xing, Online detection of unusual events in videos via dynamic sparse coding, CVPR 2011, IEEE, 2011, pp. 3313–3320.

Cited By

View all
  • (2024)A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680592(5865-5873)Online publication date: 28-Oct-2024
  • (2024)EOGT: Video Anomaly Detection with Enhanced Object Information and Global Temporal DependencyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366218520:10(1-21)Online publication date: 12-Sep-2024
  • (2024)Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep ModelsACM Computing Surveys10.1145/364510156:7(1-38)Online publication date: 9-Apr-2024
  • Show More Cited By

Index Terms

  1. Video anomaly detection with spatio-temporal dissociation
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Pattern Recognition
        Pattern Recognition  Volume 122, Issue C
        Feb 2022
        1294 pages

        Publisher

        Elsevier Science Inc.

        United States

        Publication History

        Published: 01 February 2022

        Author Tags

        1. Video anomaly detection
        2. Spatio-temporal dissociation
        3. Simulate motion of optical flow
        4. Deep K-means cluster

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680592(5865-5873)Online publication date: 28-Oct-2024
        • (2024)EOGT: Video Anomaly Detection with Enhanced Object Information and Global Temporal DependencyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366218520:10(1-21)Online publication date: 12-Sep-2024
        • (2024)Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep ModelsACM Computing Surveys10.1145/364510156:7(1-38)Online publication date: 9-Apr-2024
        • (2024)Specific event detection for video surveillance using variational Bayesian inferenceNeurocomputing10.1016/j.neucom.2024.128291603:COnline publication date: 28-Oct-2024
        • (2024)Video Anomaly Detection Based on HSOE-FAST Modified Deep Neural NetworkSN Computer Science10.1007/s42979-024-02945-85:5Online publication date: 27-May-2024
        • (2024)LightYOLO-S: a lightweight algorithm for detecting small targetsJournal of Real-Time Image Processing10.1007/s11554-024-01485-x21:4Online publication date: 14-Jun-2024
        • (2024)CNet: content-dependent and -independent cross-attention network for anomaly detection in videosApplied Intelligence10.1007/s10489-023-05252-654:2(1980-1996)Online publication date: 1-Jan-2024
        • (2024)Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detectionMultimedia Systems10.1007/s00530-024-01320-030:3Online publication date: 8-Apr-2024
        • (2024)Video anomaly detection based on attention and efficient spatio-temporal feature extractionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03361-y40:10(6825-6841)Online publication date: 1-Oct-2024
        • (2024)AnomalyNet: a spatiotemporal motion-aware CNN approach for detecting anomalies in real-world autonomous surveillanceThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03210-440:11(7823-7844)Online publication date: 1-Nov-2024
        • Show More Cited By

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media