research-article

Video anomaly detection with spatio-temporal dissociation

Authors:

Yunpeng Chang,

Zhigang Tu,

Wei Xie,

Bin Luo,

Shifu Zhang,

Haigang Sui,

Junsong YuanAuthors Info & Claims

Volume 122, Issue C

https://doi.org/10.1016/j.patcog.2021.108213

Published: 01 February 2022 Publication History

Highlights

•

We propose a novel autoencoder architecture to dissociate the spatio temporal representation and learn the regularity in both the spatial and motion feature spaces to detect anomaly in videos.

•

We design an efficient motion autoencoder, which takes consecutive video frames as input and RGB difference as output to imitate the movement of optical flow. The proposed method is much faster than the optical flow-based motion representation learning approach, where its average running time is 32fps.

•

We exploit a variance attention module to automatically assign an importance weight to the moving part of video clips, which is useful to improve the performance of the motion autoencoder.

•

To learn the normality in both the spatial and motion feature spaces, we concatenate these representations extracted from the two streams at the same spatial location, and optimize the two streams and the deep K-means cluster jointly with the early fusion strategy.

•

We fuse the spatio-temporal information with their distance from the deep K-means cluster in the pixel level to calculate the anomaly score. Compared with our prior frame level fusion scheme, experimental results show that the performance of the new architecture is improved.

Abstract

Anomaly detection in videos remains a challenging task due to the ambiguous definition of anomaly and the complexity of visual scenes from real video data. Different from the previous work which utilizes reconstruction or prediction as an auxiliary task to learn the temporal regularity, in this work, we explore a novel convolution autoencoder architecture that can dissociate the spatio-temporal representation to separately capture the spatial and the temporal information, since abnormal events are usually different from the normality in appearance and/or motion behavior. Specifically, the spatial autoencoder models the normality on the appearance feature space by learning to reconstruct the input of the first individual frame (FIF), while the temporal part takes the first four consecutive frames as the input and the RGB difference as the output to simulate the motion of optical flow in an efficient way. The abnormal events, which are irregular in appearance or in motion behavior, lead to a large reconstruction error. To improve detection performance on fast moving outliers, we exploit a variance-based attention module and insert it into the motion autoencoder to highlight large movement areas. In addition, we propose a deep K-means cluster strategy to force the spatial and the motion encoder to extract a compact representation. Extensive experiments on some publicly available datasets have demonstrated the effectiveness of our method which achieves the state-of-the-art performance. The code is publicly released at the link https://github.com/ChangYunPeng/VideoAnomalyDetection .

References

[1]

M. Hasan, J. Choi, J. Neumann, A.K. Roy-Chowdhury, L.S. Davis, Learning temporal regularity in video sequences, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 733–742.

Highlights

Abstract

References

Cited By

Index Terms

Recommendations

Clustering Driven Deep Autoencoder for Video Anomaly Detection

Spatio-Temporal AutoEncoder for Video Anomaly Detection

SATJiP: Spatial and Augmented Temporal Jigsaw Puzzles for Video Anomaly Detection

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations