Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention
<p>Extracted dataset format.</p> "> Figure 2
<p>Description of the columns in <a href="#sensors-23-05811-f001" class="html-fig">Figure 1</a>.</p> "> Figure 3
<p>Proposed model pipeline.</p> "> Figure 4
<p>Capturing bounding boxes with YOLOv5 with Deep Sort.</p> "> Figure 5
<p>Inception module proposed in [<a href="#B28-sensors-23-05811" class="html-bibr">28</a>].</p> "> Figure 6
<p>Inception network proposed in [<a href="#B28-sensors-23-05811" class="html-bibr">28</a>].</p> "> Figure 7
<p>XceptionTime module proposed in [<a href="#B29-sensors-23-05811" class="html-bibr">29</a>].</p> "> Figure 8
<p>Overall XceptionTime architecture proposed in [<a href="#B29-sensors-23-05811" class="html-bibr">29</a>].</p> "> Figure 9
<p>XCM architecture proposed in [<a href="#B30-sensors-23-05811" class="html-bibr">30</a>].</p> "> Figure 10
<p>Confusion matrix for the best models.</p> "> Figure 11
<p>F1 score distributions across 10-fold cross validation for each model.</p> "> Figure 12
<p>RTFM confusion matrix.</p> "> Figure 13
<p>RTFM misclassifications that the proposed method classified correctly.</p> ">
Abstract
:1. Introduction
- How effective is using a sequence of video frames depicting a person’s activity and behavior in increasing the likelihood of detecting suspicious behavior?
- How can time-series deep learning classification models be trained to track and learn sequences of individual actions and movements to improve detection performance?
- How does the proposed method compare to the I3D preprocessing method and the state-of-the-art Robust Temporal Feature Magnitude (RTFM) deep learning anomaly detection method in terms of detection performance on shoplifting incidents?
2. Related Work
3. Proposed Method
4. Data
4.1. Datasets
4.2. Dataset Preprocessing
5. Experimental Setup
5.1. Time-Series Deep-Learning Classification Models
5.1.1. InceptionTime
5.1.2. XceptionTime
5.1.3. MiniRocket
5.1.4. Explainable Convolutional Network (XCM)
5.2. Baseline Comparison: Robust Temporal Feature Magnitude (RTFM)
5.3. Evaluation Metrics
6. Results and Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kirichenko, L.; Radivilova, T.; Sydorenko, B.; Yakovlev, S. Detection of Shoplifting on Video Using a Hybrid Network. Computation 2022, 10, 199. [Google Scholar] [CrossRef]
- Gandapur, M.Q. E2E-VSDL: End-to-end video surveillance-based deep learning model to detect and prevent criminal activities. Image Vis. Comput. 2022, 123, 104467. [Google Scholar] [CrossRef]
- Qin, Z.; Liu, H.; Song, B.; Alazab, M.; Kumar, P.M. Detecting and preventing criminal activities in shopping malls using massive video surveillance based on deep learning models. Ann. Oper. Res. 2021, 1–8. [Google Scholar] [CrossRef]
- Wu, Y. The impact of criminal psychology trend prediction based on deep learning algorithm and three-dimensional convolutional neural network. J. Ambient. Intell. Humaniz. Comput. 2021, 1–2. [Google Scholar] [CrossRef]
- Wu, P.; Liu, J.; Shi, Y.; Sun, Y.; Shao, F.; Wu, Z.; Yang, Z. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 322–339. [Google Scholar]
- Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W. CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimed. Tools Appl. 2021, 80, 16979–16995. [Google Scholar] [CrossRef]
- Lin, W.; Liu, H.; Liu, S.; Li, Y.; Qian, R.; Wang, T.; Xu, N.; Xiong, H.; Qi, G.J.; Sebe, N. Human in events: A large-scale benchmark for human-centric video analysis in complex events. arXiv 2020, arXiv:2005.04490. [Google Scholar]
- Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 589–597. [Google Scholar]
- Liu, W.; Luo, W.; Lian, D.; Gao, S. Future frame prediction for anomaly detection–A new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6536–6545. [Google Scholar]
- Chen, C.; Xie, Y.; Lin, S.; Yao, A.; Jiang, G.; Zhang, W.; Qu, Y.; Qiao, R.; Ren, B.; Ma, L. Comprehensive Regularization in a Bi-directional Predictive Network for Video Anomaly Detection. In Proceedings of the American Association for Artificial Intelligence 2022, Osaka, Japan, 17–19 December 2022; pp. 1–9. [Google Scholar]
- Yu, J.; Lee, Y.; Yow, K.C.; Jeon, M.; Pedrycz, W. Abnormal event detection and localization via adversarial event prediction. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3572–3586. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Che, Z.; Jiang, B.; Xiao, N.; Yang, K.; Tang, J.; Ye, J.; Wang, J.; Qi, Q. Robust unsupervised video anomaly detection by multipath frame prediction. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2301–2312. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Nie, Y.; Long, C.; Zhang, Q.; Li, G. A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 13588–13597. [Google Scholar]
- Georgescu, M.I.; Barbalau, A.; Ionescu, R.T.; Khan, F.S.; Popescu, M.; Shah, M. Anomaly detection in video via self-supervised and multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Virtual, 19–25 June 2021; pp. 12742–12752. [Google Scholar]
- Cai, R.; Zhang, H.; Liu, W.; Gao, S.; Hao, Z. Appearance-motion memory consistency network for video anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 938–946. [Google Scholar]
- Liu, K.; Ma, H. Exploring background-bias for anomaly detection in surveillance videos. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1490–1499. [Google Scholar]
- Landi, F.; Snoek, C.G.M.; Cucchiara, R. Anomaly locality in video surveillance. arXiv 2019, arXiv:1901.10364. [Google Scholar]
- Lu, Y.; Yu, F.; Reddy, M.K.; Wang, Y. Few-shot scene-adaptive anomaly detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 125–141. [Google Scholar]
- Tian, Y.; Pang, G.; Chen, Y.; Singh, R.; Verjans, J.W.; Carneiro, G. Weakly-supervised video anomaly detection with contrastive learning of long and short-range temporal features. arXiv 2021, arXiv:2101.10030. [Google Scholar]
- Li, S.; Liu, F.; Jiao, L. Self-training multi-sequence learning with Transformer for weakly supervised video anomaly detection. In Proceedings of the AAAI, Virtual, 6–10 November 2022. [Google Scholar]
- Feng, J.C.; Hong, F.T.; Zheng, W.S. Mist: Multiple instance self-training framework for video anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Virtual, 19–25 June 2021; pp. 14009–14018. [Google Scholar]
- Wu, J.; Zhang, W.; Li, G.; Wu, W.; Tan, X.; Li, Y.; Ding, E.; Lin, L. Weakly-supervised spatio-temporal anomaly detection in surveillance video. arXiv 2021, arXiv:2108.03825. [Google Scholar]
- Lv, H.; Zhou, C.; Cui, Z.; Xu, C.; Li, Y.; Yang, J. Localizing anomalies from weakly-labeled videos. IEEE Trans. Image Process. 2021, 30, 4505–4515. [Google Scholar] [CrossRef] [PubMed]
- Wu, P.; Liu, J. Learning causal temporal relation and feature discrimination for anomaly detection. IEEE Trans. Image Process. 2021, 30, 3513–3527. [Google Scholar] [CrossRef] [PubMed]
- Ansari, M.A.; Singh, D.K. ESAR, An Expert Shoplifting Activity Recognition System. Cybern. Inf. Technol. 2022, 22, 190–200. [Google Scholar] [CrossRef]
- Wang, Y.; Yang, H. Multi-target Pedestrian Tracking Based on YOLOv5 and DeepSORT. In Proceedings of the 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), Dalian, China, 14–16 April 2022; pp. 508–514. [Google Scholar] [CrossRef]
- Sultani, W.; Chen, C.; Shah, M. Real-world anomaly detection in surveillance videos. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Fawaz, H.I.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. InceptionTime: Finding alexnet for Time Series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
- Rahimian, E.; Zabihi, S.; Atashzar, S.F.; Asif, A.; Mohammadi, A. XceptionTime: Independent Time-window Xceptiontime architecture for hand gesture classification. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020. [Google Scholar]
- Fauvel, K.; Lin, T.; Masson, V.; Fromont, É.; Termier, A. XCM: An explainable convolutional neural network for multivariate time series classification. Mathematics 2021, 9, 3137. [Google Scholar] [CrossRef]
- Dempster, A.; Schmidt, D.F.; Webb, G.I. MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2019, 128, 336–359. [Google Scholar] [CrossRef] [Green Version]
- Tian, Y.; Pang, G.; Chen, Y.; Singh, R.; Verjans, J.W.; Carneiro, G. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
Dataset Class | No. of Instances |
---|---|
Abnormal | 81 |
Normal | 462 |
No. of Instances | Dataset Class | Dataset |
---|---|---|
12 | Abnormal | Testing |
69 | Normal | Testing |
69 | Abnormal | Training |
393 | Normal | Training |
No. of Instances | Dataset Class | Dataset |
---|---|---|
10 | Abnormal | Testing |
40 | Normal | Testing |
40 | Abnormal | Training |
228 | Normal | Training |
Name | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|
Inception Time | 0.64 | 0.77 | 0.56 | 0.77 |
Xception Time | 0.99 | 0.96 | 0.97 | 0.96 |
XCM | 0.99 | 0.96 | 0.97 | 0.96 |
MiniRocket | 0.43 | 0.5 | 0.46 | 0.50 |
Model Name | Precision | Recall | F1 Score | Median F1 Score |
---|---|---|---|---|
InceptionTime | 0.86 ± 0.17 | 0.85 ± 0.14 | 0.84 ± 0.09 | 0.85 |
XceptionTime | 0.96 ± 0.04 | 0.92 ± 0.06 | 0.87 ± 0.13 | 0.92 |
MiniRocket | 0.91 ± 0.08 | 0.88 ± 0.09 | 0.89 ± 0.09 | 0.92 |
XCM | 0.63 ± 0.24 | 0.63 ± 0.19 | 0.6 ± 0.2 | 0.46 |
Group 1 | Group 2 | Mean Difference | p-Value | Lower | Upper | Reject |
---|---|---|---|---|---|---|
InceptionTime | MiniRocket | 0.0446 | 0.8783 | −0.1167 | 0.2059 | False |
InceptionTime | XCM | −0.2413 | 0.0015 | −0.4026 | −0.08 | True |
InceptionTime | XceptionTime | 0.0259 | 0.9725 | −0.1354 | 0.1872 | False |
MiniRocket | XCM | −0.2859 | 0.0002 | −0.4472 | −0.1246 | True |
MiniRocket | XceptionTime | −0.0187 | 0.9892 | −0.18 | 0.1426 | False |
XCM | XceptionTime | 0.2672 | 0.0004 | 0.1059 | 0.4284 | True |
Model Name | Feature Type | F1 Score |
---|---|---|
RTFM | I3D RGB | 0.89 |
Ours–XceptionTime | Custom | 0.92 |
Ours–MiniRocket | Custom | 0.92 |
Model Name | Preprocessing Time (s) | Total Inference Time (s) |
---|---|---|
RTFM | 13.7 | 13.9 |
Ours–MiniRocket | 1.5 | 1.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nazir, A.; Mitra, R.; Sulieman, H.; Kamalov, F. Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention. Sensors 2023, 23, 5811. https://doi.org/10.3390/s23135811
Nazir A, Mitra R, Sulieman H, Kamalov F. Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention. Sensors. 2023; 23(13):5811. https://doi.org/10.3390/s23135811
Chicago/Turabian StyleNazir, Amril, Rohan Mitra, Hana Sulieman, and Firuz Kamalov. 2023. "Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention" Sensors 23, no. 13: 5811. https://doi.org/10.3390/s23135811
APA StyleNazir, A., Mitra, R., Sulieman, H., & Kamalov, F. (2023). Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention. Sensors, 23(13), 5811. https://doi.org/10.3390/s23135811