Abstract
Background subtraction aims to extract moving objects from a video sequence which is a prerequisite for high-level surveillance video analysis. There are many challenges triggered by dynamic background, illumination changes, shadows, camera jittering, etc. in developing a robust background subtraction approach. In this paper, we propose an encoder-decoder type deep neural network to tackle the task of moving object detection from video sequences. The encoder is constructed based on VGG-16 Net and Resnet-50 to extract the hierarchical features from a raw image which are robust compared to the handcrafted features. A transposed convolutional neural network is employed in the decoder to map the features to a prediction result for foreground and background classification. We also design an adapted focal loss function to balance the loss contribution of positive and negative classes, as well as hard and easy samples in different frames according to their imbalance degrees. The model is evaluated on CDnet2014 and SBI2015 datasets using only a small number of training frames for various challenging scenes. The experimental results demonstrate that our method outperforms the state-of-the-art methods without any post-processing.
Similar content being viewed by others
Data Availability
The datasets analyzed during the current study are available from the corresponding author on reasonable request.
References
Garcia-Garcia B, Bouwmans T, Silva A (2020) Background subtraction in real applications: challenges, current models and future directions. Comput Sci Rev 35:1–42
Tezcan MO, Ishwar P, Konrad J (2020) BSUV-Net: a fully-convolutional neural network for background subtraction of unseen videos. In: Proceedings of IEEE winter conference on applications of computer vision. pp 2763–2772
Kalsotra R, Arora S (2022) Background subtraction for moving object detection: explorations of recent developments and challenges. Vis Comput 100:1–28
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. Proc IEEE Conf Comput Vis Pattern Recognit 2:246–252
Kim K, Chalidabhongse TH, Harwood D, Davis LS (2005) Real-time foreground-background segmentation using codebook model. Real-time Imaging 11(3):172–185
Barnich O, Van Droogenbroeck M (2011) ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans Image Process 20(6):1709–1724
Goyal K, Singhai J (2021) Recursive-learning-based moving object detection in video with dynamic environment. Multimed Tools Appl 80(3):1–12
Zeng Z, Jia J, Zhu Z, Yu D (2016) Adaptive maintenance scheme for codebook-based dynamic background subtraction. Comput Vis Image Underst 152:58–66
Yang S, Hao K, Ding Y, Liu J (2018) Improved visual background extractor with adaptive range change. Memetic Comput 10(1):53–61
Sajid H, Cheung SS (2017) Universal multimode background subtraction. IEEE Trans Image Process 26(7):3249–3260
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2021) An image is worth 16x16 words: transformers for image recognition at scale. ICLR
Sanches SRR, Oliveira C, Sementille AC, Freire V (2019) Challenging situations for background subtraction algorithms. Appl Intell 49(5):1771–1784
Braham M, Van Droogenbroeck M (2016) Deep background subtraction with scene-specific convolutional neural networks. In: Proceedings of international conference on systems, signals and image processing. pp 1–4
Lim LA, Keles HY (2018) Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recogn Lett 112:256–262
Vijayan M, Raguraman P, Mohan R (2021) A fully residual convolutional neural network for background subtraction. Pattern Recogn Lett 146:63–69
Zhou T, Wang S, Zhou Y, Yao Y, Li J, Shao L (2020) Motion-attentive transition for zero-shot video object segmentation. Proc AAAI Conf Artif Intell 34:13066–13073
Kaewtrakulpong P, Bowden R (2002) An improved adaptive background mixture model for real-time tracking with shadow detection. Video-Based Surveillance Systems. 11(1):125–144
Zivkovic Z, Der Heijden FV (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn Lett 27(7):773–780
Akilan T, Wu QMJ, Yang Y (2018) Fusion-based foreground enhancement for background subtraction using multivariate multi-model gaussian distribution. Inf Sci 430:414–431
Elgammal AM, Harwood D, Davis LS (2000) Non-parametric model for background subtraction. In: European conference on computer vision
Wu M, Peng X (2010) Spatio-temporal context for codebook-based dynamic background subtraction. Aeu-Int J Electron Commun 64(8):739–747
Liu R, Ruichek Y, Bagdouri ME (2019) Extended codebook with multispectral sequences for background subtraction. Sensors 19(3):703
Hofmann M, Tiefenbacher P, Rigoll G (2012) Background segmentation with feedback: the pixel-based adaptive segmenter. In: IEEE conference on computer vision and pattern recognition workshops. pp 38–43
Jiang S, Lu X (2018) WeSamBE: a weight-sample-based method for background subtraction. IEEE Trans Circuits Syst Video Technol 28(9):2105–2115
Ge W, Guo Z, Dong Y, Chen Y (2016) Dynamic background estimation and complementary learning for pixel-wise foreground/background segmentation. Pattern Recogn 59:112–125
Stcharles P, Bilodeau G, Bergevin R (2015) SuBSENSE: a universal change detection method with local adaptive sensitivity. IEEE Trans Image Process 24(1):359–373
Lee S, Lee G, Yoo J, Kwon S (2019) WisenetMD: motion detection using dynamic background region analysis. Symmetry 11(5):1–15
Wang Y, Luo Z, Jodoin P-M (2017) Interactive deep learning method for segmenting moving objects. Pattern Recogn Lett 96:66–75
Lim LA, Keles HY (2018) Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recogn Lett 112:256–262
Chen Y, Wang J, Zhu B, Tang M, Lu H (2019) Pixelwise deep sequence learning for moving object detection. IEEE Trans Circuits Syst Video Technol 29(9):2567–2579
Vijayan M, Mohan R, Raguraman P (2020) Contextual background modeling using deep convolutional neural network. Multimed Tools Appl 79(15):11083–11105
Chacon-Murguia MI, Guzman-Pando A (2023) Moving object detection in video sequences based on a two-frame temporal information CNN. Neural Process Lett 55(5):5425–5449
Gracewell J, John M (2020) Dynamic background modeling using deep learning autoencoder network. Multimed Tools Appl 79(7/8):4639–4659
Bakkay MC, Rashwan H, Salmane H, Khoudour L, Ruichek Y (2018) BSCGAN: deep background subtraction with conditional generative adversarial networks. In: 2018 25th IEEE International Conference on Image Processing (ICIP)
Zheng W, Wang K, Wang FY (2019) A novel background subtraction algorithm based on parallel vision and bayesian GANs. Neurocomputing 394(10):178–200
Babaee M, Dinh DT, Rigoll G (2018) A deep convolutional neural network for video sequence background subtraction. Pattern Recogn 76:635–649
Tezcan MO, Ishwar P, Konrad J (2021) BSUV-Net 2.0: spatio-temporal data augmentations for video-agnostic supervised background subtraction. IEEE Access 9:53849–53860
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML’15: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol 37. pp 448–456
Lin T-Y, Goyal P, Girshick R, He K, Dollr P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327
Tang H, Li Z, Peng Z, Tang J (2020) Blockmix: meta regularization and self-calibrated inference for metric-based meta-learning. In: Proceedings of the 28th ACM International Conference on Multimedia. pp 610–618
Zha Z, Tang H, Sun Y, Tang J (2023) Boosting few-shot fine-grained recognition with background suppression and foreground alignment. IEEE Trans Circuits Syst Video Technol 33(8):3947–3961
Wang Y, Jodoin P-M, Porikli F, Konrad J, Benezeth Y, Ishwar P (2014) CDnet 2014: an expanded change detection benchmark dataset. In: 2014 IEEE conference on computer vision and pattern recognition workshops. pp 393–400
Maddalena L, Petrosino A (2015) Towards benchmarking scene background initialization. In: International conference on image analysis and processing. pp 469–476
Vijayan M, Mohan R, Raguraman P (2020) Contextual background modeling using deep convolutional neural network. Multimed Tools Appl 79(6)
Rahmon G, Bunyak F, Seetharaman G, Palaniappan K (2021) Motion U-Net: multi-cue encoder-decoder network for motion segmentation. In: 2020 25th International Conference on Pattern Recognition (ICPR)
Yang Y, Xia T, Li D, Zhang Z, Xie G (2023) A multi-scale feature fusion spatial-channel attention model for background subtraction. Multimed Syst 1–15
An Y, Zhao X, Yu T, Gu H, Zhao C, Tang M, Wang J (2023) Zbs: zero-shot background subtraction via instance-level background modeling and foreground selection. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Bianco S, Ciocca G, Schettini R (2017) Combination of video change detection algorithms by genetic programming. IEEE Trans Evol Comput 21(6):914–928
St-Charles P-L, Bilodeau G-A, Bergevin R (2016) Universal background subtraction using word consensus models. IEEE Trans Image Process 25(10):4768–4781
Zhao C, Hu K, Basu A (2022) Universal background subtraction based on arithmetic distribution neural network. IEEE Trans Image Process 31:2934–2949
Acknowledgements
The authors would like to thank all the anonymous reviewers for their comments. This work was partly supported by the Natural Science Basic Research Program of Shaanxi under Grant 2022JM-378.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dai, Y., Yang, L. Background subtraction for video sequence using deep neural network. Multimed Tools Appl 83, 82281–82302 (2024). https://doi.org/10.1007/s11042-024-18843-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18843-3