Abstract
Background estimation is a fundamental step in many high-level vision applications, such as tracking and surveillance. Existing background estimation techniques suffer from performance degradation in the presence of challenges such as dynamic backgrounds, photometric variations, camera jitters, and shadows. To handle these challenges for the purpose of accurate background estimation, we propose a unified method based on Generative Adversarial Network (GAN) and image inpainting. The proposed method is based on a context prediction network, which is an unsupervised visual feature learning hybrid GAN model. Context prediction is followed by a semantic inpainting network for texture enhancement. We also propose a solution for arbitrary region inpainting using the center region inpainting method and Poisson blending technique. The proposed algorithm is compared with the existing state-of-the-art methods for background estimation and foreground segmentation and outperforms the compared methods by a significant margin.
Similar content being viewed by others
References
Afifi, M., Hussain, K.F.: Mpb: a modified poisson blending technique. Comput. Vis. Media 1(4), 331–341 (2015)
Bengio, Y., et al.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Bouwmans, T., Zahzah, E.H.: Robust pca via principal component pursuit: a review for a comparative evaluation in video surveillance. Comput. Vis. Image Underst. 122, 22–34 (2014)
Bouwmans, T., Maddalena, L., Petrosino, A.: Scene background initialization: a taxonomy. Pattern Recognit. Lett. 96, 3–11 (2017)
Bouwmans, T., Javed, S., Zhang, H., Lin, Z., Otazo, R.: On the applications of robust pca in image and video processing. Proc. IEEE 106(8), 1427–1457 (2018)
Braham, M., Van Droogenbroeck, M.: Deep background subtraction with scene-specific convolutional neural networks. In: 2016 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 1–4. IEEE (2016)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)
Cao, X., Yang, L., Guo, X.: Total variation regularized rpca for irregularly moving object detection under dynamic background. IEEE Trans. Cybern. 46(4), 1014–1027 (2016)
Chen, M., Wei, X., Yang, Q., Li, Q., Wang, G., Yang, M.H.: Spatiotemporal GMM for background subtraction with superpixel hierarchy. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1518–1525 (2017)
Colombari, A., Cristani, M., Murino, V., Fusiello, A.: Exemplar-based background model initialization. In: Proceedings of the third ACM International Workshop on Video Surveillance & Sensor Networks, pp. 29–36. ACM (2005)
Dong, X., Shen, J., Yu, D., Wang, W., Liu, J., Huang, H.: Occlusion-aware real-time object tracking. IEEE Trans. Multimed. 19(4), 763–771 (2017)
Dong, X., Shen, J., Wang, W., Liu, Y., Shao, L., Porikli, F.: Hyperparameter optimization for tracking with continuous deep q-learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Elgammal, A., Harwood, D., Davis, L.: Non-parametric model for background subtraction. In: European Conference on Computer Vision, pp. 751–767. Springer, Berlin (2000)
Erichson, N.B., Donovan, C.: Randomized low-rank dynamic mode decomposition for motion detection. Comput. Vis. Image Underst. 146, 40–50 (2016)
Fu, H., Cao, X., Tu, Z.: Cluster-based co-saliency detection. IEEE Trans. Image Process. 22(10), 3766–3778 (2013)
Gao, Z., Cheong, L.F., Wang, Y.X.: Block-sparse RPCA for salient motion detection. IEEE T-PAMI 36(10), 1975–1987 (2014)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Guo, X., Wang, X., Yang, L., Cao, X., Ma, Y.: Robust foreground detection using smoothness and arbitrariness constraints. In: European Conference on Computer Vision, pp 535–550. Springer, Berlin (2014)
Haines, T.S., Xiang, T.: Background subtraction with dirichletprocess mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 670–683 (2014)
Han, J., Cheng, G., Li, Z., Zhang, D.: A unified metric learning-based framework for co-saliency detection. In: IEEE Transactions on Circuits and Systems for Video Technology (2017)
Han, J., Quan, R., Zhang, D., Nie, F.: Robust object co-segmentation using background prior. IEEE Trans. Image Process. 27(4), 1639–1651 (2018a)
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Mag. 35(1), 84–100 (2018b)
He, J., Balzano, L., Szlam, A.: Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1568–1575. IEEE (2012)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Javed, S., Oh, S.H., Bouwmans, T., Jung, S.K.: Robust background subtraction to global illumination changes via multiple features-based online robust principal components analysis with markov random field. J. Electron. Imaging 24(4), 043011 (2015)
Javed, S., Jung, S.K., Mahmood, A., Bouwmans, T.: Motion-aware graph regularized RPCA for background modeling of complex scenes. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 120–125. IEEE (2016)
Javed, S., Mahmood, A., Bouwmans, T., Jung, S.K.: Spatiotemporal low-rank modeling for complex scene background initialization. In: IEEE Transactions on Circuits and Systems for Video Technology (2016)
Javed, S., Mahmood, A., Bouwmans, T., Jung, S.K.: Background-foreground modeling based on spatiotemporal sparse subspace clustering. IEEE Trans. Image Process. 26(12), 5840–5854 (2017a)
Javed, S., Mahmood, A., Bouwmans, T., Jung, S.K.: Background-Foreground Modeling Based on Spatiotemporal Sparse Subspace Clustering. IEEE T-IP (2017)
Javed, S., Mahmood, A., Al-Maadeed, S., Bouwmans, T., Jung, S.K.: Moving object detection in complex scene using spatiotemporal structured-sparse RPCA. IEEE Trans. Image Process. 28, 1007–1022 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Bartlett, P.L, Pereira, F.C.N, Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 1097–1105. Neural Information Processing Systems Conference (2012)
Kwok, T.H., Sheung, H., Wang, C.C.: Fast query for exemplar-based image completion. IEEE Trans. Image Process. 19(12), 3106–3115 (2010)
Li, X., Zhao, B., Lu, X.: A general framework for edited video and raw video summarization. IEEE Trans. Image Process. 26(8), 3652–3664 (2017)
Li, X., Zhao, B., Lu, X.: Key frame extraction in the summary space. IEEE Trans. Cybern. 48(6), 1923–1934 (2018)
Liang, D., Hashimoto, M., Iwata, K., Zhao, X., et al.: Co-occurrence probability-based pixel pairs background model for robust object detection in dynamic scenes. Pattern Recognit. 48(4), 1374–1390 (2015)
Lim, L.A., Keles, H.Y.: Foreground Segmentation Using a Triplet Convolutional Neural Network for Multiscale Feature Encoding (2018). arXiv preprint arXiv:1801.02225
Lim, L.A., Keles, H.Y.: Learning Multi-scale Features for Foreground Segmentation (2018). arXiv preprint arXiv:1808.01477
Liu, C., et al.: Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. PhD thesis, Massachusetts Institute of Technology (2009)
Lu, X.: A multiscale spatio-temporal background model for motion detection. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3268–3271. IEEE (2014)
Lu, X., Li, X.: Group sparse reconstruction for image segmentation. Neurocomputing 136, 41–48 (2014)
Maddalena, L., Petrosino, A.: Towards benchmarking scene background initialization. In: International Conference on Image Analysis and Processing, pp. 469–476. Springer, Berlin (2015)
Nakashima, Y., Babaguchi, N., Fan, J.: Automatic generation of privacy-protected videos using background estimation. In: 2011 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2011)
Ortego, D., SanMiguel, J.C., Martínez, J.M.: Rejection based multipath reconstruction for background estimation in video sequences with stationary objects. Comput. Vis. Image Underst. 147, 23–37 (2016)
Park, D., Byun, H.: A unified approach to background adaptation and initialization in public scenes. Pattern Recognit. 46(7), 1985–1997 (2013)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM Trans. Graph. 22(3), 313–318 (2003)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Shen, J., Jin, X., Zhou, C., Wang, C.C.: Gradient based image completion by solving the poisson equation. Comput. Graph. 31(1), 119–126 (2007)
Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., Shao, L.: Real-time superpixel segmentation by dbscan clustering algorithm. IEEE Trans. Image Process. 25(12), 5933–5942 (2016)
Shen, J., Peng, J., Dong, X., Shao, L., Porikli, F.: Higher order energies for image segmentation. IEEE Trans. Image Process. 26(10), 4911–4922 (2017a)
Shen, J., Yu, D., Deng, L., Dong, X.: Fast online tracking with detection refinement. IEEE Trans. Intell. Transp. Syst. 19, 162–173 (2017b)
Shen, J., Peng, J., Shao, L.: Submodular trajectories for better motion segmentation in videos. IEEE Trans. Image Process. 27(6), 2688–2700 (2018)
Shimada, A., Nagahara, H., Taniguchi, R.I.: Background modeling based on bidirectional analysis. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1979–1986. IEEE (2013)
Simonyan, K., Zisserman, A: Very Deep Convolutional Networks for Large-Scale Image Recognition (2014). arXiv preprint arXiv:1409.1556
Sobral, A., Zahzah, Eh: Matrix and tensor completion algorithms for background model initialization: a comparative evaluation. Pattern Recognit. Lett. 96, 22–33 (2017)
Sobral, A., Bouwmans, T., Zahzah, E.H.: Comparison of matrix completion algorithms for background initialization in videos. In: International Conference on Image Analysis and Processing, pp 510–518. Springer, Berlin (2015)
Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 246–252. IEEE (1999)
Tsai, C.C., Qian, X., Lin, Y.Y.: Segmentation guided local proposal fusion for co-saliency detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 523–528. IEEE (2017)
Varadarajan, S., Miller, P., Zhou, H.: Spatial mixture of gaussians for dynamic background modelling. In: 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 63–68. IEEE (2013)
Vaswani, N., Bouwmans, T., Javed, S., Narayanamurthy, P.: Robust PCA and Robust Subspace Tracking (2017). arXiv preprint arXiv:1711.09492
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001 (CVPR 2001), vol. 1, pp. I-I. IEEE (2001)
Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Wang, W., Shen, J., Sun, H., Shao, L.: Vicos2: video co-saliency guided co-segmentation. IEEE Trans. Circuits Syst. Video Technol. 28, 1727–1736 (2017)
Wang, W., Shen, J., Ling, H.: A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans. Pattern Anal. Mach. Intell. 1–16 (2018)
Wang, W., Shen, J., Porikli, F., Yang, R.: Semi-supervised video object segmentation with super-trajectories. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018c)
Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 1, 20–33 (2018d)
Wang, Y., Jodoin, P.M., Porikli, F., Konrad, J., Benezeth, Y., Ishwar, P.: Cdnet 2014: an expanded change detection benchmark dataset. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 393–400. IEEE (2014)
Wang, Y., Luo, Z., Jodoin, P.M.: Interactive deep learning method for segmenting moving objects. Pattern Recognit. Lett. 96, 66–75 (2017b)
Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in Neural Information Processing Systems, pp. 2080–2088. Curran Associates, Inc (2009)
Xu, J., Ithapu, V., Mukherjee, L., Rehg ,J,. Singh, V.: Gosus: Grassmannian online subspace updates with structured-sparsity. In: ICCV (2013)
Xu, J., Ithapu, V.K., Mukherjee, L., Rehg, J.M., Singh, V.: Gosus: Grassmannian online subspace updates with structured-sparsity. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3376–3383. IEEE (2013)
Xu, X, Huang, T.S.: A loopy belief propagation approach for robust background estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), pp. 1–7. IEEE (2008)
Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., Li, H.: High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis (2016). arXiv preprint arXiv:1611.09969
Ye, X., Yang, J., Sun, X., Li, K., Hou, C., Wang, Y.: Foreground-background separation from video clips via motion-assisted matrix restoration. IEEE Trans. Circuits Syst. Video Technol. 25(11), 1721–1734 (2015)
Zhang, D., Han, J., Li, C., Wang, J., Li, X.: Detection of co-salient objects by looking deep and wide. Int. J. Comput. Vis. 120(2), 215–232 (2016)
Zhang, D., Meng, D., Han, J.: Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 865–878 (2017)
Zhang, D., Fu, H., Han, J., Borji, A., Li, X.: A review of co-saliency detection algorithms: fundamentals, applications, and challenges. ACM Trans. Intell. Syst. Technol. 9(4), 38 (2018)
Zhang, T., Liu, S., Xu, C., Lu, H.: Mining semantic context information for intelligent video surveillance of traffic scenes. IEEE Trans. Ind. Inform. 9(1), 149–160 (2013)
Zhang, T., Liu, S., Ahuja, N., Yang, M.H., Ghanem, B.: Robust visual tracking via consistent low-rank sparse learning. Int. J. Comput. Vis. 111(2), 171–190 (2015a)
Zhang, Y., Li, X., Zhang, Z., Wu, F., Zhao, L.: Deep learning driven blockwise moving object detection with binary scene modeling. Neurocomputing 168, 454–463 (2015b)
Zhao, Q., Zhou, G., Zhang, L., Cichocki, A., Amari, S.I.: Bayesian robust tensor factorization for incomplete multiway data. IEEE Trans. Neural Netw. Learn. Syst. 27(4), 736–748 (2016)
Zhou, T., Tao, D.: Godec: randomized low-rank and sparse matrix decomposition in noisy case. In: ICML. Omnipress (2011)
Zhou, X., Yang, C., Yu, W.: Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE T-PAMI 35(3), 597–610 (2013)
Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004 (ICPR 2004), vol. 2, pp. 28–31. IEEE (2004)
Acknowledgements
This research was supported by Development project of leading technology for future vehicle of the business of Daegu metropolitan city (No. 20171105).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sultana, M., Mahmood, A., Javed, S. et al. Unsupervised deep context prediction for background estimation and foreground segmentation. Machine Vision and Applications 30, 375–395 (2019). https://doi.org/10.1007/s00138-018-0993-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-018-0993-0