Abstract
Visual smoke segmentation is widely used for fire detection, simulation, human evacuation and pollution monitoring. However, it is challenging to accurately separate smoke from a single image due to highly complicated appearance of smoke, such as blurriness, semi-transparency and varying shapes. We design several multi-resolution convolutional paths to generate multi-scale feature maps for obtaining scale invariance. To capture the feature representation of blurriness and semi-transparency features, we present a Multi-scale Residual Module (MRM) by significantly increasing the width and depth of residual paths. Combining learnable de-convolution and bilinear interpolation has the advantage of generating multi-scale features that are helpful to progressively produce up-sampled and middle predictions for training surveillances. By designing a monotonically decreasing function with respect to the independent pre-training error of each prediction, we implement an automatic regulation of relative importance for accelerating the training. Experimental results show that our method achieves the best results of 79.03%, 78.30% and 78.65% on the three virtual smoke data sets among state-of-the-art methods. In addition, another superiority of our method is the proposed multi-prediction loss making the training of our network to converge faster and more stably.
Similar content being viewed by others
References
Yuan F (2012) A double mapping framework for extraction of shape-invariant features based on multi-scale partitions with Adaboost for video smoke detection. Pattern Recognit 45(12):4326–4336
Yuan F, Shi J, Xia X, Fang Y, Fang Z, Mei T (2016) High-order local ternary patterns with locality preserving projection for smoke detection and image classification. Inf Sci 372:225–240
Yuan F, Shi J, Xia X, Huang Q, Li X (2019) Co-occurrence Matching of local binary patterns for improving visual adaption and its application to smoke recognition. IET Comput Vis 13(2):178–187
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, pp 418–434
Yuan Y, Chao M, Lo Y (2017) Automatic skin lesion segmentation using deep fully convolutional networks with Jaccard distance. IEEE Trans Med Imaging 36(9):1876–1886
Salehi SS, Erdogmus D, Gholipour A (2017) Auto-context convolutional neural network (auto-net) for brain extraction in magnetic resonance imaging. IEEE Trans Med Imaging 36(11):2319–2330
Lee H, Kwon H (2017) Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans Image Processing 26(10):4843–4855
Jiao L, Liang M, Chen H, Yang S, Liu H, Cao X (2017) Deep fully convolutional network-based spatial distribution prediction for hyperspectral image classification. IEEE Trans Geosci Remote Sens 55(10):5585–5599
Mou L, Ghamisi P, Zhu X (2018) Unsupervised spectral-spatial feature learning via deep residual conv-deconv network for hyperspectral image classification. IEEE Trans Geosci Remote Sens 56(1):391–406
Yuan F, Zhang L, Wan B, Xia X, Shi J (2019) Convolutional neural networks based on multi-scale additive merging layers for visual smoke recognition. Mach Vis Appl 30:345–358
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters — improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp 1743–1751
Long J, Shelhame E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, pp 3431–3440
Karen S, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representation (ICLR), San Diego
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp 770–778
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Munich, pp 234–241
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFS. In: Proceedings of the International Conference on Learning Representations (ICLR), San Diego
Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, pp 1475–1483
Islam MA, Naha S, Rochan M, Bruce N, Wang Y (2017) Label refinement network for coarse-to-fine semantic segmentation. arXiv preprint arXiv: 1703.00551. https://www.arxiv.org/abs/1703.00551v1
Caelles S, Maninis K, Ponttuset J, Lealtaixé L, Cremers D, Van Gool L (2017) One-Shot Video Object Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp 5320–5329
Jun TJ, Kweon J, Kim Y-H, Kim D (2020) T-Net: Nested encoder–decoder architecture for the main vessel segmentation in coronary angiography. Neural Netw 128:216–233
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Lin G, Milan A, Shen C, Reid I (2017) refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: proceedings of the ieee conference on computer vision and Pattern Recognition (CVPR), Honolulu, pp 5168–5177
Ding H, Jiang X, Shuai B, Qun Liu A, Wang G (2018) Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, pp 2393–2402
Jain SD, Xiong B, Grauma K (2017) FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp 2117–2126
Lee S, Park SJ, Hong K (2017) RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, pp 4990–4999
Durand T, Mordan T, Thome N, Cord M (2017) WILDCAT: Weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp 5957–5966
Luo P, Wang G, Lin L, Wang X (2017) Deep dual learning for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, pp 2737–2745
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp 1800–1807
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, pp 833–851
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp 5987–5995
Bilinski P, Prisacariu V (2018) Dense decoder shortcut connections for single-pass semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, pp 6596–6605
Srinivasu PN, Balas VE (2021) Self-learning network-based segmentation for real-time brain M.R. images through HARIS. Peer J Computer Science 7:e654
Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21(8):2852
Sagar A, Garg S, Nath P (2018) Nagrath, smoke detection in digital frames. Int Res J Eng Technol 5(4):3843–3846
Filonenko A, Hernandez DC, Jo K (2018) Fast smoke detection for video surveillance using CUDA. IEEE Trans Ind Inform 14(2):725–733
Dimitropoulos K, Barmpoutis P, Grammalidis NN (2017) Higher order linear dynamical systems for smoke detection in video surveillance applications. IEEE Trans Circuits Syst Video Technol 27(5):1143–1154
Zhao Y (2015) Candidate smoke region segmentation of fire video based on rough set theory. J Electr Comput Eng 11:1–8
Zhang N, Wang H, Hu Y (2015) Smoke image segmentation algorithm based on rough set and region growing. J Front Comput Sci Technol 11(8):1296–1299
Chen J, Zhao G, Salo M, Rahtu E, Pietikainen M (2013) Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Trans Image Process 22(1):326–339
Andrearczyk V, Whelan PF (2018) Convolutional neural network on three orthogonal planes for dynamic texture classification. Pattern Recognit 76:36–49
Jia Y, Lin G, Wang J (2016) Early video smoke segmentation algorithm based on saliency detection and Gaussian mixture model. Comput Eng 42(2):206–209
Hu Y, Wang H, Ma Z (2016) Adaptive smoke image segmentation algorithm based on improved Gaussiean mixture model. Journal of Computer-Aided Design & Computer Graphics 28(7):1138–1145
Lin Z, Liu H, Wotton M (2019) Kalman filter-based large-scale wildfire monitoring with a system of UAVs. IEEE Trans Ind Electron 66(1):606–615
Tian H, Li W, Ogunbona PO, Wang L (2018) Detection and separation of smoke from single image frames. IEEE Trans Image Processing 27(3):1164–1177
Kaabi R, Sayadi M, Bouchouicha M, Fnaiech F, Moreau E, Ginoux J (2018) Early smoke detection of forest wildfire video using deep belief network. In: Proceedings of the International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp 1–6
Li X, Chen Z, Wu Q, Liu C (2020) 3D parallel fully convolutional networks for real-time video wildfire smoke detection. IEEE Trans Circuits Syst Video Technol 30(1):89–103
Yuan F, Zhang L, Xia X, Wan B, Huang Q, Li X (2019) Deep smoke segmentation. Neurocomputing 357:248–260
Yuan F, Zhang L, Xia X, Huang Q, Li X (2020) Wave-shaped deep neural network for smoke density estimation. IEEE Trans Image Process 29:2301–2313
Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: Aggregating multi-level convolutional features for salient object detection. In: Proceedings of the International Conference on Computer Vision (ICCV), Venice, pp 202–211
Chen X, Liew JH, Xiong W, Chui C, Ong SH (2018) Focus, segment and erase: an efficient network for multi-label brain tumor segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, pp 674–689
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, pp 630–645
Hou Q, Cheng M, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp 5300–5309
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp 4159–4167
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, pp 7132–7141
Lee C, Xie S, Gallagher PW, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), San Diego, pp 562–570
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: Proceedings of the European Conference on Computer Vision (ECCV), Florence, pp 746–760
Everingham M, Gool LV, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The Cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, pp. 3213–3223.
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of the International Conference on Computational Statistics (COMPSTAT), Paris, pp 177–186
Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49
Yuan F, Li K, Wang C, Fang Z (2023) A lightweight network for smoke semantic segmentation. Pattern Recognit 137:109289:1-11
Wu T, Tang S, Zhang R, Cao J, Zhang Y (2021) CGNet: A light-weight context guided network for semantic segmentation. IEEE Trans Image Process 30:1169–1179
Yuan F, Dong Z, Zhang L, Xia X, Shi J (2022) Cubic-cross convolutional attention and count prior embedding for smoke segmentation. Pattern Recognit 131:1–10
Yuan F, Shi Y, Zhang L, Fang Y (2023) A cross-scale mixed attention network for smoke segmentation. Digit Signal Process 134:1–11
Yuan F (2011) Video-based smoke detection with histogram sequence of LBP and LBPV pyramids. Fire Safety J 46(3):132–139
Toreyin B, Dedeoglu Y, Gudukbay U, Cetin A (2006) Computer vision based method for real-time fire and flame detection. Pattern Recognit Lett 27(1):49–58
Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin PM (2017) Non-local deep features for salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Venice, pp 6593–6601
Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 21–26, Honolulu, pp 1475–1483
http://signal.ee.bilkent.edu.tr/VisiFire/Demo/SampleClips.html
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (62272308) and the Capacity Construction Project of Shanghai Local Colleges (23010504100).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yuan, F., Zhang, L. & Xia, X. Smoke semantic segmentation with multi-scale residual paths and weighted middle surveillances. Multimed Tools Appl 83, 47199–47224 (2024). https://doi.org/10.1007/s11042-023-17260-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17260-2