Abstract
This paper presents a novel compressed domain saliency estimation method based on analyzing block motion vectors and transform residuals extracted from the bitstream of H.264/AVC compressed videos. Block motion vectors are analyzed by modeling their orientation values utilizing Dual Cross Patterns, a feature descriptor that earlier found applications in face recognition to obtain the motion saliency map. The transform residuals are analyzed by utilizing lifting wavelet transform on the luminance component of the macro-blocks to obtain the spatial saliency map. The motion saliency map and the spatial saliency map are fused utilizing the Dempster–Shafer combination rule to generate the final saliency map. It is shown through our experiments that Dual Cross Patterns and lifting wavelet transform features fused via Dempster–Shafer rule are superior in predicting fixations as compared to the existing state-of-the-art saliency models.
Similar content being viewed by others
References
Agarwal G, Anbu A, Sinha A (2003) A fast algorithm to find the region-of-interest in the compressed mpeg domain. In: International conference on multimedia and expo. ICME ’03, vol 2, pp II–133
Bellitto G, Salanitri FP, Palazzo S, Rundo F, Giordano D, Spampinato C (2021) Video saliency detection with domain adaptation using hierarchical gradient reversal layers. Int J Comput Vis. 129:3216–3232
Borji A (2019) Saliency prediction in the deep learning era: successes and limitations. IEEE Trans Pattern Anal Mach Intell 1–1
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Ding C, Choi J, Tao D, Davis LS (2016) Multi-directional multi-level dual cross patterns for robust face recognition. IEEE Trans Pattern Anal Mach Intell 38(3):518–531
Fang Y, Lin W, Chen Z, Tsai C-M, Lin C-W (2014) A video saliency detection model in compressed domain. IEEE Trans Circuits Syst Video Technol 24(1):27–38
Fontani M, Bianchi T, De Rosa A, Piva A, Barni M (2013) A framework for decision fusion in image forensics based on Dempster–Shafer theory of evidence. IEEE Trans Inf Forensics Secur 8(4):593–607
Goferman S, Zelnik-Manor L, Tal A (2012) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926
Hadizadeh H, Bajic IV (2014) Saliency-aware video compression. IEEE Trans Image Process 23(1):19–33
Hadizadeh H, Enriquez MJ, Bajic IV (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903
Hossein Khatoonabadi S, Vasconcelos N, Bajic IV, Shan Y (2015) How many bits does it take for a stimulus to be salient? In: Proceedings to the IEEE conference on computer vision and pattern recognition, pp 5501–5510
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194
Khatoonabadi SH, Bajić IV, Shan Y (2015) Compressed-domain correlates of human fixations in dynamic scenes. Multimed Tools Appl. 74(22):10057–10075
Khatoonabadi SH, Bajić IV, Shan Y (2017) Compressed-domain visual saliency models: a comparative study. Multimed Tools Appl 76(24):26297–26328
Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817
Li Y, Lei X, Liang Y, Chen J (2018) Human fixations detection model in video-compressed-domain based on MVE and OBDL. In: Proceedings to advanced optical imaging technologies, vol 10816, p 108161O. International Society for Optics and Photonics
Li Y, Li Y (2017) A fast and efficient saliency detection model in video compressed-domain for human fixations prediction. Multimed Tools Appl 76(24):26273–26295
Li Y, Li S, Chen C, Hao A, Qin H (2021) A plug-and-play scheme to adapt image saliency deep model for video data. IEEE Trans Circuits Syst Video Technol 31(6):2315–2327
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H (2011) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367
Liu Y, Han J, Zhang Q, Shan C (2020) Deep salient object detection with contextual information guidance. IEEE Trans Image Process 29:360–374
Ma Y-F, Zhang H-J (2001) A new perceived motion based shot content representation. In: International conference on image processing, vol 3, pp 426–429
Ma Y-F, Zhang H-J (2002) A model of motion attention for video skimming. In: Proceedings international conference on image processing, vol 1, pp I–I
Ouerhani N, Hugli H (2005) Robot self-localization using visual attention. In: International symposium on computational intelligence in robotics and automation, pp 309–314
Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
Siagian C, Itti L (2007) Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 29(2):300–312
Sinha A, Agarwal G, Anbu A (2004) Region-of-interest based compressed domain video transcoding scheme. In: IEEE international conference on acoustics, speech, and signal processing, vol 3, pp iii–161
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7):560–576
Xu J, Guo X, Tu Q, Li C, Men A (2015) A novel video saliency map detection model in compressed domain. In: MILCOM 2015—2015 IEEE military communications conference, pp 157–162
Zhang Q, Huang N, Yao L, Zhang D, Shan C, Han J (2020) RGB-T salient object detection via fusing multi-level CNN features. IEEE Trans Image Process 29:3321–3335
Acknowledgements
This research work is supported by SERB, Government of India under Grant No ECR/2016/000112. We express our sincere gratitude to the Associate Editor and the anonymous reviewers whose insightful reviews and suggestions have helped us in improving the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sandula, P., Okade, M. A novel video saliency estimation method in the compressed domain. Pattern Anal Applic 25, 867–878 (2022). https://doi.org/10.1007/s10044-022-01081-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01081-4