Abstract
In the field of computer vision, video saliency detection is a critical task that involves identifying and highlighting areas within video frames that are most likely to attract human attention. Despite its importance, this task poses significant challenges due to the dynamic nature of video content, which encompasses varying spatial and temporal features. Addressing these challenges, the Dynamic Attentive Integration for Spatial-Temporal Saliency (DAISTS), a novel methodology that significantly enhances the precision of video saliency detection is designed. DAISTS introduces a dual-path spatial-temporal feature hierarchy, effectively merging deep and shallow learning attributes to improve saliency map accuracy. Additionally, spatial and channel attention aided integration is designed, which adaptively refine feature fusion based on scene context and training data. Moreover, DAISTS incorporates a frame-based attention model, a breakthrough in temporal analysis that selectively prioritizes frames for improved saliency prediction. The methodology is further supported by a comprehensive loss function that integrates Relative Entropy, linear correlation coefficient, and normalized scan-way saliency metrics, ensuring a balanced and effective approach to training. This encapsulates the approach to advancing video saliency detection, highlighting DAISTS as a robust, efficient, and adaptable solution, to make a significant impact in the realm of computer vision.
Similar content being viewed by others
Data Availability
The dataset produced and scrutinized in this study are accessible from the corresponding author upon reasonable request.
References
Tliba M, Kerkouri MA, Ghariba B, Chetouani A, Coltekin A, Shehata MS, Bruno A. ‘SATSal: a multi-level self-attention based architecture for visual saliency prediction. ’ IEEE Access. 2022;10:20701–13.
Niu L, Aha L, Mattila J, Gotchev A, Ruiz E. A stereoscopic eyein-hand vision system for remote handling in ITER. Fusion Eng Des. Sep. 2019;146:pp1790–1795.
Nousias S, Arvanitis G, Lalos AS, Pavlidis G, Koulamas C, Kalogeras A, Moustakas K. ‘A saliency aware CNN-based 3D model simplification and compression framework for remote inspection of heritage sites. ’ IEEE Access. 2020;8:169982–70001.
Yao Q, Gong X. ‘Saliency guided self-attention network for weakly and semi-supervised semantic segmentation. ’ IEEE Access. 2020;8:14413–23.
Jones Y, Deligianni F, Dalton J. ‘‘Improving ECG classification interpretability using saliency maps,’’ in Proc. IEEE 20th Int. Conf. Bioinf. Bioeng. (BIBE), Oct. 2020, pp. 675–682.
Qamar M, Qamar S, Muneeb M, Bae S-H, Rahman A. Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation, in IEEE Access, vol. 11, pp. 15460–15470, 2023, https://doi.org/10.1109/ACCESS.2023.3244191
Prem Kumar M, Ravi Shankar H, Deepa KR, et al. Effective COVID-19 disease identification using correlation coefficient absolute feature selection and logistic boosting neural network algorithm. SN COMPUT SCI. 2024;5:662. https://doi.org/10.1007/s42979-024-02941-y.
Cao L, Guo D, Wang Q, Feng L, Shi C. Video Quality Assessment of Danmaku-based video saliency regions. IEEE Signal Process Lett. 2022;29:2213–7. https://doi.org/10.1109/LSP.2022.3215925.
Zhu S, Liu C, Xu Z. High-Definition Video Compression System Based on Perception Guidance of Salient Information of a Convolutional Neural Network and HEVC Compression Domain, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 1946–1959, July 2020, https://doi.org/10.1109/TCSVT.2019.2911396
Ravishankar H, Patil KK. Throughput optimized using evolutionary computing to guarantee QoS in IEEE 802.16 networks, 2017 International Conference On Smart Technologies For Smart Nation (SmartTechCon), Bengaluru, India, 2017, pp. 1602–1606, https://doi.org/10.1109/SmartTechCon.2017.8358635.
Li Y, Li S, Chen C, Hao A, Qin H, Plug- A. and-Play Scheme to Adapt Image Saliency Deep Model for Video Data, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2315–2327, June 2021, https://doi.org/10.1109/TCSVT.2020.3023080
Sun X, Wang M, Lin R, Sun Y, Shin Cheng S. Deep-Learned Perceptual Quality Control for Intelligent Video Communication, in IEEE Transactions on Consumer Electronics, vol. 68, no. 4, pp. 354–365, Nov. 2022, https://doi.org/10.1109/TCE.2022.3206114
Huchappa R, Patil KK. Evolutionary model to guarantee quality of service for tactical worldwide interoperability for microwave access networks. IAES Int J Artif Intell. 2022;11(2):687.
Wang Z, Zhou Z, Lu H, Hu Q, Jiang J. Video Saliency Prediction via Joint discrimination and local consistency. IEEE Trans Cybernetics. March 2022;52(3):1490–501. https://doi.org/10.1109/TCYB.2020.2989158.
Lin L, Zheng Y, Chen W, Lan C, Zhao T. IEEE Signal Process Lett. 2023;30:693–7. https://doi.org/10.1109/LSP.2023.3283541. Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment,.
Chen C, Song M, Song W, Guo L, Jian M. A Comprehensive Survey on Video Saliency Detection With Auditory Information: The Audio-Visual Consistency Perceptual is the Key! in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 2, pp. 457–477, Feb. 2023, https://doi.org/10.1109/TCSVT.2022.3203421
Chen C, Wang G, Peng C, Zhang X, Qin H. Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans Image Process. 2020;29:1090–100. https://doi.org/10.1109/TIP.2019.2934350.
Kumar M, Ravishankar H, Deepa KR, et al. Early diagnosis of COVID-19 Disease by ChestNet Convolutional Neural Network from chest xray images. SN COMPUT SCI. 2024;5:696. https://doi.org/10.1007/s42979-024-02998-9.
Min X, Zhai G, Zhou J, Zhang X-P, Yang X, Guan X. A Multimodal Saliency Model for videos with High Audio-Visual Correspondence. IEEE Trans Image Process. 2020;29:3805–19. https://doi.org/10.1109/TIP.2020.2966082.
Vu PV, Chandler DM. ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. Proc SPIE. 2014;23:Art013016.
H R, R DK, Hosur SB, MB, P AK, E NV. Comparative analysis and QoS enhancement through Novel Feedback Architecture. 2023 Int Conf Data Sci Netw Secur (ICDSNS). 2023;Tiptur(India):1–6. https://doi.org/10.1109/ICDSNS58469.2023.10244875.
Bajˇcinovci V, Vranješ M, Babi´c D, Kovaˇcevi´c B. Subjective and objective quality assessment of MPEG-2, H.264 and H.265 videos, in Proc. IEEE Int. Symp. ELMAR, 2017, pp. 73–77.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
Sun W, Liao Q, Xue J-H, Zhou F. SPSIM: A. superpixel-based similarity index for full-reference image quality assessment, IEEE Trans. Image Process., vol. 27, no. 9, pp. 4232–4244, Sep. 2018.
Soundararajan R, Bovik AC. Video quality assessment by reduced reference spatio-temporal entropic differencing, IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, pp. 684–694, Apr. 2013.
Bampis CG, Gupta P, Soundararajan R, Bovik AC. SpEED-QA: Spatial efficient entropic differencing for image and video quality, IEEE Signal Process. Lett., vol. 24, no. 9, pp. 1333–1337, Sep. 2017.
Mittal A, Saad MA, Bovik AC. A completely blind video integrity oracle. IEEE Trans Image Process. Jan. 2016;25(1):289–300.
Korhonen J. Two-level approach for no-reference consumer video quality assessment, IEEE Trans. Image Process., vol. 28, no. 12, pp. 5923–5938, Dec. 2019.
Lin L, Yang J, Wang Z, Zhou L, Chen W, Xu Y. Compressed video quality index based on saliency-aware artifact detection. Sensors, 21, 19, 2021, Art. 6429.
Mittal A, Moorthy AK, Bovik AC. No-reference image quality assessment in the spatial domain, IEEE Trans. Image Process., vol. 21, no. 12, pp. 4695–4708, Dec. 2012.
Ravi SH, Patil KK. Delay aware downlink resource allocation scheme for future generation tactical wireless networks. IAES Int J Artif Intell. 2021;10(4):1025.
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 586–595.
Acknowledgements
The authors acknowledged the REVA University, Bengaluru, Karnataka, India for supporting the research work by providing the facilities.
Funding
No Funding Involved in the Research work.
Author information
Authors and Affiliations
Contributions
The research resulted from a collective effort, with all authors contributing collaboratively to its accomplishment.
Corresponding author
Ethics declarations
Conflict of Interest
Authors have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ravishankar, H., AnithaKumari, R., Sarvamangala, D. et al. Video Compression through Advanced Video Saliency Aware Spatial-Temporal Integration and Attention Mechanisms. SN COMPUT. SCI. 5, 926 (2024). https://doi.org/10.1007/s42979-024-03279-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-03279-1