Abstract
The data-driven method has recently obtained great success on saliency prediction thanks to convolutional neural networks. In this paper, a novel end-to-end deep saliency prediction method named VGG-SSM is proposed. This model identifies three key components: feature extraction, self-attention module, and multi-level integration. An encoder-decoder architecture is used to extract the feature as a baseline. The multi-level integration constructs a symmetric expanding path that enables precise localization. Global information of deep layers is refined by a self-attention module which carefully coordinated with fine details in distant portions of a feature map. Each component surely has its contribution, and its efficiency is validated in the experiments. Additionally, In order to capture several quality factors, the loss function is given by a linear combination of some saliency evaluation metrics. Through comparison with other works, VGG-SSM gains a competitive performance on the public benchmarks, SALICON 2017 version. The PyTorch implementation is available at https://github.com/caoge5844/Saliency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Schauerte, B., Richarz, J., Fink, G.A.: Saliency-based identification and recognition of pointed-at objects. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4638–4643 (2010)
Frintrop, S., Kessel, M.: Most salient region tracking. In: 2009 IEEE International Conference on Robotics and Automation, pp. 1869–1874 (2009)
Takagi, S., Raskar, R., Gleicher, M.: Automatic image retargeting, vol. 154, no. 01, pp. 59–68 (2005)
Jiang, M., Huang, S., Duan, J., Zhao, Q.: SALICON: saliency in context, no. 06 (2015)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009)
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? arXiv e-print, arXiv:1604.03605 (2016)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Borji, A.: Boosting bottom-up and top-down visual features for saliency estimation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 438–445 (2012)
Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2798–2805 (2014)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)
Pan, J., et al.: SalGAN: Visual Saliency Prediction with Generative Adversarial Networks. arXiv e-prints, arXiv:1701.01081 (2017)
Reddy, N., Jain, S., Yarlagadda, P., Gandhi, V.: Tidying Deep Saliency Pre diction Architectures. arXiv e-prints, arXiv:2003.04942 (2020)
Parikh, A.P., Täckström O., Das, D., Uszkoreit, J.: A Decomposable Attention Model for Natural Language Inference. arXiv e-prints, arXiv:1606.01933 (2016)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-Attention Generative Adversarial Networks. arXiv e-prints, arXiv:1805.08318 (2018)
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv e-prints, arXiv:1409.1556 (2014)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Wang, X., Girshick, R., Gupta A., He, K.: Non-local Neural Networks. arXiv e-prints, arXiv:1711.07971 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. arXiv e-prints, arXiv:1505.04597 (2015)
Lin, T.-Y., et al.: Microsoft COCO: Common Objects in Context. arXiv e-print, arXiv:1405.0312 (2014)
Jia, S., Bruce, N.D.B.: EML-NET: An Expandable Multi-layer NETwork for Saliency Prediction. arXiv e-prints, arXiv:1805.01047 (2018)
Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual Encoder-Decoder Network for Visual Saliency Prediction. arXiv e-prints, arXiv:1902.06634 (2019)
Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Le Callet, P.: How is gaze influenced by image transformations? dataset and model. IEEE Trans. Image Process. 29, 2287–2300 (2020)
LSUN 2017. https://competitions.codalab.org/competitions/17136#results
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, G., Tang, Q., Jo, Kh. (2020). Aggregated Deep Saliency Prediction by Self-attention Network. In: Huang, DS., Premaratne, P. (eds) Intelligent Computing Methodologies. ICIC 2020. Lecture Notes in Computer Science(), vol 12465. Springer, Cham. https://doi.org/10.1007/978-3-030-60796-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-60796-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60795-1
Online ISBN: 978-3-030-60796-8
eBook Packages: Computer ScienceComputer Science (R0)