Abstract
Visual attention is one of the most significant characteristics for selecting and understanding the outside redundancy world. The human vision system cannot process all information simultaneously due to the visual information bottleneck. In order to reduce the redundant input of visual information, the human visual system mainly focuses on dominant parts of scenes. This is commonly known as visual saliency map prediction. This paper proposed a new psychophysical oriented saliency prediction architecture, which inspired by multi-channel model of visual cortex functioning in humans. The model consists of opponent color channels, wavelet transform, wavelet energy map, and contrast sensitivity function for extracting low-level image features and providing a maximum approximation to the low-level human visual system. The proposed model is evaluated using several datasets, including the MIT1003, MIT300, TORONTO, SID4VAM, and UCF Sports datasets. We also quantitatively and qualitatively compare the saliency prediction performance with that of other state-of-the-art models. Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos. Additionally, we suggested that Fourier and spectral-inspired saliency prediction models outperformed other state-of-the-art non-neural network and even deep neural network models on psychophysical synthetic images. In the meantime, we suggest that deep neural networks need specific architectures and goals to be able to predict salient performance on psychophysical synthetic images better and more reliably. Finally, the proposed model could be used as a computational model of primate low-level vision system and help us understand mechanism of primate low-level vision system. The project page can be available at: https://sinodanishspain.github.io/HVS_SaliencyModel/.
Similar content being viewed by others
Data and Code availability
The code performs main part of experiments described in this article are available at project page: https://sinodanishspain.github.io/HVS_SaliencyModel/.
Abbreviations
- HVS:
-
Human Vision System
- V1:
-
Primary Visual Cortex
- ICL:
-
Incremental Coding Length
- CNN:
-
Convolution Neural Network
- DNN:
-
Deep Neural Network
- WT:
-
Wavelet Transform
- IWT:
-
Inverse Wavelet Transform
- AUC:
-
Area Under Curve
- NSS:
-
Normalized Scanpath Saliency
- CC:
-
Pearson’s Correlation Coefficient
- SIM:
-
Similarity or Histogram Intersection
- IG:
-
Information Gain
- KL:
-
Kullback–Leibler Divergence
- CSFs:
-
Contrast Sensitivity Functions
- FT:
-
Fourier Transform
- DWT:
-
Discrete Wavelet Transform
- IDWT:
-
Inverse Discrete Wavelet Transform
- LGN:
-
Lateral Geniculate Nucleus
- GT:
-
Ground Truth
References
Treisman, A., Gelade, G.: A feature-integration theory of attention. Cogn. Psychol. 12, 97–136 (1980)
Barlow, H.: Sensory mechanisms, the reduction of redundancy, and intelligence. Mech. Thought Proc. 10, 535–539 (1959)
Wang, D., Kristjansson, A., Nakayama, K.: Efficient visual search without top-down or bottom-up guidance. Percept. Psychophys. 67, 239–53 (2005)
Itti, L.: Models of bottom-up and top-down visual attention, Ph.D. dissertation, Pasadena, California, Jan (2000)
Sun, Y., Fisher, R.: Object-based visual attention for computer vision. Artif. Intell. 146, 77–123 (2003)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998)
Achanta, R., Hemami, S., Estrada, F., Süsstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition (CVPR), Jun. (2009)
Bruce, N.D.B., Tsotsos, J.K.: Saliency based on information maximization, In: Proceedings of the 18th International Conference on Neural Information Processing Systems, ser. NIPS’05, Vancouver, British Columbia, Canada: MIT Press, pp. 155–162 (2005)
Li, J., Levine, M., An, X., Xu, X., He, H.: Visual saliency based on scale-space analysis in the frequency domain. IEEE Trans. Pattern Anal. Mach. Intell. 35, 996–1010 (2012)
Hou, X., Zhang, L.: Dynamic visual attention: searching for coding length increments. Adv. Neural Inf. Process. Syst 21, 681–688 (2008)
Murray, N., Vanrell, M., Otazu, X., Párraga, C.A.: Saliency estimation using a non-parametric low-level vision model, In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 433–440 (2011)
Zhang, J., Sclaroff, S.: Saliency detection: A boolean map approach, In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Dec. pp. 153–160 (2013)
Pinna, B., Reeves, A., Koenderink, J., Doorn, A., Deiana, K.: A new principle of figure-ground segregation: the accentuation. Vis. Res. 143, 9–25 (2017)
Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34, 194–201 (2011)
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection, In: Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2376–2383 (2010)
Hou, X., Zhang, L.: Saliency detection: a spectral residual approach, In: IEEE Conference in Computer Vision and Pattern Recognition, vol. 2007, (2007)
Guo, C., Ma, Q.: Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform, In: IEEE Conference in Computer Vision and Pattern Recognition, (2008)
Schauerte, B., Stiefelhagen, R.: Quaternion-based spectral saliency detection for eye fixation prediction, In: European Conference on Computer Vision (ECCV), Oct. pp. 116–129, ISBN: 978-3-642-33708-6 (2012)
Murray, N., Vanrell, M., Otazu, X., Párraga, C.A.: Low-level spatiochromatic grouping for saliency estimation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2810–6 (2013)
Otazu, X., Pàrraga, C.A., Vanrell, M.: Toward a unified chromatic induction model. J. Vis. 10, 5 (2010)
Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9, 15–27 (2009)
Louis, A., Maass, P., Rieder, A.: Wavelets: theory and Applications. Jan. ISBN: 978-0-471-96792-7 (1997)
Selvaraj, A., Shebiah, N.: Object recognition using wavelet based salient points. Open Signal Process. J. 2, 14–20 (2009)
Spratling, M.: Predictive coding as a model of the v1 saliency map hypothesis. Neural Netwo. Offi. J. Int. Neural Netw. Soc. 26, 7–28 (2011)
Rao, R., Ballard, D.: Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999)
Borji, A.: Saliency prediction in the deep learning era: successes and limitations. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 679–700 (2019)
Kruthiventi, S., Gudisa, V., Dholakiya, J., Babu, R.: Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation, Jun. pp. 5781–5790 (2016)
Cong, R., Lei, J., Fu, H., Cheng, M.-M., Lin, W., Huang, Q.: Review of visual saliency detection with comprehensive information. IEEE Trans. Circuits Syst. Video Technol. 29(10), 2941–2959 (2019)
Cong, R., Lei, J., Fu, H., Porikli, F., Huang, Q., Hou, C.: Video saliency detection via sparsity-based reconstruction and propagation. IEEE Trans. Image Process. 28(10), 4819–4831 (2019)
Fang, C., Tian, H., Zhang, D., Zhang, Q., Han, J., Han, J.: Densely nested top-down flows for salient object detection. Sci. China Inf. Sci. 65(8), 1–14 (2022)
Zhang, D., Han, J., Zhang, Y., Xu, D.: Synthesizing supervision for learning deep saliency network without human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 42(7), 1755–1769 (2020)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 3488–3493 (2016)
Kümmerer, M., Wallis, T. S. A., Bethge, M.: Deepgaze II: reading fixations from deep features trained on object recognition, CoRR, vol. abs/1610.01563, arXiv: 1610.01563. [Online]. Available: (2016)
Butz, M.: Toward a cognitive sequence learner: hierarchy, self-organization, and top-down bottom-up interaction, Apr. [Online]. Available: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.177.6739 &rep=rep1 &type=pdf (2004)
Finlayson, G., Drew, M., Funt, B.: Color constancy: Enhancing von Kries adaptation via sensor transformations, Proc SPIE, vol. 1913, Sep (1993)
Finlayson, G.D., Alsam, A., Hordley, S.D.: Local linear models for improved von kries adaptation, In: The Tenth Color Imaging Conference: Color Science and Engineering Systems, Technologies, Applications, CIC,: Scottsdale, Arizona, USA, November 12–15, 2002, IS &T - The Society for Imaging. Science and Technology 2002, pp. 139–144 (2002)
Krauskopf, J., Gegenfurtner, K.: Color discrimination and adaptation. Vis. Res. 32, 2165–75 (1992)
Brill, M.: Trichromatic theory, pp. 827–829, ISBN: 978-0-387-30771-8 (2014)
Shapley, R., Hawken, M.: Color in the cortex: single- and double-opponent cells. Vis. Res. 51, 701–17 (2011)
Hering, E.: Outlines of a Theory of the Light Sense. Harvard University Press, Cambridge (1920)
Hurvich, L., Jameson, D.: An opponent-process theory of color vison. Psychol. Rev. 64, 384–404 (1957)
Zhaoping, L.: A neural model of contour integration in the primary visual cortex. Neural Comput. 10, 903–40 (1998)
Haar, A.: Zur theorie der orthogonalen funktionensysteme. (zweite mitteilung)., Mathematische Annalen, vol. 71, pp. 38–53, [Online]. Available: http://eudml.org/doc/158516 (1912)
Imamoğlu, N., Lin, W., Fang, Y.: A saliency detection model using low-level features based on wavelet transform. IEEE Trans. Multimedia 15, 96–105 (2013)
Mullen, K.: The contrast sensitivity of human color vision to red-green and blue-yellow chromatic gratings. J. Physiol. 359, 381–400 (1985)
Mannos, J., Sakrison, D.: The effects of a visual fidelity criterion of the encoding of images. IEEE Trans. Inf. Theory 20(4), 525–536 (1974)
Watson, A., Malo, J.: Video quality measures based on the standard spatial observer, In: Proceedings International Conference on Image Processing (ICIP), vol. 3, Feb. pp. III–41, ISBN: 0-7803-7622-6 (2002)
Watson, A., Ahumada, A.: The spatial standard observer. J. Vis. 4, 51–51 (2010)
Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations, In: MIT Technical Report (2012)
Berga, D., Fdez-Vidal, X. R., Otazu, X., Pardo, X. M.: Sid4vam: A benchmark dataset with synthetic images for visual attention modeling, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8788– 8797 (2019)
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1604.03605 (2016)
Borji, A., Rezazadegan Tavakoli, H., Sihite, D., Itti, L.: Analysis of scores, datasets, and models in visual saliency prediction, In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Dec, pp. 921–928 (2013)
Emami, M., Hoberock, L.: Selection of a best metric and evaluation of bottom-up visual saliency models. Image Vis. Comput. 31, 796–808 (2013)
Riche, N., Duvinage, M., Mancas, M., Gosselin, B., Dutoit, T.: Saliency and human fixations: State-of-the-art and study of comparison metrics, In: Proceedings of the IEEE International Conference on Computer Vision, (2013)
Kümmerer, M., Wallis, T., Bethge, M.: Information-theoretic model comparison unifies saliency metrics. Proc. Natl. Acad. Sci. 112, 201 510 393 (2015)
Jost, T., Ouerhani, N., Wartburg, R., Müri, R., Hügli, H.: Assessing the contribution of color in visual attention. Comput. Vis. Image Underst. 100, 107–123 (2005)
Feng, M.: Evaluation toolbox for salient object detection., https://github.com/ArcherFMY/sal_eval_toolbox (2018)
Li, Q.: Understanding saliency prediction with deep convolutional neural networks and psychophysical models (2022)
Bowers, J.S., et al.: Deep problems with neural network models of human vision. Behav. Brain Sci., 1–74 (2022)
Riche, N., Mancas, M., Gosselin, B., Dutoit, T.: Rare: A new bottom-up saliency model, In: Image Processing, 19th IEEE Conference on (IEEE), (2012)
Zhang, L., Tong, M., Marks, T., Shan, H., Cottrell, G.: Sun: a Bayesian framework for saliency using nature statistics. J. Vis. 8, 32 (2008)
Harel, J.: A saliency implementation in matlab, http://www.klab.caltech.edu/~harel/share/gbvs.php (2012)
Acknowledgements
I thank the anonymous reviewers, whose suggestions helped to improve and clarify this manuscript. This work was partially funded by GVA Grisolía-P/2019/035.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
See Fig. 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Q. Saliency prediction based on multi-channel models of visual processing. Machine Vision and Applications 34, 47 (2023). https://doi.org/10.1007/s00138-023-01405-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01405-2