IPHGaze: Image Pyramid Gaze Estimation with Head Pose Guidance

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15328))

Included in the following conference series:

International Conference on Pattern Recognition

78 Accesses

Abstract

Gaze refers to the directed focus of an individual’s visual perception, playing a fundamental role in human communication and cognition. Recent studies have employed neural networks to predict gaze from standard RGB face images. However, obtaining effective face images is challenging due to their sensitivity to bounding box size. The interference from head pose further complicates gaze estimation, yet in real-world scenarios, it is not feasible to obtain accurate head pose values. To overcome these challenges, we design the IPHGaze network guided by head pose information for image pyramid face gaze estimation. We craft this network to capture diverse face perspectives by incorporating various face bounding box sizes, ensuring rich gaze features. Additionally, a Feature Ensemble Module (FEM) facilitates feature sharing across image pyramid levels. We use head pose features instead of precise labels in two stages: feature communication and fusion, enhancing robustness for stable gaze predictions. Our method achieves a \(5.4 \%\) improvement on EyeDiap dataset and a \(2.5 \%\) improvement on Gaze360 dataset compared with existing methods, which demonstrates its effectiveness and versatility across diverse indoor and outdoor scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 89.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 109.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EFG-Net: A Unified Framework for Estimating Eye Gaze and Face Gaze Simultaneously

Residual feature learning with hierarchical calibration for gaze estimation

Article 05 May 2024

Transgaze: exploring plain vision transformers for gaze estimation

Article 21 September 2024

References

Abdelrahman, A.A., Hempel, T., Khalifa, A., Al-Hamadi, A., Dinges, L.: L2CS-Net: fine-grained gaze estimation in unconstrained environments. In: 2023 8th International Conference on Frontiers of Signal Processing (ICFSP), pp. 98–102. IEEE (2023)
Google Scholar
Bao, Y., Cheng, Y., Liu, Y., Lu, F.: Adaptive feature fusion network for gaze tracking in mobile tablets. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9936–9943. IEEE (2021)
Google Scholar
Bao, Y., Wang, J., Wang, Z., Lu, F.: Exploring 3D interaction with gaze guidance in augmented reality. In: 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 22–32. IEEE (2023)
Google Scholar
Bektaş, K., Strecker, J., Mayer, S., Garcia, K.: Gaze-enabled activity recognition for augmented reality feedback. Comput. Graph. 119, 103909 (2024)
Article Google Scholar
Cai, X., et al.: Gaze estimation with an ensemble of four architectures. arXiv preprint arXiv:2107.01980 (2021)
Che, H., et al.: EFG-Net: a unified framework for estimating eye gaze and face gaze simultaneously. In: Yu, S., et al. (eds.) PRCV 2022. LNCS, vol. 13534, pp. 552–565. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18907-4_43
Chapter Google Scholar
Chen, Z., Shi, B.E.: Appearance-based gaze estimation using dilated-convolutions. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 309–324. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_20
Chapter Google Scholar
Cheng, Y., Huang, S., Wang, F., Qian, C., Lu, F.: A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10623–10630 (2020)
Google Scholar
Cheng, Y., Lu, F.: Gaze estimation using transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3341–3347. IEEE (2022)
Google Scholar
Cheng, Y., Lu, F.: DVGaze: dual-view gaze estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 20632–20641 (2023)
Google Scholar
Cheng, Y., Wang, H., Bao, Y., Lu, F.: Appearance-based gaze estimation with deep learning: a review and benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 46, 7509–7528 (2024)
Article Google Scholar
Cheng, Y., et al.: What do you see in vehicle? Comprehensive vision solution for in-vehicle gaze estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1556–1565 (2024)
Google Scholar
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Google Scholar
Fischer, T., Chang, H.J., Demiris, Y.: RT-GENE: real-time eye gaze estimation in natural environments. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 339–357. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_21
Chapter Google Scholar
Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258 (2014)
Google Scholar
Gao, J., Geng, X., Zhang, Y., Wang, R., Shao, K.: Augmented weighted bidirectional feature pyramid network for marine object detection. Expert Syst. Appl. 237, 121688 (2024)
Article Google Scholar
Gideon, J., Su, S., Stent, S.: Unsupervised multi-view gaze representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5001–5009 (2022)
Google Scholar
Hempel, T., Abdelrahman, A.A., Al-Hamadi, A.: 6D rotation representation for unconstrained head pose estimation. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 2496–2500. IEEE (2022)
Google Scholar
Her, P., Manderle, L., Dias, P.A., Medeiros, H., Odone, F.: Uncertainty-aware gaze tracking for assisted living environments. IEEE Trans. Image Process. 32, 2335–2347 (2023)
Article Google Scholar
Hisadome, Y., Wu, T., Qin, J., Sugano, Y.: Rotation-constrained cross-view feature fusion for multi-view appearance-based gaze estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5985–5994 (2024)
Google Scholar
Hsieh, Y.H., Granlund, M., Odom, S.L., Hwang, A.W., Hemmingsson, H.: Increasing participation in computer activities using eye-gaze assistive technology for children with complex needs. Disabil. Rehabil. Assist. Technol. 19(2), 492–505 (2024)
Article Google Scholar
Huang, S., Lu, Z., Cheng, R., He, C.: FAPN: feature-aligned pyramid network for dense image prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 864–873 (2021)
Google Scholar
Jha, S., Busso, C.: Estimation of driver’s gaze region from head position and orientation using probabilistic confidence regions. IEEE Trans. Intell. Veh. 8(1), 59–72 (2022)
Article Google Scholar
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921 (2019)
Google Scholar
Kim, T., Kim, K., Lee, J., Cha, D., Lee, J., Kim, D.: Revisiting image pyramid structure for high resolution salient object detection. In: Proceedings of the Asian Conference on Computer Vision, pp. 108–124 (2022)
Google Scholar
Lee, H.S., Weidner, F., Sidenmark, L., Gellersen, H.: Snap, pursuit and gain: virtual reality viewport control by gaze. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2024)
Google Scholar
Li, Y., et al.: MViTv2: improved multiscale vision transformers for classification and detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4804–4814 (2022)
Google Scholar
Luo, X., et al.: Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Med. Image Anal. 80, 102517 (2022)
Article Google Scholar
Nagpure, V., Okuma, K.: Searching efficient neural architecture with multi-resolution fusion transformer for appearance-based gaze estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 890–899 (2023)
Google Scholar
Tolstikhin, I.O., et al.: MLP-mixer: an all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)
Google Scholar
Wang, Y., Yuan, G., Fu, X.: Driver’s head pose and gaze zone estimation based on multi-zone templates registration and multi-frame point cloud fusion. Sensors 22(9), 3154 (2022)
Article Google Scholar
Xiang, X., Yin, H., Qiao, Y., El Saddik, A.: Temporal adaptive feature pyramid network for action detection. Comput. Vis. Image Underst. 240, 103945 (2024)
Article Google Scholar
Yin, X., Yu, Z., Fei, Z., Lv, W., Gao, X.: PE-YOLO: pyramid enhancement network for dark object detection. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds.) ICANN 2023. LNCS, vol. 14260, pp. 163–174. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-44195-0_14
Chapter Google Scholar
Yun, J.S., Na, Y., Kim, H.H., Kim, H.I., Yoo, S.B.: HAZE-Net: high-frequency attentive super-resolved gaze estimation in low-resolution face images. In: Proceedings of the Asian Conference on Computer Vision, pp. 3361–3378 (2022)
Google Scholar
Zhang, C., Chen, T., Nedungadi, R.R., Shaffer, E., Soltanaghai, E.: FocusFlow: leveraging focal depth for gaze interaction in virtual reality. In: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–4 (2023)
Google Scholar
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: Full-face appearance-based gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 51–60 (2017)
Google Scholar
Zhu, M.: Dynamic feature pyramid networks for object detection. In: Fifteenth International Conference on Signal Processing Systems (ICSPS 2023), vol. 13091, pp. 503–511. SPIE (2024)
Google Scholar
Zhu, W., Deng, H.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3143–3152 (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by National Science and Technology Major Project from Minister of Science and Technology, China (2021ZD0201403), Natural Science Foundation of Shanghai (23ZR1474200), Youth Innovation Promotion Association, Chinese Academy of Sciences (2021233, 2023242), Shanghai Academic Research Leader (22XD1424500).

Author information

Authors and Affiliations

Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
Hekuangyi Che, Dongchen Zhu, Wenjun Shi, Guanghui Zhang, Hang Li, Lei Wang & Jiamao Li
University of Chinese Academy of Sciences, Beijing, 100049, China
Hekuangyi Che, Dongchen Zhu, Lei Wang & Jiamao Li

Authors

Hekuangyi Che
View author publications
You can also search for this author in PubMed Google Scholar
Dongchen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjun Shi
View author publications
You can also search for this author in PubMed Google Scholar
Guanghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiamao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongchen Zhu .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Che, H. et al. (2025). IPHGaze: Image Pyramid Gaze Estimation with Head Pose Guidance. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15328. Springer, Cham. https://doi.org/10.1007/978-3-031-78104-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-78104-9_27
Published: 02 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78103-2
Online ISBN: 978-3-031-78104-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

IPHGaze: Image Pyramid Gaze Estimation with Head Pose Guidance

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EFG-Net: A Unified Framework for Estimating Eye Gaze and Face Gaze Simultaneously

Residual feature learning with hierarchical calibration for gaze estimation

Transgaze: exploring plain vision transformers for gaze estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

IPHGaze: Image Pyramid Gaze Estimation with Head Pose Guidance

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EFG-Net: A Unified Framework for Estimating Eye Gaze and Face Gaze Simultaneously

Residual feature learning with hierarchical calibration for gaze estimation

Transgaze: exploring plain vision transformers for gaze estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation