Abstract
Gaze refers to the directed focus of an individual’s visual perception, playing a fundamental role in human communication and cognition. Recent studies have employed neural networks to predict gaze from standard RGB face images. However, obtaining effective face images is challenging due to their sensitivity to bounding box size. The interference from head pose further complicates gaze estimation, yet in real-world scenarios, it is not feasible to obtain accurate head pose values. To overcome these challenges, we design the IPHGaze network guided by head pose information for image pyramid face gaze estimation. We craft this network to capture diverse face perspectives by incorporating various face bounding box sizes, ensuring rich gaze features. Additionally, a Feature Ensemble Module (FEM) facilitates feature sharing across image pyramid levels. We use head pose features instead of precise labels in two stages: feature communication and fusion, enhancing robustness for stable gaze predictions. Our method achieves a \(5.4 \%\) improvement on EyeDiap dataset and a \(2.5 \%\) improvement on Gaze360 dataset compared with existing methods, which demonstrates its effectiveness and versatility across diverse indoor and outdoor scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdelrahman, A.A., Hempel, T., Khalifa, A., Al-Hamadi, A., Dinges, L.: L2CS-Net: fine-grained gaze estimation in unconstrained environments. In: 2023 8th International Conference on Frontiers of Signal Processing (ICFSP), pp. 98–102. IEEE (2023)
Bao, Y., Cheng, Y., Liu, Y., Lu, F.: Adaptive feature fusion network for gaze tracking in mobile tablets. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9936–9943. IEEE (2021)
Bao, Y., Wang, J., Wang, Z., Lu, F.: Exploring 3D interaction with gaze guidance in augmented reality. In: 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 22–32. IEEE (2023)
Bektaş, K., Strecker, J., Mayer, S., Garcia, K.: Gaze-enabled activity recognition for augmented reality feedback. Comput. Graph. 119, 103909 (2024)
Cai, X., et al.: Gaze estimation with an ensemble of four architectures. arXiv preprint arXiv:2107.01980 (2021)
Che, H., et al.: EFG-Net: a unified framework for estimating eye gaze and face gaze simultaneously. In: Yu, S., et al. (eds.) PRCV 2022. LNCS, vol. 13534, pp. 552–565. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18907-4_43
Chen, Z., Shi, B.E.: Appearance-based gaze estimation using dilated-convolutions. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 309–324. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_20
Cheng, Y., Huang, S., Wang, F., Qian, C., Lu, F.: A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10623–10630 (2020)
Cheng, Y., Lu, F.: Gaze estimation using transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3341–3347. IEEE (2022)
Cheng, Y., Lu, F.: DVGaze: dual-view gaze estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 20632–20641 (2023)
Cheng, Y., Wang, H., Bao, Y., Lu, F.: Appearance-based gaze estimation with deep learning: a review and benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 46, 7509–7528 (2024)
Cheng, Y., et al.: What do you see in vehicle? Comprehensive vision solution for in-vehicle gaze estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1556–1565 (2024)
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Fischer, T., Chang, H.J., Demiris, Y.: RT-GENE: real-time eye gaze estimation in natural environments. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 339–357. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_21
Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258 (2014)
Gao, J., Geng, X., Zhang, Y., Wang, R., Shao, K.: Augmented weighted bidirectional feature pyramid network for marine object detection. Expert Syst. Appl. 237, 121688 (2024)
Gideon, J., Su, S., Stent, S.: Unsupervised multi-view gaze representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5001–5009 (2022)
Hempel, T., Abdelrahman, A.A., Al-Hamadi, A.: 6D rotation representation for unconstrained head pose estimation. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 2496–2500. IEEE (2022)
Her, P., Manderle, L., Dias, P.A., Medeiros, H., Odone, F.: Uncertainty-aware gaze tracking for assisted living environments. IEEE Trans. Image Process. 32, 2335–2347 (2023)
Hisadome, Y., Wu, T., Qin, J., Sugano, Y.: Rotation-constrained cross-view feature fusion for multi-view appearance-based gaze estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5985–5994 (2024)
Hsieh, Y.H., Granlund, M., Odom, S.L., Hwang, A.W., Hemmingsson, H.: Increasing participation in computer activities using eye-gaze assistive technology for children with complex needs. Disabil. Rehabil. Assist. Technol. 19(2), 492–505 (2024)
Huang, S., Lu, Z., Cheng, R., He, C.: FAPN: feature-aligned pyramid network for dense image prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 864–873 (2021)
Jha, S., Busso, C.: Estimation of driver’s gaze region from head position and orientation using probabilistic confidence regions. IEEE Trans. Intell. Veh. 8(1), 59–72 (2022)
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921 (2019)
Kim, T., Kim, K., Lee, J., Cha, D., Lee, J., Kim, D.: Revisiting image pyramid structure for high resolution salient object detection. In: Proceedings of the Asian Conference on Computer Vision, pp. 108–124 (2022)
Lee, H.S., Weidner, F., Sidenmark, L., Gellersen, H.: Snap, pursuit and gain: virtual reality viewport control by gaze. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2024)
Li, Y., et al.: MViTv2: improved multiscale vision transformers for classification and detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4804–4814 (2022)
Luo, X., et al.: Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Med. Image Anal. 80, 102517 (2022)
Nagpure, V., Okuma, K.: Searching efficient neural architecture with multi-resolution fusion transformer for appearance-based gaze estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 890–899 (2023)
Tolstikhin, I.O., et al.: MLP-mixer: an all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)
Wang, Y., Yuan, G., Fu, X.: Driver’s head pose and gaze zone estimation based on multi-zone templates registration and multi-frame point cloud fusion. Sensors 22(9), 3154 (2022)
Xiang, X., Yin, H., Qiao, Y., El Saddik, A.: Temporal adaptive feature pyramid network for action detection. Comput. Vis. Image Underst. 240, 103945 (2024)
Yin, X., Yu, Z., Fei, Z., Lv, W., Gao, X.: PE-YOLO: pyramid enhancement network for dark object detection. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds.) ICANN 2023. LNCS, vol. 14260, pp. 163–174. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-44195-0_14
Yun, J.S., Na, Y., Kim, H.H., Kim, H.I., Yoo, S.B.: HAZE-Net: high-frequency attentive super-resolved gaze estimation in low-resolution face images. In: Proceedings of the Asian Conference on Computer Vision, pp. 3361–3378 (2022)
Zhang, C., Chen, T., Nedungadi, R.R., Shaffer, E., Soltanaghai, E.: FocusFlow: leveraging focal depth for gaze interaction in virtual reality. In: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–4 (2023)
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: Full-face appearance-based gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 51–60 (2017)
Zhu, M.: Dynamic feature pyramid networks for object detection. In: Fifteenth International Conference on Signal Processing Systems (ICSPS 2023), vol. 13091, pp. 503–511. SPIE (2024)
Zhu, W., Deng, H.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3143–3152 (2017)
Acknowledgements
This work was supported by National Science and Technology Major Project from Minister of Science and Technology, China (2021ZD0201403), Natural Science Foundation of Shanghai (23ZR1474200), Youth Innovation Promotion Association, Chinese Academy of Sciences (2021233, 2023242), Shanghai Academic Research Leader (22XD1424500).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Che, H. et al. (2025). IPHGaze: Image Pyramid Gaze Estimation with Head Pose Guidance. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15328. Springer, Cham. https://doi.org/10.1007/978-3-031-78104-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-78104-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78103-2
Online ISBN: 978-3-031-78104-9
eBook Packages: Computer ScienceComputer Science (R0)