[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

IPHGaze: Image Pyramid Gaze Estimation with Head Pose Guidance

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Abstract

Gaze refers to the directed focus of an individual’s visual perception, playing a fundamental role in human communication and cognition. Recent studies have employed neural networks to predict gaze from standard RGB face images. However, obtaining effective face images is challenging due to their sensitivity to bounding box size. The interference from head pose further complicates gaze estimation, yet in real-world scenarios, it is not feasible to obtain accurate head pose values. To overcome these challenges, we design the IPHGaze network guided by head pose information for image pyramid face gaze estimation. We craft this network to capture diverse face perspectives by incorporating various face bounding box sizes, ensuring rich gaze features. Additionally, a Feature Ensemble Module (FEM) facilitates feature sharing across image pyramid levels. We use head pose features instead of precise labels in two stages: feature communication and fusion, enhancing robustness for stable gaze predictions. Our method achieves a \(5.4 \%\) improvement on EyeDiap dataset and a \(2.5 \%\) improvement on Gaze360 dataset compared with existing methods, which demonstrates its effectiveness and versatility across diverse indoor and outdoor scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 89.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdelrahman, A.A., Hempel, T., Khalifa, A., Al-Hamadi, A., Dinges, L.: L2CS-Net: fine-grained gaze estimation in unconstrained environments. In: 2023 8th International Conference on Frontiers of Signal Processing (ICFSP), pp. 98–102. IEEE (2023)

    Google Scholar 

  2. Bao, Y., Cheng, Y., Liu, Y., Lu, F.: Adaptive feature fusion network for gaze tracking in mobile tablets. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9936–9943. IEEE (2021)

    Google Scholar 

  3. Bao, Y., Wang, J., Wang, Z., Lu, F.: Exploring 3D interaction with gaze guidance in augmented reality. In: 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 22–32. IEEE (2023)

    Google Scholar 

  4. Bektaş, K., Strecker, J., Mayer, S., Garcia, K.: Gaze-enabled activity recognition for augmented reality feedback. Comput. Graph. 119, 103909 (2024)

    Article  Google Scholar 

  5. Cai, X., et al.: Gaze estimation with an ensemble of four architectures. arXiv preprint arXiv:2107.01980 (2021)

  6. Che, H., et al.: EFG-Net: a unified framework for estimating eye gaze and face gaze simultaneously. In: Yu, S., et al. (eds.) PRCV 2022. LNCS, vol. 13534, pp. 552–565. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18907-4_43

    Chapter  Google Scholar 

  7. Chen, Z., Shi, B.E.: Appearance-based gaze estimation using dilated-convolutions. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 309–324. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_20

    Chapter  Google Scholar 

  8. Cheng, Y., Huang, S., Wang, F., Qian, C., Lu, F.: A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10623–10630 (2020)

    Google Scholar 

  9. Cheng, Y., Lu, F.: Gaze estimation using transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3341–3347. IEEE (2022)

    Google Scholar 

  10. Cheng, Y., Lu, F.: DVGaze: dual-view gaze estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 20632–20641 (2023)

    Google Scholar 

  11. Cheng, Y., Wang, H., Bao, Y., Lu, F.: Appearance-based gaze estimation with deep learning: a review and benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 46, 7509–7528 (2024)

    Article  Google Scholar 

  12. Cheng, Y., et al.: What do you see in vehicle? Comprehensive vision solution for in-vehicle gaze estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1556–1565 (2024)

    Google Scholar 

  13. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)

    Google Scholar 

  14. Fischer, T., Chang, H.J., Demiris, Y.: RT-GENE: real-time eye gaze estimation in natural environments. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 339–357. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_21

    Chapter  Google Scholar 

  15. Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258 (2014)

    Google Scholar 

  16. Gao, J., Geng, X., Zhang, Y., Wang, R., Shao, K.: Augmented weighted bidirectional feature pyramid network for marine object detection. Expert Syst. Appl. 237, 121688 (2024)

    Article  Google Scholar 

  17. Gideon, J., Su, S., Stent, S.: Unsupervised multi-view gaze representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5001–5009 (2022)

    Google Scholar 

  18. Hempel, T., Abdelrahman, A.A., Al-Hamadi, A.: 6D rotation representation for unconstrained head pose estimation. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 2496–2500. IEEE (2022)

    Google Scholar 

  19. Her, P., Manderle, L., Dias, P.A., Medeiros, H., Odone, F.: Uncertainty-aware gaze tracking for assisted living environments. IEEE Trans. Image Process. 32, 2335–2347 (2023)

    Article  Google Scholar 

  20. Hisadome, Y., Wu, T., Qin, J., Sugano, Y.: Rotation-constrained cross-view feature fusion for multi-view appearance-based gaze estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5985–5994 (2024)

    Google Scholar 

  21. Hsieh, Y.H., Granlund, M., Odom, S.L., Hwang, A.W., Hemmingsson, H.: Increasing participation in computer activities using eye-gaze assistive technology for children with complex needs. Disabil. Rehabil. Assist. Technol. 19(2), 492–505 (2024)

    Article  Google Scholar 

  22. Huang, S., Lu, Z., Cheng, R., He, C.: FAPN: feature-aligned pyramid network for dense image prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 864–873 (2021)

    Google Scholar 

  23. Jha, S., Busso, C.: Estimation of driver’s gaze region from head position and orientation using probabilistic confidence regions. IEEE Trans. Intell. Veh. 8(1), 59–72 (2022)

    Article  Google Scholar 

  24. Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921 (2019)

    Google Scholar 

  25. Kim, T., Kim, K., Lee, J., Cha, D., Lee, J., Kim, D.: Revisiting image pyramid structure for high resolution salient object detection. In: Proceedings of the Asian Conference on Computer Vision, pp. 108–124 (2022)

    Google Scholar 

  26. Lee, H.S., Weidner, F., Sidenmark, L., Gellersen, H.: Snap, pursuit and gain: virtual reality viewport control by gaze. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2024)

    Google Scholar 

  27. Li, Y., et al.: MViTv2: improved multiscale vision transformers for classification and detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4804–4814 (2022)

    Google Scholar 

  28. Luo, X., et al.: Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Med. Image Anal. 80, 102517 (2022)

    Article  Google Scholar 

  29. Nagpure, V., Okuma, K.: Searching efficient neural architecture with multi-resolution fusion transformer for appearance-based gaze estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 890–899 (2023)

    Google Scholar 

  30. Tolstikhin, I.O., et al.: MLP-mixer: an all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)

    Google Scholar 

  31. Wang, Y., Yuan, G., Fu, X.: Driver’s head pose and gaze zone estimation based on multi-zone templates registration and multi-frame point cloud fusion. Sensors 22(9), 3154 (2022)

    Article  Google Scholar 

  32. Xiang, X., Yin, H., Qiao, Y., El Saddik, A.: Temporal adaptive feature pyramid network for action detection. Comput. Vis. Image Underst. 240, 103945 (2024)

    Article  Google Scholar 

  33. Yin, X., Yu, Z., Fei, Z., Lv, W., Gao, X.: PE-YOLO: pyramid enhancement network for dark object detection. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds.) ICANN 2023. LNCS, vol. 14260, pp. 163–174. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-44195-0_14

    Chapter  Google Scholar 

  34. Yun, J.S., Na, Y., Kim, H.H., Kim, H.I., Yoo, S.B.: HAZE-Net: high-frequency attentive super-resolved gaze estimation in low-resolution face images. In: Proceedings of the Asian Conference on Computer Vision, pp. 3361–3378 (2022)

    Google Scholar 

  35. Zhang, C., Chen, T., Nedungadi, R.R., Shaffer, E., Soltanaghai, E.: FocusFlow: leveraging focal depth for gaze interaction in virtual reality. In: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–4 (2023)

    Google Scholar 

  36. Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: Full-face appearance-based gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 51–60 (2017)

    Google Scholar 

  37. Zhu, M.: Dynamic feature pyramid networks for object detection. In: Fifteenth International Conference on Signal Processing Systems (ICSPS 2023), vol. 13091, pp. 503–511. SPIE (2024)

    Google Scholar 

  38. Zhu, W., Deng, H.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3143–3152 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Science and Technology Major Project from Minister of Science and Technology, China (2021ZD0201403), Natural Science Foundation of Shanghai (23ZR1474200), Youth Innovation Promotion Association, Chinese Academy of Sciences (2021233, 2023242), Shanghai Academic Research Leader (22XD1424500).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongchen Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Che, H. et al. (2025). IPHGaze: Image Pyramid Gaze Estimation with Head Pose Guidance. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15328. Springer, Cham. https://doi.org/10.1007/978-3-031-78104-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78104-9_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78103-2

  • Online ISBN: 978-3-031-78104-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics