[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)

Published: 17 May 2024 Publication History

Abstract

The advent of foundation models signals a new era in artificial intelligence. The Segment Anything Model (SAM) is the first foundation model for image segmentation. In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups. The increasing requirement for annotated eye-image datasets presents a significant opportunity for SAM to redefine the landscape of data annotation in gaze estimation. Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks. Our results are consistent with studies in other domains, demonstrating that SAM's segmentation effectiveness can be on-par with specialized models depending on the feature, with prompts improving its performance, evidenced by an IoU of 93.34% for pupil segmentation in one dataset. Foundation models like SAM could revolutionize gaze estimation by enabling quick and easy image segmentation, reducing reliance on specialized models and extensive manual annotation.

References

[1]
Anwai Archit, Sushmita Nair, Nabeel Khalid, Paul Hilt, Vikas Rajashekar, Marei Freitag, Sagnik Gupta, Andreas Dengel, Sheraz Ahmed, and Constantin Pape. 2023. Segment Anything for Microscopy. bioRxiv (2023), 2023--08.
[2]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39, 12 (2017), 2481--2495.
[3]
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Trainèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. 2022. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[5]
Sean Anthony Byrne, Virmarie Maquiling, Marcus Nyström, Enkelejda Kasneci, and Diederick C Niehorster. 2023a. LEyes: A Lightweight Framework for Deep Learning-Based Eye Tracking using Synthetic Eye Images. arXiv preprint arXiv:2309.06129 (2023).
[6]
Sean Anthony Byrne, Marcus Nyström, Virmarie Maquiling, Enkelejda Kasneci, and Diederick C Niehorster. 2023b. Precise localization of corneal reflections in eye images using deep learning trained on synthetic data. Behavior Research Methods (2023), 1--16.
[7]
Aayush K Chaudhary, Nitinraj Nair, Reynold J Bailey, Jeff B Pelz, Sachin S Talathi, and Gabriel J Diaz. 2022. Temporal RIT-Eyes: From real infrared eye-images to synthetic sequences of gaze behavior. IEEE Transactions on Visualization and Computer Graphics 28, 11 (2022), 3948--3958.
[8]
Soumil Chugh, Braiden Brousseau, Jonathan Rose, and Moshe Eizenman. 2021. Detection and correspondence matching of corneal reflections for eye tracking using deep learning. In 2020 25th International Conference on Pattern Recognition (ICPR).IEEE, 2210--2217.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[10]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2010. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020. arXiv preprint arXiv:2010.11929 (2010).
[11]
Matt J Dunn, Robert G Alexander, Onyekachukwu M Amiebenomo, Gemma Arblaster, Denize Atan, Jonathan T Erichsen, Ulrich Ettinger, Mario E Giardini, Iain D Gilchrist, Ruth Hamilton, et al. 2023. Minimal reporting guideline for research involving eye tracking (2023 edition). Behavior research methods (2023), 1--7.
[12]
Wolfgang Fuhl, Hong Gao, and Enkelejda Kasneci. 2020. Tiny convolution, decision tree, and binary neuronal networks for robust and real time pupil outline estimation. In ACM symposium on eye tracking research and applications. 1--5.
[13]
Wolfgang Fuhl, Gjergji Kasneci, and Enkelejda Kasneci. 2021. TEyeD: Over 20 Million Real-World Eye Images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE. https://doi.org/10.1109/ismar52148.2021.00053
[14]
Wolfgang Fuhl, Thomas Kübler, Katrin Sippel, Wolfgang Rosenstiel, and Enkelejda Kasneci. 2015. Excuse: Robust pupil detection in real-world scenarios. In Computer Analysis of Images and Patterns: 16th International Conference, CAIP 2015, Valletta, Malta, September 2-4, 2015 Proceedings, Part I 16. Springer, 39--51.
[15]
Wolfgang Fuhl, Thiago Santini, Gjergji Kasneci, and Enkelejda Kasneci. 2016a. Pupilnet: Convolutional neural networks for robust pupil detection. arXiv preprint arXiv:1601.04902 (2016).
[16]
Wolfgang Fuhl, Marc Tonsen, Andreas Bulling, and Enkelejda Kasneci. 2016b. Pupil detection for head-mounted eye tracking in the wild: an evaluation of the state of the art. Machine Vision and Applications 27 (2016), 1275--1288.
[17]
Stephan Joachim Garbin, Oleg Komogortsev, Robert Cavin, Gregory Hughes, Yiru Shen, Immo Schuetz, and Sachin S Talathi. 2020. Dataset for eye tracking on a virtual reality platform. In ACM Symposium on Eye Tracking Research and Applications. 1--10.
[18]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000--16009.
[19]
Kenneth Holmqvist, Marcus Nyström, Richard Andersson, Richard Dewhurst, Halszka Jarodzka, and Joost Van de Weijer. 2011. Eye tracking: A comprehensive guide to methods and measures. OUP Oxford.
[20]
Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, et al. 2024. Segment anything model for medical images? Medical Image Analysis 92 (2024), 103061.
[21]
Daniel P Huttenlocher, Gregory A. Klanderman, and William J Rucklidge. 1993. Comparing images using the Hausdorff distance. IEEE Transactions on pattern analysis and machine intelligence 15, 9 (1993), 850--863.
[22]
Yongcheng Jing, Xinchao Wang, and Dacheng Tao. 2023. Segment Anything in Non-Euclidean Domains: Challenges and Opportunities. arXiv:2304.11595 [cs.CV]
[23]
Joohwan Kim, Michael Stengel, Alexander Majercik, Shalini De Mello, David Dunn, Samuli Laine, Morgan McGuire, and David Luebke. 2019. Nvgaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--12.
[24]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4015--4026.
[25]
Rakshit Kothari, Zhizhuo Yang, Christopher Kanan, Reynold Bailey, Jeff B Pelz, and Gabriel J Diaz. 2020. Gaze-in-wild: A dataset for studying eye and head coordination in everyday activities. Scientific reports 10, 1 (2020), 2539.
[26]
Rakshit S Kothari, Reynold J Bailey, Christopher Kanan, Jeff B Pelz, and Gabriel J Diaz. 2022. EllSeg-Gen, towards Domain Generalization for head-mounted eyetracking. Proceedings of the ACM on Human-Computer Interaction 6, ETRA (2022), 1--17.
[27]
Rakshit S Kothari, Aayush K Chaudhary, Reynold J Bailey, Jeff B Pelz, and Gabriel J Diaz. 2021. Ellseg: An ellipse segmentation framework for robust gaze tracking. IEEE Transactions on Visualization and Computer Graphics 27, 5 (2021), 2757--2767.
[28]
Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. 2024. Segment anything in medical images. Nature Communications 15, 1 (2024), 654.
[29]
Virmarie Maquiling, Sean Anthony Byrne, Marcus Nyström, Enkelejda Kasneci, and Diederick C Niehorster. 2023. V-ir-Net: A Novel Neural Network for Pupil and Corneal Reflection Detection trained on Simulated Light Distributions. In Proceedings of the 25th International Conference on Mobile Human-Computer Interaction. 1--7.
[30]
Christian Mattjie, Luis Vinicius De Moura, Rafaela Ravazio, Lucas Kupssinskü, Otávio Parraga, Marcelo Mussi Delucis, and Rodrigo C Barros. 2023. Zero-shot performance of the Segment Anything Model (SAM) in 2D medical imaging: A comprehensive evaluation and practical guidelines. In 2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE). IEEE, 108--112.
[31]
Maciej A Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, and Yixin Zhang. 2023. Segment anything model for medical image analysis: an experimental study. Medical Image Analysis 89 (2023), 102918.
[32]
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV). Ieee, 565--571.
[33]
Nitinraj Nair, Rakshit Kothari, Aayush K Chaudhary, Zhizhuo Yang, Gabriel J Diaz, Jeff B Pelz, and Reynold J Bailey. 2020. RIT-Eyes: Rendering of near-eye images for eye-tracking applications. In ACM Symposium on Applied Perception 2020. 1--9.
[34]
Cristina Palmero, Abhishek Sharma, Karsten Behrendt, Kapil Krishnakumar, Oleg V Komogortsev, and Sachin S Talathi. 2021. Openeds2020 challenge on gaze tracking for vr: Dataset and results. Sensors 21, 14 (2021), 4769.
[35]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[36]
Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 2152--2161. https://proceedings.mlr.press/v37/romera-paredes15.html
[37]
Thiago Santini, Wolfgang Fuhl, and Enkelejda Kasneci. 2018. PuRe: Robust pupil detection for real-time pervasive eye tracking. Computer Vision and Image Understanding 170 (2018), 40--50.
[38]
Qiuhong Shen, Xingyi Yang, and Xinchao Wang. 2023. Anything-3d: Towards single-view anything reconstruction in the wild. arXiv preprint arXiv:2304.10261 (2023).
[39]
Bin Wang, Armstrong Aboah, Zheyuan Zhang, and Ulas Bagci. 2023a. Gazesam: What you see is what you segment. arXiv preprint arXiv:2304.13844 (2023).
[40]
Di Wang, Jing Zhang, Bo Du, Dacheng Tao, and Liangpei Zhang. 2023b. Scaling-up remote sensing segmentation dataset with segment anything model. arXiv preprint arXiv:2305.02034 (2023).
[41]
Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, and Feng Zheng. 2023. Track Anything: Segment Anything Meets Videos. arXiv:2304.11968 [cs.CV]
[42]
Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. 2023b. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419 (2023).
[43]
Dingfu Zhou, Jin Fang, Xibin Song, Chenye Guan, Junbo Yin, Yuchao Dai, and Ruigang Yang. 2019. Iou loss for 2d/3d object detection. In 2019 international conference on 3D vision (3DV). IEEE, 85--94.
[44]
Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. 2022. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[45]
Yukun Zhou, Mark A Chia, Siegfried K Wagner, Murat S Ayhan, Dominic J Williamson, Robbert R Struyven, Timing Liu, Moucheng Xu, Mateo G Lozano, Peter Woodward-Court, et al. 2023a. A foundation model for generalizable disease detection from retinal images. Nature (2023), 1--8.

Cited By

View all
  • (2024)Application and Evaluation of the AI-Powered Segment Anything Model (SAM) in Seafloor Mapping: A Case Study from Puck Lagoon, PolandRemote Sensing10.3390/rs1614263816:14(2638)Online publication date: 18-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Computer Graphics and Interactive Techniques
Proceedings of the ACM on Computer Graphics and Interactive Techniques  Volume 7, Issue 2
May 2024
101 pages
EISSN:2577-6193
DOI:10.1145/3665652
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024
Published in PACMCGIT Volume 7, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Eye-tracking
  2. Foundational models
  3. Prompt Engineering
  4. Segment Anything Model
  5. Segmentation
  6. Zero-shot learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)498
  • Downloads (Last 6 weeks)93
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Application and Evaluation of the AI-Powered Segment Anything Model (SAM) in Seafloor Mapping: A Case Study from Puck Lagoon, PolandRemote Sensing10.3390/rs1614263816:14(2638)Online publication date: 18-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media