[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3654777.3676384acmotherconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open access

SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XR

Published: 11 October 2024 Publication History

Abstract

We introduce SonoHaptics, an audio-haptic cursor for gaze-based 3D object selection. SonoHaptics addresses challenges around providing accurate visual feedback during gaze-based selection in Extended Reality (XR), e. g., lack of world-locked displays in no- or limited-display smart glasses and visual inconsistencies. To enable users to distinguish objects without visual feedback, SonoHaptics employs the concept of cross-modal correspondence in human perception to map visual features of objects (color, size, position, material) to audio-haptic properties (pitch, amplitude, direction, timbre). We contribute data-driven models for determining cross-modal mappings of visual features to audio and haptic features, and a computational approach to automatically generate audio-haptic feedback for objects in the user’s environment. SonoHaptics provides global feedback that is unique to each object in the scene, and local feedback to amplify differences between nearby objects. Our comparative evaluation shows that SonoHaptics enables accurate object identification and selection in a cluttered scene without visual feedback.

Supplemental Material

MP4 File
Video Figure

References

[1]
Michel Beaudouin-Lafon and William W Gaver. 1994. ENO: synthesizing structured sound spaces. In Proceedings of the 7th annual ACM symposium on User interface software and technology. 49–57.
[2]
Joanna Bergström, Tor-Salve Dalsgaard, Jason Alexander, and Kasper Hornbæk. 2021. How to evaluate object selection and manipulation in vr? guidelines from 20 years of studies. In proceedings of the 2021 CHI conference on human factors in computing systems. 1–20.
[3]
Jeffrey P Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, 2010. Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 333–342.
[4]
Meera M Blattner, Denise A Sumikawa, and Robert M Greenberg. 1989. Earcons and icons: Their structure and common design principles. Human–Computer Interaction 4, 1 (1989), 11–44.
[5]
Stephen A Brewster and Lorna M Brown. 2004. Tactons: structured tactile messages for non-visual information display. (2004).
[6]
Stephen A Brewster, Peter C Wright, and Alastair DN Edwards. 1994. A detailed investigation into the effectiveness of earcons. In Santa Fe Institute Studies in the Sciences of Complexity-Proceedings Volume-, Vol. 18. Addison-Wesley Publishing CO, 471–471.
[7]
Lorna M Brown, Stephen A Brewster, and Helen C Purchase. 2005. A first investigation into the effectiveness of tactons. In First joint eurohaptics conference and symposium on haptic interfaces for virtual environment and teleoperator systems. world haptics conference. IEEE, 167–176.
[8]
Sonny Chan, Chase Tymms, and Nicholas Colonnese. 2021. Hasti: Haptic and audio synthesis for texture interactions. In 2021 IEEE World Haptics Conference (WHC). IEEE, 733–738.
[9]
Pierre Chandon and Nailya Ordabayeva. 2009. Supersize in one dimension, downsize in three dimensions: Effects of spatial dimensionality on size perceptions and preferences. Journal of Marketing Research 46, 6 (2009), 739–753.
[10]
Jonathan Cohen. 1993. “Kirk here:” using genre sounds to monitor background activity. In INTERACT’93 and CHI’93 Conference Companion on Human Factors in Computing Systems. 63–64.
[11]
Jonathan Cohen. 1994. Monitoring background activities. In Santa Fe Institute Studies in the Sciences of Complexity-Proceedings Volume-, Vol. 18. Addison-Wesley Publishing CO, 499–499.
[12]
International Color Consortium. 2004. Specification ICC.1:2004-10. Image technology colour management - Architecture, profile format, and data structure (2004).
[13]
Adrian A De Freitas, Michael Nebeling, Xiang’Anthony’ Chen, Junrui Yang, Akshaye Shreenithi Kirupa Karthikeyan Ranithangam, and Anind K Dey. 2016. Snap-to-it: A user-inspired platform for opportunistic device interactions. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 5909–5920.
[14]
William W Gaver. 1989. The SonicFinder: An interface that uses auditory icons. Human–Computer Interaction 4, 1 (1989), 67–94.
[15]
William W Gaver. 1991. Sound support for collaboration. In Proceedings of the Second European Conference on Computer-Supported Cooperative Work ECSCW’91. Springer, 293–308.
[16]
William W Gaver. 1993. Synthesizing auditory icons. In Proceedings of the INTERACT’93 and CHI’93 conference on Human factors in computing systems. 228–235.
[17]
Taesik Gong, Hyunsung Cho, Bowon Lee, and Sung-Ju Lee. 2019. Knocker: Vibroacoustic-based object recognition with smartphones. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 3, 3 (2019), 1–21.
[18]
Anhong Guo, Xiang’Anthony’ Chen, Haoran Qi, Samuel White, Suman Ghosh, Chieko Asakawa, and Jeffrey P Bigham. 2016. Vizlens: A robust and interactive screen reader for interfaces in the real world. In Proceedings of the 29th annual symposium on user interface software and technology. 651–664.
[19]
Anhong Guo, Jeeeun Kim, Xiang’Anthony’ Chen, Tom Yeh, Scott E Hudson, Jennifer Mankoff, and Jeffrey P Bigham. 2017. Facade: Auto-generating tactile interfaces to appliances. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 5826–5838.
[20]
Anhong Guo, Saige McVea, Xu Wang, Patrick Clary, Ken Goldman, Yang Li, Yu Zhong, and Jeffrey P Bigham. 2018. Investigating cursor-based interactions to support non-visual exploration in the real world. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 3–14.
[21]
Aakar Gupta, Naveen Sendhilnathan, Jess Hartcher-O’Brien, Evan Pezent, Hrvoje Benko, and Tanya R Jonker. 2023. Investigating Eyes-away Mid-air Typing in Virtual Reality using Squeeze haptics-based Postural Reinforcement. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–11.
[22]
Pia Hauck and Heiko Hecht. 2019. The louder, the longer: object length perception is influenced by loudness, but not by pitch. Vision 3, 4 (2019), 57.
[23]
Haikun Huang, Michael Solah, Dingzeyu Li, and Lap-Fai Yu. 2019. Audible panorama: Automatic spatial audio generation for panorama imagery. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–11.
[24]
Timothy L Hubbard. 1996. Synesthesia-like mappings of lightness, pitch, and melodic interval. The American journal of psychology (1996), 219–238.
[25]
Robert JK Jacob. 1990. What you look at is what you get: eye movement-based interaction techniques. In Proceedings of the SIGCHI conference on Human factors in computing systems. 11–18.
[26]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
[27]
Joseph J LaViola Jr, Ernst Kruijff, Ryan P McMahan, Doug Bowman, and Ivan P Poupyrev. 2017. 3D user interfaces: theory and practice. Addison-Wesley Professional.
[28]
Zhipeng Li, Yikai Cui, Tianze Zhou, Yu Jiang, Yuntao Wang, Yukang Yan, Michael Nebeling, and Yuanchun Shi. 2022. Color-to-Depth Mappings as Depth Cues in Virtual Reality. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–14.
[29]
Anan Lin, Meike Scheller, Feng Feng, Michael J Proulx, and Oussama Metatla. 2021. Feeling colours: crossmodal correspondences between tangible 3D objects, colours and emotions. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–12.
[30]
Scott D Lipscomb and Eugene M Kim. 2004. Perceived match between visual parameters and auditory correlates: an experimental multimedia investigation. In Proceedings of the 8th International Conference on Music Perception and Cognition. Citeseer, 72–75.
[31]
Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, and Mark D Plumbley. 2023. Audioldm: Text-to-audio generation with latent diffusion models. arXiv preprint arXiv:2301.12503 (2023).
[32]
Li Liu, Jie Chen, Paul Fieguth, Guoying Zhao, Rama Chellappa, and Matti Pietikäinen. 2019. From BoW to CNN: Two decades of texture representation for texture classification. International Journal of Computer Vision 127 (2019), 74–109.
[33]
Pedro Lopes, Patrik Jonell, and Patrick Baudisch. 2015. Affordance++: Allowing Objects to Communicate Dynamic Use. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 2515–2524. https://doi.org/10.1145/2702123.2702128
[34]
Karon MacLean and Mario Enriquez. 2003. Perceptual design of haptic icons. In Proc. of EuroHaptics. 351–363.
[35]
Päivi Majaranta, Poika Isokoski, Jussi Rantala, Oleg Špakov, Deepak Akkil, Jari Kangas, and Roope Raisamo. 2016. Haptic feedback in eye typing. Journal of Eye Movement Research 9, 1 (2016).
[36]
Lawrence E Marks. 1974. On associations of light and sound: The mediation of brightness, pitch, and loudness. The American journal of psychology (1974), 173–188.
[37]
Sergio Mascetti, Andrea Gerino, Cristian Bernareggi, Silvia D’Acquisto, Mattia Ducci, and James M Coughlan. 2017. JustPoint: Identifying Colors with a Natural User Interface. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. 329–330.
[38]
Oussama Metatla, Nuno N Correia, Fiore Martin, Nick Bryan-Kinns, and Tony Stockman. 2016. Tap the ShapeTones: Exploring the effects of crossmodal congruence in an audio-visual interface. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 1055–1066.
[39]
Fiona Bríd Mulvey, Marek Mikitovic, Mateusz Sadowski, Baosheng Hou, Nils David Rasamoel, John Paulin Paulin Hansen, and Per Bækgaard. 2021. Gaze interactive and attention aware low vision aids as future smart glasses. In ACM Symposium on Eye Tracking Research and Applications. 1–4.
[40]
Yun Suen Pai, Tilman Dingler, and Kai Kunze. 2019. Assessing hands-free interactions for VR using eye gaze and electromyography. Virtual Reality 23 (2019), 119–131.
[41]
Jerome Pasquero, Joseph Luk, Shannon Little, and Karon MacLean. 2006. Perceptual analysis of haptic icons: an investigation into the validity of cluster sorted mds. In 2006 14th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. IEEE, 437–444.
[42]
Roy D Patterson, David RR Smith, Ralph van Dinther, and Thomas C Walters. 2008. Size information in the production and perception of communication sounds. In Auditory perception of sound sources. Springer, 43–75.
[43]
Ken Pfeuffer, Benedikt Mayer, Diako Mardanbegi, and Hans Gellersen. 2017. Gaze+ pinch interaction in virtual reality. In Proceedings of the 5th symposium on spatial user interaction. 99–108.
[44]
Vilayanur S Ramachandran and Edward M Hubbard. 2001. Synaesthesia–a window into perception, thought and language. Journal of consciousness studies 8, 12 (2001), 3–34.
[45]
Jussi Rantala, Päivi Majaranta, Jari Kangas, Poika Isokoski, Deepak Akkil, Oleg Špakov, and Roope Raisamo. 2020. Gaze interaction with vibrotactile feedback: Review and design guidelines. Human–Computer Interaction 35, 1 (2020), 1–39.
[46]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47]
Taebeum Ryu, Jaehyun Park, and Olga Vl Bitkina. 2022. Effect on Perceived Weight of Object Shapes. International Journal of Environmental Research and Public Health 19, 16 (2022), 9877.
[48]
Eldon Schoop, James Smith, and Bjoern Hartmann. 2018. Hindsight: enhancing spatial awareness by sonifying detected objects in real-time 360-degree video. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.
[49]
I Scott MacKenzie and Behrooz Ashtiani. 2011. BlinkWrite: efficient text entry using eye blinks. Universal Access in the Information Society 10 (2011), 69–80.
[50]
Ludwig Sidenmark, Christopher Clarke, Xuesong Zhang, Jenny Phu, and Hans Gellersen. 2020. Outline pursuits: Gaze-assisted selection of occluded objects in virtual reality. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13.
[51]
Ludwig Sidenmark and Hans Gellersen. 2019. Eye&head: Synergetic eye and head movement for gaze pointing and selection. In Proceedings of the 32nd annual ACM symposium on user interface software and technology. 1161–1174.
[52]
Charles Spence. 2011. Crossmodal correspondences: A tutorial review. Attention, Perception, & Psychophysics 73 (2011), 971–995.
[53]
Nelson Daniel Troncoso Aldas, Sooyeon Lee, Chonghan Lee, Mary Beth Rosson, John M Carroll, and Vijaykrishnan Narayanan. 2020. AIGuide: An augmented reality hand guidance application for people with visual impairments. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–13.
[54]
Jan BF Van Erp, HAHC Van Veen, 2001. Vibro-tactile information presentation in automobiles. In Proceedings of eurohaptics, Vol. 2001. Eurohaptics Society Paris, France, 99–104.
[55]
Shu Wei, Desmond Bloemers, and Aitor Rovira. 2023. A preliminary study of the eye tracker in the meta quest pro. In Proceedings of the 2023 ACM International Conference on Interactive Media Experiences. 216–221.
[56]
Gyeore Yun, Hyoseung Lee, Sangyoon Han, and Seungmoon Choi. 2021. Improving viewing experiences of first-person shooter gameplays with automatically-generated motion effects. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.
[57]
Gyeore Yun, Minjae Mun, Jungeun Lee, Dong-Geun Kim, Hong Z Tan, and Seungmoon Choi. 2023. Generating Real-Time, Selective, and Multimodal Haptic Effects from Sound for Gaming Experience Enhancement. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
UIST '24: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology
October 2024
2334 pages
ISBN:9798400706288
DOI:10.1145/3654777
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2024

Check for updates

Author Tags

  1. Computational Interaction
  2. Extended Reality
  3. Gaze-based Selection
  4. Haptics
  5. Multimodal Feedback
  6. Sonification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

UIST '24

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 366
    Total Downloads
  • Downloads (Last 12 months)366
  • Downloads (Last 6 weeks)177
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media