[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3611314.3615902acmconferencesArticle/Chapter ViewAbstractPublication Pagesweb3dConference Proceedingsconference-collections
research-article
Open access

Increasing Web3D Accessibility with Audio Captioning

Published: 09 October 2023 Publication History

Abstract

Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user’s perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.

References

[1]
Dragan Ahmetovic, Federico Avanzini, Adriano Baratè, Cristian Bernareggi, Gabriele Galimberti, Luca A Ludovico, Sergio Mascetti, and Giorgio Presti. 2019. Sonification of rotation instructions to support navigation of people with visual impairment. In 2019 IEEE International Conference on Pervasive Computing and Communications (PerCom. IEEE, 1–10.
[2]
Jesse Anderson. 2022. How Can a Blind Person Use Virtual Reality. Retrieved February 21, 2023 from https://equalentry.com/how-can-a-blind-person-use-virtual-reality/
[3]
Nida Aziz, Tony Stockman, and Rebecca Stewart. 2019. An investigation into customisable automatically generated auditory route overviews for pre-navigation. Georgia Institute of Technology.
[4]
Johannes Behr, Peter Eschler, Yvonne Jung, and Michael Zöllner. 2009. X3DOM: A DOM-Based HTML5/X3D Integration Model. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1559764.1559784
[5]
Johannes Behr, Yvonne Jung, Timm Drevensek, and Andreas Aderhold. 2011. Dynamic and Interactive Aspects of X3DOM. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2010425.2010440
[6]
J. Behr, Y. Jung, J. Keil, T. Drevensek, M. Zoellner, P. Eschler, and D. Fellner. 2010. A Scalable Architecture for the HTML5/X3D Integration Model X3DOM. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1836049.1836077
[7]
Pramod Chundury, Biswaksen Patnaik, Yasmin Reyazuddin, Christine Tang, Jonathan Lazar, and Niklas Elmqvist. 2021. Towards understanding sensory substitution for accessible visualization: An interview study. IEEE transactions on visualization and computer graphics 28, 1 (2021), 1084–1094.
[8]
Angela Constantinescu, Karin Müller, Monica Haurilet, Vanessa Petrausch, and Rainer Stiefelhagen. 2020. Bring the environment to life: A sonification module for people with visual impairments to improve situation awareness. In Proceedings of the 2020 International Conference on Multimodal Interaction. 50–59.
[9]
Tilman Dingler, Jeffrey Lindsay, and Bruce N Walker. 2008. Learnabiltiy of sound cues for environmental features: Auditory icons, earcons, spearcons, and speech. International Community for Auditory Display.
[10]
Niklas Elmqvist. 2023. Visualization for the Blind. Retrieved February 18, 2023 from https://interactions.acm.org/archive/view/january-february-2023/visualization-for-the-blind
[11]
Mark Everingham, S. M. Eslami, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. 111, 1 (2015). https://doi.org/10.1007/s11263-014-0733-5
[12]
Mexhid Ferati, Steve Mannheimer, and Davide Bolchini. 2011. Usability evaluation of acoustic interfaces for the blind. In Proceedings of the 29th ACM international conference on Design of communication. 9–16.
[13]
Michele Geronazzo, Alberto Bedin, Luca Brayda, Claudio Campus, and Federico Avanzini. 2016. Interactive spatial sonification for non-visual exploration of virtual maps. International Journal of Human-Computer Studies 85 (2016), 4–15.
[14]
Sneha Gupta, Suchismita Chakraborti, R Yogitha, and G Mathivanan. 2022. Object Detection with Audio Comments using YOLO v3. In 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC). IEEE, 903–909.
[15]
Andreas Hub, Joachim Diepstraten, and Thomas Ertl. 2003. Design and development of an indoor navigation and object identification system for the blind. ACM Sigaccess Accessibility and Computing77-78 (2003), 147–152.
[16]
Brian FG Katz, Slim Kammoun, Gaëtan Parseihian, Olivier Gutierrez, Adrien Brilhault, Malika Auvray, Philippe Truillet, Michel Denis, Simon Thorpe, and Christophe Jouffrais. 2012. NAVIG: Augmented reality guidance system for the visually impaired: Combining object localization, GNSS, and spatial audio. Virtual Reality 16 (2012), 253–269.
[17]
Nitin Kumar and Anuj Jain. 2022. A Deep Learning Based Model to Assist Blind People in Their Navigation. Journal of Information Technology Education: Innovations in Practice 21 (2022), 095–114.
[18]
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network In Network. https://doi.org/10.48550/ARXIV.1312.4400
[19]
Mansi Mahendru and Sanjay Kumar Dubey. 2021. Real time object detection with audio feedback using Yolo vs. Yolo_v3. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 734–740.
[20]
Sergio Mascetti, Lorenzo Picinali, Andrea Gerino, Dragan Ahmetovic, and Cristian Bernareggi. 2016. Sonification of guidance data during road crossing for people with visual impairments or blindness. International Journal of Human-Computer Studies 85 (2016), 16–26.
[21]
Nicholas Polys, Jessica Hotter, Madison Lanier, Laura Purcell, Jordan Wolf, W. Cully Hession, Peter Sforza, and James D. Ivory. 2017. Finding Frogs: Using Game-Based Learning to Increase Environmental Awareness. In Proceedings of the 22nd International Conference on 3D Web Technology (Brisbane, Queensland, Australia) (Web3D ’17). Association for Computing Machinery, New York, NY, USA, Article 10, 8 pages. https://doi.org/10.1145/3055624.3075955
[22]
Nicholas Polys, Cecile Newcomb, Todd Schenk, Thomas Skuzinski, and Donna Dunay. 2018. The Value of 3D Models and Immersive Technology in Planning Urban Density. In Proceedings of the 23rd International ACM Conference on 3D Web Technology (Poznań, Poland) (Web3D ’18). Association for Computing Machinery, New York, NY, USA, Article 13, 4 pages. https://doi.org/10.1145/3208806.3208824
[23]
Nicholas F Polys, Kathleen Meaney, John Munsell, and Benjamin J Addlestone. 2021. X3D Field Trips for Remote Learning. In The 26th International Conference on 3D Web Technology. 1–7.
[24]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You Only Look Once: Unified, Real-Time Object Detection. https://doi.org/10.48550/ARXIV.1506.02640
[25]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge. https://doi.org/10.48550/ARXIV.1409.0575
[26]
Manaswi Saha, Alexander J Fiannaca, Melanie Kneisel, Edward Cutrell, and Meredith Ringel Morris. 2019. Closing the gap: Designing for the last-few-meters wayfinding problem for people with visual impairments. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 222–235.
[27]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going Deeper with Convolutions. https://doi.org/10.48550/ARXIV.1409.4842
[28]
Christoph Urbanietz, Gerald Enzner, Alexander Orth, Patrick Kwiatkowski, and Nils Pohl. 2019. A radar-based navigation assistance device with binaural sound interface for vision-impaired people. Georgia Institute of Technology.
[29]
Karen Vines, Chris Hughes, Laura Alexander, Carol Calvert, Chetz Colwell, Hilary Holmes, Claire Kotecki, Kaela Parks, and Victoria Pearson. 2019. Sonification of numerical data for education. Open Learning: The Journal of Open, Distance and e-Learning 34, 1 (2019), 19–39.
[30]
Bruce N Walker and Lisa M Mauney. 2010. Universal design of auditory graphs: A comparison of sonification mappings for visually impaired and sighted listeners. ACM Transactions on Accessible Computing (TACCESS) 2, 3 (2010), 1–16.
[31]
WHO. 2022. Vision Impairment and blindness. Retrieved February 18, 2023 from https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment
[32]
Limin Zeng, Markus Simros, and Gerhard Weber. 2017. Camera-based mobile electronic travel aids support for cognitive mapping of unknown spaces. In Proceedings of the 19th international conference on human-computer interaction with mobile devices and services. 1–10.
[33]
Minye Zhan, Rainer Goebel, and Beatrice de Gelder. 2018. Ventral and Dorsal Pathways Relate Differently to Visual Awareness of Body Postures under Continuous Flash Suppression. eNeuro 5, 1 (2018). https://doi.org/10.1523/ENEURO.0285-17.2017 arXiv:https://www.eneuro.org/content/5/1/ENEURO.0285-17.2017.full.pdf

Cited By

View all
  • (2024)Artificial Intelligence in Virtual Reality for Blind and Low Vision Individuals: Literature ReviewProceedings of the Human Factors and Ergonomics Society Annual Meeting10.1177/1071181324126683268:1(1333-1338)Online publication date: 9-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Web3D '23: Proceedings of the 28th International ACM Conference on 3D Web Technology
October 2023
244 pages
ISBN:9798400703249
DOI:10.1145/3611314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. YOLO
  2. narration
  3. neural network
  4. user study

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Web3D '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 27 of 71 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)230
  • Downloads (Last 6 weeks)31
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Artificial Intelligence in Virtual Reality for Blind and Low Vision Individuals: Literature ReviewProceedings of the Human Factors and Ergonomics Society Annual Meeting10.1177/1071181324126683268:1(1333-1338)Online publication date: 9-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media