More Web Proxy on the site http://driver.im/

research-article

Voiceye: A Multimodal Inclusive Development Environment

Authors:

Bharat Paudyal,

Maite Frutos-Pascual,

Ian WilliamsAuthors Info & Claims

DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference

Pages 21 - 33

https://doi.org/10.1145/3357236.3395553

Published: 03 July 2020 Publication History

Abstract

People with physical impairments who are unable to use traditional input devices (i.e. mouse and keyboard) are often excluded from technical professions (e.g. web development). Alternative input methods such as eye gaze tracking and speech recognition have become more readily available in recent years with both being explored independently to support people with physical impairments in coding activities. This paper describes a novel multimodal application ("Voiceye") that combines voice input, gaze interaction, and mechanical switches as an alternative approach for writing code. The system was evaluated with non-disabled participants who have coding experience (N=29) to assess the feasibility of the application in writing HTML and CSS code. Results found that Voiceye was perceived positively and enabled successful completion of coding tasks. A follow-up study with disabled participants (N=5) demonstrated that this method of multimodal interaction can support people with physical impairments in writing and editing code.

References

[1]

Advanced Voice-Control, speech to code, program by voice, stop RSI. Retrieved April 11, 2019 from http://voicecode.io

[2]

Atom. Retrieved January 29, 2020 from https://atom.io

[3]

Mubbashir Ayub and Muhammad Asjad Saleem. 2012. A speech recognition based approach for development in C++. IJCSNS 12, 10: 110--114

[4]

Imran Bajwa, Waqar Aslam Sarwar, and Irfan Hyder Syed. 2006. Speech Language Engineering System for Automatic Generation of Web based User Forms. In Proceedings of the International Conference on ManMachine Systems (ICOMMS 2006)

[5]

Aaron Bangor, Philip Kortum, and James Miller. 2009. Determining what individual SUS scores mean: Adding an adjective rating scale. Journal of Usability Studies 4, 3: 114--123.

Digital Library

[6]

Richard Bates and Istance Howell. 2002. Zooming interfaces!: enhancing the performance of eye controlled pointing devices. In Proceedings of the fifth international ACM conference on Assistive technologies, 119--126.

Digital Library

[7]

Tanya René Beelders. 2011. Enhancing the user experience for a word processor application through vision and voice. Ph.D Dissertation. University of the Free State.

[8]

Tanya René Beelders and Pieter J. Blignaut. 2010. Using vision and voice to create a multimodal interface for Microsoft Word 2007. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, 173--176.

[9]

Andrew Begel and Susan L.Graham. 2006. An assessment of a speech-based programming environment. In Proceedings IEEE Symposium on Visual Languages and Human-Centric Computing, 116--120.

[10]

Andrew Begel and Susan L.Graham. 2005. Spoken programs. In Proceedings IEEE Symposium on Visual Languages and Human-Centric Computing, 99--106.

[11]

Andrew Begel. 2005. Programming by voice: A domain-specific application of speech recognition. In AVIOS speech technology symposium--SpeechTek West.

[12]

Brackets - A modern, open source code editor that understands web design. Retrieved January 29, 2020 from http://brackets.io

[13]

Teresa Busjahn, Bednarik Roman, Andrew Begel, Martha Crosby, James H. Paterson, Carsten Schulte, Bonita Sharif, and Sascha Tamm. 2015. Eye Movements in Code Reading: Relaxing the Linear Order. In 2015 IEEE 23rd International Conference on Program Comprehension, 255--265.

Digital Library

[14]

Emiliano Castellina, Fulvio Corno, and Paolo Pellegrino. 2008. Integrated speech and gaze control for realistic desktop environments. In Proceedings of the 2008 symposium on Eye tracking research & applications, 79--82.

Digital Library

[15]

Central London RSI Support Group - Home| Facebook. Retrieved on January 30, 2020 from https://www.facebook.com/CentralLondonRsiSupportG roup

[16]

Hashmeet Chadha, Satyam Mhatre, Unnati Ganatra, and Sujata Pathak. 2018. HTML Voice. In 2018 Fourth International Conference on Computing Communication Control and Automation, 1--4.

[17]

Tuhin Chakraborty, Sayan Sarcar, and Debasis Samanta. 2014. Design and evaluation of a dwell-free eye typing technique. In Proceedings of the extended abstracts of the 32nd annual ACM conference on Human factors in computing systems, 1573--1578.

Digital Library

[18]

CodeMirror. Retrieved January 30, 2020 from https://codemirror.net

[19]

Chris Creed. 2018. Assistive technology for disabled visual artists: exploring the impact of digital technologies on artistic practice. Disability and Society 33, 7: 1103--1119.

[20]

Chris Creed, Maite Frutos-Pascual, and Ian Williams. 2020. Multimodal Gaze Interaction for Creative Design. In Proceedings of the ACM conference on Human factors in computing systems (CHI).

Digital Library

[21]

Chris Creed. 2016. Eye Gaze Interaction for Supporting Creative Work with Disabled Artists. In Proceedings of the 30th International BCS Human Computer Interaction Conference, 1--3.

[22]

Alain Desilets. 2001. VoiceGrip: A tool for programming-by-voice. In International Journal of Speech Technology, 103--116.

[23]

Alain Désilets, David C. Fox, and Stuart Norton. 2006. Voice Code: An Innovative Speech Interface for Programming-by-voice. In CHI '06 extended abstracts on Human factors in computing systems, 239--242.

[24]

Electron - Build cross platform desktop apps with JavaScript, HTML, and CSS. Retrieved January 29, 2020 from https://www.electronjs.org

[25]

Monika Elepfandt and Martin Grund. 2012. Move it there, or not?: The design of voice commands for gaze with speech. In Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction,13.

Digital Library

[26]

Emmet - the essential toolkit for web-developers. Retrieved September 13, 2019 from https://docs.emmet.io/

[27]

Hartmut Glücker, Felix Raab, Florian Echtler, and Christian Wolff. 2014. EyeDE: Gaze-enhanced software development environments. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, 1555--1560.

Digital Library

[28]

Benjamin M. Gordon and George F. Luger. 2012. English for spoken programming. In 6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, 16--20.

[29]

Benjamin M. Gordon and George F. Luger. 2012. Progress in Spoken Programming. In 8th Student Conference, 1--5.

[30]

Dilek Hakkani-Tür, Malcolm Slaney, Asli Celikyilmaz, and Larry Heck. 2014. Eye gaze for spoken language understanding in multi-modal conversational interactions. In Proceedings of the 2014 International Conference on Multimodal Interaction, 263--266.

Digital Library

[31]

Henna Heikkilä. 2013. EyeSketch: A drawing application for gaze control. In Proceedings of the 2013 Conference on Eye Tracking South Africa, 71--74.

Digital Library

[32]

Anthony J. Hornof and Anna Cavender. 2005. EyeDraw. In Proceedings of the SIGCHI conference on Human factors in computing systems, 161--170.

[33]

Thomas J. Hubbell, David D. Langan, and Thomas F. Hain. 2006. A voice-activated syntax-directed editor for manually disabled programmers. In Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility, 205--212.

[34]

Robert Jacob and Keith Karn. 2003. Eye tracking in human-computer interaction and usability research: Ready to deliver the promises. In The Mind's Eye, 573--605.

[35]

Yvonne Kammerer, Katharina Scheiter, and Wolfgang Beinhauer. 2008. Looking my way through the menu. In Proceedings of the 2008 Symposium on Eye Tracking Research & Applications, 213--220.

Digital Library

[36]

Jan van der Kamp and Veronica Sundstedt. 2011. Gaze and voice controlled drawing. In Proceedings of the 1st Conference on Novel Gaze-Controlled Applications, 1--8.

Digital Library

[37]

Per Ola Kristensson and Keith Vertanen. 2012. The potential of dwell-free eye-typing for fast assistive gaze communication. In Proceedings of the Symposium on Eye Tracking Research and Applications, 241--244.

Digital Library

[38]

Chandan Kumar, Raphael Menges, Daniel Müller, and Steffen Staab. 2017. Chromium based framework to include gaze interaction in web browser. In Proceedings of the 26th International World Wide Web Conference, 219--223.

Digital Library

[39]

Thomas D. LaToza, Gina Venolia, and Robert DeLine. 2006. Maintaining Mental Models: A Study of Developer Work Habits. In Proceedings of the 28th international conference on Software engineering, 492--501.

Digital Library

[40]

Päivi Majaranta, Ulla-Kaija Ahola, and Oleg pakov. 2009. Fast gaze typing with an adjustable dwell time. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 357--360.

Digital Library

[41]

Rinor S. Maloku and Besart Xh Pllana. 2016. HyperCode: Voice aided programming. IFAC Papers On Line, 263--268.

[42]

Darius Miniotas, Oleg Spakov, Ivan Tugoy, and I. Scott MacKenzie. 2006. Speech-augmented eye gaze interaction with small closely spaced targets. In Proceedings of the 2006 Symposium on Eye Tracking Research & Applications, 67--72.

Digital Library

[43]

Sahil Modak, Sagar Vikmani, Suril Shah, and Lakshmi Kurup. 2016. Voice driven dynamic generation of webpages. In 2016 International Conference on Computing Communication Control and automation, 1--4.

[44]

Martez E. Mott, Shane Williams, Jacob O. Wobbrock, and Meredith Ringel Morris. 2017. Improving dwell-based gaze typing with dynamic, cascading dwell times. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 2558--2570.

Digital Library

[45]

Thomas H. Park, Ankur Saxena, Swathi Jagannath, Susan Wiedenbeck, and Andrea Forte. 2013. Towards a Taxonomy of Errors in HTML and CSS. In Proceedings of the ninth annual international ACM conference on International computing education research, 75--82.

Digital Library

[46]

Rakesh Patel and Mili Patel. 2014. Hands free JAVA (Through Speech Recognition).

[47]

Marco Porta and Alessia Ravelli. 2009. WeyeB, an eye-controlled web browser for hands-free navigation. In 2009 2nd Conference on Human System Interactions, 210--215.

[48]

David Price, Ellen Rilofff, Joseph Zachary, and Brandon Harvey. 2004. NaturalJava. In Proceedings of the 5th International Conference on Intelligent User Interfaces, 207--211.

[49]

Tavis Rudd. 2013. Using Python to Code by Voice. Retrieved September 17, 2019 from https://pyvideo.org/pycon-us-2013/using-python-tocode-by-voice.html

[50]

Stevche Radevski, Hideaki Hata, and Kenichi Matsumoto. 2016. EyeNav. In Proceedings of the 9th Nordic Conference on Human-Computer Interaction, 1--4.

[51]

Jean K. Rodriguez-Cartagena, Andrea C. ClaudioPalacios, Natalia Pacheco-Tallaj, Valerie Santiago González, and Patricia Ordonez-Franco. 2016. The Implementation of a Vocabulary and Grammar for an Open-Source Speech-Recognition Programming Platform. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 447--448.

[52]

Lucas Rosenblatt, Patrick Carrington, Kotaro Hara, and Jeffrey P. Bigham. 2018. Vocal Programming for People with Upper-Body Motor Impairments. In Proceedings of the Internet of Accessible Things, 110.

[53]

David Rozado, Alexander McNeill, and Daniel Mazur. 2016. VoxVisio--Combining Gaze and Speech for Accessible HCI. Resna 2016.

[54]

Susana Rubio, Eva Díaz, Jesús Martín, and José M. Puente. 2004. Evaluation of Subjective Mental Workload: A Comparison of SWAT, NASA-TLX, and Workload Profile Methods. Applied Psychology 53, 1: 61--86.

[55]

Korok Sengupta, Min Ke, Raphael Menges, Chandan Kumar, and Steffen Staab. 2018. Handsfree web browsing. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, 1--3.

[56]

Asma Shakil, Christof Lutteroth, and Gerald Weber. 2019. Code Gazer: Making Code Navigation Easy and Natural with Gaze Input. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1--12.

Digital Library

[57]

Henrik Skovsgaard, Julio C. Mateo, John M. Flach, and John Paulin Hansen. 2010. Small-target selection with gaze alone. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, 145--148.

Digital Library

[58]

Lindsey Snell and Mr Jim Cunningham. 2000.An investigation into programming by voice and development of a toolkit for writing voice controlled applications. M. Eng. Report, Imperial College of Science, Technology and Medicine, London.

[59]

Speech to Text API | Microsoft Azure. Retrieved January 30, 2020 from https://azure.microsoft.com/engb/services/cognitive-services/speech-to-text

[60]

Talon 0.0.7.7 documentation. retrieved April 19, 2019 from https://talonvoice.com/docs/index.html#documentindex

[61]

The Eye Tribe. Retrieved January 29, 2020 from https://theeyetribe.com/theeyetribe.com/about/index .html

[62]

Visual Studio Code - Code Editing. Redefined. Retrieved January 29, 2020 from https://code.visualstudio.com.

[63]

Tobii Gaming | Eye Tracker 4C for PC Gaming. Buy Now at ?169. Retrieved January 29, 2020 from https://gaming.tobii.com/tobii-eye-tracker-4c

[64]

Amber Wagner and Jeff Gray. 2015. An Empirical Evaluation of a Vocal User Interface for Programming by Voice. In International Journal of Information Technologies and Systems Approach, 47--63.

[65]

Qiaohui Zhang, Atsumi Imamiya, X. Mao, and K. Go. 2004. A gaze and speech multimodal interface. In 24th International Conference on Distributed Computing Systems Workshops, 2004. Proceedings, 208--213.

Cited By

Santos ACancelinha ABatista FDig DBryksin TGolubev YBezzubov A(2024)Jasay: Towards Voice Commands in Projectional EditorsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648449(30-34)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648449
Lei YHe SKhamis MYe J(2023)An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile DevicesACM Computing Surveys10.1145/360694756:2(1-38)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3606947
Ramos PZapata MValencia KVargas VRamos-Galarza C(2022)Low-Cost Human–Machine Interface for Computer Control with Facial Landmark Detection and Voice CommandsSensors10.3390/s2223927922:23(9279)Online publication date: 29-Nov-2022
https://doi.org/10.3390/s22239279
Show More Cited By

Index Terms

Voiceye: A Multimodal Inclusive Development Environment
1. Human-centered computing
  1. Interaction design
    1. Systems and tools for interaction design

Recommendations

Inclusive Multimodal Voice Interaction for Code Navigation
ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction

Navigation of source code typically requires extensive use of a traditional mouse and keyboard which can present significant barriers for developers with physical impairments. We present research exploring how commonly used code navigation approaches (...
Vocal Programming for People with Upper-Body Motor Impairments
W4A '18: Proceedings of the 15th International Web for All Conference

Programming heavily relies on entering text using traditional QWERTY keyboards, which poses challenges for people with limited upper-body movement. Developing tools using a publicly available speech recognition API could provide a basis for keyboard ...
Eye Gaze Controlled Robotic Arm for Persons with Severe Speech and Motor Impairment
ETRA '20 Full Papers: ACM Symposium on Eye Tracking Research and Applications

Recent advancements in the field of robotics offers new promises for people with different range of abilities although making a human robot interface for people with severe disabilities is challenging. This paper describes the design and development of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference

July 2020

2264 pages

ISBN:9781450369749

DOI:10.1145/3357236

General Chairs:
Ron Wakkary
Simon Fraser University, CA and Eindhoven University of Technology, NL
,
Kristina Andersen
Eindhoven University of Technology, NL
,
Program Chairs:
Will Odom
Simon Fraser University, CA
,
Audrey Desjardins
University of Washington, USA
,
Marianne Graves Petersen
Aarhus University, DK

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DIS '20

Sponsor:

SIGCHI

DIS '20: Designing Interactive Systems Conference 2020

July 6 - 10, 2020

Eindhoven, Netherlands

Acceptance Rates

Overall Acceptance Rate 1,158 of 4,684 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
443
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Santos ACancelinha ABatista FDig DBryksin TGolubev YBezzubov A(2024)Jasay: Towards Voice Commands in Projectional EditorsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648449(30-34)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648449
Lei YHe SKhamis MYe J(2023)An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile DevicesACM Computing Surveys10.1145/360694756:2(1-38)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3606947
Ramos PZapata MValencia KVargas VRamos-Galarza C(2022)Low-Cost Human–Machine Interface for Computer Control with Facial Landmark Detection and Voice CommandsSensors10.3390/s2223927922:23(9279)Online publication date: 29-Nov-2022
https://doi.org/10.3390/s22239279
Paudyal BCreed CWilliams IFrutos-Pascual M(2022)Inclusive Multimodal Voice Interaction for Code NavigationProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3556600(509-519)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3536221.3556600
Onishi RMorisaki TSuzuki SMizutani SKamigaki TFujiwara MMakino YShinoda H(2022)GazeBreath: Input Method Using Gaze Pointing and Breath SelectionProceedings of the Augmented Humans International Conference 202210.1145/3519391.3519405(1-9)Online publication date: 13-Mar-2022
https://dl.acm.org/doi/10.1145/3519391.3519405
Grazioso MPodda ABarra SCutugno F(2021)Natural Interaction with Traffic Control Cameras Through Multimodal InterfacesArtificial Intelligence in HCI10.1007/978-3-030-77772-2_33(501-515)Online publication date: 3-Jul-2021
https://doi.org/10.1007/978-3-030-77772-2_33

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents