[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3411764.3445430acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Acceptability of Speech and Silent Speech Input Methods in Private and Public

Published: 07 May 2021 Publication History

Abstract

Silent speech input converts non-acoustic features like tongue and lip movements into text. It has been demonstrated as a promising input method on mobile devices and has been explored for a variety of audiences and contexts where the acoustic signal is unavailable (e.g., people with speech disorders) or unreliable (e.g., noisy environment). Though the method shows promise, very little is known about peoples’ perceptions regarding using it. In this work, first, we conduct two user studies to explore users’ attitudes towards the method with a particular focus on social acceptance and error tolerance. Results show that people perceive silent speech as more socially acceptable than speech input and are willing to tolerate more errors with it to uphold privacy and security. We then conduct a third study to identify a suitable method for providing real-time feedback on silent speech input. Results show users find an abstract feedback method effective and significantly more private and secure than a commonly used video feedback method.

Supplementary Material

MP4 File (3411764.3445430_videopreview.mp4)
Preview video
MP4 File (3411764.3445430_videofigure.mp4)
Supplemental video

References

[1]
Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. 2014. Convolutional Neural Networks for Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 10 (Oct. 2014), 1533–1545. https://doi.org/10.1109/TASLP.2014.2339736 Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[2]
Triantafyllos Afouras, Joon Son Chung, and Andrew Zisserman. 2018. Deep Lip Reading: A Comparison of Models and an Online Application. arXiv:1806.06053 [cs] (June 2018). http://arxiv.org/abs/1806.06053 arXiv:1806.06053.
[3]
David Ahlström, Khalad Hasan, and Pourang Irani. 2014. Are You Comfortable Doing That? Acceptance Studies of around-Device Gestures in and for Public Settings. In Proceedings of the 16th International Conference on Human-Computer Interaction with Mobile Devices & Services (Toronto, ON, Canada) (MobileHCI ’14). Association for Computing Machinery, New York, NY, USA, 193–202. https://doi.org/10.1145/2628363.2628381
[4]
Fouad Alallah, Ali Neshati, Yumiko Sakamoto, Khalad Hasan, Edward Lank, Andrea Bunt, and Pourang Irani. 2018. Performer vs. Observer: Whose Comfort Level Should We Consider When Examining the Social Acceptability of Input Modalities for Head-Worn Display?. In Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology (Tokyo, Japan) (VRST ’18). Association for Computing Machinery, New York, NY, USA, Article 10, 9 pages. https://doi.org/10.1145/3281505.3281541
[5]
Fouad Alallah, Ali Neshati, Nima Sheibani, Yumiko Sakamoto, Andrea Bunt, Pourang Irani, and Khalad Hasan. 2018. Crowdsourcing vs Laboratory-Style Social Acceptability Studies? Examining the Social Acceptability of Spatial User Interactions for Head-Worn Displays. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3173574.3173884
[6]
Ohoud Alharbi, Ahmed Sabbir Arif, Wolfgang Stuerzlinger, Mark D. Dunlop, and Andreas Komninos. 2019. WiseType: A Tablet Keyboard with Color-Coded Visualization and Various Editing Options for Error Correction. In Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019 (Kingston, Canada) (GI’19). Canadian Human-Computer Communications Society, Waterloo, CAN, Article 4, 10 pages. https://doi.org/10.20380/GI2019.04
[7]
Cyril Allauzen and Michael Riley. 2011. Bayesian Language Model Interpolation for Mobile Speech Input. In Interspeech 2011. 1429–1432.
[8]
Ibrahim Almajai, Stephen Cox, Richard Harvey, and Yuxuan Lan. 2016. Improved Speaker Independent Lip Reading Using Speaker Adaptive Training and Deep Neural Networks. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2722–2726. https://doi.org/10.1109/ICASSP.2016.7472172 ISSN: 2379-190X.
[9]
Ahmed Sabbir Arif and Wolfgang Stuerzlinger. 2010. Predicting the Cost of Error Correction in Character-Based Text Entry Technologies. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 5–14. https://doi.org/10.1145/1753326.1753329
[10]
Ahmed Sabbir Arif and Wolfgang Stuerzlinger. 2010. Predicting the Cost of Error Correction in Character-Based Text Entry Technologies. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’10). ACM, New York, NY, USA, 5–14. https://doi.org/10.1145/1753326.1753329
[11]
Ahmed Sabbir Arif and Wolfgang Stuerzlinger. 2014. User Adaptation to a Faulty Unistroke-Based Text Entry Technique by Switching to an Alternative Gesture Set. In Proceedings of Graphics Interface 2014 (Montreal, Quebec, Canada) (GI ’14). Canadian Information Processing Society, CAN, 183–192.
[12]
Ahmed Sabbir Arif and Wolfgang Stuerzlinger. 2014. User Adaptation to a Faulty Unistroke-Based Text Entry Technique by Switching to an Alternative Gesture Set. In Proceedings of Graphics Interface 2014(GI ’14). Canadian Information Processing Society, Toronto, Ont., Canada, Canada, 183–192. http://dl.acm.org/citation.cfm?id=2619648.2619679
[13]
Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. 2016. LipNet: End-to-End Sentence-level Lipreading. arXiv:1611.01599 [cs] (Dec. 2016). http://arxiv.org/abs/1611.01599 arXiv:1611.01599.
[14]
Mauro Avila Soto and Markus Funk. [n.d.]. Look, a guidance drone! Assessing the Social Acceptability of Companion Drones for Blind Travelers in Public Spaces. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA, 2018-10-08) (ASSETS ’18). Association for Computing Machinery, 417–419. https://doi.org/10.1145/3234695.3241019
[15]
Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. 2016. End-to-End Attention-Based Large Vocabulary Speech Recognition. arXiv:1508.04395 [cs] (March 2016). http://arxiv.org/abs/1508.04395 arXiv:1508.04395.
[16]
Monique Faye Baier and Michael Burmester. 2019. Not Just About the User: Acceptance of Speech Interaction in Public Spaces. In Proceedings of Mensch Und Computer 2019 (Hamburg, Germany) (MuC’19). Association for Computing Machinery, New York, NY, USA, 349–359. https://doi.org/10.1145/3340764.3340801
[17]
Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, and Johan Schalkwyk. 2010. On-Demand Language Model Interpolation for Mobile Speech Input. In Interspeech. 1812–1815.
[18]
Jess Bartels, D. Andreasen, P. Ehirim, Hui Mao, and P. Kennedy. 2008. Neurotrophic Electrode: Method of Assembly and Implantation into Human Motor Speech Cortex. Journal of Neuroscience Methods(2008). https://doi.org/10.1016/j.jneumeth.2008.06.030
[19]
Helen L. Bear and Richard Harvey. 2019. Alternative Visual Units for an Optimized Phoneme-Based Lipreading System. 18 (2019), 3870. https://doi.org/10.3390/app9183870
[20]
Tara S. Behrend, David J. Sharek, Adam W. Meade, and Eric N. Wiebe. 2011. The Viability of Crowdsourcing for Survey Research. Behavior Research Methods 43, 3 (March 2011), 800. https://doi.org/10.3758/s13428-011-0081-0
[21]
Godfred O Boateng, Torsten B Neilands, Edward A Frongillo, Hugo R Melgar-Quiñonez, and Sera L Young. 2018. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Frontiers in public health 6 (2018), 149.
[22]
John Brooke. 1996. SUS: A Quick and Dirty Usability Scale. Usability evaluation in industry(1996), 189.
[23]
Christopher Ralph Brown. 2008. Automatic Pruning of Grammars in a Multi-Application Speech Recognition Interface. https://patents.google.com/patent/US20080059195/en
[24]
Jonathan S. Brumberg, Alfonso Nieto-Castanon, Philip R. Kennedy, and Frank H. Guenther. 2010. Brain-Computer Interfaces for Speech Communication. Speech Communication 52, 4 (April 2010), 367–379. https://doi.org/10.1016/j.specom.2010.01.001
[25]
Joon Son Chung and Andrew Zisserman. 2016. Out of Time: Automated Lip Sync in the Wild. In ACCV Workshops. https://doi.org/10.1007/978-3-319-54427-4_19
[26]
Joon Son Chung and Andrew Zisserman. 2017. Lip Reading in Profile. In BMVC. https://doi.org/10.5244/C.31.155
[27]
Joon Son Chung and Andrew Zisserman. 2017. Lip Reading in the Wild. In Computer Vision – ACCV 2016(Lecture Notes in Computer Science), Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato(Eds.). Springer International Publishing, Cham, 87–103. https://doi.org/10.1007/978-3-319-54184-6_6
[28]
Joon Son Chung and Andrew Zisserman. 2018. Learning to Lip Read Words by Watching Videos. Computer Vision and Image Understanding 173 (Aug. 2018), 76–85. https://doi.org/10.1016/j.cviu.2018.02.001
[29]
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards, and et al.2019. The State of Speech in HCI: Trends, Themes and Challenges. Interacting with Computers 31, 4 (Jun 2019), 349–371. https://doi.org/10.1093/iwc/iwz016
[30]
Charles S. DaSalla, Hiroyuki Kambara, Yasuharu Koike, and Makoto Sato. 2009. Spatial Filtering and Single-Trial Classification of Eeg During Vowel Speech Imagery. In Proceedings of the 3rd International Convention on Rehabilitation Engineering & Assistive Technology(i-CREATe ’09). Association for Computing Machinery, New York, NY, USA, 1–4. https://doi.org/10.1145/1592700.1592731
[31]
B. Denby, Y. Oussar, G. Dreyfus, and M. Stone. 2006. Prospects for a Silent Speech Interface Using Ultrasound Imaging. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 1. I–I. https://doi.org/10.1109/ICASSP.2006.1660033 ISSN: 2379-190X.
[32]
B. Denby and M. Stone. 2004. Speech Synthesis from Real Time Ultrasound Images of the Tongue. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. I–685. https://doi.org/10.1109/ICASSP.2004.1326078 ISSN: 1520-6149.
[33]
Li Deng and Xuedong Huang. 2004. Challenges in Adopting Speech Recognition. Commun. ACM 47, 1 (Jan. 2004), 69–75. https://doi.org/10.1145/962081.962108
[34]
Tamara Denning, Zakariya Dehlawi, and Tadayoshi Kohno. [n.d.]. In situ with bystanders of augmented reality glasses: perspectives on recording and privacy-mediating technologies. Association for Computing Machinery, 2377–2386. https://doi.org/10.1145/2556288.2557352
[35]
Alan Dix, Janet E. Finlay, Gregory D. Abowd, and Russell Beale. 2003. Human-Computer Interaction (3rd Edition). Prentice-Hall, Inc., USA.
[36]
Aarthi Easwara Moorthy and Kim-Phuong L. Vu. 2014. Voice Activated Personal Assistant: Acceptability of Use in the Public Space. In Human Interface and the Management of Information. Information and Knowledge in Applications and Services(Lecture Notes in Computer Science), Sakae Yamamoto (Ed.). Springer International Publishing, Cham, 324–334. https://doi.org/10.1007/978-3-319-07863-2_32
[37]
Christos Efthymiou and M. Halvey. 2016. Evaluating the Social Acceptability of Voice Based Smartwatch Search. In AIRS. https://doi.org/10.1007/978-3-319-48051-0_20
[38]
M. J. Fagan, S. R. Ell, J. M. Gilbert, E. Sarrazin, and P. M. Chapman. 2008. Development of a (silent) Speech Recognition System for Patients Following Laryngectomy. Medical Engineering & Physics 30, 4 (May 2008), 419–425. https://doi.org/10.1016/j.medengphy.2007.05.003
[39]
Victoria M. Florescu, L. Crevier-Buchman, B. Denby, T. Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, P. Roussel-Ragot, C. Gendrot, and S. Quattrocchi. 2010. Silent Vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface. In INTERSPEECH.
[40]
Masaaki Fukumoto. 2018. Silentvoice: Unnoticeable Voice Input by Ingressive Speech. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology(UIST ’18). Association for Computing Machinery, New York, NY, USA, 237–246. https://doi.org/10.1145/3242587.3242603
[41]
Shabnam Ghaffarzadegan, Hynek Bořil, and John H. Hansen. 2017. Deep Neural Network Training for Whispered Speech Recognition Using Small Databases and Generative Model Sampling. International Journal of Speech Technology 20, 4 (Dec. 2017), 1063–1075. https://doi.org/10.1007/s10772-017-9461-x
[42]
Shabnam Ghaffarzadegan, Hynek Bořil, and John H. L. Hansen. 2016. Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 10 (Oct. 2016), 1705–1720. https://doi.org/10.1109/TASLP.2016.2580944 Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[43]
J. M. Gilbert, S. I. Rybchenko, R. Hofe, S. R. Ell, M. J. Fagan, R. K. Moore, and P. Green. 2010. Isolated Word Recognition of Silent Speech Using Magnetic Implants and Sensors. Medical Engineering & Physics 32, 10 (Dec. 2010), 1189–1197. https://doi.org/10.1016/j.medengphy.2010.08.011
[44]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech Recognition with Deep Recurrent Neural Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 6645–6649. https://doi.org/10.1109/ICASSP.2013.6638947 ISSN: 2379-190X.
[45]
Đorđe T. Grozdić and Slobodan T. Jovičić. 2017. Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (Dec. 2017), 2313–2322. https://doi.org/10.1109/TASLP.2017.2738559 Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[46]
Anja S. Göritz, Kathrin Borchert, and Matthias Hirth. 2019. Using Attention Testing to Select Crowdsourced Workers and Research Participants. Social Science Computer Review (June 2019), 0894439319848726. https://doi.org/10.1177/0894439319848726 Publisher: SAGE Publications Inc.
[47]
Sandra G Hart. 2006. NASA-task Load Index (NASA-TLX); 20 Years Later. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 50. Sage publications Sage CA: Los Angeles, CA, 904–908.
[48]
Hideki Hashimoto, Yoshifumi Nagata, Shigenobu Seto, Yoichi Takebayashi, Hideaki Shinchi, and Koji Yamaguchi. 1997. Speech Recognition Interface System Suitable for Window Systems and Speech Mail Systems. https://patents.google.com/patent/US5632002/en
[49]
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-yiin Chang, Kanishka Rao, and Alexander Gruenstein. 2019. Streaming End-to-End Speech Recognition for Mobile Devices. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6381–6385. https://doi.org/10.1109/ICASSP.2019.8682336 ISSN: 2379-190X.
[50]
Panikos Heracleous and Norihiro Hagita. 2011. Automatic Recognition of Speech Without Any Audio Information. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2392–2395. https://doi.org/10.1109/ICASSP.2011.5946965 ISSN: 2379-190X.
[51]
Panikos Heracleous, Tomomi Kaino, H. Saruwatari, and K. Shikano. 2007. Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor. EURASIP J. Adv. Signal Process.(2007). https://doi.org/10.1155/2007/94068
[52]
Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Nakamura, Yoshitaka Nakajima, and Kiyohiro Shikano. 2010. Silent-Speech Enhancement Using Body-Conducted Vocal-Tract Resonance Signals. Speech Communication 52, 4 (April 2010), 301–313. https://doi.org/10.1016/j.specom.2009.12.001
[53]
Matthew B. Hoy. 2018. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants. Medical Reference Services Quarterly 37, 1 (Jan. 2018), 81–88. https://doi.org/10.1080/02763869.2018.1404391 Publisher: Routledge _eprint: https://doi.org/10.1080/02763869.2018.1404391.
[54]
T. Hueber, G. Aversano, G. Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, and M. Stone. 2007. Eigentongue Feature Extraction for an Ultrasound-Based Silent Speech Interface. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07, Vol. 1. I–1245–I–1248. https://doi.org/10.1109/ICASSP.2007.366140 ISSN: 2379-190X.
[55]
Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, and Maureen Stone. 2010. Development of a Silent Speech Interface Driven by Ultrasound and Optical Images of the Tongue and Lips. Speech Communication 52, 4 (April 2010), 288–300. https://doi.org/10.1016/j.specom.2009.11.004
[56]
Charles Jorgensen and Sorin Dusan. 2010. Speech Interfaces Based Upon Surface Electromyography. Speech Communication 52, 4 (April 2010), 354–366. https://doi.org/10.1016/j.specom.2009.11.003
[57]
C. Jorgensen, D.D. Lee, and S. Agabont. 2003. Sub Auditory Speech Recognition Based on Emg Signals. In Proceedings of the International Joint Conference on Neural Networks, 2003., Vol. 4. 3128–3133 vol.4. https://doi.org/10.1109/IJCNN.2003.1224072 ISSN: 1098-7576.
[58]
S. Jou, Tanja Schultz, Matthias Walliczek, F. Kraft, and Alexander H. Waibel. 2006. Towards Continuous Speech Recognition Using Surface Electromyography. In INTERSPEECH.
[59]
Szu-Chen Jou and et al. 2004. Adaptation for Soft Whisper Recognition Using a Throat Microphone.
[60]
Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. Alterego: A Personalized Wearable Silent Speech Interface. In 23rd International Conference on Intelligent User Interfaces(IUI ’18). Association for Computing Machinery, New York, NY, USA, 43–53. https://doi.org/10.1145/3172944.3172977
[61]
Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300376
[62]
Peter F. King. 2003. Server Based Speech Recognition User Interface for Wireless Devices. https://patents.google.com/patent/US6532446B1/en
[63]
Marion Koelle, Swamy Ananthanarayan, and Susanne Boll. 2020. Social Acceptability in HCI: A Survey of Methods, Measures, and Design Strategies. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–19. https://doi.org/10.1145/3313831.3376162
[64]
Marion Koelle, Abdallah El Ali, Vanessa Cobus, Wilko Heuten, and Susanne CJ Boll. 2017. All about Acceptability? Identifying Factors for the Adoption of Data Glasses. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 295–300. https://doi.org/10.1145/3025453.3025749
[65]
Marion Koelle, Matthias Kranz, and Andreas Möller. 2015. Don’t Look at Me That Way! Understanding User Attitudes Towards Data Glasses Usage. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services (Copenhagen, Denmark) (MobileHCI ’15). Association for Computing Machinery, New York, NY, USA, 362–372. https://doi.org/10.1145/2785830.2785842
[66]
Andreas Komninos, Emma Nicol, and Mark Dunlop. 2020. Investigating Error Injection to Enhance the Effectiveness of Mobile Text Entry Studies of Error Behaviour. arxiv:2003.06318 [cs.HC]
[67]
DoYoung Lee, Youryang Lee, Yonghwan Shin, and Ian Oakley. 2018. Designing Socially Acceptable Hand-to-Face Input. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 711–723. https://doi.org/10.1145/3242587.3242642
[68]
Xin Lei, A. Senior, A. Gruenstein, and Jeffrey Scott Sorensen. 2013. Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices. In INTERSPEECH.
[69]
Ning Li, Tuoyang Zhou, Yingwei Zhou, Chen Guo, Deqiang Fu, Xiaoqing Li, and Zijing Guo. 2019. Research on Human-Computer Interaction Mode of Speech Recognition Based on Environment Elements of Command and Control System. In 2019 5th International Conference on Big Data and Information Analytics (BigDIA). 170–175. https://doi.org/10.1109/BigDIA.2019.8802812
[70]
Andrés Lucero and Akos Vetek. [n.d.]. NotifEye: using interactive glasses to deal with notifications while walking in public. In Proceedings of the 11th Conference on Advances in Computer Entertainment Technology (New York, NY, USA, 2014-11-11) (ACE ’14). Association for Computing Machinery, 1–10. https://doi.org/10.1145/2663806.2663824
[71]
Patrick J. Lynch. 1994. Visual Design for the User Interface, Part 1: Design Fundamentals. Journal of Biocommunication 21 (1994), 22–22. http://trantor.sheridanc.on.ca/sys32a1/manual/appendix/gui1.html
[72]
Gustavo López, Luis Quesada, and Luis A. Guerrero. 2018. Alexa Vs. Siri Vs. Cortana Vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. In Advances in Human Factors and Systems Interaction(Advances in Intelligent Systems and Computing), Isabel L. Nunes (Ed.). Springer International Publishing, Cham, 241–250. https://doi.org/10.1007/978-3-319-60366-7_23
[73]
I. Scott MacKenzie and R. William Soukoreff. [n.d.]. Phrase sets for evaluating text entry techniques. In CHI ’03 Extended Abstracts on Human Factors in Computing Systems (New York, NY, USA, 2003-04-05) (CHI EA ’03). Association for Computing Machinery, 754–755. https://doi.org/10.1145/765891.765971
[74]
L. Maier-Hein, F. Metze, T. Schultz, and A. Waibel. 2005. Session Independent Non-Audible Speech Recognition Using Surface Electromyography. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2005.331–336. https://doi.org/10.1109/ASRU.2005.1566521
[75]
Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Haşim Sak, Alexander Gruenstein, Françoise Beaufays, and Carolina Parada. 2016. Personalized Speech Recognition on Mobile Devices. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5955–5959. https://doi.org/10.1109/ICASSP.2016.7472820 ISSN: 2379-190X.
[76]
I. Mcloughlin, J. Li, and Yan Song. 2013. Reconstruction of Continuous Voiced Speech from Whispers. In INTERSPEECH.
[77]
Calkin S. Montero, Jason Alexander, Mark T. Marshall, and Sriram Subramanian. [n.d.]. Would you do that? understanding social acceptance of gestural interfaces. In Proceedings of the 12th international conference on Human computer interaction with mobile devices and services (New York, NY, USA, 2010-09-07) (MobileHCI ’10). Association for Computing Machinery, 275–278. https://doi.org/10.1145/1851600.1851647
[78]
Aarthi Easwara Moorthy and Kim-Phuong L. Vu. 2015. Privacy Concerns for Use of Voice Activated Personal Assistant in the Public Space. International Journal of Human–Computer Interaction 31, 4 (April 2015), 307–335. https://doi.org/10.1080/10447318.2014.986642 Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/10447318.2014.986642.
[79]
Y. Nakajima, H. Kashioka, K. Shikano, and N. Campbell. 2003. Non-Audible Murmur Recognition Input Interface Using Stethoscopic Microphone Attached to the Skin. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03)., Vol. 5. V–708. https://doi.org/10.1109/ICASSP.2003.1200069 ISSN: 1520-6149.
[80]
Y. Nakajima, H. Kashioka, K. Shikano, and N. Campbell. 2003. Non-Audible Murmur Recognition Input Interface Using Stethoscopic Microphone Attached to the Skin. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03)., Vol. 5. V–708. https://doi.org/10.1109/ICASSP.2003.1200069 ISSN: 1520-6149.
[81]
NASA. 1986. NASA Task Load Index (TLX) V 1.0: Paper and Pencil Package. Technical Report. Human Performance Research Group, NASA Ames Research Center, Moffett Field, CA, USA.
[82]
Chalapathy Neti, Gerasimos Potamianos, Juergen Luettin, Iain Matthews, and Herve Glotin. 2000. Audio-Visual Speech Recognition. (2000), 86.
[83]
L.C. Ng, G.C. Burnett, J.F. Holzrichter, and T.J. Gable. 2000. Denoising of Human Speech Using Combined Acoustic and Em Sensor Signal Processing. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), Vol. 1. 229–232 vol.1. https://doi.org/10.1109/ICASSP.2000.861925 ISSN: 1520-6149.
[84]
Sanjay A. Patil and John H. L. Hansen. 2010. The Physiological Microphone (pmic): A Competitive Alternative for Speaker Assessment in Stress Detection and Speaker Verification. Speech Communication 52, 4 (April 2010), 327–340. https://doi.org/10.1016/j.specom.2009.11.006
[85]
Dr Marta Perez Garcia, Sarita Saffon Lopez, and Hector Donis. 2018. Everybody is Talking About Virtual Assistants, But How are People Really Using Them?. In Proceedings of the 32nd International BCS Human Computer Interaction Conference 32. 1–5.
[86]
Stavros Petridis and Maja Pantic. 2016. Deep Complementary Bottleneck Features for Visual Speech Recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2304–2308. https://doi.org/10.1109/ICASSP.2016.7472088 ISSN: 2379-190X.
[87]
J.W. Picone. 1993. Signal Modeling Techniques in Speech Recognition. Proc. IEEE 81, 9 (Sept. 1993), 1215–1247. https://doi.org/10.1109/5.237532 Conference Name: Proceedings of the IEEE.
[88]
Anne Porbadnigk, Marek Wester, Jan Calliess, and Tanja Schultz. 2009. EEG-based Speech Recognition - Impact of Temporal Effects. In BIOSIGNALS. https://doi.org/10.5220/0001554303760381
[89]
Anne Porbadnigk, Marek Wester, Jan Calliess, and Tanja Schultz. 2009. EEG-Based Speech Recognition - Impact of Temporal Effects. In BIOSIGNALS. https://doi.org/10.5220/0001554303760381
[90]
G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior. 2003. Recent Advances in the Automatic Recognition of Audiovisual Speech. Proc. IEEE 91, 9 (Sept. 2003), 1306–1326. https://doi.org/10.1109/JPROC.2003.817150 Conference Name: Proceedings of the IEEE.
[91]
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The Kaldi Speech Recognition Toolkit. https://infoscience.epfl.ch/record/192584 Conference Name: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding Number: CONF Publisher: IEEE Signal Processing Society.
[92]
S. Prabhakar, S. Pankanti, and A.K. Jain. 2003. Biometric Recognition: Security and Privacy Concerns. IEEE Security Privacy 1, 2 (March 2003), 33–42. https://doi.org/10.1109/MSECP.2003.1193209 Conference Name: IEEE Security Privacy.
[93]
Halley Profita, Reem Albaghli, Leah Findlater, Paul Jaeger, and Shaun K. Kane. 2016. The AT Effect: How Disability Affects the Perceived Social Acceptability of Head-Mounted Display Use. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 4884–4895. https://doi.org/10.1145/2858036.2858130
[94]
Halley P. Profita. [n.d.]. Designing wearable computing technology for acceptability and accessibility. 114 ([n. d.]), 44–48. https://doi.org/10.1145/2904092.2904101
[95]
T.F. Quatieri, K. Brady, D. Messing, J.P. Campbell, W.M. Campbell, M.S. Brandstein, C.J. Weinstein, J.D. Tardelli, and P.D. Gatewood. 2006. Exploiting Nonacoustic Sensors for Speech Encoding. IEEE Transactions on Audio, Speech, and Language Processing 14, 2 (March 2006), 533–544. https://doi.org/10.1109/TSA.2005.855838 Conference Name: IEEE Transactions on Audio, Speech, and Language Processing.
[96]
Stuart Reeves, Steve Benford, Claire O’Malley, and Mike Fraser. 2005. Designing the Spectator Experience. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Portland, Oregon, USA) (CHI ’05). Association for Computing Machinery, New York, NY, USA, 741–750. https://doi.org/10.1145/1054972.1055074
[97]
Julie Rico and Stephen Brewster. [n.d.]. Gestures all around us: user differences in social acceptability perceptions of gesture based interfaces. In Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services (New York, NY, USA, 2009-09-15) (MobileHCI ’09). Association for Computing Machinery, 1–2. https://doi.org/10.1145/1613858.1613936
[98]
Julie Rico and Stephen Brewster. 2010. Usable Gestures for Mobile Interfaces: Evaluating Social Acceptability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 887–896. https://doi.org/10.1145/1753326.1753458
[99]
Julie Rico and Stephen Brewster. 2010. Usable Gestures for Mobile Interfaces: Evaluating Social Acceptability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 887–896. https://doi.org/10.1145/1753326.1753458
[100]
Judy Robertson and Maurits Kaptein. [n.d.]. Modern Statistical Methods for HCI. Springer.
[101]
Sami Ronkainen, Jonna Häkkilä, Saana Kaleva, Ashley Colley, and Jukka Linjama. 2007. Tap Input as an Embedded Interaction Method for Mobile Devices. In Proceedings of the 1st International Conference on Tangible and Embedded Interaction (Baton Rouge, Louisiana) (TEI ’07). Association for Computing Machinery, New York, NY, USA, 263–270. https://doi.org/10.1145/1226969.1227023
[102]
Sherry Ruan, Jacob O. Wobbrock, Kenny Liou, Andrew Ng, and James A. Landay. 2018. Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (Jan. 2018), 159:1–159:23. https://doi.org/10.1145/3161187
[103]
A. Rubin, V. Praneetvatakul, Shirley Gherson, C. Moyer, and R. Sataloff. 2006. Laryngeal Hyperfunction During Whispering: Reality or Myth?Journal of voice : official journal of the Voice Foundation (2006). https://doi.org/10.1016/J.JVOICE.2004.10.007
[104]
Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo, Thad Starner, and Maysam Ghovanloo. 2014. The Tongue and Ear Interface: A Wearable System for Silent Speech Recognition. In Proceedings of the 2014 ACM International Symposium on Wearable Computers(ISWC ’14). Association for Computing Machinery, New York, NY, USA, 47–54. https://doi.org/10.1145/2634317.2634322
[105]
Tanja Schultz and Michael Wand. 2010. Modeling Coarticulation in Emg-Based Continuous Speech Recognition. Speech Communication 52, 4 (April 2010), 341–353. https://doi.org/10.1016/j.specom.2009.12.002
[106]
Mike Schuster. 2010. Speech Recognition for Mobile Devices at Google. In PRICAI 2010: Trends in Artificial Intelligence(Lecture Notes in Computer Science), Byoung-Tak Zhangand Mehmet A. Orgun (Eds.). Springer, Berlin, Heidelberg, 8–10. https://doi.org/10.1007/978-3-642-15246-7_3
[107]
Marcos Serrano, Barrett M. Ens, and Pourang P. Irani. 2014. Exploring the Use of Hand-to-Face Input for Interacting with Head-Worn Displays. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 3181–3190. https://doi.org/10.1145/2556288.2556984
[108]
R. V. Shannon, F. Zeng, V. Kamath, J. Wygonski, and M. Ekelid. 1995. Speech Recognition with Primarily Temporal Cues. Science (1995). https://doi.org/10.1126/science.270.5234.303
[109]
Themos Stafylakis and Georgios Tzimiropoulos. 2017. Combining Residual Networks with Lstms for Lipreading. INTERSPEECH (2017). https://doi.org/10.21437/INTERSPEECH.2017-85
[110]
Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology(UIST ’18). Association for Computing Machinery, New York, NY, USA, 581–593. https://doi.org/10.1145/3242587.3242599
[111]
P. Suppes, B. Han, and Z. Lu. 1998. Brain-Wave Recognition of Sentences.Proceedings of the National Academy of Sciences of the United States of America (1998). https://doi.org/10.1073/pnas.95.26.15861
[112]
P. Suppes, Z. Lu, and B. Han. 1997. Brain Wave Recognition of Words.Proceedings of the National Academy of Sciences of the United States of America (1997). https://doi.org/10.1073/pnas.94.26.14965
[113]
Ingo R. Titze, Brad H. Story, Gregory C. Burnett, John F. Holzrichter, Lawrence C. Ng, and Wayne A. Lea. 1999. Comparison Between Electroglottography and Electromagnetic Glottography. The Journal of the Acoustical Society of America 107, 1 (Dec. 1999), 581–588. https://doi.org/10.1121/1.428324 Publisher: Acoustical Society of America.
[114]
Ying-Chao Tung, Chun-Yen Hsu, Han-Yu Wang, Silvia Chyou, Jhe-Wei Lin, Pei-Jung Wu, Andries Valstar, and Mike Y. Chen. 2015. User-Defined Game Input for Smart Glasses in Public Space. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 3327–3336. https://doi.org/10.1145/2702123.2702214
[115]
Michael Wand and Tanja Schultz. 2011. Session-Independent Emg-Based Speech Recognition. In BIOSIGNALS. https://doi.org/10.5220/0003169702950300
[116]
Dean Weber. 2001. Interactive User Interface Using Speech Recognition and Natural Language Processing. https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2001026093
[117]
Dean Weber. 2002. Object Interactive User Interface Using Speech Recognition and Natural Language Processing. https://patents.google.com/patent/US6434524B1/en
[118]
Dean Weber. 2002. Object Interactive User Interface Using Speech Recognition and Natural Language Processing. https://patents.google.com/patent/US6434524B1/en
[119]
D. Zaykovskiy. 2006. Survey of the Speech Recognition Techniques for Mobile Devices.
[120]
You Zhang, Jeffery J. Faneuff, William Hidden, James T. Hotary, Steven C. Lee, and Vasu Iyengar. 2010. Automobile Speech-Recognition Interface. https://patents.google.com/patent/US7826945B2/en
[121]
Yu Zhong, T. V. Raman, Casey Burkhardt, Fadi Biadsy, and Jeffrey P. Bigham. 2014. Justspeak: Enabling Universal Voice Control on Android. In W4A 2014. http://dl.acm.org/citation.cfm?id=2596720

Cited By

View all
  • (2024)NearFetch: Enhancing Touch-Based Mobile Interaction on Public Displays with an Embedded Programmable NFC ArrayProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685749(234-243)Online publication date: 4-Nov-2024
  • (2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
  • (2024)Poster Unvoiced: Designing an Unvoiced User Interface using Earables and LLMsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699413(871-872)Online publication date: 4-Nov-2024
  • Show More Cited By

Index Terms

  1. Acceptability of Speech and Silent Speech Input Methods in Private and Public
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
    May 2021
    10862 pages
    ISBN:9781450380966
    DOI:10.1145/3411764
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 May 2021

    Check for updates

    Author Tags

    1. contactless interaction
    2. input and interaction
    3. silent speech
    4. social acceptance
    5. speech
    6. voice assistant

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CHI '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)668
    • Downloads (Last 6 weeks)80
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)NearFetch: Enhancing Touch-Based Mobile Interaction on Public Displays with an Embedded Programmable NFC ArrayProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685749(234-243)Online publication date: 4-Nov-2024
    • (2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
    • (2024)Poster Unvoiced: Designing an Unvoiced User Interface using Earables and LLMsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699413(871-872)Online publication date: 4-Nov-2024
    • (2024)Unvoiced: Designing an LLM-assisted Unvoiced User Interface using EarablesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699374(784-798)Online publication date: 4-Nov-2024
    • (2024)Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596148:2(1-29)Online publication date: 15-May-2024
    • (2024)Enabling Accessible and Ubiquitous Interaction in Next-Generation Wearables: An Unvoiced Speech ApproachProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3695908(2257-2259)Online publication date: 4-Dec-2024
    • (2024)MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile DevicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642348(1-23)Online publication date: 11-May-2024
    • (2024)ReHEarSSE: Recognizing Hidden-in-the-Ear Silently Spelled ExpressionsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642095(1-16)Online publication date: 11-May-2024
    • (2024)Alaryngeal Speech Enhancement for Noisy Environments Using a Pareto Denoising Gated LSTMJournal of Voice10.1016/j.jvoice.2024.07.016Online publication date: Aug-2024
    • (2023)Inform, Explain, or Control: Techniques to Adjust End-User Performance Expectations for a Conversational Agent Facilitating Group Chat DiscussionsProceedings of the ACM on Human-Computer Interaction10.1145/36101927:CSCW2(1-26)Online publication date: 4-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media