More Web Proxy on the site http://driver.im/

research-article

Open access

ScratchThat: Supporting Command-Agnostic Speech Repair in Voice-Driven Assistants

Authors:

Jeffrey BighamAuthors Info & Claims

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 3, Issue 2

Article No.: 63, Pages 1 - 17

https://doi.org/10.1145/3328934

Published: 21 June 2019 Publication History

Abstract

Speech interfaces have become an increasingly popular input method for smartphone-based virtual assistants, smart speakers, and Internet of Things (IoT) devices. While they facilitate rapid and natural interaction in the form of voice commands, current speech interfaces lack natural methods for command correction. We present ScratchThat, a method for supporting command-agnostic speech repair in voice-driven assistants, suitable for enabling corrective functionality within third-party commands. Unlike existing speech repair methods, ScratchThat is able to automatically infer query parameters and intelligently select entities in a correction clause for editing. We conducted three evaluations to (1) elicit natural forms of speech repair in voice commands, (2) compare the interaction speed and NASA TLX score of the system to existing voice-based correction methods, and (3) assess the accuracy of the ScratchThat algorithm. Our results show that (1) speech repair for voice commands differ from previous models for conversational speech repair, (2) methods for command correction based on speech repair are significantly faster than other voice-based methods, and (3) the ScratchThat algorithm facilitates accurate command repair as rated by humans (77% accuracy) and machines (0.94 BLEU score). Finally, we present several ScratchThat use cases, which collectively demonstrate its utility across many applications.

Supplementary Material

wu (wu.zip)

Supplemental movie, appendix, image and software files for, ScratchThat: Supporting Command-Agnostic Speech Repair in Voice-Driven Assistants

Download
40.63 MB

References

[1]

{n. d.}. Actions on Google Errors - Conversational components. https://designguidelines.withgoogle.com/conversation/conversational-components/errors.html. Accessed: 2019-01-25.

[2]

{n. d.}. Actions on Google Scale your design - Conversation design process - Conversation design. https://designguidelines.withgoogle.com/conversation/conversation-design-process/scale-your-design.html. Accessed: 2019-01-26.

[3]

{n. d.}. Alexa.bio The Living List of Alexa Commands. Accessed 2018-11-15.

[4]

{n. d.}. Dictanote How to setup Voice Commands? https://support.dictanote.co/hc/en-us/articles/115002811807-How-to-setup-Voice-Commands-. Accessed: 2019-04-11.

[5]

{n. d.}. Dragon Inserting, replacing, and deleting text. https://www.nuance.com/products/help/dragon/dragon-for-pc/enx/professionalgroup/main/Content/WorkingWithText/inserting_replacing_and_deleting_text.htm. Accessed: 2019-04-11.

[6]

{n. d.}. Dragon Revising text. https://www.nuance.com/products/help/dragon/dragon-for-mac/enx/Content/Correction/RevisingText.htm. Accessed: 2019-04-11.

[7]

Steven P Abney. 1991. Parsing by chunks. In Principle-based parsing. Springer, 257--278.

[8]

Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottrjdge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 91.

Digital Library

[9]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).

[10]

Boxing Chen and Colin Cherry. 2014. A systematic comparison of smoothing techniques for sentence-level bleu. In Proceedings of the Ninth Workshop on Statistical Machine Translation. 362--367.

[11]

Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 740--750.

[12]

Nancy Chinchor. 1998. Overview of MUC-7. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29-May 1, 1998.

[13]

Eunah Cho, Jan Niehues, Thanh-Le Ha, and A. Waibel. 2016. Multilingual Disfluency Removal using NMT.

[14]

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.

[15]

Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, and Michael S Bernstein. 2018. Iris: A Conversational Agent for Complex Tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 473.

Digital Library

[16]

Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 363--370.

Digital Library

[17]

Anushay Furqan, Chelsea Myers, and Jichen Zhu. 2017. Learnability through Adaptive Discovery Tools in Voice User Interfaces. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM, 1617--1623.

Digital Library

[18]

Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. 2017. Affect-lm: A neural language model for customizable affective text generation. arXiv preprint arXiv:1704.06851 (2017).

[19]

Ramanathan Guha, Vineet Gupta, Vivek Raghunathan, and Ramakrishnan Srikant. 2015. User Modeling for a Personal Assistant. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM '15). ACM, New York, NY, USA, 275--284.

Digital Library

[20]

Yawen Guo. 2018. ImprovChat: An AI-enabled Dialogue Assistant Chatbot for English Language Learners (ELL). Ph.D. Dissertation. OCAD University.

[21]

Peter Heeman and James Allen. 1994. Detecting and correcting speech repairs. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 295--302.

Digital Library

[22]

Peter A Heeman and James F Allen. 1999. Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue. Computational Linguistics 25, 4 (1999), 527--571.

Digital Library

[23]

Julian Hough and David Schlangen. 2015. Recurrent neural networks for incremental disfluency detection. In INTERSPEECH.

[24]

Paria Jamshid Lou and Mark Johnson. 2017. Disfluency Detection using a Noisy Channel Model and a Deep Neural Language Model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 547--553.

[25]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on. IEEE, 1--12.

Digital Library

[26]

Boris Katz, Gary Borchardt, Sue Felshin, and Federico Mora. 2018. A Natural Language Interface for Mobile Devices. (2018).

[27]

Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83--97.

[28]

Kent Lyons, Christopher Skeels, Thad Starner, Cornelis M. Snoeck, Benjamin A. Wong, and Daniel Ashbrook. 2004. Augmenting Conversations Using Dual-purpose Speech. In Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology (UIST '04). ACM, New York, NY, USA, 237--246.

Digital Library

[29]

Michael L Mauldin. 1994. Chatterbots, tinymuds, and the turing test: Entering the loebner prize competition. In AAAI, Vol. 94. 16--21.

Digital Library

[30]

Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for How Users Overcome Obstacles in Voice User Interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 6.

Digital Library

[31]

Daniel OâĂ&Zacute;Sullivan. 2009. Using an adaptive voice user interface to gain efficiencies in automated calls. White Paper, Interactive Digital, Smithtown, USA (2009).

[32]

Aung Pyae and Tapani N. Joelsson. 2018. Investigating the Usability and User Experiences of Voice User Interface: A Case of Google Home Smart Speaker. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (MobileHCI '18). ACM, New York, NY, USA, 127--131.

Digital Library

[33]

Reza Rawassizadeh, Chelsea Dobbins, Manouchehr Nourizadeh, Zahra Ghamchili, and Michael Pazzani. 2017. A natural language query interface for searching personal information on smartwatches. In 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 679--684.

[34]

Sherry Ruan, Jacob O Wobbrock, Kenny Liou, Andrew Ng, and James A Landay. 2018. Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 159.

Digital Library

[35]

Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C Courville, and Joelle Pineau. 2016. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models. In AAAI, Vol. 16. 3776--3784.

Digital Library

[36]

Ben Shneiderman. 2000. The Limits of Speech Recognition. Commun. ACM 43, 9 (Sept. 2000), 63--65.

Digital Library

[37]

Elizabeth Ellen Shriberg. 1994. Preliminaries to a theory of speech disfluencies. Ph.D. Dissertation. Citeseer.

[38]

Shimpei Soda, Masahide Nakamura, Shinsuke Matsumoto, Shintaro Izumi, Hiroshi Kawaguchi, and Masahiko Yoshimoto. 2012. Implementing virtual agent as an interface for smart home voice control. In Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, Vol. 1. IEEE, 342--345.

Digital Library

[39]

Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 173--180.

Digital Library

[40]

Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).

[41]

Richard S Wallace. 2009. The anatomy of ALICE. In Parsing the Turing Test. Springer, 181--210.

[42]

Shaolei Wang, Wanxiang Che, and Ting Liu. 2016. A Neural Attention Model for Disfluency Detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 278--287.

[43]

Jason Wu, Sayan Ghosh, Mathieu Chollet, Steven Ly, Sharon Mozgai, and Stefan Scherer. 2018. NADiA: Neural Network Driven Virtual Human Conversation Agents. In Proceedings of the 18th International Conference on Intelligent Virtual Agents (IVA '18). ACM, New York, NY, USA, 173--178.

Digital Library

[44]

Vicky Zayats, Mari Ostendorf, and Hannaneh Hajishirzi. 2016. Disfluency Detection using a Bidirectional LSTM. CoRR abs/1604.03209 (2016). arXiv:1604.03209 http://arxiv.org/abs/1604.03209

[45]

Simon Zwarts, Mark Johnson, and Robert Dale. 2010. Detecting Speech Repairs Incrementally Using a Noisy Channel Approach. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 1371--1378. http://dl.acm.org/citation.cfm?id=1873781.1873935

Digital Library

Cited By

Zhu HCheung YXie L(2023)Stability and Efficiency of Personalised Cultural MarketsProceedings of the ACM Web Conference 202310.1145/3543507.3583315(3447-3455)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583315
Meck ADraxler CVogt T(2023)Failing with Grace: Exploring the Role of Repair Costs in Conversational Breakdowns with in-Car Voice AssistantsInternational Journal of Human–Computer Interaction10.1080/10447318.2023.226679140:22(7574-7592)Online publication date: 11-Oct-2023
https://doi.org/10.1080/10447318.2023.2266791
Chen YJiang TGel Y(2023)H-Nets: Hyper-hodge Convolutional Neural Networks for Time-Series ForecastingMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43424-2_17(271-289)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43424-2_17
Show More Cited By

Index Terms

ScratchThat: Supporting Command-Agnostic Speech Repair in Voice-Driven Assistants
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output
    2. Interaction techniques
  2. Ubiquitous and mobile computing
    1. Ubiquitous and mobile devices

Recommendations

A pattern language for error management in voice user interfaces
EuroPLoP '10: Proceedings of the 15th European Conference on Pattern Languages of Programs

Speech is not recognized with an accuracy of 100%. Even humans are not able to do that. There will always be some uncertainty in the recognized input, requiring strategies to cope. This is different from the experience with graphical user interfaces, ...
Identifying Speech Input Errors Through Audio-Only Interaction
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Speech has become an increasingly common means of text input, from smartphones and smartwatches to voice-based intelligent personal assistants. However, reviewing the recognized text to identify and correct errors is a challenge when no visual feedback ...
Finding a New Voice: Transitioning Designers from GUI to VUI Design
CUI '21: Proceedings of the 3rd Conference on Conversational User Interfaces

As Voice User Interfaces (VUIs) become widely popular, designers must handle new usability challenges. However, compared to other established domains such as Graphical User Interfaces (GUIs), VUI designers have fewer resources (training support, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 3, Issue 2

June 2019

802 pages

EISSN:2474-9567

DOI:10.1145/3341982

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2019

Accepted: 01 April 2019

Revised: 01 February 2019

Received: 01 November 2018

Published in IMWUT Volume 3, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
712
Total Downloads

Downloads (Last 12 months)120
Downloads (Last 6 weeks)13

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu HCheung YXie L(2023)Stability and Efficiency of Personalised Cultural MarketsProceedings of the ACM Web Conference 202310.1145/3543507.3583315(3447-3455)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583315
Meck ADraxler CVogt T(2023)Failing with Grace: Exploring the Role of Repair Costs in Conversational Breakdowns with in-Car Voice AssistantsInternational Journal of Human–Computer Interaction10.1080/10447318.2023.226679140:22(7574-7592)Online publication date: 11-Oct-2023
https://doi.org/10.1080/10447318.2023.2266791
Chen YJiang TGel Y(2023)H-Nets: Hyper-hodge Convolutional Neural Networks for Time-Series ForecastingMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43424-2_17(271-289)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43424-2_17
Nouna BNadia EMohammed B(2022)Input Multimodal Interactions for IoT Environments: smartphones instead of intermediate equipment2022 13th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS55353.2022.9811235(65-69)Online publication date: 21-Jun-2022
https://doi.org/10.1109/ICICS55353.2022.9811235
Chen XLiesenfeld ALi SYao Y(2022)Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin ChineseEngineering Psychology and Cognitive Ergonomics10.1007/978-3-031-06086-1_1(3-17)Online publication date: 26-Jun-2022
https://dl.acm.org/doi/10.1007/978-3-031-06086-1_1
Koh J(2021)Discourse Analysis in Voice User Interface ResearchProceedings of the 3rd Conference on Conversational User Interfaces10.1145/3469595.3469622(1-5)Online publication date: 27-Jul-2021
https://dl.acm.org/doi/10.1145/3469595.3469622
Cho JRader E(2020)The Role of Conversational Grounding in Supporting Symbiosis Between People and Digital AssistantsProceedings of the ACM on Human-Computer Interaction10.1145/33928384:CSCW1(1-28)Online publication date: 29-May-2020
https://dl.acm.org/doi/10.1145/3392838

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents