[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

ScratchThat: Supporting Command-Agnostic Speech Repair in Voice-Driven Assistants

Published: 21 June 2019 Publication History

Abstract

Speech interfaces have become an increasingly popular input method for smartphone-based virtual assistants, smart speakers, and Internet of Things (IoT) devices. While they facilitate rapid and natural interaction in the form of voice commands, current speech interfaces lack natural methods for command correction. We present ScratchThat, a method for supporting command-agnostic speech repair in voice-driven assistants, suitable for enabling corrective functionality within third-party commands. Unlike existing speech repair methods, ScratchThat is able to automatically infer query parameters and intelligently select entities in a correction clause for editing. We conducted three evaluations to (1) elicit natural forms of speech repair in voice commands, (2) compare the interaction speed and NASA TLX score of the system to existing voice-based correction methods, and (3) assess the accuracy of the ScratchThat algorithm. Our results show that (1) speech repair for voice commands differ from previous models for conversational speech repair, (2) methods for command correction based on speech repair are significantly faster than other voice-based methods, and (3) the ScratchThat algorithm facilitates accurate command repair as rated by humans (77% accuracy) and machines (0.94 BLEU score). Finally, we present several ScratchThat use cases, which collectively demonstrate its utility across many applications.

Supplementary Material

wu (wu.zip)
Supplemental movie, appendix, image and software files for, ScratchThat: Supporting Command-Agnostic Speech Repair in Voice-Driven Assistants

References

[1]
{n. d.}. Actions on Google Errors - Conversational components. https://designguidelines.withgoogle.com/conversation/conversational-components/errors.html. Accessed: 2019-01-25.
[2]
{n. d.}. Actions on Google Scale your design - Conversation design process - Conversation design. https://designguidelines.withgoogle.com/conversation/conversation-design-process/scale-your-design.html. Accessed: 2019-01-26.
[3]
{n. d.}. Alexa.bio The Living List of Alexa Commands. Accessed 2018-11-15.
[4]
{n. d.}. Dictanote How to setup Voice Commands? https://support.dictanote.co/hc/en-us/articles/115002811807-How-to-setup-Voice-Commands-. Accessed: 2019-04-11.
[5]
{n. d.}. Dragon Inserting, replacing, and deleting text. https://www.nuance.com/products/help/dragon/dragon-for-pc/enx/professionalgroup/main/Content/WorkingWithText/inserting_replacing_and_deleting_text.htm. Accessed: 2019-04-11.
[6]
{n. d.}. Dragon Revising text. https://www.nuance.com/products/help/dragon/dragon-for-mac/enx/Content/Correction/RevisingText.htm. Accessed: 2019-04-11.
[7]
Steven P Abney. 1991. Parsing by chunks. In Principle-based parsing. Springer, 257--278.
[8]
Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottrjdge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 91.
[9]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).
[10]
Boxing Chen and Colin Cherry. 2014. A systematic comparison of smoothing techniques for sentence-level bleu. In Proceedings of the Ninth Workshop on Statistical Machine Translation. 362--367.
[11]
Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 740--750.
[12]
Nancy Chinchor. 1998. Overview of MUC-7. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29-May 1, 1998.
[13]
Eunah Cho, Jan Niehues, Thanh-Le Ha, and A. Waibel. 2016. Multilingual Disfluency Removal using NMT.
[14]
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.
[15]
Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, and Michael S Bernstein. 2018. Iris: A Conversational Agent for Complex Tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 473.
[16]
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 363--370.
[17]
Anushay Furqan, Chelsea Myers, and Jichen Zhu. 2017. Learnability through Adaptive Discovery Tools in Voice User Interfaces. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM, 1617--1623.
[18]
Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. 2017. Affect-lm: A neural language model for customizable affective text generation. arXiv preprint arXiv:1704.06851 (2017).
[19]
Ramanathan Guha, Vineet Gupta, Vivek Raghunathan, and Ramakrishnan Srikant. 2015. User Modeling for a Personal Assistant. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM '15). ACM, New York, NY, USA, 275--284.
[20]
Yawen Guo. 2018. ImprovChat: An AI-enabled Dialogue Assistant Chatbot for English Language Learners (ELL). Ph.D. Dissertation. OCAD University.
[21]
Peter Heeman and James Allen. 1994. Detecting and correcting speech repairs. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 295--302.
[22]
Peter A Heeman and James F Allen. 1999. Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue. Computational Linguistics 25, 4 (1999), 527--571.
[23]
Julian Hough and David Schlangen. 2015. Recurrent neural networks for incremental disfluency detection. In INTERSPEECH.
[24]
Paria Jamshid Lou and Mark Johnson. 2017. Disfluency Detection using a Noisy Channel Model and a Deep Neural Language Model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 547--553.
[25]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on. IEEE, 1--12.
[26]
Boris Katz, Gary Borchardt, Sue Felshin, and Federico Mora. 2018. A Natural Language Interface for Mobile Devices. (2018).
[27]
Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83--97.
[28]
Kent Lyons, Christopher Skeels, Thad Starner, Cornelis M. Snoeck, Benjamin A. Wong, and Daniel Ashbrook. 2004. Augmenting Conversations Using Dual-purpose Speech. In Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology (UIST '04). ACM, New York, NY, USA, 237--246.
[29]
Michael L Mauldin. 1994. Chatterbots, tinymuds, and the turing test: Entering the loebner prize competition. In AAAI, Vol. 94. 16--21.
[30]
Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for How Users Overcome Obstacles in Voice User Interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 6.
[31]
Daniel OâĂŹSullivan. 2009. Using an adaptive voice user interface to gain efficiencies in automated calls. White Paper, Interactive Digital, Smithtown, USA (2009).
[32]
Aung Pyae and Tapani N. Joelsson. 2018. Investigating the Usability and User Experiences of Voice User Interface: A Case of Google Home Smart Speaker. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (MobileHCI '18). ACM, New York, NY, USA, 127--131.
[33]
Reza Rawassizadeh, Chelsea Dobbins, Manouchehr Nourizadeh, Zahra Ghamchili, and Michael Pazzani. 2017. A natural language query interface for searching personal information on smartwatches. In 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 679--684.
[34]
Sherry Ruan, Jacob O Wobbrock, Kenny Liou, Andrew Ng, and James A Landay. 2018. Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 159.
[35]
Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C Courville, and Joelle Pineau. 2016. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models. In AAAI, Vol. 16. 3776--3784.
[36]
Ben Shneiderman. 2000. The Limits of Speech Recognition. Commun. ACM 43, 9 (Sept. 2000), 63--65.
[37]
Elizabeth Ellen Shriberg. 1994. Preliminaries to a theory of speech disfluencies. Ph.D. Dissertation. Citeseer.
[38]
Shimpei Soda, Masahide Nakamura, Shinsuke Matsumoto, Shintaro Izumi, Hiroshi Kawaguchi, and Masahiko Yoshimoto. 2012. Implementing virtual agent as an interface for smart home voice control. In Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, Vol. 1. IEEE, 342--345.
[39]
Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 173--180.
[40]
Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).
[41]
Richard S Wallace. 2009. The anatomy of ALICE. In Parsing the Turing Test. Springer, 181--210.
[42]
Shaolei Wang, Wanxiang Che, and Ting Liu. 2016. A Neural Attention Model for Disfluency Detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 278--287.
[43]
Jason Wu, Sayan Ghosh, Mathieu Chollet, Steven Ly, Sharon Mozgai, and Stefan Scherer. 2018. NADiA: Neural Network Driven Virtual Human Conversation Agents. In Proceedings of the 18th International Conference on Intelligent Virtual Agents (IVA '18). ACM, New York, NY, USA, 173--178.
[44]
Vicky Zayats, Mari Ostendorf, and Hannaneh Hajishirzi. 2016. Disfluency Detection using a Bidirectional LSTM. CoRR abs/1604.03209 (2016). arXiv:1604.03209 http://arxiv.org/abs/1604.03209
[45]
Simon Zwarts, Mark Johnson, and Robert Dale. 2010. Detecting Speech Repairs Incrementally Using a Noisy Channel Approach. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 1371--1378. http://dl.acm.org/citation.cfm?id=1873781.1873935

Cited By

View all
  • (2023)Stability and Efficiency of Personalised Cultural MarketsProceedings of the ACM Web Conference 202310.1145/3543507.3583315(3447-3455)Online publication date: 30-Apr-2023
  • (2023)Failing with Grace: Exploring the Role of Repair Costs in Conversational Breakdowns with in-Car Voice AssistantsInternational Journal of Human–Computer Interaction10.1080/10447318.2023.226679140:22(7574-7592)Online publication date: 11-Oct-2023
  • (2023)H-Nets: Hyper-hodge Convolutional Neural Networks for Time-Series ForecastingMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43424-2_17(271-289)Online publication date: 18-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 3, Issue 2
June 2019
802 pages
EISSN:2474-9567
DOI:10.1145/3341982
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2019
Accepted: 01 April 2019
Revised: 01 February 2019
Received: 01 November 2018
Published in IMWUT Volume 3, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Conversational Agents
  2. Dialog Interaction
  3. Error Correction
  4. Speech Interfaces
  5. Speech Repair
  6. Voice User Interfaces

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)120
  • Downloads (Last 6 weeks)13
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Stability and Efficiency of Personalised Cultural MarketsProceedings of the ACM Web Conference 202310.1145/3543507.3583315(3447-3455)Online publication date: 30-Apr-2023
  • (2023)Failing with Grace: Exploring the Role of Repair Costs in Conversational Breakdowns with in-Car Voice AssistantsInternational Journal of Human–Computer Interaction10.1080/10447318.2023.226679140:22(7574-7592)Online publication date: 11-Oct-2023
  • (2023)H-Nets: Hyper-hodge Convolutional Neural Networks for Time-Series ForecastingMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43424-2_17(271-289)Online publication date: 18-Sep-2023
  • (2022)Input Multimodal Interactions for IoT Environments: smartphones instead of intermediate equipment2022 13th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS55353.2022.9811235(65-69)Online publication date: 21-Jun-2022
  • (2022)Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin ChineseEngineering Psychology and Cognitive Ergonomics10.1007/978-3-031-06086-1_1(3-17)Online publication date: 26-Jun-2022
  • (2021)Discourse Analysis in Voice User Interface ResearchProceedings of the 3rd Conference on Conversational User Interfaces10.1145/3469595.3469622(1-5)Online publication date: 27-Jul-2021
  • (2020)The Role of Conversational Grounding in Supporting Symbiosis Between People and Digital AssistantsProceedings of the ACM on Human-Computer Interaction10.1145/33928384:CSCW1(1-28)Online publication date: 29-May-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media