research-article

A smart speaker performance measurement tool

Authors:

Hyunsu Mun,

Hyungjin Lee,

Soohyun Kim,

Youngseok LeeAuthors Info & Claims

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Pages 755 - 762

https://doi.org/10.1145/3341105.3373990

Published: 30 March 2020 Publication History

Get Access

Abstract

Recently voice-controlled virtual assistants (VA) in smart speakers or smartphones have been popular. As VA provides interactive services by executing complicated processes such as speech recognition, natural language understanding, service invocation, and TTS generation jobs, its functions are performed in the cloud. However, we do not know why the response time of voice commands is slow and what is the performance bottleneck of the VA service. In this paper, we present a comprehensive VA performance measurement framework that analyzes the timing events and the response time by processing audio, video and packets. From experiments of 414 voice commands with five smart speakers and 178 commands for two VAs in smartphones, we observed that 24.9% of voice commands are completed within two seconds and 63.2% within three seconds and 36.8% of voice commands over three seconds result in poor user experiences. In particular, 96.2% of music commands and 66.7% of IoT control commands show the slow response time longer than three seconds. We found that our performance measurement tool is useful for finding the slow service such as music and news with the overhead of extracting the user intent from the voice command, the content app startup delay, and the initial playback time. Our tool shows that IoT control with a smart speaker produces the slow response time.

References

[1]

Steven Guamán, Adrián Calvopiña, Pamela Orta, Freddy Tapia, and Sang Guun Yoo. Device control system for a smart home using voice commands: A practical case. In Proceedings of the 2018 10th International Conference on Information Management and Engineering, pages 86--89. ACM, 2018.

Digital Library

Google Scholar

[2]

Shih-Chieh Lin, Chang-Hong Hsu, Walter Talamonti, Yunqi Zhang, Steve Oney, Jason Mars, and Lingjia Tang. Adasa: A conversational in-vehicle digital assistant for advanced driver assistance features. In The 31st Annual ACM Symposium on User Interface Software and Technology, pages 531--542. ACM, 2018. Driverś VA implementation.

Google Scholar

[3]

Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottrjdge. Understanding the long-term use of smart speaker assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(3):91, 2018.

Google Scholar

[4]

Josephine Lau, Benjamin Zimmerman, and Florian Schaub. Alexa, are you listening?: Privacy perceptions, concerns and privacy-seeking behaviors with smart speakers. Proc. ACM Hum.-Comput. Interact., 2(CSCW):102:1--102:31, November 2018.

Digital Library

Google Scholar

[5]

Rickard Hjulström. Evaluation of a speech recognition system pocketsphinx, 2015.

Google Scholar

[6]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818--2826, 2016.

Crossref

Google Scholar

[7]

Apple Machine Learning Journal. Hey siri: An on-device dnn-powered voice trigger for apple's personal assistant, https://machinelearning.apple.com/2017/10/01/hey-siri.html, 2017.

Google Scholar

[8]

Xianghang Mi, Feng Qian, Ying Zhang, and XiaoFeng Wang. An empirical characterization of ifttt: ecosystem, usage, and performance. In Proceedings of the 2017 Internet Measurement Conference, pages 398--404. ACM, 2017.

Digital Library

Google Scholar

[9]

Aung Pyae and Paul Scifleet. Investigating differences between native english and non-native english speakers in interacting with a voice user interface: a case of google home. In Proceedings of the 30th Australian Conference on Computer-Human Interaction, pages 548--553. ACM, 2018.

Digital Library

Google Scholar

[10]

Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, and Paul A Crook. Measuring user satisfaction on smart speaker intelligent assistants using intent sensitive query embeddings. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1183--1192. ACM, 2018.

Digital Library

Google Scholar

[11]

Hank Liao, Golan Pundak, Olivier Siohan, Melissa K Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N Sainath, Andrew Senior, Françoise Beaufays, and Michiel Bacchiani. Large vocabulary automatic speech recognition for children. In Sixteenth Annual Conference of the International Speech Communication Association, 2015.

Crossref

Google Scholar

[12]

Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, and Paul A Crook. Impact of domain and user's learning phase on task and session identification in smart speaker intelligent assistants. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1193--1202. ACM, 2018.

Digital Library

Google Scholar

Cited By

View all

Rostami MLiu ASundaresan K(2024)Scalable Acoustic IoT through Composable Distributed Beamforming Tags2024 23rd ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)10.1109/IPSN61024.2024.00008(39-50)Online publication date: 13-May-2024
https://doi.org/10.1109/IPSN61024.2024.00008
Wei JTag BTrippas JDingler TKostakos V(2022)What Could Possibly Go Wrong When Interacting with Proactive Smart Speakers? A Case Study Using an ESM ApplicationProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517432(1-15)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3517432
Chang J(2022)Enabling progressive system integration for AIoT and speech-based HCI through semantic-aware computingThe Journal of Supercomputing10.1007/s11227-021-03996-x78:3(3288-3324)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s11227-021-03996-x

Index Terms

A smart speaker performance measurement tool
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

Understanding the Long-Term Use of Smart Speaker Assistants

Over the past two years the Ubicomp vision of ambient voice assistants, in the form of smart speakers such as the Amazon Echo and Google Home, has been integrated into tens of millions of homes. However, the use of these systems over time in the home has ...
Measurement of Smart Speaker Wake-up Response Time with Camera (poster)
MobiSys '19: Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services

Voice-controlled smart speakers are popular due to Amazon Echo and Google Home. Though many smart speakers have appeared in the market, we do not know the exact performance of smart speakers. In particular, when we call a smart speaker to issue a voice ...
Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Voice command in multi-room smart homes for assisting people in loss of autonomy in their daily activities faces several challenges, one of them being the distant condition which impacts ASR performance. This paper presents an overview of multiple ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

March 2020

2348 pages

ISBN:9781450368667

DOI:10.1145/3341105

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Tomas Cerny
Baylor University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Alessio Bechini
University of Pisa, Italy

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF-2016R1D1A1A09916326)
This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC support program(IITP-2019-2016-0-00304) supervised by the IITP

Conference

SAC '20

Sponsor:

SIGAPP

SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing

March 30 - April 3, 2020

Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
451
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)9

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Rostami MLiu ASundaresan K(2024)Scalable Acoustic IoT through Composable Distributed Beamforming Tags2024 23rd ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)10.1109/IPSN61024.2024.00008(39-50)Online publication date: 13-May-2024
https://doi.org/10.1109/IPSN61024.2024.00008
Wei JTag BTrippas JDingler TKostakos V(2022)What Could Possibly Go Wrong When Interacting with Proactive Smart Speakers? A Case Study Using an ESM ApplicationProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517432(1-15)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3517432
Chang J(2022)Enabling progressive system integration for AIoT and speech-based HCI through semantic-aware computingThe Journal of Supercomputing10.1007/s11227-021-03996-x78:3(3288-3324)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s11227-021-03996-x

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Understanding the Long-Term Use of Smart Speaker Assistants

Measurement of Smart Speaker Wake-up Response Time with Camera (poster)

Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command