More Web Proxy on the site http://driver.im/

demonstration

LiveLocalizer: Augmenting Mobile Speech-to-Text with Microphone Arrays, Optimized Localization and Beamforming

Authors:

Artem Dementyev,

Dimitri Kanevsky,

Mathieu Parvaix,

Alex OlwalAuthors Info & Claims

UIST '23 Adjunct: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

Article No.: 75, Pages 1 - 3

https://doi.org/10.1145/3586182.3615789

Published: 29 October 2023 Publication History

Abstract

Speech-to-text capabilities on mobile devices have proven helpful for language translation, note-taking, hearing and speech accessibility, and meeting transcripts. However, their usefulness is constrained by being unable to distinguish between multiple speakers, track which direction speech is coming from, and provide acceptable performance in noisy environments.

This work introduces efficient real-time audio localization and adaptive beamforming algorithms on custom sound perception hardware running on a low-power microcontroller and four integrated microphones. A prototype is implemented in a phone case form factor and is plug-and-play with modern smartphones.

We characterize the performance in technical evaluations of localization, beamforming, and diarization. We demonstrate how the phone case extends existing smartphones with speaker diarization in a speech-to-text app, sound direction visualization, and sound enhancement through beamforming. In the future, we hope our approach will inspire the widespread adoption of advanced microphone arrays that natively unlock the potential of spatial sound processing and perception in mobile and wearable devices.

Supplemental Material

ZIP File

Supplemental File

Download
194.20 MB

References

[1]

Android. 2022. Introducing Live Transcribe. https://www.android.com/accessibility/live-transcribe/. Accessed 2022-03-26.

[2]

Android. 2022. SpeechRecognizer API Documentation). https://developer.android.com/reference/android/speech/SpeechRecognizer. Accessed 2022-10-25.

[3]

ARM. 2022. CMSIS DSP Software Library. https://www.keil.com/pack/doc/CMSIS/DSP/html/index.html. Accessed 2022-05-12.

[4]

Jørgen Grythe and AS Norsonic. 2015. Beamforming algorithms-beamformers. Technical Note, Norsonic AS, Norway (2015).

[5]

Ru Guo, Yiru Yang, Johnson Kuang, Xue Bin, Dhruv Jain, Steven Goodman, Leah Findlater, and Jon Froehlich. 2020. HoloSound: Combining Speech and Sound Identification for Deaf or Hard of Hearing Users on a Head-Mounted Display. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 71, 4 pages. https://doi.org/10.1145/3373625.3418031

Digital Library

[6]

Dhruv Jain, Leah Findlater, Jamie Gilkeson, Benjamin Holland, Ramani Duraiswami, Dmitry Zotkin, Christian Vogler, and Jon E. Froehlich. 2015. Head-Mounted Display Visualizations to Support Sound Awareness for the Deaf and Hard of Hearing. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 241–250. https://doi.org/10.1145/2702123.2702393

Digital Library

[7]

Ellington Kirby, Seoyoon Park, Yan Wang, and Yingying Chen. 2016. HearHere: Smartphone Based Audio Localization Using Time Difference of Arrival: Demo. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking (New York City, New York) (MobiCom ’16). Association for Computing Machinery, New York, NY, USA, 509–510. https://doi.org/10.1145/2973750.2985625

Digital Library

[8]

Charles Knapp and Glifford Carter. 1976. The generalized correlation method for estimation of time delay. IEEE transactions on acoustics, speech, and signal processing 24, 4 (1976), 320–327.

[9]

Raja S. Kushalnagar, Gary W. Behm, Aaron W. Kelstone, and Shareef Ali. 2015. Tracked Speech-To-Text Display: Enhancing Accessibility and Readability of Real-Time Speech-To-Text. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (Lisbon, Portugal) (ASSETS ’15). Association for Computing Machinery, New York, NY, USA, 223–230. https://doi.org/10.1145/2700648.2809843

Digital Library

[10]

Ahmet Köse, Aleksei Tepljakov, and Sergei Astapov. 2017. Real-time localization and visualization of a sound source for virtual reality applications. In 2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM). 1–6. https://doi.org/10.23919/SOFTCOM.2017.8115577

[11]

Hong Liu and Miao Shen. 2010. Continuous sound source localization based on microphone array for mobile robots. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4332–4339.

[12]

Microsoft. 2022. Translator. https://translator.microsoft.com/. Accessed 2022-03-26.

[13]

Pius Kavuma Basajjabaka Mugagga and Simon Winberg. 2015. Sound source localisation on Android smartphones: A first step to using smartphones as auditory sensors for training A.I systems with Big Data. In AFRICON 2015. 1–5. https://doi.org/10.1109/AFRCON.2015.7331970

[14]

Matthew Seita. 2020. Designing Automatic Speech Recognition Technologies to Improve Accessibility for Deaf and Hard-of-Hearing People in Small Group Meetings. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3334480.3375039

Digital Library

[15]

Giuseppe Valenzise, Luigi Gerosa, Marco Tagliasacchi, Fabio Antonacci, and Augusto Sarti. 2007. Scream and gunshot detection and localization for audio-surveillance systems. In 2007 IEEE Conference on Advanced Video and Signal Based Surveillance. IEEE, 21–26.

Digital Library

Index Terms

LiveLocalizer: Augmenting Mobile Speech-to-Text with Microphone Arrays, Optimized Localization and Beamforming
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Graphics input devices
  2. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings

In distributed meeting applications, microphone arrays have been widely used to capture superior speech sound and perform speaker localization through sound source localization (SSL) and beamforming. This paper presents a unified maximum likelihood ...
Verified speaker localization utilizing voicing level in split-bands

This paper proposes a joint verification-localization structure based on split-band analysis of speech signal and the mixed voicing level. To address the problems in reverberant acoustic environments, a new fundamental frequency estimation algorithm is ...
Position independent close-talking microphone

Close-talking microphones are required for speech pick-up in high noise environments and are based on the assumption that unwanted noise sources are distant from a desired nearfield source. Common close-talking microphones in use today have to be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '23 Adjunct: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

October 2023

424 pages

ISBN:9798400700965

DOI:10.1145/3586182

Editors:
Sean Follmer
Stanford University, USA
,
Jeff Han,
Jürgen Steimle
Saarland University, Germany
,
Nathalie Henry Riche
Microsoft Research, USA

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Check for updates

Author Tags

Qualifiers

Demonstration
Research
Refereed limited

Conference

UIST '23

Sponsor:

UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology

October 29 - November 1, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 355 of 1,733 submissions, 20%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
116
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)2

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten