[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

AmbiEar: mmWave Based Voice Recognition in NLoS Scenarios

Published: 07 September 2022 Publication History

Abstract

Millimeter wave (mmWave) based sensing is a significant technique that enables innovative smart applications, e.g., voice recognition. The existing works in this area require direct sensing of the human's near-throat region and consequently have limited applicability in non-line-of-sight (NLoS) scenarios. This paper proposes AmbiEar, the first mmWave based voice recognition approach applicable in NLoS scenarios. AmbiEar is based on the insight that the human's voice causes correlated vibrations of the surrounding objects, regardless of the human's position and posture. Therefore, AmbiEar regards the surrounding objects as ears that can perceive sound and realizes indirect sensing of the human's voice by sensing the vibration of the surrounding objects. By incorporating the designs like common component extraction, signal superimposition, and encoder-decoder network, AmbiEar tackles the challenges induced by low-SNR and distorted signals. We implement AmbiEar on a commercial mmWave radar and evaluate its performance under different settings. The experimental results show that AmbiEar has a word recognition accuracy of 87.21% in NLoS scenarios and reduces the recognition error by 35.1%, compared to the direct sensing approach.

References

[1]
Khamis A Al-Karawi, Ahmed H Al-Noori, Francis F Li, Tim Ritchings, et al. 2015. Automatic speaker recognition system in adverse conditions---implication of noise and reverberation on system performance. International Journal of Information and Electronics Engineering 5, 6 (2015), 423--427.
[2]
Jacob Benesty, Jingdong Chen, and Yiteng Huang. 2005. A generalized MVDR spectrum. IEEE Signal Processing Letters 12, 12 (2005), 827--830.
[3]
Google Brain. 2017. TensorFlow Speech Recognition Challenge. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge.
[4]
Baicheng Chen, Huining Li, Zhengxiong Li, Xingyu Chen, Chenhan Xu, and Wenyao Xu. 2020. ThermoWave: a new paradigm of wireless passive temperature monitoring via mmWave sensing. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.
[5]
Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In International conference on machine learning. PMLR, 933--941.
[6]
Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham J Mysore, Fredo Durand, and William T Freeman. 2014. The visual microphone: Passive recovery of sound from video. (2014).
[7]
Rohan Doshi, Youzheng Chen, Liyang Jiang, Xia Zhang, Fadi Biadsy, Bhuvana Ramabhadran, Fang Chu, Andrew Rosenberg, and Pedro J Moreno. 2021. Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6988--6992.
[8]
Jean-Louis Durrieu, Gaël Richard, Bertrand David, and Cédric Févotte. 2010. Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE transactions on audio, speech, and language processing 18, 3 (2010), 564--575.
[9]
M. Ester, H. P. Kriegel, Jrg Sander, and X. Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. AAAI Press (1996).
[10]
Walter Gander, Gene H Golub, and Rolf Strebel. 1994. Least-squares fitting of circles and ellipses. BIT Numerical Mathematics 34, 4 (1994), 558--578.
[11]
Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:12113711 (2012).
[12]
Junchen Guo, Meng Jin, Yuan He, Weiguo Wang, and Yunhao Liu. 2021. Dancing Waltz with Ghosts: Measuring Sub-mm-level 2D Rotor Orbit with a Single mmWave Radar. (2021).
[13]
Unsoo Ha, Salah Assana, and Fadel Adib. 2020. Contactless seismocardiography via deep learning radars. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.
[14]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.
[15]
Texas Instruments Incorporated. 2020. IWR1642: Single-chip 76-GHz to 81-GHz mmWave sensor integrating DSP and MCU. https://www.ti.com/product/IWR1642.
[16]
Texas Instruments Incorporated. 2020. Real-time data-capture adapter for radar sensing evaluation module. http://www.ti.com/tool/DCA1000EVM.
[17]
Chengkun Jiang, Junchen Guo, Yuan He, Meng Jin, Shuai Li, and Yunhao Liu. 2020. mmVib: micrometer-level vibration measurement with mmwave radar. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--13.
[18]
Kam-Chuen Jim, C Lee Giles, and Bill G Horne. 1996. An analysis of noise in recurrent neural networks: convergence and generalization. IEEE Transactions on neural networks 7, 6 (1996), 1424--1438.
[19]
Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Mary-Louise McLaws, and Wen Hu. 2020. RFWash: a weakly supervised tracking of hand hygiene technique. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 572--584.
[20]
Michael Levandowsky and David Winter. 1971. Distance between sets. Nature 234, 5323 (1971), 34--35.
[21]
Huining Li, Chenhan Xu, Aditya Singh Rathore, Zhengxiong Li, Hanbin Zhang, Chen Song, Kun Wang, Lu Su, Feng Lin, Kui Ren, et al. 2020. VocalPrint: exploring a resilient and secure voice authentication via mmWave biometric interrogation. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 312--325.
[22]
Zhengxiong Li, Baicheng Chen, Zhuolin Yang, Huining Li, Chenhan Xu, Xingyu Chen, Kun Wang, and Wenyao Xu. 2019. Ferrotag: A paper-based mmwave-scannable tagging infrastructure. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems. 324--337.
[23]
Jaime Lien, Nicholas Gillian, M Emre Karagozler, Patrick Amihood, Carsten Schwesig, Erik Olson, Hakim Raja, and Ivan Poupyrev. 2016. Soli: Ubiquitous gesture sensing with millimeter wave radar. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--19.
[24]
Tiantian Liu, Ming Gao, Feng Lin, Chao Wang, Zhongjie Ba, Jinsong Han, Wenyao Xu, Kui Ren, et al. 2021. Wavoice: A Noise-resistant Multi-modal Speech Recognition System Fusing mmWave and Audio Signals. In Proceedings of the 19th Conference on Embedded Networked Sensor Systems.
[25]
Lu-Shih Alex Low, Namunu C Maddage, Margaret Lech, Lisa Sheeber, and Nicholas Allen. 2009. Content based clinical depression detection in adolescents. In 2009 17th European Signal Processing Conference. IEEE, 2362--2366.
[26]
Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. 2010. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010).
[27]
James Munkres. 1957. Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics 5, 1 (1957), 32--38.
[28]
Muhammed Zahid Ozturk, Chenshu Wu, Beibei Wang, and K. J. Ray Liu. 2021. RadioMic: Sound Sensing via mmWave Signals. arXiv:2108.03164 [eess.SP]
[29]
Ashutosh Pandey and DeLiang Wang. 2019. Exploring deep complex networks for complex spectrogram enhancement. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6885--6889.
[30]
Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. 2019. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019).
[31]
K Sreenivasa Rao and Sourjya Sarkar. 2014. Robust speaker recognition in noisy environments. Springer.
[32]
Patrice Robisson, Thierry Aubin, and Jean-Claude Bremond. 1993. Individuality in the voice of the emperor penguin Aptenodytes forsteri: adaptation to a noisy environment. Ethology 94, 4 (1993), 279--290.
[33]
Hermann Rohling. 1983. Radar CFAR thresholding in clutter and multiple target situations. IEEE transactions on aerospace and electronic systems 4 (1983), 608--621.
[34]
Aaron E Rosenberg, Chin-Hui Lee, and Frank K Soong. 1994. Cepstral channel normalization techniques for HMM-based speaker verification. In Third International Conference on Spoken Language Processing.
[35]
Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing 26, 1 (1978), 43--49.
[36]
Stan Salvador and Philip Chan. 2007. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11, 5 (2007), 561--580.
[37]
Sriram Sami, Yimin Dai, Sean Rui Xiang Tan, Nirupam Roy, and Jun Han. 2020. Spying with your robot vacuum cleaner: eavesdropping via lidar sensors. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 354--367.
[38]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.
[39]
Petre Stoica, Zhisong Wang, and Jian Li. 2002. Robust capon beamforming. In Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002., Vol. 1. IEEE, 876--880.
[40]
Z Tufekci and John N Gowdy. 2000. Feature extraction using discrete wavelet transform for speech recognition. In Proceedings of the IEEE SoutheastCon 2000.'Preparing for The New Millennium'(Cat. No. 00CH37105). IEEE, 116--123.
[41]
Diana Van Lancker, Jody Kreiman, and Karen Emmorey. 1985. Familiar voice recognition: patterns and parameters Part I: Recognition of backward voices. Journal of phonetics 13, 1 (1985), 19--38.
[42]
Saeed V Vaseghi. 2008. Advanced digital signal processing and noise reduction. John Wiley & Sons.
[43]
Daria Vazhenina and Konstantin Markov. 2020. End-to-end noisy speech recognition using Fourier and Hilbert spectrum features. Electronics 9, 7 (2020), 1157.
[44]
Teng Wei, Shu Wang, Anfu Zhou, and Xinyu Zhang. 2015. Acoustic eavesdropping through wireless vibrometry. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking. 130--141.
[45]
Teng Wei and Xinyu Zhang. 2015. mtrack: High-precision passive tracking using millimeter wave radios. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking. 117--129.
[46]
Wikipedia. 2021. Alibaba Tmall Genie. https://en.wikipedia.org/wiki/Tmall_Genie.
[47]
Wikipedia. 2021. Amazon Alexa. https://en.wikipedia.org/wiki/Amazon_Alexa.
[48]
Wikipedia. 2021. Apple HomePod. https://en.wikipedia.org/wiki/HomePod.
[49]
Wikipedia. 2021. Google Nest. https://en.wikipedia.org/wiki/Google_Nest_(smart_speakers).
[50]
Chenshu Wu, Feng Zhang, Beibei Wang, and KJ Ray Liu. 2020. mmTrack: Passive multi-person localization using commodity millimeter wave radio. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 2400--2409.
[51]
Chenshu Wu, Feng Zhang, Beibei Wang, and KJ Ray Liu. 2020. mSense: Towards Mobile Material Sensing with a Single Millimeter-Wave Radio. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1--20.
[52]
Chenhan Xu, Zhengxiong Li, Hanbin Zhang, Aditya Singh Rathore, Huining Li, Chen Song, Kun Wang, and Wenyao Xu. 2019. Waveear: Exploring a mmwave-based noise-resistant speech sensing for voice-user interface. In Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services. 14--26.
[53]
Zhicheng Yang, Parth H Pathak, Yunze Zeng, Xixi Liran, and Prasant Mohapatra. 2016. Monitoring vital signs using millimeter wave. In Proceedings of the 17th ACM international symposium on mobile ad hoc networking and computing. 211--220.
[54]
Feng Zhang, Chenshu Wu, Beibei Wang, and KJ Ray Liu. 2020. mmEye: Super-Resolution Millimeter Wave Imaging. IEEE Internet of Things Journal (2020).

Cited By

View all
  • (2024)HomeOSD: Appliance Operating-Status Detection Using mmWave RadarSensors10.3390/s2409291124:9(2911)Online publication date: 2-May-2024
  • (2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
  • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 6, Issue 3
September 2022
1612 pages
EISSN:2474-9567
DOI:10.1145/3563014
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2022
Published in IMWUT Volume 6, Issue 3

Check for updates

Author Tags

  1. Millimeter Wave
  2. Voice Recognition
  3. Wireless Sensing

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NSFC
  • The Guoqiang Institute, Tsinghua University

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)389
  • Downloads (Last 6 weeks)47
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HomeOSD: Appliance Operating-Status Detection Using mmWave RadarSensors10.3390/s2409291124:9(2911)Online publication date: 2-May-2024
  • (2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
  • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
  • (2024)PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor DataProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595948:2(1-24)Online publication date: 15-May-2024
  • (2024)AutoAugHAR: Automated Data Augmentation for Sensor-based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595898:2(1-27)Online publication date: 15-May-2024
  • (2024)Intelligent Wearable Systems: Opportunities and Challenges in Health and SportsACM Computing Surveys10.1145/364846956:7(1-42)Online publication date: 9-Apr-2024
  • (2024)MetaFormerProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435508:1(1-27)Online publication date: 6-Mar-2024
  • (2024)Community Archetypes: An Empirical Framework for Guiding Research Methodologies to Reflect User Experiences of Sense of Virtual Community on RedditProceedings of the ACM on Human-Computer Interaction10.1145/36373108:CSCW1(1-33)Online publication date: 26-Apr-2024
  • (2024)SURF: Eavesdropping on Underwater Communications from the AirProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690663(815-829)Online publication date: 4-Dec-2024
  • (2024)Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314447:4(1-23)Online publication date: 12-Jan-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media