[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

AdaStreamLite: Environment-adaptive Streaming Speech Recognition on Mobile Devices

Published: 12 January 2024 Publication History

Abstract

Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments.

References

[1]
Zhongxin Bai and Xiao-Lei Zhang. 2021. Speaker recognition based on deep learning: An overview. Neural Networks 140 (2021), 65--99.
[2]
Steven Boll. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on acoustics, speech, and signal processing 27, 2 (1979), 113--120.
[3]
Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, and Hao Zheng. 2017. AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline. In Oriental COCOSDA 2017. Submitted.
[4]
Suliang Bu, Yunxin Zhao, Tuo Zhao, Shaojun Wang, and Mei Han. 2022. Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2705--2715. https://doi.org/10.1109/TASLP.2022.3196168
[5]
Joseph P Campbell. 1997. Speaker recognition: A tutorial. Proc. IEEE 85, 9 (1997), 1437--1462.
[6]
William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4960--4964. https://doi.org/10.1109/ICASSP.2016.7472621
[7]
Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, and Helen Meng. 2022. FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7857--7861. https://doi.org/10.1109/ICASSP43922.2022.9747888
[8]
Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, and Jinyu Li. 2021. Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5904--5908. https://doi.org/10.1109/ICASSP39728.2021.9413535
[9]
Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, and Michiel Bacchiani. 2018. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4774--4778. https://doi.org/10.1109/ICASSP.2018.8462105
[10]
Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-Based Models for Speech Recognition. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/file/1068c6e4c8051cfd4e9ea8072e3189e2-Paper.pdf
[11]
George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42. https://doi.org/10.1109/TASL.2011.2134090
[12]
Jiankang Deng, Jia Guo, Jing Yang, Niannan Xue, Irene Kotsia, and Stefanos Zafeiriou. 2022. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2022), 5962--5979. https://doi.org/10.1109/TPAMI.2021.3087709
[13]
Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485--532.
[14]
Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. 2020. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. In Proc. Interspeech 2020. 3830--3834. https://doi.org/10.21437/Interspeech.2020-2650
[15]
Han Ding, Yizhan Wang, Hao Li, Cui Zhao, Ge Wang, Wei Xi, and Jizhong Zhao. 2022. UltraSpeech: Speech Enhancement by Interaction between Ultrasound and Speech. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 111 (sep 2022), 25 pages. https://doi.org/10.1145/3550303
[16]
Y. Ephraim and D. Malah. 1985. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 33, 2 (1985), 443--445. https://doi.org/10.1109/TASSP.1985.1164550
[17]
Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, and Michael Rubinstein. 2018. Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation. ACM Trans. Graph. 37, 4, Article 112 (jul 2018), 11 pages. https://doi.org/10.1145/3197517.3201357
[18]
Petko Georgiev, Sourav Bhattacharya, Nicholas D. Lane, and Cecilia Mascolo. 2017. Low-Resource Multi-Task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, Article 50 (sep 2017), 19 pages. https://doi.org/10.1145/3131895
[19]
Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012).
[20]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376.
[21]
Alex Graves and Navdeep Jaitly. 2014. Towards End-To-End Speech Recognition with Recurrent Neural Networks. In Proceedings of the 31st International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 32), Eric P. Xing and Tony Jebara (Eds.). PMLR, Bejing, China, 1764--1772. https://proceedings.mlr.press/v32/graves14.html
[22]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 6645--6649. https://doi.org/10.1109/ICASSP.2013.6638947
[23]
Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).
[24]
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, et al. 2020. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100 (2020).
[25]
Mahdi Hajibabaei and Dengxin Dai. 2018. Unified hypersphere embedding for speaker recognition. arXiv preprint arXiv:1807.08312 (2018).
[26]
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-yiin Chang, Kanishka Rao, and Alexander Gruenstein. 2019. Streaming End-to-end Speech Recognition for Mobile Devices. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6381--6385. https://doi.org/10.1109/ICASSP.2019.8682336
[27]
Hu Hu, Tian Tan, and Yanmin Qian. 2018. Generative Adversarial Networks Based Data Augmentation for Noise Robust Speech Recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5044--5048. https://doi.org/10.1109/ICASSP.2018.8462624
[28]
Insider Intelligence. 2023. Voice Assistants in 2023: Usage, growth, and future of the AI voice assistant market. Retrieved July 08, 2023 from https://www.insiderintelligence.com/insights/voice-assistants/
[29]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.
[30]
F. Jelinek. 1976. Continuous speech recognition by statistical methods. Proc. IEEE 64, 4 (1976), 532--556. https://doi.org/10.1109/PROC.1976.10159
[31]
Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, Junmo Park, Sungsoo Kim, Sichen Jin, Young-Yoon Lee, Jinsu Yeo, Daehyun Kim, Seokyeong Jung, et al. 2019. Attention based on-device streaming speech recognition with large speech corpus. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 956--963.
[32]
Lantian Li, Ruiqian Nai, and Dong Wang. 2022. Real Additive Margin Softmax for Speaker Verification. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7527--7531. https://doi.org/10.1109/ICASSP43922.2022.9747166
[33]
Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine 37, 3 (2020), 50--60. https://doi.org/10.1109/MSP.2020.2975749
[34]
J.S. Lim and A.V. Oppenheim. 1979. Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67, 12 (1979), 1586--1604. https://doi.org/10.1109/PROC.1979.11540
[35]
Sicong Liu, Bin Guo, Ke Ma, Zhiwen Yu, and Junzhao Du. 2021. AdaSpring: Context-Adaptive and Runtime-Evolutionary Deep Model Compression for Mobile Applications. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 1, Article 24 (mar 2021), 22 pages. https://doi.org/10.1145/3448125
[36]
Philipos C Loizou. 2013. Speech enhancement: theory and practice. CRC press.
[37]
Xugang Lu, Yu Tsao, Shigeki Matsuda, and Chiori Hori. 2013. Speech enhancement based on deep denoising autoencoder. In Proc. Interspeech 2013. 436--440. https://doi.org/10.21437/Interspeech.2013-130
[38]
Douglas A Lyon. 2009. The discrete fourier transform, part 4: spectral leakage. Journal of object technology 8, 7 (2009).
[39]
Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, and Yonghong Yan. 2020. Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 1452--1465. https://doi.org/10.1109/TASLP.2020.2987752
[40]
Zhaoxu Nian, Yan-Hui Tu, Jun Du, and Chin-Hui Lee. 2021. A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6913--6917. https://doi.org/10.1109/ICASSP39728.2021.9413395
[41]
Sergey Novoselov, Vadim Shchemelinin, Andrey Shulipa, Alexandr Kozlov, and Ivan Kremnev. 2018. Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition. In Proc. Interspeech 2018. 2242--2246. https://doi.org/10.21437/Interspeech.2018-1209
[42]
Koji Okabe, Takafumi Koshinaka, and Koichi Shinoda. 2018. Attentive Statistics Pooling for Deep Speaker Embedding. In Proc. Interspeech 2018. 2252--2256. https://doi.org/10.21437/Interspeech.2018-993
[43]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206--5210. https://doi.org/10.1109/ICASSP.2015.7178964
[44]
Santiago Pascual, Antonio Bonafonte, and Joan Serra. 2017. SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017).
[45]
L.R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257--286. https://doi.org/10.1109/5.18626
[46]
Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017).
[47]
D Raj Reddy. 1976. Speech recognition by machine: A review. Proc. IEEE 64, 4 (1976), 501--531.
[48]
Leda Sarı, Niko Moritz, Takaaki Hori, and Jonathan Le Roux. 2020. Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7384--7388. https://doi.org/10.1109/ICASSP40776.2020.9054249
[49]
Hendrik Schroter, Alberto N. Escalante-B, Tobias Rosenkranz, and Andreas Maier. 2022. Deepfilternet: A Low Complexity Speech Enhancement Framework for Full-Band Audio Based On Deep Filtering. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7407--7411. https://doi.org/10.1109/ICASSP43922.2022.9747055
[50]
Eric Hal Schwartz. 2021. EU Publishes Privacy Guidelines for Voice Assistants for Comment. Retrieved April 27, 2023 from https://voicebot.ai/2021/03/23/eu-publishes-privacy-guidelines-for-voice-assistants-for-comment/
[51]
Michael L. Seltzer, Dong Yu, and Yongqiang Wang. 2013. An investigation of deep neural networks for noise robust speech recognition. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 7398--7402. https://doi.org/10.1109/ICASSP.2013.6639100
[52]
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5329--5333. https://doi.org/10.1109/ICASSP.2018.8461375
[53]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.
[54]
Joachim Thiemann, Nobutaka Ito, and Emmanuel Vincent. 2013. DEMAND: a collection of multi-channel recordings of acoustic noise in diverse environments. In Proc. Meetings Acoust. 1--6.
[55]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
[56]
Ehsan Variani, Ke Wu, Michael D Riley, David Rybach, Matt Shannon, and Cyril Allauzen. 2022. Global Normalization for Streaming Speech Recognition in a Modular Framework. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 4257--4269. https://proceedings.neurips.cc/paper_files/paper/2022/file/1b4839ff1f843b6be059bd0e8437e975-Paper-Conference.pdf
[57]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[58]
Li Wan, Quan Wang, Alan Papir, and Ignacio Lopez Moreno. 2018. Generalized End-to-End Loss for Speaker Verification. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4879--4883. https://doi.org/10.1109/ICASSP.2018.8462665
[59]
Yuxuan Wang, Arun Narayanan, and DeLiang Wang. 2014. On Training Targets for Supervised Speech Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 12 (2014), 1849--1858. https://doi.org/10.1109/TASLP.2014.2352935
[60]
Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ-Skerry Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, and Rif A Saurous. 2018. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. In International Conference on Machine Learning. PMLR, 5180--5189.
[61]
Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey, and Tomoki Hayashi. 2017. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition. IEEE Journal of Selected Topics in Signal Processing 11, 8 (2017), 1240--1253. https://doi.org/10.1109/JSTSP.2017.2763455
[62]
Yuheng Wei, Junzhao Du, and Hui Liu. 2020. Angular Margin Centroid Loss for Text-Independent Speaker Recognition. In Proc. Interspeech 2020. 3820--3824. https://doi.org/10.21437/Interspeech.2020-2538
[63]
Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee. 2015. A Regression Approach to Speech Enhancement Based on Deep Neural Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 1 (2015), 7--19. https://doi.org/10.1109/TASLP.2014.2364452
[64]
Binbin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu, and Zhendong Peng. 2022. WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6182--6186. https://doi.org/10.1109/ICASSP43922.2022.9746682
[65]
Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, and Jianwei Niu. 2022. Wenet 2.0: More productive end-to-end speech recognition toolkit. arXiv preprint arXiv:2203.15455 (2022).
[66]
Binbin Zhang, Di Wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, and Xin Lei. 2020. Unified streaming and non-streaming two-pass end-to-end model for speech recognition. arXiv preprint arXiv:2012.05481 (2020).
[67]
Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, and Shankar Kumar. 2020. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7829--7833. https://doi.org/10.1109/ICASSP40776.2020.9053896
[68]
Qian Zhang, Dong Wang, Run Zhao, Yinggang Yu, and Junjie Shen. 2021. Sensing to Hear: Speech Enhancement for Mobile Devices Using Acoustic Signals. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3, Article 137 (sep 2021), 30 pages. https://doi.org/10.1145/3478093
[69]
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung yi Lee, and Helen Meng. 2022. MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification. In Proc. Interspeech 2022. 306--310. https://doi.org/10.21437/Interspeech.2022-563
[70]
Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, and Björn Schuller. 2018. Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments. ACM Trans. Intell. Syst. Technol. 9, 5, Article 49 (apr 2018), 28 pages. https://doi.org/10.1145/3178115
[71]
Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. 2019. Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing. Proc. IEEE 107, 8 (2019), 1738--1762. https://doi.org/10.1109/JPROC.2019.2918951

Cited By

View all
  • (2024)Emotional Resonance Unleashed by exploring Novel Audio Classification Techniques with Log- Melspectrogram Augmentation2024 OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 4.010.1109/OTCON60325.2024.10687403(1-5)Online publication date: 5-Jun-2024
  • (2024)Harmonizing Emotions: A Novel Approach to Audio Emotion Classification using Log-Melspectrogram with Augmentation2024 International Conference on Communication, Computing and Internet of Things (IC3IoT)10.1109/IC3IoT60841.2024.10550216(1-4)Online publication date: 17-Apr-2024

Index Terms

  1. AdaStreamLite: Environment-adaptive Streaming Speech Recognition on Mobile Devices

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 4
    December 2023
    1613 pages
    EISSN:2474-9567
    DOI:10.1145/3640795
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 January 2024
    Published in IMWUT Volume 7, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. acoustic environment sensing
    2. ambient noise adaptation
    3. on-device speech recognition
    4. streaming speech recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)281
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Emotional Resonance Unleashed by exploring Novel Audio Classification Techniques with Log- Melspectrogram Augmentation2024 OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 4.010.1109/OTCON60325.2024.10687403(1-5)Online publication date: 5-Jun-2024
    • (2024)Harmonizing Emotions: A Novel Approach to Audio Emotion Classification using Log-Melspectrogram with Augmentation2024 International Conference on Communication, Computing and Internet of Things (IC3IoT)10.1109/IC3IoT60841.2024.10550216(1-4)Online publication date: 17-Apr-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media