Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences. Key features: Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech. Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments. Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR. Includes contributions from top ASR researchers from leading research units in the field
Cited By
- Borgström B and Brandstein M (2024). A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement, IEEE/ACM Transactions on Audio, Speech and Language Processing, 32, (2418-2431), Online publication date: 1-Jan-2024.
- Krause D, García-Barrios G, Politis A and Mesaros A (2024). Binaural Sound Source Distance Estimation and Localization for a Moving Listener, IEEE/ACM Transactions on Audio, Speech and Language Processing, 32, (996-1011), Online publication date: 1-Jan-2024.
- Cho B and Park H (2021). Convolutional Maximum-Likelihood Distortionless Response Beamforming With Steering Vector Estimation for Robust Speech Recognition, IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, (1352-1367), Online publication date: 1-Jan-2021.
- Bishop J, Falzon G, Trotter M, Kwan P and Meek P (2022). Livestock vocalisation classification in farm soundscapes, Computers and Electronics in Agriculture, 162:C, (531-542), Online publication date: 1-Jul-2019.
- Zhang Z, Geiger J, Pohjalainen J, Mousa A, Jin W and Schuller B (2018). Deep Learning for Environmentally Robust Speech Recognition, ACM Transactions on Intelligent Systems and Technology, 9:5, (1-28), Online publication date: 30-Sep-2018.
- Ebrahim Kafoori K and Ahadi S (2018). Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data, Circuits, Systems, and Signal Processing, 37:4, (1625-1648), Online publication date: 1-Apr-2018.
- Sivasankaran S, Vincent E and Illina I (2017). A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions, Computer Speech and Language, 46:C, (444-460), Online publication date: 1-Nov-2017.
- Lin P, Lyu D, Chen F, Wang S and Tsao Y (2017). Multi-style learning with denoising autoencoders for acoustic modeling in the internet of things (IoT), Computer Speech and Language, 46:C, (481-495), Online publication date: 1-Nov-2017.
- Gonzalez J, Gómez A, Peinado A, Ma N and Barker J (2017). Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition, Circuits, Systems, and Signal Processing, 36:9, (3731-3760), Online publication date: 1-Sep-2017.
- Trowitzsch I, Mohr J, Kashef Y, Obermayer K, Trowitzsch I, Mohr J, Kashef Y and Obermayer K (2017). Robust Detection of Environmental Sounds in Binaural Auditory Scenes, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:6, (1344-1356), Online publication date: 1-Jun-2017.
- Gannot S, Vincent E, Markovich-Golan S, Ozerov A, Gannot S, Vincent E, Markovich-Golan S and Ozerov A (2017). A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:4, (692-730), Online publication date: 1-Apr-2017.
- Meutzner H, Gupta S, Nguyen V, Holz T and Kolossa D (2016). Toward Improved Audio CAPTCHAs Based on Auditory Perception and Language Understanding, ACM Transactions on Privacy and Security, 19:4, (1-31), Online publication date: 3-Feb-2017.
- Barker T, Virtanen T, Barker T and Virtanen T (2016). Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:12, (2377-2389), Online publication date: 1-Dec-2016.
- Adiloglu K, Vincent E, Adiloglu K, Vincent E, Vincent E and Adiloglu K (2016). Variational Bayesian Inference for Source Separation and Robust Feature Extraction, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:10, (1746-1758), Online publication date: 1-Oct-2016.
- Cho J and Park H (2016). Independent vector analysis followed by HMM-based feature enhancement for robust speech recognition, Signal Processing, 120:C, (200-208), Online publication date: 1-Mar-2016.
- Gerazov B and Ivanovski Z (2015). Kernel power flow orientation coefficients for noise-robust speech recognition, IEEE/ACM Transactions on Audio, Speech and Language Processing, 23:2, (407-419), Online publication date: 1-Feb-2015.
- Meutzner H, Nguyen V, Holz T and Kolossa D Using automatic speech recognition for attacking acoustic CAPTCHAs Proceedings of the 30th Annual Computer Security Applications Conference, (276-285)