Environment mismatch compensation using average eigenspace-based methods for robust speech recognition

John H. L. Hansen^1,2,
Abhishek Kumar¹ &
Pongtep Angkititrakul¹

221 Accesses
4 Citations
Explore all metrics

Abstract

The performance of speech recognition systems is adversely affected by mismatch between training and test conditions due to environmental factors. In addition to the case of test data from noisy environments, there are scenarios where the training data itself is noisy. In this study, we propose a series of methods for mismatch compensation between training and test environments, based on our “average eigenspace” approach. These methods are also shown to be effective for non-stationary mismatch conditions. An advantage is that there is no need for explicit adaptation data since the method is applied to incoming test data to find the compensatory transform. We evaluate these approaches on two separate corpora which are collected from realistic car environments: CU-Move and UTDrive. Compared with a baseline system incorporating spectral subtraction, highpass filtering and cepstral mean normalization, we obtain a relative word error rate reduction of 17–26 % by applying the proposed techniques. These methods also result in a dimensionality reduction of the feature vectors allowing for a more compact set of acoustic models in the phoneme space, a property important for automatic speech recognition for small footprint size mobile devices such as cell-phone or PDA’s which require ASR in diverse environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

Article 06 August 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abut, H., Hansen, J. H. L., & Takeda, K. (2004). DSP for In-Vehicle and Mobile Systems. New York: Springer.
Google Scholar
Abut, H., Hansen, J. H. L., & Takeda, K. (2006). Advances for In-Vehicle and Mobile Systems: Challenges for International Standards. New York: Springer.
Google Scholar
Angkititrakul, P., Hansen, J. H. L. (2008). In-Vehicle and Mobile Systems. In “UTDrive: The smart vehicle project”. New York: Springer.
Bou-Ghazale, S., & Hansen, J.H.L. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans on Speech and Audio Processing, 8, 429–442.
Google Scholar
Cardoso, J.-F., & Souloumiac, A. (1996). Jacobi angles for simultaneous diagonalization. SIAM Journal of Matrix Analysis and Application, 17(1), 161–164.
Article MathSciNet MATH Google Scholar
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification (2nd ed.). Washington: DC: Wiley.
Google Scholar
Gales, M. J. F. (1998). Predictive model-based compensation schemes for robust speech recognition. Speech Communication, 25, 49–94.
Article Google Scholar
Garofolo, J. S. (1993). Timit Acoustic-Phonetic Continuous Speech Corpus. Philadelphia: Linguistic Data Consortium.
Google Scholar
Hanai, N., & Stern, R. M. (1994). “Robust speech recognition in the automobile”. In Proceedings of the ICSLP, (pp. 1339–1342).
Hansen, J. H. L., & Bria, O. N. (1990) “Lombard effect compensation for robust automatic speech recognition in noise”. In Proceedings of the ICSLP, (pp. 1125–1128).
Hansen, J. H. L., Zhang, X. X., Akbacak, M., Yapanel, U., Pellom, B., Ward, W., & Angkititrakul, P. (2004).“CU-MOVE: Advanced in-vehicle speech systems for route navigation”. In DSP for In-Vehicle and Mobile Systems. New York: Springer.
Hansen, J. H. L., & Clements, M. A. (1991). Constrained iterative speech enhancement for speech recognition. IEEE Transactions on Signal Processing, 39(4), 795–805.
Article Google Scholar
Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communications, 20, 151–170.
Article Google Scholar
Hansen, J. H. L., Huang, R., Zhou, B., Seadle, M., Deller Jr, J. R., Gurijala, A.R., Angkititrakul, P. (2005). Speechfind: Advances in spoken document retrieval for a national gallery of the spoken word. IEEE Transactions on Speech and Audio Processing, 13, 712–730.
Google Scholar
Hermus, K., & Wambacq, P. (2004). “Assessment of signal subspace based speech enhancement for noise robust speech recognition”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (vol. 1, pp. 945–948).
Kim, W., & Hansen, J. H. L. (2009). Timefrequency correlation-based missing-feature reconstruction for robust speech recognition in band-restricted conditions. IEEE Transactions on Audio Speech and Language Processing, 17, 1292–1304.
Article Google Scholar
Kumar, A., & Hansen, J. H. (2008). “Environment mismatch compensation using average eigenspaces for robust speech recognition”. In Proceedings of the Interspeech, (pp. 1277–1280).
Legetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9, 171–185.
Article Google Scholar
Lockwood, P., Boudy, J., & Blanchet, M. (1992)“Non-linear spectral subtraction (NSS) and hidden markov models for robust speech recognition in car noise environments”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (pp. 265–268).
Moreno, P. J., Raj, B., & Stern, R. (1998). Data-driven environmental compensation for speech recognition: A unified approach. Speech Communication, 24, 267–285.
Google Scholar
Nguyen, P., Wellekens, C., & Junqua, J. (1999). “Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments”. In Proceedings of the EUROSPEECH (vol. 6, pp. 2519–2522).
Potamitis, L., Fakotakis, N., & Kokkinakis, G. (2000). Independent component analysis applied to feature extracton for robust automatic speech recognition. Electronic Letters, 36(23), 1977–1978.
Google Scholar
Raj, B., Seltzer, M. L., & Stern, R. M. (2004). Reconstruction of missing features for robust speech recognition. Speech Communication, 43, 275–296.
Google Scholar
Sankar, A., & Lee, C.-H. (1996). A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202.
Article Google Scholar
Takiguchi, T., & Ariki, Y. (2006). “Robust Feature Extraction using Kernel PCA”. In Proceedings of the ICASSP. Vetter, R., Virag, N., Renevey, P., & Vesin, J. M. (1999). “Single channel speech enhancement using principal component analysis and MDL subspace selection“. In Proceedings of the EUROSPEECH, (pp. 2411–2414). Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). “Speech Denoising using Nonnegative Matrix Factorization with Priors”. In Proceedings of the ICASSP, (pp. 4029–4032).
Takiguchi, T., & Ariki, Y. (2006). “Robust Feature Extraction using Kernel PCA”. In Proceedings of the ICASSP. Vetter, R., Virag, N., Renevey, P., & Vesin, J. M. (1999). “Single channel speech enhancement using principal component analysis and MDL subspace selection“. In Proceedings of the EUROSPEECH, (pp. 2411–2414). Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). “Speech Denoising using Nonnegative Matrix Factorization with Priors”. In Proceedings of the ICASSP, (pp. 4029–4032).
Yu, D., et al. (2008). “A minimum mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas:IEEE.
Zhang, X., & Hansen, J. H. L. (2003). CSA-BF: A constrained switched adaptive beamformer for speech enhancement and recognition in real car environments. IEEE Transactions on Speech and Audio Processing, 11(6), 733–745.
Google Scholar
Zhou, B., & Hansen, J. H. L. (2005). Rapid discriminative acoustic modeling based on eigenspace mapping for fast speaker adaptation. IEEE Transactions on Speech and Audio Processing, 13(4), 554–564.
Article Google Scholar
Zou, X., Jancovic, P., & Liu, J. (2006). “The Effectiveness of ICA-based Representation: Application to Speech Feature Extraction for Noise Robust Speaker Recognition”. Proceedings of the EUSIPCO.

Download references

Acknowledgments

This study was funded by AFRL under contract FA8750-12-1-0188 and partially by the University of Texas at Dallas from the Distinguished University Chair in Telecommunications Engineering held by J.H.L. Hansen.

Author information

Authors and Affiliations

Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, Richardson, TX , 75083-0688, USA
John H. L. Hansen, Abhishek Kumar & Pongtep Angkititrakul
Department of Electrical Engineering, Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, The University of Texas at Dallas, 2601 N. Floyd Road, EC33, Richardson, TX , 75080-1407, USA
John H. L. Hansen

Authors

John H. L. Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Pongtep Angkititrakul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John H. L. Hansen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, J.H.L., Kumar, A. & Angkititrakul, P. Environment mismatch compensation using average eigenspace-based methods for robust speech recognition. Int J Speech Technol 17, 353–364 (2014). https://doi.org/10.1007/s10772-014-9233-9

Download citation

Received: 16 October 2013
Accepted: 02 April 2014
Published: 24 April 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10772-014-9233-9

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Environment mismatch compensation using average eigenspace-based methods for robust speech recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation