Abstract
The performance of speech recognition systems is adversely affected by mismatch between training and test conditions due to environmental factors. In addition to the case of test data from noisy environments, there are scenarios where the training data itself is noisy. In this study, we propose a series of methods for mismatch compensation between training and test environments, based on our “average eigenspace” approach. These methods are also shown to be effective for non-stationary mismatch conditions. An advantage is that there is no need for explicit adaptation data since the method is applied to incoming test data to find the compensatory transform. We evaluate these approaches on two separate corpora which are collected from realistic car environments: CU-Move and UTDrive. Compared with a baseline system incorporating spectral subtraction, highpass filtering and cepstral mean normalization, we obtain a relative word error rate reduction of 17–26 % by applying the proposed techniques. These methods also result in a dimensionality reduction of the feature vectors allowing for a more compact set of acoustic models in the phoneme space, a property important for automatic speech recognition for small footprint size mobile devices such as cell-phone or PDA’s which require ASR in diverse environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abut, H., Hansen, J. H. L., & Takeda, K. (2004). DSP for In-Vehicle and Mobile Systems. New York: Springer.
Abut, H., Hansen, J. H. L., & Takeda, K. (2006). Advances for In-Vehicle and Mobile Systems: Challenges for International Standards. New York: Springer.
Angkititrakul, P., Hansen, J. H. L. (2008). In-Vehicle and Mobile Systems. In “UTDrive: The smart vehicle project”. New York: Springer.
Bou-Ghazale, S., & Hansen, J.H.L. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans on Speech and Audio Processing, 8, 429–442.
Cardoso, J.-F., & Souloumiac, A. (1996). Jacobi angles for simultaneous diagonalization. SIAM Journal of Matrix Analysis and Application, 17(1), 161–164.
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification (2nd ed.). Washington: DC: Wiley.
Gales, M. J. F. (1998). Predictive model-based compensation schemes for robust speech recognition. Speech Communication, 25, 49–94.
Garofolo, J. S. (1993). Timit Acoustic-Phonetic Continuous Speech Corpus. Philadelphia: Linguistic Data Consortium.
Hanai, N., & Stern, R. M. (1994). “Robust speech recognition in the automobile”. In Proceedings of the ICSLP, (pp. 1339–1342).
Hansen, J. H. L., & Bria, O. N. (1990) “Lombard effect compensation for robust automatic speech recognition in noise”. In Proceedings of the ICSLP, (pp. 1125–1128).
Hansen, J. H. L., Zhang, X. X., Akbacak, M., Yapanel, U., Pellom, B., Ward, W., & Angkititrakul, P. (2004).“CU-MOVE: Advanced in-vehicle speech systems for route navigation”. In DSP for In-Vehicle and Mobile Systems. New York: Springer.
Hansen, J. H. L., & Clements, M. A. (1991). Constrained iterative speech enhancement for speech recognition. IEEE Transactions on Signal Processing, 39(4), 795–805.
Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communications, 20, 151–170.
Hansen, J. H. L., Huang, R., Zhou, B., Seadle, M., Deller Jr, J. R., Gurijala, A.R., Angkititrakul, P. (2005). Speechfind: Advances in spoken document retrieval for a national gallery of the spoken word. IEEE Transactions on Speech and Audio Processing, 13, 712–730.
Hermus, K., & Wambacq, P. (2004). “Assessment of signal subspace based speech enhancement for noise robust speech recognition”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (vol. 1, pp. 945–948).
Kim, W., & Hansen, J. H. L. (2009). Timefrequency correlation-based missing-feature reconstruction for robust speech recognition in band-restricted conditions. IEEE Transactions on Audio Speech and Language Processing, 17, 1292–1304.
Kumar, A., & Hansen, J. H. (2008). “Environment mismatch compensation using average eigenspaces for robust speech recognition”. In Proceedings of the Interspeech, (pp. 1277–1280).
Legetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9, 171–185.
Lockwood, P., Boudy, J., & Blanchet, M. (1992)“Non-linear spectral subtraction (NSS) and hidden markov models for robust speech recognition in car noise environments”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (pp. 265–268).
Moreno, P. J., Raj, B., & Stern, R. (1998). Data-driven environmental compensation for speech recognition: A unified approach. Speech Communication, 24, 267–285.
Nguyen, P., Wellekens, C., & Junqua, J. (1999). “Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments”. In Proceedings of the EUROSPEECH (vol. 6, pp. 2519–2522).
Potamitis, L., Fakotakis, N., & Kokkinakis, G. (2000). Independent component analysis applied to feature extracton for robust automatic speech recognition. Electronic Letters, 36(23), 1977–1978.
Raj, B., Seltzer, M. L., & Stern, R. M. (2004). Reconstruction of missing features for robust speech recognition. Speech Communication, 43, 275–296.
Sankar, A., & Lee, C.-H. (1996). A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202.
Takiguchi, T., & Ariki, Y. (2006). “Robust Feature Extraction using Kernel PCA”. In Proceedings of the ICASSP. Vetter, R., Virag, N., Renevey, P., & Vesin, J. M. (1999). “Single channel speech enhancement using principal component analysis and MDL subspace selection“. In Proceedings of the EUROSPEECH, (pp. 2411–2414). Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). “Speech Denoising using Nonnegative Matrix Factorization with Priors”. In Proceedings of the ICASSP, (pp. 4029–4032).
Takiguchi, T., & Ariki, Y. (2006). “Robust Feature Extraction using Kernel PCA”. In Proceedings of the ICASSP. Vetter, R., Virag, N., Renevey, P., & Vesin, J. M. (1999). “Single channel speech enhancement using principal component analysis and MDL subspace selection“. In Proceedings of the EUROSPEECH, (pp. 2411–2414). Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). “Speech Denoising using Nonnegative Matrix Factorization with Priors”. In Proceedings of the ICASSP, (pp. 4029–4032).
Yu, D., et al. (2008). “A minimum mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition”. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas:IEEE.
Zhang, X., & Hansen, J. H. L. (2003). CSA-BF: A constrained switched adaptive beamformer for speech enhancement and recognition in real car environments. IEEE Transactions on Speech and Audio Processing, 11(6), 733–745.
Zhou, B., & Hansen, J. H. L. (2005). Rapid discriminative acoustic modeling based on eigenspace mapping for fast speaker adaptation. IEEE Transactions on Speech and Audio Processing, 13(4), 554–564.
Zou, X., Jancovic, P., & Liu, J. (2006). “The Effectiveness of ICA-based Representation: Application to Speech Feature Extraction for Noise Robust Speaker Recognition”. Proceedings of the EUSIPCO.
Acknowledgments
This study was funded by AFRL under contract FA8750-12-1-0188 and partially by the University of Texas at Dallas from the Distinguished University Chair in Telecommunications Engineering held by J.H.L. Hansen.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hansen, J.H.L., Kumar, A. & Angkititrakul, P. Environment mismatch compensation using average eigenspace-based methods for robust speech recognition. Int J Speech Technol 17, 353–364 (2014). https://doi.org/10.1007/s10772-014-9233-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-014-9233-9