Abstract
In the article we evaluate different techniques of acoustic modeling for speech recognition in the case of limited audio resources. The objective was to build different sets of acoustic models, the first was trained on a small set of telephone speech recordings and the other was trained on a bigger database with broadband speech recordings and later adapted to a different audio environment. Different adaptation methods (MLLR, MAP) were examined in combination with different parameterization features (MFCC, PLP, RPLP). We show that using adaptation methods, which are mainly used for speaker adaptation purposes, can increase the robustness of speech recognition in cases of mismatched training and working acoustic environment conditions.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing ASSP-28(4), 357–365 (1980)
Gales, M., Pye, D., Woodland, P.: Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. In: Proc. ICSLP 1996, Philadelphia, USA, vol. 3, pp. 1832–1835 (1996)
Digalakis, V.V., Rtischev, D., Neumeyer, L.G.: Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Transactions SAP 3, 357–366 (1995)
Gauvain, J.L., Lee, C.H.: Maximum a-posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions SAP 2, 291–298 (1994)
Hajdinjak, M., Mihelič, F.: The wizard of Oz system for weather information retrieval. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 400–405. Springer, Heidelberg (2003)
Hermansky, H., Brian, H., Wakita, H.: Perceptually based linear predictive analysis of speech. In: ICASSP 1985, pp. 509–512 (1985)
Höning, F., Stemmer, G., Hacker, C., Brugnara, F.: Revising perceptual linear prediction (PLP). In: Proceedings of INTERSPEECH 2005, pp. 2997–3000 (2005)
Maier, A., Haderlein, T., Nöth, E.: Environmental Adaptation with a Small Data Set of the Target Domain. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 431–437. Springer, Heidelberg (2006)
Mihelič, F., et al.: Spoken language resources ad LUKS of the University of Ljubljana. International Journal of Speech Technology 6(3), 221–232 (2003)
Young, S., et al.: The HTK Book (for HTK version 3.4). Cambridge University Engeneering Department (2006)
Žibert, J., Martinčić-Ipšić, S., Hajdinjak, M., Ipšić, I., Mihelič, F.: Development of a bilingual spoken dialog system for weather information retrieval. In: Proceedings of EUROSPEECH 2003, pp. 1917–1920 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gajšek, R., Žibert, J., Mihelič, F. (2008). Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-87391-4_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)