Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources

Rok Gajšek¹,
Janez Žibert¹ &
France Mihelič¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

Abstract

In the article we evaluate different techniques of acoustic modeling for speech recognition in the case of limited audio resources. The objective was to build different sets of acoustic models, the first was trained on a small set of telephone speech recordings and the other was trained on a bigger database with broadband speech recordings and later adapted to a different audio environment. Different adaptation methods (MLLR, MAP) were examined in combination with different parameterization features (MFCC, PLP, RPLP). We show that using adaptation methods, which are mainly used for speaker adaptation purposes, can increase the robustness of speech recognition in cases of mismatched training and working acoustic environment conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech recognition in a dialog system: from conventional to deep processing

Article 06 September 2017

Automatic Speech Recognition Based on Neural Networks

Online Adaptation of Language Models for Speech Recognition

References

Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing ASSP-28(4), 357–365 (1980)
Article Google Scholar
Gales, M., Pye, D., Woodland, P.: Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. In: Proc. ICSLP 1996, Philadelphia, USA, vol. 3, pp. 1832–1835 (1996)
Google Scholar
Digalakis, V.V., Rtischev, D., Neumeyer, L.G.: Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Transactions SAP 3, 357–366 (1995)
Google Scholar
Gauvain, J.L., Lee, C.H.: Maximum a-posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions SAP 2, 291–298 (1994)
Google Scholar
Hajdinjak, M., Mihelič, F.: The wizard of Oz system for weather information retrieval. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 400–405. Springer, Heidelberg (2003)
Google Scholar
Hermansky, H., Brian, H., Wakita, H.: Perceptually based linear predictive analysis of speech. In: ICASSP 1985, pp. 509–512 (1985)
Google Scholar
Höning, F., Stemmer, G., Hacker, C., Brugnara, F.: Revising perceptual linear prediction (PLP). In: Proceedings of INTERSPEECH 2005, pp. 2997–3000 (2005)
Google Scholar
Maier, A., Haderlein, T., Nöth, E.: Environmental Adaptation with a Small Data Set of the Target Domain. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 431–437. Springer, Heidelberg (2006)
Chapter Google Scholar
Mihelič, F., et al.: Spoken language resources ad LUKS of the University of Ljubljana. International Journal of Speech Technology 6(3), 221–232 (2003)
Article Google Scholar
Young, S., et al.: The HTK Book (for HTK version 3.4). Cambridge University Engeneering Department (2006)
Google Scholar
Žibert, J., Martinčić-Ipšić, S., Hajdinjak, M., Ipšić, I., Mihelič, F.: Development of a bilingual spoken dialog system for weather information retrieval. In: Proceedings of EUROSPEECH 2003, pp. 1917–1920 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000, Ljubljana, Slovenia
Rok Gajšek, Janez Žibert & France Mihelič

Authors

Rok Gajšek
View author publications
You can also search for this author in PubMed Google Scholar
Janez Žibert
View author publications
You can also search for this author in PubMed Google Scholar
France Mihelič
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gajšek, R., Žibert, J., Mihelič, F. (2008). Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech recognition in a dialog system: from conventional to deep processing

Automatic Speech Recognition Based on Neural Networks

Online Adaptation of Language Models for Speech Recognition

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech recognition in a dialog system: from conventional to deep processing

Automatic Speech Recognition Based on Neural Networks

Online Adaptation of Language Models for Speech Recognition

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation