[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources

  • Conference paper
Text, Speech and Dialogue (TSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

Abstract

In the article we evaluate different techniques of acoustic modeling for speech recognition in the case of limited audio resources. The objective was to build different sets of acoustic models, the first was trained on a small set of telephone speech recordings and the other was trained on a bigger database with broadband speech recordings and later adapted to a different audio environment. Different adaptation methods (MLLR, MAP) were examined in combination with different parameterization features (MFCC, PLP, RPLP). We show that using adaptation methods, which are mainly used for speaker adaptation purposes, can increase the robustness of speech recognition in cases of mismatched training and working acoustic environment conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing ASSP-28(4), 357–365 (1980)

    Article  Google Scholar 

  2. Gales, M., Pye, D., Woodland, P.: Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. In: Proc. ICSLP 1996, Philadelphia, USA, vol. 3, pp. 1832–1835 (1996)

    Google Scholar 

  3. Digalakis, V.V., Rtischev, D., Neumeyer, L.G.: Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Transactions SAP 3, 357–366 (1995)

    Google Scholar 

  4. Gauvain, J.L., Lee, C.H.: Maximum a-posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions SAP 2, 291–298 (1994)

    Google Scholar 

  5. Hajdinjak, M., Mihelič, F.: The wizard of Oz system for weather information retrieval. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 400–405. Springer, Heidelberg (2003)

    Google Scholar 

  6. Hermansky, H., Brian, H., Wakita, H.: Perceptually based linear predictive analysis of speech. In: ICASSP 1985, pp. 509–512 (1985)

    Google Scholar 

  7. Höning, F., Stemmer, G., Hacker, C., Brugnara, F.: Revising perceptual linear prediction (PLP). In: Proceedings of INTERSPEECH 2005, pp. 2997–3000 (2005)

    Google Scholar 

  8. Maier, A., Haderlein, T., Nöth, E.: Environmental Adaptation with a Small Data Set of the Target Domain. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 431–437. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Mihelič, F., et al.: Spoken language resources ad LUKS of the University of Ljubljana. International Journal of Speech Technology 6(3), 221–232 (2003)

    Article  Google Scholar 

  10. Young, S., et al.: The HTK Book (for HTK version 3.4). Cambridge University Engeneering Department (2006)

    Google Scholar 

  11. Žibert, J., Martinčić-Ipšić, S., Hajdinjak, M., Ipšić, I., Mihelič, F.: Development of a bilingual spoken dialog system for weather information retrieval. In: Proceedings of EUROSPEECH 2003, pp. 1917–1920 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gajšek, R., Žibert, J., Mihelič, F. (2008). Acoustic Modeling for Speech Recognition in Telephone Based Dialog System Using Limited Audio Resources. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87391-4_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87390-7

  • Online ISBN: 978-3-540-87391-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics