Abstract
In this paper, the application of LVCSR (Large Vocabulary Continuous Speech Recognition) technology is investigated for real-time, resource-limited broadcast close captioning. The work focuses on transcribing live broadcast conversation speech to make such programs accessible to deaf viewers. Due to computational limitations, real time factor (RTF) and memory requirements are kept low during decoding with various models tailored for Hungarian broadcast speech recognition. Two decoders are compared on the direct transcription task of broadcast conversation recordings, and setups employing re-speakers are also tested. Moreover, the models are evaluated on a broadcast news transcription task as well, and different language models (LMs) are tested in order to demonstrate the performance of our systems in settings when low memory consumption is a less crucial factor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Publications in Computer and Information Science, Report A81 (2005)
Kobayashi, A., Oku, T., Imai, T., Nakagawa, S.: Risk-based semi-supervised discriminative language modeling for broadcast transcription. IEICE Trans. 95–D(11), 2674–2681 (2012)
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, Hilton Waikoloa Village (2011)
Roy, A., et al.: Some issues affecting the transcription of hungarian broadcast audio. In: 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), pp. 3102–3106 (2013)
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, pp. 901–904. Denver (2002)
Sundermeyer, M., et al.: The RWTH 2010 Quaero ASR evaluation system for English, French, and German. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2212–2215 (2011)
Tarján, B., Mihajlik, P.: On morph-based LVCSR improvements. In: Proceedings of the 2nd International Workshop on Spoken Language Technologies for Under-resourced Languages, pp. 10–15 (2010)
Tarján, B., Mihajlik, P., Balog, A., Fegyó, T.: Evaluation of lexical models for hungarian broadcast speech transcription and spoken term detection. In: 2nd IEEE International Conference on Cognitive Infocommunications, pp. 1–5 (2011)
Tarján, B., Fegyó, T., Mihajlik, P.: A bilingual study on the prediction of morph-based improvement. In: Spoken Language Technologies for Under-Resourced Languages, pp. 131–138 (2014)
Tóth, L., Grósz, T.: A comparison of deep neural network training methods for large vocabulary speech recognition. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 36–43. Springer, Heidelberg (2013)
Winebarger, J., Nguyen, B., Gehring, J., Stüker, S., Waibel, A.: The 2013 KIT Quaero speech-to-text system for French. In: Proceedings of the 10th International Workshop for Spoken Language Translation (IWSLT 2013) (2013)
Young, S.J., et al.: The HTK Book, Version 3.4. Cambridge University Engineering Department, Cambridge (2006)
Acknowledgement
This research has been partially funded by the PIAC_13-1-2013-0234 (Patimedia) and KMR_12-1-2012-0207 (DIANA) projects. The authors would also like to thank MTVA for their support towards this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Varga, Á. et al. (2015). Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)