Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach

Ádám Varga⁷,
Balázs Tarján^7,9,
Zoltán Tobler⁷,
György Szaszák^7,9,
Tibor Fegyó^8,9,
Csaba Bordás¹⁰ &
…
Péter Mihajlik^7,9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1681 Accesses
4 Citations

Abstract

In this paper, the application of LVCSR (Large Vocabulary Continuous Speech Recognition) technology is investigated for real-time, resource-limited broadcast close captioning. The work focuses on transcribing live broadcast conversation speech to make such programs accessible to deaf viewers. Due to computational limitations, real time factor (RTF) and memory requirements are kept low during decoding with various models tailored for Hungarian broadcast speech recognition. Two decoders are compared on the direct transcription task of broadcast conversation recordings, and setups employing re-speakers are also tested. Moreover, the models are evaluated on a broadcast news transcription task as well, and different language models (LMs) are tested in order to demonstrate the performance of our systems in settings when low memory consumption is a less crucial factor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated audio captioning: an overview of recent progress and new challenges

Article Open access 09 October 2022

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Real-time Informatized caption enhancement based on speaker pronunciation time database

Article Open access 05 September 2020

References

Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Publications in Computer and Information Science, Report A81 (2005)
Google Scholar
Kobayashi, A., Oku, T., Imai, T., Nakagawa, S.: Risk-based semi-supervised discriminative language modeling for broadcast transcription. IEICE Trans. 95–D(11), 2674–2681 (2012)
Google Scholar
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, Hilton Waikoloa Village (2011)
Google Scholar
Roy, A., et al.: Some issues affecting the transcription of hungarian broadcast audio. In: 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), pp. 3102–3106 (2013)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, pp. 901–904. Denver (2002)
Google Scholar
Sundermeyer, M., et al.: The RWTH 2010 Quaero ASR evaluation system for English, French, and German. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2212–2215 (2011)
Google Scholar
Tarján, B., Mihajlik, P.: On morph-based LVCSR improvements. In: Proceedings of the 2nd International Workshop on Spoken Language Technologies for Under-resourced Languages, pp. 10–15 (2010)
Google Scholar
Tarján, B., Mihajlik, P., Balog, A., Fegyó, T.: Evaluation of lexical models for hungarian broadcast speech transcription and spoken term detection. In: 2nd IEEE International Conference on Cognitive Infocommunications, pp. 1–5 (2011)
Google Scholar
Tarján, B., Fegyó, T., Mihajlik, P.: A bilingual study on the prediction of morph-based improvement. In: Spoken Language Technologies for Under-Resourced Languages, pp. 131–138 (2014)
Google Scholar
Tóth, L., Grósz, T.: A comparison of deep neural network training methods for large vocabulary speech recognition. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 36–43. Springer, Heidelberg (2013)
Google Scholar
Winebarger, J., Nguyen, B., Gehring, J., Stüker, S., Waibel, A.: The 2013 KIT Quaero speech-to-text system for French. In: Proceedings of the 10th International Workshop for Spoken Language Translation (IWSLT 2013) (2013)
Google Scholar
Young, S.J., et al.: The HTK Book, Version 3.4. Cambridge University Engineering Department, Cambridge (2006)
Google Scholar

Download references

Acknowledgement

This research has been partially funded by the PIAC_13-1-2013-0234 (Patimedia) and KMR_12-1-2012-0207 (DIANA) projects. The authors would also like to thank MTVA for their support towards this work.

Author information

Authors and Affiliations

THINKTech Research Center, Budapest, Hungary
Ádám Varga, Balázs Tarján, Zoltán Tobler, György Szaszák & Péter Mihajlik
SpeechTex Ltd., Budapest, Hungary
Tibor Fegyó
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
Balázs Tarján, György Szaszák, Tibor Fegyó & Péter Mihajlik
Media Service Support and Asset Management Fund (MTVA), Budapest, Hungary
Csaba Bordás

Authors

Ádám Varga
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Tarján
View author publications
You can also search for this author in PubMed Google Scholar
Zoltán Tobler
View author publications
You can also search for this author in PubMed Google Scholar
György Szaszák
View author publications
You can also search for this author in PubMed Google Scholar
Tibor Fegyó
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Bordás
View author publications
You can also search for this author in PubMed Google Scholar
Péter Mihajlik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Balázs Tarján .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varga, Á. et al. (2015). Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_13
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated audio captioning: an overview of recent progress and new challenges

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Real-time Informatized caption enhancement based on speaker pronunciation time database

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated audio captioning: an overview of recent progress and new challenges

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Real-time Informatized caption enhancement based on speaker pronunciation time database

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation