Abstract
This paper proposes a real-time algorithmic framework for Automatic Speech Recognition (ASR) in presence of multiple sources in reverberated environment. The addressed real-life acoustic scenario definitely asks for a robust signal processing solution to reduce the impact of source mixing and reverberation on ASR performances. Here the authors show how the implemented approach allows to improve recognition accuracies under real-time processing constraints and overlapping distant-talking speakers. A suitable database has been generated on purpose, by adapting an existing large vocabulary continuous speech recognition (LVCSR) corpus to deal with the acoustic conditions under study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Peinado, A., Segura, J.: Speech Recognition Over Digital Channels. John Wiley & Sons, Ltd., Chichester (2006)
Huang, Y., Benesty, J., Chen, J.: A blind channel identification-based two-stage approach to separation and dereverberation of speech signals in a reverberant environment. IEEE Trans. Speech Audio Process. 13(5), 882–895 (2005)
Rotili, R., De Simone, C., Perelli, A., Cifani, S., Squartini, S.: Joint multichannel blind speech separation and dereverberation: A real-time algorithmic implementation. In: Huang, D.-S., McGinnity, M., Heutte, L., Zhang, X.-P. (eds.) ICIC 2010. CCIS, vol. 93, pp. 85–93. Springer, Heidelberg (2010)
Squartini, S., Ciavattini, E., Lattanzi, A., Zallocco, D., Bettarelli, F., Piazza, F.: NU-Tech: implementing DSP algorithms in a plug-in based software platform for real time audio applications. In: Proc. of 118th Convention of the AES (2005)
Young, S., Everman, G., Kershaw, D., Moore, G., Odell, J.: The HTK Book. Cambridge University Engineering (2006)
Huang, Y., Benesty, J.: A class of frequency-domain adaptive approaches to blind multichannel identification. IEEE Trans. Speech Audio Process. 51(1), 11–24 (2003)
Rotili, R., Cifani, S., Principi, E., Squartini, S., Piazza, F.: A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances. In: Proc. of APCCAS 2008, pp. 434–437 (2008)
Vertanen, K.: Baseline WSJ acoustic models for HTK and Sphinx: Training recipes and recognition experiments. Cavendish Laboratory, University of Cambridge, Tech. Rep. (2006), http://www.keithv.com/software/htk/us/
Habets, E.A.P.: Room impulse response (RIR) generator (May 2008), http://home.tiscali.nl/ehabets/rirgenerator.html
Shriberg, E., Stolcke, A., Baron, D.: Observations on Overlap: Findings and Implications for Automatic Processing of Multi-Party Conversation. Word Journal of the International Linguistic Association, 1–4 (2000)
Colagiacomo, V., Principi, E., Cifani, S., Squartini, S.: Real-time speaker diarization on TI OMAP3530. In: Proc. of EDERC 2010 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rotili, R., Principi, E., Squartini, S., Schuller, B. (2012). Real-Time Speech Recognition in a Multi-talker Reverberated Acoustic Scenario. In: Huang, DS., Gan, Y., Gupta, P., Gromiha, M.M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2011. Lecture Notes in Computer Science(), vol 6839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25944-9_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-25944-9_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25943-2
Online ISBN: 978-3-642-25944-9
eBook Packages: Computer ScienceComputer Science (R0)