[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6839))

Included in the following conference series:

Abstract

This paper proposes a real-time algorithmic framework for Automatic Speech Recognition (ASR) in presence of multiple sources in reverberated environment. The addressed real-life acoustic scenario definitely asks for a robust signal processing solution to reduce the impact of source mixing and reverberation on ASR performances. Here the authors show how the implemented approach allows to improve recognition accuracies under real-time processing constraints and overlapping distant-talking speakers. A suitable database has been generated on purpose, by adapting an existing large vocabulary continuous speech recognition (LVCSR) corpus to deal with the acoustic conditions under study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Peinado, A., Segura, J.: Speech Recognition Over Digital Channels. John Wiley & Sons, Ltd., Chichester (2006)

    Book  Google Scholar 

  2. Huang, Y., Benesty, J., Chen, J.: A blind channel identification-based two-stage approach to separation and dereverberation of speech signals in a reverberant environment. IEEE Trans. Speech Audio Process. 13(5), 882–895 (2005)

    Article  Google Scholar 

  3. Rotili, R., De Simone, C., Perelli, A., Cifani, S., Squartini, S.: Joint multichannel blind speech separation and dereverberation: A real-time algorithmic implementation. In: Huang, D.-S., McGinnity, M., Heutte, L., Zhang, X.-P. (eds.) ICIC 2010. CCIS, vol. 93, pp. 85–93. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Squartini, S., Ciavattini, E., Lattanzi, A., Zallocco, D., Bettarelli, F., Piazza, F.: NU-Tech: implementing DSP algorithms in a plug-in based software platform for real time audio applications. In: Proc. of 118th Convention of the AES (2005)

    Google Scholar 

  5. Young, S., Everman, G., Kershaw, D., Moore, G., Odell, J.: The HTK Book. Cambridge University Engineering (2006)

    Google Scholar 

  6. Huang, Y., Benesty, J.: A class of frequency-domain adaptive approaches to blind multichannel identification. IEEE Trans. Speech Audio Process. 51(1), 11–24 (2003)

    MathSciNet  Google Scholar 

  7. Rotili, R., Cifani, S., Principi, E., Squartini, S., Piazza, F.: A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances. In: Proc. of APCCAS 2008, pp. 434–437 (2008)

    Google Scholar 

  8. Vertanen, K.: Baseline WSJ acoustic models for HTK and Sphinx: Training recipes and recognition experiments. Cavendish Laboratory, University of Cambridge, Tech. Rep. (2006), http://www.keithv.com/software/htk/us/

  9. Habets, E.A.P.: Room impulse response (RIR) generator (May 2008), http://home.tiscali.nl/ehabets/rirgenerator.html

  10. Shriberg, E., Stolcke, A., Baron, D.: Observations on Overlap: Findings and Implications for Automatic Processing of Multi-Party Conversation. Word Journal of the International Linguistic Association, 1–4 (2000)

    Google Scholar 

  11. Colagiacomo, V., Principi, E., Cifani, S., Squartini, S.: Real-time speaker diarization on TI OMAP3530. In: Proc. of EDERC 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rotili, R., Principi, E., Squartini, S., Schuller, B. (2012). Real-Time Speech Recognition in a Multi-talker Reverberated Acoustic Scenario. In: Huang, DS., Gan, Y., Gupta, P., Gromiha, M.M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2011. Lecture Notes in Computer Science(), vol 6839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25944-9_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25944-9_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25943-2

  • Online ISBN: 978-3-642-25944-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics