Real-Time Auditory and Visual Talker Tracking Through Integrating EM Algorithm and Particle Filter

Hyun-Don Kim¹,
Kazunori Komatani¹,
Tetsuya Ogata¹ &
…
Hiroshi G. Okuno¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4570))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1370 Accesses
3 Citations

Abstract

This paper presents techniques that enable a talker tracking for effective human-robot interaction. We propose new way of integrating an EM algorithm and a particle filter to select an appropriate path for tracking the talker. It can easily adapt to new kinds of information for tracking the talker with our system. This is because our system estimates the position of the desired talker through means, variances, and weights calculated from EM training regardless of the numbers or kinds of information. In addition, to enhance a robot’s ability to track a talker in real-world environments, we applied the particle filter to talker tracking after executing the EM algorithm. We also integrated a variety of auditory and visual information regarding sound localization, face localization, and the detection of lip movement. Moreover, we applied a sound classification function that allows our system to distinguish between voice, music, or noise. We also developed a vision module that can locate moving objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Multisensor Based Approach Using Supervised Learning and Particle Filtering for People Detection and Tracking

Motion estimation of indoor robot based on image sequences and improved particle filter

Article 10 July 2018

Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise

References

Nakadai, K., Hidai, K.-i., Mizoguchi, H., Okuno, H.G., Kitano, H.: Real-Time Auditory and Visual Multiple-Object Tracking for Humanoids. In: Proc. of 17th Int. Conf. on Artificial Intelligence (IJCAI-01), Seattle, August 2001. pp. 1425–1432 (2001)
Google Scholar
Okuno, H.G., Nakadai, K., Hidai, K.-i., Mizoguchi, H., Kitano, H.: Human-Robot Interaction Through Real-Time Auditory and Visual Multiple-Talker Tracking. In: Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS-2001), October 2001, pp. 1402–1409 (2001)
Google Scholar
Kim, H.D., Choi, J.S., Kim, M.S.: Speaker localization among multi-faces in noisy environment by audio-visual integration. In: Proc. of IEEE Int. Conf. on Robotics and Automation (ICRA2006), May 2006, pp. 1305–1310 (2006)
Google Scholar
Moon, T.K.: The Expectation-Maximization algorithm. IEEE Signal Processing Magazine 13(6), 47–60 (1996)
Article Google Scholar
Gustafsson, F., Gunnarsson, F., Bergman, N., Forssell, U., Jansson, J., Karlsson, R., Nordlund, P.: Particle Filters for Positioning, Navigation and Tracking. IEEE Trans. on Acoustics, Speech, and Signal Processing 50(2), 425–437 (2002)
Google Scholar
Nishiura, T., Yamada, T., Nakamura, S., Shikano, K.: Localization of multiple sound sources based on a CSP analysis with a microphone array. In: IEEE/ICASSP Int. Conf. Acoustics, Speech, and Signal Processing, June 2000, pp. 1053–1056 (2000)
Google Scholar
Kobayashi, H., Shimamura, T.: A Modified Cepstrum Method for Pitch Extraction. In: IEEE/APCCAS Int. Conf. Circuits and Systems, pp. 299–302 (November 1988)
Google Scholar
Lu, L., Zhang, H.J., Jiang, H.: Content Analysis for Audio Classification and Segmentation. IEEE Trans. on Speech and Audio Processing 10(7), 504–516 (2002)
Article Google Scholar
Shah, J.K., Iyer, A.N., Smolenski, B.Y., Yantormo, R.E.: Robust Voiced/Unvoiced classification using novel feature and Gaussian Mixture Model. In: IEEE/ICASSP Int. Conf. Acoustics, Speech, and Signal Processing, Montreal, Canada (May 2004)
Google Scholar
Bahoura, M., Pelletier, C.: Respiratory Sound Classification using Cepstral Analysis and Gaussian Mixture Models. In: IEEE/EMBS Int. Conf. San Francisco, USA (September 1-5, 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Media Processing Group, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Hyun-Don Kim, Kazunori Komatani, Tetsuya Ogata & Hiroshi G. Okuno

Authors

Hyun-Don Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hiroshi G. Okuno Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, HD., Komatani, K., Ogata, T., Okuno, H.G. (2007). Real-Time Auditory and Visual Talker Tracking Through Integrating EM Algorithm and Particle Filter. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-73325-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73322-5
Online ISBN: 978-3-540-73325-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics