Abstract
This paper presents techniques that enable a talker tracking for effective human-robot interaction. We propose new way of integrating an EM algorithm and a particle filter to select an appropriate path for tracking the talker. It can easily adapt to new kinds of information for tracking the talker with our system. This is because our system estimates the position of the desired talker through means, variances, and weights calculated from EM training regardless of the numbers or kinds of information. In addition, to enhance a robot’s ability to track a talker in real-world environments, we applied the particle filter to talker tracking after executing the EM algorithm. We also integrated a variety of auditory and visual information regarding sound localization, face localization, and the detection of lip movement. Moreover, we applied a sound classification function that allows our system to distinguish between voice, music, or noise. We also developed a vision module that can locate moving objects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Nakadai, K., Hidai, K.-i., Mizoguchi, H., Okuno, H.G., Kitano, H.: Real-Time Auditory and Visual Multiple-Object Tracking for Humanoids. In: Proc. of 17th Int. Conf. on Artificial Intelligence (IJCAI-01), Seattle, August 2001. pp. 1425–1432 (2001)
Okuno, H.G., Nakadai, K., Hidai, K.-i., Mizoguchi, H., Kitano, H.: Human-Robot Interaction Through Real-Time Auditory and Visual Multiple-Talker Tracking. In: Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS-2001), October 2001, pp. 1402–1409 (2001)
Kim, H.D., Choi, J.S., Kim, M.S.: Speaker localization among multi-faces in noisy environment by audio-visual integration. In: Proc. of IEEE Int. Conf. on Robotics and Automation (ICRA2006), May 2006, pp. 1305–1310 (2006)
Moon, T.K.: The Expectation-Maximization algorithm. IEEE Signal Processing Magazine 13(6), 47–60 (1996)
Gustafsson, F., Gunnarsson, F., Bergman, N., Forssell, U., Jansson, J., Karlsson, R., Nordlund, P.: Particle Filters for Positioning, Navigation and Tracking. IEEE Trans. on Acoustics, Speech, and Signal Processing 50(2), 425–437 (2002)
Nishiura, T., Yamada, T., Nakamura, S., Shikano, K.: Localization of multiple sound sources based on a CSP analysis with a microphone array. In: IEEE/ICASSP Int. Conf. Acoustics, Speech, and Signal Processing, June 2000, pp. 1053–1056 (2000)
Kobayashi, H., Shimamura, T.: A Modified Cepstrum Method for Pitch Extraction. In: IEEE/APCCAS Int. Conf. Circuits and Systems, pp. 299–302 (November 1988)
Lu, L., Zhang, H.J., Jiang, H.: Content Analysis for Audio Classification and Segmentation. IEEE Trans. on Speech and Audio Processing 10(7), 504–516 (2002)
Shah, J.K., Iyer, A.N., Smolenski, B.Y., Yantormo, R.E.: Robust Voiced/Unvoiced classification using novel feature and Gaussian Mixture Model. In: IEEE/ICASSP Int. Conf. Acoustics, Speech, and Signal Processing, Montreal, Canada (May 2004)
Bahoura, M., Pelletier, C.: Respiratory Sound Classification using Cepstral Analysis and Gaussian Mixture Models. In: IEEE/EMBS Int. Conf. San Francisco, USA (September 1-5, 2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Kim, HD., Komatani, K., Ogata, T., Okuno, H.G. (2007). Real-Time Auditory and Visual Talker Tracking Through Integrating EM Algorithm and Particle Filter. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-73325-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73322-5
Online ISBN: 978-3-540-73325-6
eBook Packages: Computer ScienceComputer Science (R0)