Abstract
In human-human interaction, para-verbal and non-verbal communication are naturally aligned and synchronized. The difficulty encountered during the coordination between speech and head gestures concerns the conveyed meaning, the way of performing the gesture with respect to speech characteristics, their relative temporal arrangement, and their coordinated organization in a phrasal structure of utterance. In this research, we focus on the mechanism of mapping head gestures and speech prosodic characteristics in a natural human-robot interaction. Prosody patterns and head gestures are aligned separately as a parallel multi-stream HMM model. The mapping between speech and head gestures is based on Coupled Hidden Markov Models (CHMMs), which could be seen as a collection of HMMs, one for the video stream and one for the audio stream. Experimental results with Nao robots are reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
McNeill, D.: Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago (1992)
Eyereisen, F.P., Lannoy, J.D.D.: Gestures and Speech: Psychological Investigations. Cambridge University Press (1991)
Michalowski, M.P., Sabanovic, S., Kozima, H.: Proceedings of the Human-Robot Interaction Conference, Arlington, USA, pp. 89–96 (2007)
Munhall, K., Jones, J.A., Callan, D.A., Kuratate, T., Vatikiotis-Bateson, E.: Psychological Science 15(2), 133–137 (2004)
Kuratate, T., Munhall, K.G., Rubin, P.E., Vatikiotis-Bateson, E., Yehia, H.: Proceedings of the 6th European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1279–1282 (1999)
Valbonesi, L., Ansari, R., McNeill, D., Quek, F., Duncan, S., McCullough, K.E., Bryll, R.: Proceedings of the European Signal Processing Conference (EUSIPCO), vol. 1, pp. 75–78 (2005)
Quek, F., McNeill, D., Ansari, R., Ma, X., Bryll, R., Duncan, S., McCullough, K.: Proceedings of the ICCV, pp. 64–69 (1999)
Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.: Proceedings of IEEE Int. Conf. Automatic Face and Gesture Recognition, pp. 381–386 (2002)
Sargn, M.E., Yemez, Y., Erzin, E., Tekalp, A.M.: IEEE Transactions on Pattern Analysis and Machine Intelligence 30(8), 1330–1345 (2008)
Talkin, D.: In: Kleijn, W.B., Paliwal, K. (eds.) Speech Coding and Synthesis, pp. 497–518. Elsevier (1995)
Chutorian, E.M., Trivedi, M.M.: IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 607–626 (2009)
Viola, P., Jones, M.J.: International Journal of Computer Vision 57, 137–154 (2004)
Wong, K., Lam, K., Siu, W.: Signal Processing: Image Communication 18(2), 103–114 (2003)
Wong, K.W., Lam, K.I., Siu, W.: Pattern Recognition 34(10), 1993–2004 (2000)
Yip, B., Siu, W.Y., Jin, S.: Proceedings of IEEE Int. Conf. on Multimedia and Expo. (ICME), vol. 2, pp. 1183–1186 (2004)
Ringeval, F., Demouy, J., Chetouani, M., Robel, L., Xavier, J., Plaza, D.C.: IEEE Transactions on Audio, Speech and Language Processing 99, 1–15 (2010)
Arai, T., Greenberg, S.: Proceedings of Eurospeech, Rhodes, Greece, pp. 1011–1114 (1997)
Nickel, K., Stiefelhagen, R.: Proceedings of DAGM-Symposium, Magdeburg, Germany, pp. 557–565 (2003)
Moubayed, S.A., Beskow, J.: Proceedings of the International Conference on Auditory-Visual Speech Processing (AVSP), Norwich, UK (2009)
Rabiner, L.R.: Proceedings of the IEEE 77, 257–286 (1989)
Rezek, I., Sykacek, P., Roberts, S.J.: Proceedings of the International Conference on Advances in Medical Signal and Information Processing, MEDSIP (2000)
Rezek, I., Roberts, S.J.: Proceedings of the IEEE International Workshop on Neural Networks for Signal Processing (NNSP), Sydney, Australia (2000)
Nean, A.V., Liang, L., Pi, X., Liu, X., Mao, C.: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP, Orlando, USA, vol. 2, pp. 2013–2016 (2002)
Liang, L., Liu, X., Pi, X., Zhao, Y., Nean, A.V.: Proceedings of the International Conference on Multimedia and Expo. (ICME), Lausanne, Switzerland, vol. 2, pp. 25–28 (2002)
Penny, W., Roberts, S.: Technical Report TR-98-12, Imperial College London, UK (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this chapter
Cite this chapter
Aly, A., Tapus, A. (2012). Speech to Head Gesture Mapping in Multimodal Human-Robot Interaction. In: Borangiu, T., Thomas, A., Trentesaux, D. (eds) Service Orientation in Holonic and Multi-Agent Manufacturing Control. Studies in Computational Intelligence, vol 402. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27449-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-27449-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27448-0
Online ISBN: 978-3-642-27449-7
eBook Packages: EngineeringEngineering (R0)