More Web Proxy on the site http://driver.im/

research-article

Open access

Understanding the Predictability of Gesture Parameters from Speech and their Perceptual Importance

Authors:

Rachel McDonnellAuthors Info & Claims

IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents

Article No.: 19, Pages 1 - 8

https://doi.org/10.1145/3383652.3423882

Published: 19 October 2020 Publication History

Abstract

Gesture behavior is a natural part of human conversation. Much work has focused on removing the need for tedious hand-animation to create embodied conversational agents by designing speech-driven gesture generators. However, these generators often work in a black-box manner, assuming a general relationship between input speech and output motion. As their success remains limited, we investigate in more detail how speech may relate to different aspects of gesture motion. We determine a number of parameters characterizing gesture, such as speed and gesture size, and explore their relationship to the speech signal in a two-fold manner. First, we train multiple recurrent networks to predict the gesture parameters from speech to understand how well gesture attributes can be modeled from speech alone. We find that gesture parameters can be partially predicted from speech, and some parameters, such as path length, being predicted more accurately than others, like velocity. Second, we design a perceptual study to assess the importance of each gesture parameter for producing motion that people perceive as appropriate for the speech. Results show that a degradation in any parameter was viewed negatively, but some changes, such as hand shape, are more impactful than others. A video summarization can be found at https://youtu.be/aw6-_5kmLjY.

References

[1]

Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. Computer Graphics Forum (2020). https://doi.org/10.1111/cgf.13946

[2]

Kirsten Bergmann and Stefan Kopp. 2009. GNetIc - Using bayesian decision networks for iconic gesture generation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5773 LNAI (2009), 76--89. https://doi.org/10.1007/978-3-642-04380-2_12

Digital Library

[3]

Elif Bozkurt, Yücel Yemez, and Engin Erzin. 2016. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures. Speech Communication 85 (2016), 29--42. https://doi.org/10.1016/j.specom.2016.10.004

Digital Library

[4]

Justine Cassell, Hannes Högni Vilhjálmsson, and Timothy Bickmore. 2001. BEAT: the Behavior Expression Animation Toolkit. ACM Transactions on Graphics (2001), 477--486. https://doi.org/10.1007/978-3-662-08373-4_8

[5]

Gabriel Castillo and Michael Neff. 2019. What do we express without knowing? Emotion in Gesture. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 702--710.

Digital Library

[6]

Chung Cheng Chiu and Stacy Marsella. 2011. How to train your avatar: A data driven approach to gesture generation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6895 LNAI (2011), 127--140. https://doi.org/10.1007/978-3-642-23974-8_14

[7]

Chung-cheng Chiu and Stacy Marsella. 2014. Gesture Generation with Low-Dimensional Embeddings. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. 781--788.

[8]

R. H. B. Christensen. 2015. ordinal---Regression Models for Ordinal Data. R package version 2015.6-28. http://www.cran.r-project.org/package=ordinal/.

[9]

Cathy Ennis, Rachel McDonnell, and Carol O'Sullivan. 2010. Seeing is believing. ACM Transactions on Graphics 29 (2010), 91. https://doi.org/10.1145/1833351.1778828

[10]

Florian Eyben, Klaus R. Scherer, Bjorn W. Schuller, Johan Sundberg, Elisabeth Andre, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, and Khiet P. Truong. 2016. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing 7, 2 (2016), 190--202. https://doi.org/10.1109/TAFFC.2015.2457417

Digital Library

[11]

Florian Eyben, Felix Weninger, Florian Gross, and Björn Schuller. 2013. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 835--838.

Digital Library

[12]

Ylva Ferstl and Rachel McDonnell. 2018. Investigating the use of recurrent motion modelling for speech gesture generation. In IVA '18: International Conference on Intelligent Virtual Agents (IVA '18). 93--98.

Digital Library

[13]

Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2019. Multi-objective adversarial gesture generation. In Motion, Interaction and Games. 1--10.

[14]

Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2020. Adversarial gesture generation with realistic gesture phasing. Computers & Graphics (2020).

[15]

Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning Individual Styles of Conversational Gesture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3497--3506.

[16]

Arno Hartholt, David Traum, Stacy C Marsella, Ari Shapiro, Giota Stratou, Anton Leuski, Louis-Philippe Morency, and Jonathan Gratch. 2013. All together now. In International Workshop on Intelligent Virtual Agents. Springer, 368--381.

[17]

Björn Hartmann, Maurizio Mancini, and Catherine Pelachaud. 2005. Implementing expressive gesture synthesis for embodied conversational agents. In International Gesture Workshop. Springer, 188--199.

[18]

Dai Hasegawa, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. 2018. Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network. In Proceedings of the 18th International Conference on Intelligent Virtual Agents (Sydney, NSW, Australia) (IVA '18). Association for Computing Machinery, New York, NY, USA, 79--86. https://doi.org/10.1145/3267851.3267878

Digital Library

[19]

Carlos Toshinori Ishi, Daichi Machiyashiki, Ryusuke Mikata, and Hiroshi Ishiguro. 2018. A speech-driven hand gesture generation method and evaluation in android robots. IEEE Robotics and Automation Letters (2018), 1--1. https://doi.org/10.1109/LRA.2018.2856281

[20]

Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. 2019. Analyzing Input and Output Representations for Speech-Driven Gesture Generation. In IVA '19: International Conference on Intelligent Virtual Agents (IVA '19).

[21]

Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A framework for semantically-aware speech-driven gesture generation. arXiv preprint arXiv:2001.09326 (2020).

[22]

Sergey Levine, Philipp Krähenbühl, Sebastian Thrun, and Vladlen Koltun. 2010. Gesture controllers. ACM Transactions on Graphics 29, 4 (2010). https://doi.org/10.1145/1833351.1778861

[23]

Sergey Levine, Christian Theobalt, and Vladlen Koltun. 2009. Real-time prosody-driven synthesis of body language. ACM Transactions on Graphics 28, 5 (2009), 1. https://doi.org/10.1145/1618452.1618518

Digital Library

[24]

Daniel P Loehr. 2012. Temporal, structural, and pragmatic synchrony between intonation and gesture. Laboratory phonology 3, 1 (2012), 71--89.

[25]

Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual character performance from speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 25--35. https://doi.org/10.1145/2485895.2485900

Digital Library

[26]

David McNeill. 1992. Hand and mind: What gestures reveal about thought. University of Chicago press.

[27]

Michael Neff and Yejin Kim. 2009. Interactive editing of motion style using drives and correlations. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 103--112.

Digital Library

[28]

Michael Neff, Michael Kipp, Irene Albrecht, and Hans-Peter Seidel. 2008. Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics 27, 1 (2008), 1--24. https://doi.org/10.1145/1330511.1330516

Digital Library

[29]

Michael Neff, Yingying Wang, Rob Abbott, and Marilyn Walker. 2010. Evaluating the effect of gesture and language on personality perception in conversational agents. In International Conference on Intelligent Virtual Agents, Vol. 6356 LNAI. 222--235. https://doi.org/10.1007/978-3-642-15892-6_24

[30]

Wim Pouw, Steven J Harrison, and James A Dixon. 2019. Gesture-speech physics: The biomechanical basis for the emergence of gesture-speech synchrony. Journal of Experimental Psychology: General (2019).

[31]

Ari Shapiro, Petros Faloutsos, and Victor Ng-Thow-Hing. 2005. Dynamic animation and control environment. In Proceedings of graphics interface 2005. Canadian Human-Computer Communications Society, 61--70.

[32]

Ted Shawn. 1963. Every little movement: A book about François Delsarte, the man and his philosophy, his science and applied aesthetics, the application of this science to the art of the dance, the influence of Delsarte on American dance. Printed by the Eagle Print. and Binding Co.

[33]

Harrison Jesse Smith and Michael Neff. 2017. Understanding the impact of animated gesture performance on personality perceptions. ACM Transactions on Graphics 36, 4 (2017), 1--12. https://doi.org/10.1145/3072959.3073697

Digital Library

[34]

Marcus Thiebaux, Stacy Marsella, Andrew N Marshall, and Marcelo Kallmann. 2008. SmartBody: behavior realization for embodied conversational agents. In Proceedings of International Joint Conference on Autonomous Agents and Multiagent Systems. 151--158. https://doi.org/10.1016/j.ins.2009.01.020

Digital Library

[35]

Yingying Wang, Jean E Fox Tree, Marilyn Walker, and Michael Neff. 2016. Assessing the impact of hand motion on virtual character personality. ACM Transactions on Applied Perception (TAP) 13, 2 (2016), 1--23.

Digital Library

[36]

Liwei Zhao and Norman I Badler. 2001. Synthesis and acquisition of laban movement analysis qualitative parameters for communicative gestures. Technical Reports (CIS) (2001), 116.

Cited By

Chang CSohn SZhang SJayashankar RUsman MKapadia M(2023)The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational AgentsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584045(790-801)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584045
Gregori AAmici FBrilmayer IĆwiek AFritzsche LFuchs SHenlein AHerbort OKügler FLemanski JLiebal KLücking AMehler ANguyen KPouw WPrieto PRohrer PSánchez-Ramón PSchulte-Rüther MSchumacher PSchweinberger SStruckmeier VTrettenbrein Pvon Eiff C(2023)A Roadmap for Technological Innovation in Multimodal Communication ResearchDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management10.1007/978-3-031-35748-0_30(402-438)Online publication date: 9-Jul-2023
https://doi.org/10.1007/978-3-031-35748-0_30
Kucherenko TNagy RNeff MKjellström HHenter GPelachaud CTaylor MFaliszewski PMascardi V(2022)Multimodal Analysis of the Predictability of Hand-gesture PropertiesProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535937(770-779)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3535937
Show More Cited By

Index Terms

Understanding the Predictability of Gesture Parameters from Speech and their Perceptual Importance
1. Computing methodologies
  1. Computer graphics
    1. Animation
  2. Machine learning

Recommendations

Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization
Abstract
Humanoid robots are already among us and they are beginning to assume more social and personal roles, like guiding and assisting people. Thus, they should interact in a human-friendly manner, using not only verbal cues but also synchronized non-...
Perceived Length of Czech High Vowels in Relation to Formant Frequencies Evaluated by Automatic Speech Recognition
Text, Speech, and Dialogue
Abstract
Recent studies measured significant differences in formant values in the production of short and long high vowel pairs in the Czech language. Perceptional impacts of such findings were confirmed employing listening tests proving that a perceived ...
Japanese lexical accent recognition for a CALL system by deriving classification equations with perceptual experiments

For non-native learners of Japanese, the pitch accent can be cumbersome to acquire without proper instruction. A Computer Assisted Language Learning (CALL) system could aid these learners in this acquisition provided that it can generate helpful ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents

October 2020

394 pages

ISBN:9781450375863

DOI:10.1145/3383652

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IVA '20

Sponsor:

SIGAI

IVA '20: ACM International Conference on Intelligent Virtual Agents

October 20 - 22, 2020

Scotland, Virtual Event, UK

Acceptance Rates

Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
608
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)11

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chang CSohn SZhang SJayashankar RUsman MKapadia M(2023)The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational AgentsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584045(790-801)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584045
Gregori AAmici FBrilmayer IĆwiek AFritzsche LFuchs SHenlein AHerbort OKügler FLemanski JLiebal KLücking AMehler ANguyen KPouw WPrieto PRohrer PSánchez-Ramón PSchulte-Rüther MSchumacher PSchweinberger SStruckmeier VTrettenbrein Pvon Eiff C(2023)A Roadmap for Technological Innovation in Multimodal Communication ResearchDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management10.1007/978-3-031-35748-0_30(402-438)Online publication date: 9-Jul-2023
https://doi.org/10.1007/978-3-031-35748-0_30
Kucherenko TNagy RNeff MKjellström HHenter GPelachaud CTaylor MFaliszewski PMascardi V(2022)Multimodal Analysis of the Predictability of Hand-gesture PropertiesProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535937(770-779)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3535937
Zhou CBian TChen K(2022)GestureMaster: Graph-based Speech-driven Gesture GenerationProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3558063(764-770)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3536221.3558063
Chang CZhang SKapadia M(2022)The IVI Lab entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention MechanismProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3558060(784-789)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3536221.3558060
Habibie IElgharib MSarkar KAbdullah ANyatsanga SNeff MTheobalt C(2022)A Motion Matching-based Framework for Controllable Gesture Synthesis from SpeechACM SIGGRAPH 2022 Conference Proceedings10.1145/3528233.3530750(1-9)Online publication date: 27-Jul-2022
https://dl.acm.org/doi/10.1145/3528233.3530750
Pearson LPouw W(2022)Gesture–vocal coupling in Karnatak music performance: A neuro–bodily distributed aesthetic entanglementAnnals of the New York Academy of Sciences10.1111/nyas.148061515:1(219-236)Online publication date: 21-Jun-2022
https://doi.org/10.1111/nyas.14806
Nagy RKucherenko TMoell BPereira AKjellström HBernardet UDignum FLomuscio AEndriss UNowé A(2021)A Framework for Integrating Gesture Generation Models into Interactive Conversational AgentsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464235(1779-1781)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464235
Ferstl YNeff MMcDonnell RDignum FLomuscio AEndriss UNowé A(2021)It's A Match! Gesture Generation Using Expressive Parameter MatchingProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464137(1495-1497)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464137
Ferstl YThomas SGuiard CEnnis CMcDonnell R(2021)Human or Robot?Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents10.1145/3472306.3478338(76-83)Online publication date: 14-Sep-2021
https://dl.acm.org/doi/10.1145/3472306.3478338
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents