More Web Proxy on the site http://driver.im/

research-article

FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation

Authors:

Stefan KoppAuthors Info & Claims

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

Pages 763 - 771

https://doi.org/10.1145/3577190.3616115

Published: 09 October 2023 Publication History

Abstract

Human communication relies on multiple modalities such as verbal expressions, facial cues, and bodily gestures. Developing computational approaches to process and generate these multimodal signals is critical for seamless human-agent interaction. A particular challenge is the generation of co-speech gestures due to the large variability and number of gestures that can accompany a verbal utterance, leading to a one-to-many mapping problem. This paper presents an approach based on a Feature Extraction Infusion Network (FEIN-Z) that adopts insights from robot imitation learning and applies them to co-speech gesture generation. Building on the BC-Z architecture, our framework combines transformer architectures and Wasserstein generative adversarial networks. We describe the FEIN-Z methodology and evaluation results obtained within the GENEA Challenge 2023, demonstrating good results and significant improvements in human-likeness over the GENEA baseline. We discuss potential areas for improvement, such as refining input segmentation, employing more fine-grained control networks, and exploring alternative inference methods.

References

[1]

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, and Andy Zeng. 2022. Do As I Can and Not As I Say: Grounding Language in Robotic Affordances. In arXiv preprint arXiv:2204.01691.

[2]

Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, and Louis-Philippe Morency. 2020. No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1884–1895. https://doi.org/10.18653/v1/2020.findings-emnlp.170

[3]

Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, and Louis-Philippe Morency. 2020. Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach. https://doi.org/10.48550/arXiv.2007.12553 arXiv:2007.12553 [cs].

[4]

Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Generating coherent spontaneous speech and gesture from text. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–3. https://doi.org/10.1145/3383652.3423874 arXiv:2101.05684 [cs, eess].

Digital Library

[5]

James Allen, Mehdi Manshadi, Myroslava Dzikovska, and Mary Swift. 2007. Deep linguistic processing for spoken dialogue systems. In Proceedings of the Workshop on Deep Linguistic Processing - DeepLP ’07. Association for Computational Linguistics, Prague, Czech Republic, 49. https://doi.org/10.3115/1608912.1608922

[6]

Kirsten Bergmann, Sebastian Kahl, and Stefan Kopp. 2013. Modeling the Semantic Coordination of Speech and Gesture under Cognitive and Linguistic Constraints. In Intelligent Virtual Agents, David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Doug Tygar, Moshe Y. Vardi, Gerhard Weikum, Ruth Aylett, Brigitte Krenn, Catherine Pelachaud, and Hiroshi Shimodaira (Eds.). Vol. 8108. Springer Berlin Heidelberg, Berlin, Heidelberg, 203–216. https://doi.org/10.1007/978-3-642-40415-3_18 Series Title: Lecture Notes in Computer Science.

[7]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. https://doi.org/10.48550/arXiv.1607.04606 arXiv:1607.04606 [cs].

[8]

Matthew Brand and Aaron Hertzmann. 2000. Style machines. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques(SIGGRAPH ’00). ACM Press/Addison-Wesley Publishing Co., USA, 183–192. https://doi.org/10.1145/344779.344865

Digital Library

[9]

Justine Cassell, David Mcneill, and Karl-Erik Mccullough. 1994. Speech-Gesture Mismatches: Evidence for One Underlying Representation of Linguistic and Nonlinguistic Information. Cognition 7 (Jan. 1994). https://doi.org/10.1075/pc.7.1.03cas

[10]

Justine Cassell, Hannes Högni Vilhjálmsson, and Timothy Bickmore. 2004. BEAT: the Behavior Expression Animation Toolkit. In Life-Like Characters: Tools, Affective Functions, and Applications, Helmut Prendinger and Mitsuru Ishizuka (Eds.). Springer, Berlin, Heidelberg, 163–185. https://doi.org/10.1007/978-3-662-08373-4_8

[11]

Che-Jui Chang, Sen Zhang, and Mubbasir Kapadia. 2022. The IVI Lab entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 784–789. https://doi.org/10.1145/3536221.3558060

Digital Library

[12]

Chung-Cheng Chiu, Louis-Philippe Morency, and Stacy Marsella. 2015. Predicting Co-verbal Gestures: A Deep and Temporal Modeling Approach. In Intelligent Virtual Agents, Willem-Paul Brinkman, Joost Broekens, and Dirk Heylen (Eds.). Vol. 9238. Springer International Publishing, Cham, 152–166. https://doi.org/10.1007/978-3-319-21996-7_17 Series Title: Lecture Notes in Computer Science.

[13]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. https://doi.org/10.48550/arXiv.1406.1078 arXiv:1406.1078 [cs, stat].

[14]

Sharice Clough and Melissa C. Duff. 2020. The Role of Gesture in Communication and Cognition: Implications for Understanding and Treating Neurogenic Communication Disorders. Frontiers in Human Neuroscience 14 (2020). https://doi.org/10.3389/fnhum.2020.00323

[15]

Ilaria Cutica and Monica Bucciarelli. 2011. “The More You Gesture, the Less I Gesture”: Co-Speech Gestures as a Measure of Mental Model Quality. Journal of Nonverbal Behavior 35, 3 (Sept. 2011), 173–187. https://doi.org/10.1007/s10919-011-0112-7

[16]

Ilaria Cutica and Monica Bucciarelli. 2013. Cognitive change in learning from text: Gesturing enhances the construction of the text mental model. Journal of Cognitive Psychology 25, 2 (March 2013), 201–209. https://doi.org/10.1080/20445911.2012.743987

[17]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/arXiv.2010.11929 arXiv:2010.11929 [cs].

[18]

Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2019. Multi-objective adversarial gesture generation. In Proceedings of the 12th ACM SIGGRAPH Conference on Motion, Interaction and Games(MIG ’19). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3359566.3360053

Digital Library

[19]

Aphrodite Galata, Neil Johnson, and David Hogg. 2001. Learning Variable-Length Markov Models of Behavior. Computer Vision and Image Understanding 81, 3 (March 2001), 398–413. https://doi.org/10.1006/cviu.2000.0894

Digital Library

[20]

Chongkai Gao, Haichuan Gao, Shangqi Guo, Tianren Zhang, and Feng Chen. 2021. CRIL: Continual Robot Imitation Learning via Generative and Prediction Model. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6747–5754. https://doi.org/10.1109/IROS51168.2021.9636069 ISSN: 2153-0866.

Digital Library

[21]

Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning Individual Styles of Conversational Gesture. https://doi.org/10.48550/arXiv.1906.04160 arXiv:1906.04160 [cs, eess].

[22]

F. Sebastian Grassia. 1998. Practical Parameterization of Rotations Using the Exponential Map. Journal of Graphics Tools 3, 3 (Jan. 1998), 29–48. https://doi.org/10.1080/10867651.1998.10487493

Digital Library

[23]

Dai Hasegawa, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. 2018. Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network. In Proceedings of the 18th International Conference on Intelligent Virtual Agents(IVA ’18). Association for Computing Machinery, New York, NY, USA, 79–86. https://doi.org/10.1145/3267851.3267878

Digital Library

[24]

Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. MoGlow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics 39, 6 (Dec. 2020), 1–14. https://doi.org/10.1145/3414685.3417836 arXiv:1905.06598 [cs, eess, stat].

Digital Library

[25]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (Nov. 1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 Conference Name: Neural Computation.

Digital Library

[26]

Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics 36, 4 (Aug. 2017), 1–13. https://doi.org/10.1145/3072959.3073663

Digital Library

[27]

Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. 2022. BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning. In Proceedings of the 5th Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 164), Aleksandra Faust, David Hsu, and Gerhard Neumann (Eds.). PMLR, 991–1002. https://proceedings.mlr.press/v164/jang22a.html

[28]

Naoshi Kaneko, Yuna Mitsubayashi, and Geng Mu. 2022. TransGesture: Autoregressive Gesture Generation with RNN-Transducer. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 753–757. https://doi.org/10.1145/3536221.3558061

Digital Library

[29]

Stefan Kopp, Brigitte Krenn, Stacy Marsella, Andrew N. Marshall, Catherine Pelachaud, Hannes Pirker, Kristinn R. Thórisson, and Hannes Vilhjálmsson. 2006. Towards a Common Framework for Multimodal Generation: The Behavior Markup Language. In Intelligent Virtual Agents(Lecture Notes in Computer Science), Jonathan Gratch, Michael Young, Ruth Aylett, Daniel Ballin, and Patrick Olivier (Eds.). Springer, Berlin, Heidelberg, 205–217. https://doi.org/10.1007/11821830_17

Digital Library

[30]

Vladislav Korzun, Anna Beloborodova, and Arkady Ilin. 2022. ReCell: replicating recurrent cell for auto-regressive pose generation. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 94–97. https://doi.org/10.1145/3536220.3558801

Digital Library

[31]

Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, and Gustav Eje Henter. 2023. The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings. In Proceedings of the ACM International Conference on Multimodal Interaction(ICMI ’23). ACM.

Digital Library

[32]

Dong Won Lee, Chaitanya Ahuja, and Louis-Philippe Morency. 2021. Crossmodal Clustered Contrastive Learning: Grounding of Spoken Language to Gesture. In Companion Publication of the 2021 International Conference on Multimodal Interaction. ACM, Montreal QC Canada, 202–210. https://doi.org/10.1145/3461615.3485408

Digital Library

[33]

Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha Srinivasa, and Yaser Sheikh. 2019. Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 763–772. https://doi.org/10.1109/ICCV.2019.00085

[34]

Jina Lee and Stacy Marsella. 2006. Nonverbal Behavior Generator for Embodied Conversational Agents. In Intelligent Virtual Agents(Lecture Notes in Computer Science), Jonathan Gratch, Michael Young, Ruth Aylett, Daniel Ballin, and Patrick Olivier (Eds.). Springer, Berlin, Heidelberg, 243–255. https://doi.org/10.1007/11821830_20

Digital Library

[35]

Yang Li, Jin Huang, Feng Tian, Hong-An Wang, and Guo-Zhong Dai. 2019. Gesture interaction in virtual reality. Virtual Reality & Intelligent Hardware 1, 1 (Feb. 2019), 84–112. https://doi.org/10.3724/SP.J.2096-5796.2018.0006

[36]

Kevin Lin, Christopher Agia, Toki Migimatsu, Marco Pavone, and Jeannette Bohg. 2023. Text2Motion: From Natural Language Instructions to Feasible Plans. arxiv:2303.12153 [cs.RO]

[37]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. https://doi.org/10.48550/arXiv.1711.05101 arXiv:1711.05101 [cs, math].

[38]

Shuhong Lu and Andrew Feng. 2022. The DeepMotion entry to the GENEA Challenge 2022. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 790–796. https://doi.org/10.1145/3536221.3558059

Digital Library

[39]

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. [n. d.]. Rectiﬁer Nonlinearities Improve Neural Network Acoustic Models. ([n. d.]).

[40]

Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual character performance from speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, Anaheim California, 25–35. https://doi.org/10.1145/2485895.2485900

Digital Library

[41]

Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, and Michael Neff. 2023. A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. Computer Graphics Forum 42, 2 (May 2023), 569–596. https://doi.org/10.1111/cgf.14776 arXiv:2301.05339 [cs].

[42]

Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, and Martin Riedmiller. 2023. Towards A Unified Agent with Foundation Models. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023. https://openreview.net/forum?id=JK_B1tB6p-

[43]

Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2018. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.

[44]

Khaled Saleh. 2022. Hybrid Seq2Seq Architecture for 3D Co-Speech Gesture Generation. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 748–752. https://doi.org/10.1145/3536221.3558064

Digital Library

[45]

Noam Shazeer. 2020. GLU Variants Improve Transformer. CoRR abs/2002.05202 (2020). arXiv:2002.05202https://arxiv.org/abs/2002.05202

[46]

Mingyang Sun, Mengchen Zhao, Yaqing Hou, Minglei Li, Huang Xu, Songcen Xu, and Jianye Hao. [n. d.]. Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards. ([n. d.]).

[47]

Graham W. Taylor and Geoffrey E. Hinton. 2009. Factored conditional restricted Boltzmann Machines for modeling motion style. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, Montreal Quebec Canada, 1025–1032. https://doi.org/10.1145/1553374.1553505

Digital Library

[48]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. https://doi.org/10.48550/arXiv.2302.13971 arXiv:2302.13971 [cs].

[49]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

[50]

Petra Wagner, Zofia Malisz, and Stefan Kopp. 2014. Gesture and speech in interaction: An overview. Speech Communication 57 (Feb. 2014), 209–232. https://doi.org/10.1016/j.specom.2013.09.008

Digital Library

[51]

Jonathan Windle, David Greenwood, and Sarah Taylor. 2022. UEA Digital Humans entry to the GENEA Challenge 2022. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 771–777. https://doi.org/10.1145/3536221.3558065

Digital Library

[52]

Bowen Wu, Chaoran Liu, Carlos T. Ishi, and Hiroshi Ishiguro. 2021. Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN. In Companion Publication of the 2021 International Conference on Multimodal Interaction. ACM, Montreal QC Canada, 194–201. https://doi.org/10.1145/3461615.3485407

Digital Library

[53]

Jiqing Wu, Zhiwu Huang, Janine Thoma, Dinesh Acharya, and Luc Van Gool. 2018. Wasserstein Divergence for GANs. https://doi.org/10.48550/arXiv.1712.01026 arXiv:1712.01026 [cs].

[54]

Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, and Weihong Bao. 2022. The ReprGesture entry to the GENEA Challenge 2022. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 758–763. https://doi.org/10.1145/3536221.3558066

Digital Library

[55]

Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity. ACM Transactions on Graphics 39, 6 (Dec. 2020), 1–16. https://doi.org/10.1145/3414685.3417838 arXiv:2009.02119 [cs].

Digital Library

[56]

Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, Montreal, QC, Canada, 4303–4309. https://doi.org/10.1109/ICRA.2019.8793720

Digital Library

[57]

Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, Montreal, QC, Canada, 4303–4309. https://doi.org/10.1109/ICRA.2019.8793720

Digital Library

[58]

Youngwoo Yoon, Pieter Wolfert, Taras Kucherenko, Carla Viegas, Teodor Nikolov, Mihail Tsakov, and Gustav Eje Henter. 2022. The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 736–747. https://doi.org/10.1145/3536221.3558058

Digital Library

[59]

Chi Zhou, Tengyue Bian, and Kang Chen. 2022. GestureMaster: Graph-based Speech-driven Gesture Generation. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. ACM, Bengaluru India, 764–770. https://doi.org/10.1145/3536221.3558063

Digital Library

Cited By

Tonoli RCosta PMarques LUeda L(2024)Gesture Area Coverage to Assess Gesture Expressiveness and Human-LikenessCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688822(165-169)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3686215.3688822
Kucherenko TNagy RYoon YWoo JNikolov TTsakov MHenter G(2023)The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settingsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616120(792-801)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3616120

Index Terms

FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
    2. Machine learning approaches
      1. Learning latent representations
      2. Neural networks
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
    2. Interactive systems and tools
  2. Interaction design
    1. Empirical studies in interaction design

Recommendations

Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

Due to their significance in human communication, the automatic generation of co-speech gestures in artificial embodied agents has received a lot of attention. Although modern deep learning approaches can generate realistic-looking conversational ...
AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech ...
A Learning-based Co-Speech Gesture Generation System for Social Robots
HAI '24: Proceedings of the 12th International Conference on Human-Agent Interaction

Co-speech gestures enhance both human-human and human-robot interactions. This paper examines the efficacy of a data-driven approach for generating synchronised co-speech gestures in three social robots to improve social interactions. Building on a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

October 2023

858 pages

ISBN:9798400700552

DOI:10.1145/3577190

Editors:
Elisabeth André
University of Augsburg
,
Mohamed Chetouani
Sorbonne University
,
Dominique Vaufreydaz
Univ. Grenoble Alpes
,
Gale Lucas
USC Institute for Creative Technologies
,
Tanja Schultz
University of Bremen
,
Louis-Philippe Morency
Carnegie Mellon University
,
Alessandro Vinciarelli
University of Glasgow

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMI '23

Sponsor:

SIGCHI

ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 9 - 13, 2023

Paris, France

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
102
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)5

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tonoli RCosta PMarques LUeda L(2024)Gesture Area Coverage to Assess Gesture Expressiveness and Human-LikenessCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688822(165-169)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3686215.3688822
Kucherenko TNagy RYoon YWoo JNikolov TTsakov MHenter G(2023)The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settingsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616120(792-801)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3616120

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents