More Web Proxy on the site http://driver.im/

research-article

Text2Action: Generative Adversarial Synthesis from Language to Action

Authors:

Songhwai OhAuthors Info & Claims

2018 IEEE International Conference on Robotics and Automation (ICRA)

Pages 1 - 5

https://doi.org/10.1109/ICRA.2018.8460608

Published: 21 May 2018 Publication History

Abstract

In this paper, we propose a generative model which learns the relationship between language and human action in order to generate a human action sequence given a sentence describing human behavior. The proposed generative model is a generative adversarial network (GAN), which is based on the sequence to sequence (SEQ2SEQ) model. Using the proposed generative network, we can synthesize various actions for a robot or a virtual agent using a text encoder recurrent neural network (RNN) and an action decoder RNN. The proposed generative network is trained from 29,770 pairs of actions and sentence annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video dataset. We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence. Results show that the proposed generative network correctly models the relationship between language and action and can generate a diverse set of actions from the same sentence.

References

[1]

E. Ribes-Iñesta, “Human behavior as language: some thoughts on wittgenstein,” Behavior and Philosophy, pp. 109–121, 2006.

[2]

W. Takano and Y. Nakamura, “Symbolically structured database for human whole body motions based on association between motion symbols and motion words,” Robotics and Autonomous Systems, vol. 66, pp. 75–85, 2015.

Digital Library

[3]

M. Plappert, C. Mandery, and T. Asfour, “The KIT motion-language dataset,” Big Data, vol. 4, no. 4, pp. 236–252, 2016.

[4]

W. Takano and Y. Nakamura, “Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions,” The International Journal of Robotics Research, vol. 34, no. 10, pp. 1314–1328, 2015.

Digital Library

[5]

M. Plappert, C. Mandery, and T. Asfour, “Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks,” arXiv preprint arXiv:, 2017.

[6]

S. R. Eddy, “Hidden markov models,” Current Opinion in Structural Biology, vol. 6, no. 3, pp. 361–365, 1996.

[7]

I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems, Dec. 2014.

[8]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, Dec. 2014.

Digital Library

[9]

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in International Conference on Machine Learning, June 2016.

[10]

C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, July 2017.

[11]

A. Dosovitskiy and T. Brox, “Generating images with perceptual similarity metrics based on deep networks,” in Advances in Neural Information Processing Systems, Dec. 2016.

[12]

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

Digital Library

[13]

O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton, “Grammar as a foreign language,” in Advances in Neural Information Processing Systems, Dec. 2015.

[14]

J. Xu, T. Mei, T. Yao, and Y. Rui, “MSR-VTT: A large video description dataset for bridging video and language,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, June 2016.

[15]

S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, June 2016.

[16]

X. Zhou, M. Zhu, S. Leonardos, K. G. Derpanis, and K. Daniilidis, “Sparseness meets deepness: 3d human pose estimation from monocular video,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, June 2016.

[17]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compo-sitionality,” in Advances in Neural Information Processing Systems, Dec. 2013.

[18]

H. Ahn, T. Ha, Y. Choi, H. Yoo, and S. Oh, “Text2Action: Generative adversarial synthesis from language to action,” arXiv preprint arXiv:, 2017.

[19]

P. Steadman. (2015) baxter-teleoperation. [Online]. Available: https://github.com/ptsteadman/baxter-teleoperation.

Cited By

Liu QNiu ZLu KDong KXue JQin XWang JZhu HSong JLiu WZhang DHuang WWang X(2024)AdaptControl: Adaptive Human Motion Control and Generation via User Prompt and Spatial Trajectory GuidanceProceedings of the 5th International Workshop on Human-centric Multimedia Analysis10.1145/3688865.3689476(13-22)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688865.3689476
Li YChen BRen ZDing YLiu LShao TZhou K(2024)CPoser: An Optimization-after-Parsing Approach for Text-to-Pose Generation Using Large Language ModelsACM Transactions on Graphics10.1145/368793243:6(1-13)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687932
Tang XWu LWang HWu YHu BLi SGong XLiao YKou QJin X(2024)Decoupling Contact for Fine-Grained Motion Style TransferSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687609(1-11)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687609
Show More Cited By

Index Terms

Text2Action: Generative Adversarial Synthesis from Language to Action
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Graphical generative adversarial networks
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

We propose Graphical Generative Adversarial Networks (Graphical-GAN) to model structured data. Graphical-GAN conjoins the power of Bayesian networks on compactly representing the dependency structures among random variables and that of generative ...
Auto-encoder generative adversarial networks

Generative Adversarial Networks have demonstrated potential on a variety of generative tasks, although they are regarded as unstable and sometimes they miss modes. We propose Auto-encoder Generative Adversarial Networks - a convolutional neural network ...
Bayesian Generative Adversarial Nets with Dropout Inference
CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)

Generative adversarial networks are one of the most popular approaches to generate new data from complex high-dimensional data distributions. They have revolutionized the area of generative models by creating quality samples that highly resemble the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

2018 IEEE International Conference on Robotics and Automation (ICRA)

May 2018

5954 pages

Copyright © 2018.

Publisher

IEEE Press

Publication History

Published: 21 May 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu QNiu ZLu KDong KXue JQin XWang JZhu HSong JLiu WZhang DHuang WWang X(2024)AdaptControl: Adaptive Human Motion Control and Generation via User Prompt and Spatial Trajectory GuidanceProceedings of the 5th International Workshop on Human-centric Multimedia Analysis10.1145/3688865.3689476(13-22)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688865.3689476
Li YChen BRen ZDing YLiu LShao TZhou K(2024)CPoser: An Optimization-after-Parsing Approach for Text-to-Pose Generation Using Large Language ModelsACM Transactions on Graphics10.1145/368793243:6(1-13)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687932
Tang XWu LWang HWu YHu BLi SGong XLiao YKou QJin X(2024)Decoupling Contact for Fine-Grained Motion Style TransferSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687609(1-11)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687609
Yao HSong ZZhou YAo TChen BLiu L(2024)MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete RepresentationsACM Transactions on Graphics10.1145/365813743:4(1-21)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658137
Cohan STevet GReda DPeng Xvan de Panne M(2024)Flexible Motion In-betweening with Diffusion ModelsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657414(1-9)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657414
Jin PWu YFan YSun ZWei YYuan LOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Act as you wishProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666803(15497-15518)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666803
Schmuck VTuyen NCeliktutan O(2023)The KCL-SAIR team's entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. SpeakerCompanion Publication of the 25th International Conference on Multimodal Interaction10.1145/3610661.3616555(214-219)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3610661.3616555
Ideno AKaneko THarada T(2023)Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar MotionProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614175(292-300)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3614175
Hong FZhang MPan LCai ZYang LLiu Z(2022)AvatarCLIPACM Transactions on Graphics10.1145/3528223.353009441:4(1-19)Online publication date: 22-Jul-2022
https://dl.acm.org/doi/10.1145/3528223.3530094
Ideno AMukuta YHarada T(2021)Generation of Variable-Length Time Series from Text using Dynamic Time Warping-Based MethodProceedings of the 3rd ACM International Conference on Multimedia in Asia10.1145/3469877.3495644(1-7)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1145/3469877.3495644
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents