research-article

Open access

Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion

Authors:

Ayaka Ideno,

Takuhiro Kaneko,

Tatsuya HaradaAuthors Info & Claims

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

Pages 292 - 300

https://doi.org/10.1145/3577190.3614175

Published: 09 October 2023 Publication History

All formats PDF

Abstract

Understanding an avatar’s motion and controlling its content is important for content creation and has been actively studied in computer vision and graphics. An avatar’s motion consists of frames representing poses each time, and a subsequence of frames can be grouped into a segment based on semantic meaning. To enable semantic-level control of motion, it is important to understand the semantic division of the avatar’s motion. We define a semantic division of avatar’s motion as an “event”, which switches only when the frame in the motion cannot be predicted from the previous frames and information of the last event, and tackled editing motion and inferring motion from text based on events. However, it is challenging because we need to obtain the event information, and control the content of motion based on the obtained event information. To overcome this challenge, we propose obtaining frame-level event representation from the pair of motion and text and using it to edit events in motion and predict motion from the text. Specifically, we learn a frame-level event representation by reconstructing the avatar’s motion from the corresponding frame-level event representation sequence while inferring the sequence from the text. By doing so, we can predict motion from the text. Also, since the event at each motion frame is represented with the corresponding event representation, we can edit events in motion by editing the corresponding event representation sequence. We evaluated our method on the HumanML3D dataset and demonstrated that our model can generate motion from the text while editing motion flexibly (e.g., allowing the change of the event duration, modification of the event characteristics, and the addition of new events).

Supplemental Material

MP4 File

presentation video

Download
82.62 MB

PDF File

Supplementary file of the paper

Download
493.83 KB

ZIP File

zip file including videos in the main paper

Download
1.99 MB

References

[1]

Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh. 2018. Text2Action: Generative Adversarial Synthesis from Language to Action. In 2018 IEEE International Conference on Robotics and Automation. 1–5. https://doi.org/10.1109/ICRA.2018.8460608

Abstract

Supplemental Material

References

Index Terms

Recommendations

Using motion capture for interactive motion editing

Low-level motion analysis

Level-set-based motion estimation algorithm for multiple reference frame motion estimation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations