Habibie et al., 2021 - Google Patents

Learning speech-driven 3d conversational gestures from video

Habibie et al., 2021

Document ID: 12700202956387754433
Author: Habibie I; Xu W; Mehta D; Liu L; Seidel H; Pons-Moll G; Elgharib M; Theobalt C
Publication year: 2021
Publication venue: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents

External Links

Cited by

Snippet

We propose the first approach to synthesize the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation …

Continue reading at dl.acm.org (PDF) (other versions)

230000014509 gene expression 0 abstract description 35

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis

Similar Documents

Publication	Publication Date	Title
Habibie et al.	2021	Learning speech-driven 3d conversational gestures from video
Lu et al.	2021	Live speech portraits: real-time photorealistic talking-head animation
Kim et al.	2019	Neural style-preserving visual dubbing
Fan et al.	2022	Faceformer: Speech-driven 3d facial animation with transformers
Yi et al.	2023	Generating holistic 3d human motion from speech
Zhang et al.	2021	Facial: Synthesizing dynamic talking face with implicit attribute learning
Bhattacharya et al.	2021	Speech2affectivegestures: Synthesizing co-speech gestures with generative adversarial affective expression learning
Suwajanakorn et al.	2017	Synthesizing obama: learning lip sync from audio
Chen et al.	2020	What comprises a good talking-head video generation?: A survey and benchmark
US11514634B2 (en)	2022-11-29	Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses
Yu et al.	2020	Multimodal inputs driven talking face generation with spatial–temporal dependency
Tian et al.	2019	Audio2face: Generating speech/face animation from single audio with attention-based bidirectional lstm networks
Thambiraja et al.	2023	Imitator: Personalized speech-driven 3d facial animation
US20210390945A1 (en)	2021-12-16	Text-driven video synthesis with phonetic dictionary
Filntisis et al.	2022	Visual speech-aware perceptual 3d facial expression reconstruction from videos
Yu et al.	2019	Mining audio, text and visual information for talking face generation
Liu et al.	2020	Synthesizing talking faces from text and audio: an autoencoder and sequence-to-sequence convolutional neural network
Liu et al.	2011	Real-time speech-driven animation of expressive talking faces
Nazarieh et al.	2024	A Survey of Cross-Modal Visual Content Generation
Tran et al.	2024	Dyadic Interaction Modeling for Social Behavior Generation
Bhattacharya et al.	2024	Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs
Wang et al.	2023	Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head
Chuang	2004	Analysis, synthesis, and retargeting of facial expressions
Liu et al.	2023	A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Kumar Das et al.	2022	Audio driven artificial video face synthesis using gan and machine learning approaches