Habibie et al., 2021 - Google Patents
Learning speech-driven 3d conversational gestures from videoHabibie et al., 2021
View PDF- Document ID
- 12700202956387754433
- Author
- Habibie I
- Xu W
- Mehta D
- Liu L
- Seidel H
- Pons-Moll G
- Elgharib M
- Theobalt C
- Publication year
- Publication venue
- Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents
External Links
Snippet
We propose the first approach to synthesize the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation …
- 230000014509 gene expression 0 abstract description 35
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Habibie et al. | Learning speech-driven 3d conversational gestures from video | |
Lu et al. | Live speech portraits: real-time photorealistic talking-head animation | |
Kim et al. | Neural style-preserving visual dubbing | |
Fan et al. | Faceformer: Speech-driven 3d facial animation with transformers | |
Yi et al. | Generating holistic 3d human motion from speech | |
Zhang et al. | Facial: Synthesizing dynamic talking face with implicit attribute learning | |
Bhattacharya et al. | Speech2affectivegestures: Synthesizing co-speech gestures with generative adversarial affective expression learning | |
Suwajanakorn et al. | Synthesizing obama: learning lip sync from audio | |
Chen et al. | What comprises a good talking-head video generation?: A survey and benchmark | |
US11514634B2 (en) | Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses | |
Yu et al. | Multimodal inputs driven talking face generation with spatial–temporal dependency | |
Tian et al. | Audio2face: Generating speech/face animation from single audio with attention-based bidirectional lstm networks | |
Thambiraja et al. | Imitator: Personalized speech-driven 3d facial animation | |
US20210390945A1 (en) | Text-driven video synthesis with phonetic dictionary | |
Filntisis et al. | Visual speech-aware perceptual 3d facial expression reconstruction from videos | |
Yu et al. | Mining audio, text and visual information for talking face generation | |
Liu et al. | Synthesizing talking faces from text and audio: an autoencoder and sequence-to-sequence convolutional neural network | |
Liu et al. | Real-time speech-driven animation of expressive talking faces | |
Nazarieh et al. | A Survey of Cross-Modal Visual Content Generation | |
Tran et al. | Dyadic Interaction Modeling for Social Behavior Generation | |
Bhattacharya et al. | Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs | |
Wang et al. | Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head | |
Chuang | Analysis, synthesis, and retargeting of facial expressions | |
Liu et al. | A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation | |
Kumar Das et al. | Audio driven artificial video face synthesis using gan and machine learning approaches |