Hagrot, 2019 - Google Patents
A Data-Driven Approach For Automatic Visual Speech In Swedish Speech Synthesis ApplicationsHagrot, 2019
View PDF- Document ID
- 18158540553671972058
- Author
- Hagrot J
- Publication year
External Links
Snippet
This project investigates the use of artificial neural networks for visual speech synthesis. The objective was to produce a framework for animated chat bots in Swedish. A survey of the literature on the topic revealed that the state-of-the-art approach was using ANNs with either …
- 230000000007 visual effect 0 title abstract description 51
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cudeiro et al. | Capture, learning, and synthesis of 3D speaking styles | |
US11847727B2 (en) | Generating facial position data based on audio data | |
Karras et al. | Audio-driven facial animation by joint end-to-end learning of pose and emotion | |
Sadoughi et al. | Speech-driven expressive talking lips with conditional sequential generative adversarial networks | |
CN112465935A (en) | Virtual image synthesis method and device, electronic equipment and storage medium | |
US20100082345A1 (en) | Speech and text driven hmm-based body animation synthesis | |
CN113781610A (en) | Virtual face generation method | |
Bozkurt et al. | Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures | |
Taylor et al. | Audio-to-visual speech conversion using deep neural networks | |
WO2021023869A1 (en) | Audio-driven speech animation using recurrent neutral network | |
Charalambous et al. | Audio‐driven emotional speech animation for interactive virtual characters | |
EP4309133A1 (en) | Three-dimensional face animation from speech | |
Fares et al. | Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding | |
Bozkurt et al. | Affect-expressive hand gestures synthesis and animation | |
Liu et al. | MusicFace: Music-driven expressive singing face synthesis | |
CN115953521A (en) | Remote digital human rendering method, device and system | |
Delbosc et al. | Automatic facial expressions, gaze direction and head movements generation of a virtual agent | |
Filntisis et al. | Video-realistic expressive audio-visual speech synthesis for the Greek language | |
Tang et al. | Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar | |
Hagrot | A Data-Driven Approach For Automatic Visual Speech In Swedish Speech Synthesis Applications | |
Khan | An Approach of Lip Synchronization With Facial Expression Rendering for an ECA | |
Gustafson et al. | Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters | |
Han et al. | Facial landmark predictions with applications to metaverse | |
Deena | Visual speech synthesis by learning joint probabilistic models of audio and video | |
Edge et al. | Model-based synthesis of visual speech movements from 3D video |