Hagrot, 2019 - Google Patents

A Data-Driven Approach For Automatic Visual Speech In Swedish Speech Synthesis Applications

Hagrot, 2019

Document ID: 18158540553671972058
Author: Hagrot J
Publication year: 2019

External Links

Cited by

Snippet

This project investigates the use of artificial neural networks for visual speech synthesis. The objective was to produce a framework for animated chat bots in Swedish. A survey of the literature on the topic revealed that the state-of-the-art approach was using ANNs with either …

Continue reading at www.diva-portal.org (PDF) (other versions)

230000000007 visual effect 0 title abstract description 51

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems

Similar Documents

Publication	Publication Date	Title
Cudeiro et al.	2019	Capture, learning, and synthesis of 3D speaking styles
US11847727B2 (en)	2023-12-19	Generating facial position data based on audio data
Karras et al.	2017	Audio-driven facial animation by joint end-to-end learning of pose and emotion
Sadoughi et al.	2019	Speech-driven expressive talking lips with conditional sequential generative adversarial networks
CN112465935A (en)	2021-03-09	Virtual image synthesis method and device, electronic equipment and storage medium
US20100082345A1 (en)	2010-04-01	Speech and text driven hmm-based body animation synthesis
CN113781610A (en)	2021-12-10	Virtual face generation method
Bozkurt et al.	2016	Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures
Taylor et al.	2016	Audio-to-visual speech conversion using deep neural networks
WO2021023869A1 (en)	2021-02-11	Audio-driven speech animation using recurrent neutral network
Charalambous et al.	2019	Audio‐driven emotional speech animation for interactive virtual characters
EP4309133A1 (en)	2024-01-24	Three-dimensional face animation from speech
Fares et al.	2023	Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding
Bozkurt et al.	2015	Affect-expressive hand gestures synthesis and animation
Liu et al.	2024	MusicFace: Music-driven expressive singing face synthesis
CN115953521A (en)	2023-04-11	Remote digital human rendering method, device and system
Delbosc et al.	2022	Automatic facial expressions, gaze direction and head movements generation of a virtual agent
Filntisis et al.	2017	Video-realistic expressive audio-visual speech synthesis for the Greek language
Tang et al.	2008	Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar
Hagrot	2019	A Data-Driven Approach For Automatic Visual Speech In Swedish Speech Synthesis Applications
Khan	2023	An Approach of Lip Synchronization With Facial Expression Rendering for an ECA
Gustafson et al.	2023	Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters
Han et al.	2022	Facial landmark predictions with applications to metaverse
Deena	2012	Visual speech synthesis by learning joint probabilistic models of audio and video
Edge et al.	2009	Model-based synthesis of visual speech movements from 3D video