Gao et al., 2024 - Google Patents

GesGPT: Speech Gesture Synthesis With Text Parsing from ChatGPT

Gao et al., 2024

Document ID: 16211239975666645435
Author: Gao N; Zhao Z; Zeng Z; Zhang S; Weng D; Bao Y
Publication year: 2024
Publication venue: IEEE Robotics and Automation Letters

External Links

Cited by

Snippet

Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they …

Continue reading at ieeexplore.ieee.org (PDF) (other versions)

230000015572 biosynthetic process 0 title abstract description 17

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/06—Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
- G09B5/065—Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads

Similar Documents

Publication	Publication Date	Title
Gibet et al.	2011	The signcom system for data-driven animation of interactive virtual signers: Methodology and evaluation
Nyatsanga et al.	2023	A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation
Bragg et al.	2019	Sign language recognition, generation, and translation: An interdisciplinary perspective
Chiu et al.	2011	How to train your avatar: A data driven approach to gesture generation
Tao et al.	2005	Affective computing: A review
US7412389B2 (en)	2008-08-12	Document animation system
Naert et al.	2020	A survey on the animation of signing avatars: From sign representation to utterance synthesis
Fernández-Baena et al.	2014	Gesture synthesis adapted to speech emphasis
CN106991172B (en)	2020-04-28	Method for establishing multi-mode emotion interaction database
Lu et al.	2023	Sentiment analysis: Comprehensive reviews, recent advances, and open challenges
Neidle et al.	2018	New shared & interconnected asl resources: Signstream® 3 software; dai 2 for web access to linguistically annotated video corpora; and a sign bank
Qi et al.	2024	Emotiongesture: Audio-driven diverse emotional co-speech 3d gesture generation
Zeng et al.	2022	Gesturelens: Visual analysis of gestures in presentation videos
Gao et al.	2023	Gesgpt: Speech gesture synthesis with text parsing from gpt
Pang et al.	2023	Bodyformer: Semantics-guided 3d body gesture synthesis with transformer
Gao et al.	2024	GesGPT: Speech Gesture Synthesis With Text Parsing from ChatGPT
Liu	2022	Analysis of gender differences in speech and hand gesture coordination for the design of multimodal interface systems
Jin et al.	2024	MtArtGPT: A Multi-task Art Generation System with Pre-Trained Transformer
Jia et al.	2019	A model of emotional speech generation based on conditional generative adversarial networks
Segouat et al.	2009	Toward the study of sign language coarticulation: methodology proposal
Gibet et al.	2023	Signing avatars-multimodal challenges for text-to-sign generation
Sun et al.	2024	Beyond Talking--Generating Holistic 3D Human Dyadic Motion for Communication
Zhang et al.	2024	Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
Navarretta	2014	Predicting emotions in facial expressions from the annotations in naturally occurring first encounters
Courty et al.	2010	Why is the creation of a virtual signer challenging computer animation?