Gao et al., 2024 - Google Patents
GesGPT: Speech Gesture Synthesis With Text Parsing from ChatGPTGao et al., 2024
View PDF- Document ID
- 16211239975666645435
- Author
- Gao N
- Zhao Z
- Zeng Z
- Zhang S
- Weng D
- Bao Y
- Publication year
- Publication venue
- IEEE Robotics and Automation Letters
External Links
Snippet
Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they …
- 230000015572 biosynthetic process 0 title abstract description 17
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/06—Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
- G09B5/065—Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gibet et al. | The signcom system for data-driven animation of interactive virtual signers: Methodology and evaluation | |
Nyatsanga et al. | A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation | |
Bragg et al. | Sign language recognition, generation, and translation: An interdisciplinary perspective | |
Chiu et al. | How to train your avatar: A data driven approach to gesture generation | |
Tao et al. | Affective computing: A review | |
US7412389B2 (en) | Document animation system | |
Naert et al. | A survey on the animation of signing avatars: From sign representation to utterance synthesis | |
Fernández-Baena et al. | Gesture synthesis adapted to speech emphasis | |
CN106991172B (en) | Method for establishing multi-mode emotion interaction database | |
Lu et al. | Sentiment analysis: Comprehensive reviews, recent advances, and open challenges | |
Neidle et al. | New shared & interconnected asl resources: Signstream® 3 software; dai 2 for web access to linguistically annotated video corpora; and a sign bank | |
Qi et al. | Emotiongesture: Audio-driven diverse emotional co-speech 3d gesture generation | |
Zeng et al. | Gesturelens: Visual analysis of gestures in presentation videos | |
Gao et al. | Gesgpt: Speech gesture synthesis with text parsing from gpt | |
Pang et al. | Bodyformer: Semantics-guided 3d body gesture synthesis with transformer | |
Gao et al. | GesGPT: Speech Gesture Synthesis With Text Parsing from ChatGPT | |
Liu | Analysis of gender differences in speech and hand gesture coordination for the design of multimodal interface systems | |
Jin et al. | MtArtGPT: A Multi-task Art Generation System with Pre-Trained Transformer | |
Jia et al. | A model of emotional speech generation based on conditional generative adversarial networks | |
Segouat et al. | Toward the study of sign language coarticulation: methodology proposal | |
Gibet et al. | Signing avatars-multimodal challenges for text-to-sign generation | |
Sun et al. | Beyond Talking--Generating Holistic 3D Human Dyadic Motion for Communication | |
Zhang et al. | Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference | |
Navarretta | Predicting emotions in facial expressions from the annotations in naturally occurring first encounters | |
Courty et al. | Why is the creation of a virtual signer challenging computer animation? |