Liu et al., 2021 - Google Patents
Strengthnet: Deep learning-based emotion strength assessment for emotional speech synthesisLiu et al., 2021
View PDF- Document ID
- 4600620686443583384
- Author
- Liu R
- Sisman B
- Li H
- Publication year
- Publication venue
- arXiv preprint arXiv:2110.03156
External Links
Snippet
Recently, emotional speech synthesis has achieved remarkable performance. The emotion strength of synthesized speech can be controlled flexibly using a strength descriptor, which is obtained by an emotion attribute ranking function. However, a trained ranking function on …
- 230000002996 emotional 0 title abstract description 34
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11210470B2 (en) | Automatic text segmentation based on relevant context | |
CN113987179B (en) | Dialogue emotion recognition network model based on knowledge enhancement and backtracking loss, construction method, electronic equipment and storage medium | |
CN111400601B (en) | Video recommendation method and related equipment | |
Kuchibhotla et al. | An optimal two stage feature selection for speech emotion recognition using acoustic features | |
Lian et al. | Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition. | |
Atmaja et al. | Evaluating self-supervised speech representations for speech emotion recognition | |
CN113505198B (en) | Keyword-driven generation type dialogue reply method and device and electronic equipment | |
Liu et al. | Accurate emotion strength assessment for seen and unseen speech based on data-driven deep learning | |
CN117312500B (en) | Semantic retrieval model building method based on ANN and BERT | |
Zhang et al. | Unsupervised domain adaptation integrating transformer and mutual information for cross-corpus speech emotion recognition | |
CN113628640B (en) | Cross-library voice emotion recognition method based on sample equalization and maximum mean difference | |
Liu et al. | Strengthnet: Deep learning-based emotion strength assessment for emotional speech synthesis | |
CN118093936B (en) | Video tag processing method, device, computer equipment and storage medium | |
CN116663523B (en) | Semantic text similarity calculation method for multi-angle enhanced network | |
CN115116470B (en) | Audio processing method, device, computer equipment and storage medium | |
Li et al. | Frame-level emotional state alignment method for speech emotion recognition | |
Liu et al. | Keyword retrieving in continuous speech using connectionist temporal classification | |
Bao et al. | Multi-dimensional Convolutional Neural Network for Speech Emotion Recognition | |
Yu et al. | Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing | |
Esteban-Romero et al. | THAU-UPM at EmoSPeech-IberLEF2024: Efficient Adaptation of Mono-modal and Multi-modal Large Language Models for Automatic Speech Emotion Recognition | |
CN114036946B (en) | Text feature extraction and auxiliary retrieval system and method | |
Geng et al. | A Multi-View Co-Learning Method for Multimodal Sentiment Analysis | |
CN113723458B (en) | Long file classification method based on layer attention transducer network | |
Pandelea et al. | Selecting Language Models Features VIA Software-Hardware Co-Design | |
Bonaccorsi | Speech-Text Cross-Modal Learning through Self-Attention Mechanisms |