Liu et al., 2021 - Google Patents

Strengthnet: Deep learning-based emotion strength assessment for emotional speech synthesis

Liu et al., 2021

Document ID: 4600620686443583384
Author: Liu R; Sisman B; Li H
Publication year: 2021
Publication venue: arXiv preprint arXiv:2110.03156

External Links

Cited by

Snippet

Recently, emotional speech synthesis has achieved remarkable performance. The emotion strength of synthesized speech can be controlled flexibly using a strength descriptor, which is obtained by an emotion attribute ranking function. However, a trained ranking function on …

Continue reading at arxiv.org (PDF) (other versions)

230000002996 emotional 0 title abstract description 34

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models

Similar Documents

Publication	Publication Date	Title
US11210470B2 (en)	2021-12-28	Automatic text segmentation based on relevant context
CN113987179B (en)	2024-03-22	Dialogue emotion recognition network model based on knowledge enhancement and backtracking loss, construction method, electronic equipment and storage medium
CN111400601B (en)	2023-03-10	Video recommendation method and related equipment
Kuchibhotla et al.	2016	An optimal two stage feature selection for speech emotion recognition using acoustic features
Lian et al.	2020	Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition.
Atmaja et al.	2022	Evaluating self-supervised speech representations for speech emotion recognition
CN113505198B (en)	2023-12-29	Keyword-driven generation type dialogue reply method and device and electronic equipment
Liu et al.	2022	Accurate emotion strength assessment for seen and unseen speech based on data-driven deep learning
CN117312500B (en)	2024-02-27	Semantic retrieval model building method based on ANN and BERT
Zhang et al.	2022	Unsupervised domain adaptation integrating transformer and mutual information for cross-corpus speech emotion recognition
CN113628640B (en)	2024-09-20	Cross-library voice emotion recognition method based on sample equalization and maximum mean difference
Liu et al.	2021	Strengthnet: Deep learning-based emotion strength assessment for emotional speech synthesis
CN118093936B (en)	2024-07-16	Video tag processing method, device, computer equipment and storage medium
CN116663523B (en)	2024-09-24	Semantic text similarity calculation method for multi-angle enhanced network
CN115116470B (en)	2024-09-27	Audio processing method, device, computer equipment and storage medium
Li et al.	2024	Frame-level emotional state alignment method for speech emotion recognition
Liu et al.	2020	Keyword retrieving in continuous speech using connectionist temporal classification
Bao et al.	2021	Multi-dimensional Convolutional Neural Network for Speech Emotion Recognition
Yu et al.	2022	Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing
Esteban-Romero et al.	2024	THAU-UPM at EmoSPeech-IberLEF2024: Efficient Adaptation of Mono-modal and Multi-modal Large Language Models for Automatic Speech Emotion Recognition
CN114036946B (en)	2023-07-07	Text feature extraction and auxiliary retrieval system and method
Geng et al.	2023	A Multi-View Co-Learning Method for Multimodal Sentiment Analysis
CN113723458B (en)	2024-08-13	Long file classification method based on layer attention transducer network
Pandelea et al.	2023	Selecting Language Models Features VIA Software-Hardware Co-Design
Bonaccorsi	2023	Speech-Text Cross-Modal Learning through Self-Attention Mechanisms