Lu et al., 2013 - Google Patents

Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis

Lu et al., 2013

Document ID: 17506550162325821482
Author: Lu H; King S; Watts O
Publication year: 2013
Publication venue: 8th ISCA Speech Synthesis Workshop

External Links

Cited by

Snippet

Conventional statistical parametric speech synthesis relies on decision trees to cluster together similar contexts, result-ing in tied-parameter context-dependent hidden Markov models (HMMs). However, decision tree clustering has a major weak-ness: it use hard …

Continue reading at www.research.ed.ac.uk (PDF) (other versions)

230000001537 neural 0 title abstract description 13

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data

Similar Documents

Publication	Publication Date	Title
Lu et al.	2013	Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
US11361751B2 (en)	2022-06-14	Speech synthesis method and device
Zen et al.	2013	Statistical parametric speech synthesis using deep neural networks
Sun et al.	2015	Voice conversion using deep bidirectional long short-term memory based recurrent neural networks
Kirchhoff et al.	2002	Combining acoustic and articulatory feature information for robust speech recognition
Wu et al.	2015	Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
Athanaselis et al.	2005	ASR for emotional speech: clarifying the issues and enhancing performance
Hashimoto et al.	2015	The effect of neural networks in statistical parametric speech synthesis
CN106935239A (en)	2017-07-07	The construction method and device of a kind of pronunciation dictionary
Chengalvarayan et al.	1998	Speech trajectory discrimination using the minimum classification error learning
Hashimoto et al.	2016	Trajectory training considering global variance for speech synthesis based on neural networks
EP1647970B1 (en)	2010-11-03	Hidden conditional random field models for phonetic classification and speech recognition
Rosenberg et al.	2015	Modeling phrasing and prominence using deep recurrent learning.
Shahin et al.	2015	Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s
Ng et al.	2013	Spoken language recognition with prosodic features
Du et al.	2021	Phone-level prosody modelling with GMM-based MDN for diverse and controllable speech synthesis
CN110415725A (en)	2019-11-05	Use the method and system of first language data assessment second language pronunciation quality
Lai et al.	2016	Phone-aware LSTM-RNN for voice conversion
Kuzdeuov et al.	2023	Speech command recognition: Text-to-speech and speech corpus scraping are all you need
Zangar et al.	2021	Duration modelling and evaluation for Arabic statistical parametric speech synthesis
Ribeiro et al.	2017	Learning word vector representations based on acoustic counts
Saba et al.	2022	Urdu text-to-speech conversion using deep learning
Barman et al.	2017	State of the art review of speech recognition using genetic algorithm
Pour et al.	2024	Persian automatic speech recognition by the use of whisper model
Vazirnezhad et al.	2009	Hybrid statistical pronunciation models designed to be trained by a medium-size corpus