Lu et al., 2013 - Google Patents
Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesisLu et al., 2013
View PDF- Document ID
- 17506550162325821482
- Author
- Lu H
- King S
- Watts O
- Publication year
- Publication venue
- 8th ISCA Speech Synthesis Workshop
External Links
Snippet
Conventional statistical parametric speech synthesis relies on decision trees to cluster together similar contexts, result-ing in tied-parameter context-dependent hidden Markov models (HMMs). However, decision tree clustering has a major weak-ness: it use hard …
- 230000001537 neural 0 title abstract description 13
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lu et al. | Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis | |
US11361751B2 (en) | Speech synthesis method and device | |
Zen et al. | Statistical parametric speech synthesis using deep neural networks | |
Sun et al. | Voice conversion using deep bidirectional long short-term memory based recurrent neural networks | |
Kirchhoff et al. | Combining acoustic and articulatory feature information for robust speech recognition | |
Wu et al. | Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis | |
Athanaselis et al. | ASR for emotional speech: clarifying the issues and enhancing performance | |
Hashimoto et al. | The effect of neural networks in statistical parametric speech synthesis | |
CN106935239A (en) | The construction method and device of a kind of pronunciation dictionary | |
Chengalvarayan et al. | Speech trajectory discrimination using the minimum classification error learning | |
Hashimoto et al. | Trajectory training considering global variance for speech synthesis based on neural networks | |
EP1647970B1 (en) | Hidden conditional random field models for phonetic classification and speech recognition | |
Rosenberg et al. | Modeling phrasing and prominence using deep recurrent learning. | |
Shahin et al. | Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s | |
Ng et al. | Spoken language recognition with prosodic features | |
Du et al. | Phone-level prosody modelling with GMM-based MDN for diverse and controllable speech synthesis | |
CN110415725A (en) | Use the method and system of first language data assessment second language pronunciation quality | |
Lai et al. | Phone-aware LSTM-RNN for voice conversion | |
Kuzdeuov et al. | Speech command recognition: Text-to-speech and speech corpus scraping are all you need | |
Zangar et al. | Duration modelling and evaluation for Arabic statistical parametric speech synthesis | |
Ribeiro et al. | Learning word vector representations based on acoustic counts | |
Saba et al. | Urdu text-to-speech conversion using deep learning | |
Barman et al. | State of the art review of speech recognition using genetic algorithm | |
Pour et al. | Persian automatic speech recognition by the use of whisper model | |
Vazirnezhad et al. | Hybrid statistical pronunciation models designed to be trained by a medium-size corpus |